3.1. Evaluation
In this paper, we use a variety of metrics to quantify the prediction effect of three models, including SSIM (structural similarity), LPIPS (image perception similarity index) and CSI (critical success index). The SSIM and LPIPS metrics are used to quantify the image similarity based on the perspective of iconology, and the CSI metric is commonly used in the field of meteorology, which represents the success rate of prediction.
SSIM is an index to measure the similarity of two images. Given two images
and
, the structural similarity of the two images can be calculated according to Equation (11), where
is the average of
,
is the average of
,
is the variance of
,
is the variance of
, and
is the covariance of
and
.
,
is a constant used to maintain stability,
is the dynamic range of pixel value,
,
.
LPIPS learns to generate the reverse mapping from the image to the ground truth, forces the generator to learn to reconstruct the reverse mapping of the real image from the false image, and gives priority to processing the perceptual similarity between them, which is more in line with human perception. The lower the value of LPIPS, the more similar the two images are, where
represents the distance of
and
(
d represents LPIPS), as shown in Equation (12), where
and
represent the height and width of the image, respectively in convolutional layer
,
and
are the result of normalizing the output of each convolutional layer after activation, vector
is used to deflate the number of active channels, and
represents the multiplication of
and
.
For the construction of the CSI metric in this study, since the prediction task is completed at the pixel level, they are projected back to the atmospheric visibility, and the atmospheric visibility of each pixel is calculated. This metric is like the classification metric and mainly focuses on whether the prediction of the location point is hit within a certain threshold range. For example, if the threshold is 1000 m, after binarization, 999 m will be converted to 0, 1001 m will be converted to 1. After converting each pixel value between the predicted value and the real value to 0/1, the
(true positive, predicted value = 1, real value = 1),
(false positive, predicted value = 1, real value = 0) and
(false negative, predicted value = 0, true value = 0) are calculated. To fully evaluate the performance of the algorithm, this paper calculates this metric under three thresholds, e.g., 1000 m, 4000 m and 10,000 m, corresponding to atmospheric visibility with different levels. The CSI metric is shown in Equation (13).
3.2. Results
In this section, we compare the SwiftRNN model with the classical ConvLSTM and PredRNN models based on the SSIM, LPIPS and CSI metrics.
To intuitively analyze the prediction effect, a group of individual cases are selected from the test set for analysis.
Figure 5 shows the spatial distribution of atmospheric visibility images predicted by the ConvLSTM, PredRNN and SwiftRNN models at 12:00–17:00 on 31st October 2020; those results are also compared with the observed atmospheric visibility images. It can be seen that the spatial distribution of atmospheric visibility images predicted by the ConvLSTM, PredRNN and SwiftRNN models is in good agreement with the observed images. However, since the predictions of the ConvLSTM model are relatively unsatisfactory compared with the other two models, we only analyze the results of the proposed SwiftRNN model and the PredRNN model. Generally, the SwiftRNN and PredRNN model can well predict two large low-visibility areas in north and central China. Additionally, in some fine local characteristics, the low visibility in Wuhan, Tianjin, Shijiazhuang and other areas can also be well grasped. However, the overall intensity of the predicted images is slightly larger than the observed images, and the low visibility in Henan is underestimated in the next few hours. Compared with the PredRNN model, the visibility distribution predicted by the SwiftRNN model is closer to the observed images. For the example in
Figure 5, the PredRNN model forecasts lower visibility in Shanxi, Shaanxi and the Yangtze River Delta, while the predicted results by the SwiftRNN model are more accurate and do not differ much from the observed images. In addition, the advantage of the SwiftRNN model is more obvious in the last 6 h.
Figure 6 shows the comparison of the predicted results from 18:00 to 23:00 on 31st October 2020, the predicted results from the SwiftRNN model have a larger area of low visibility in central China and Guizhou, and a smaller area of low visibility in northern Jiangsu, northern Anhui, and Henan, which is closer to the observed images. However, the PredRNN model predicts heavier visibility over more areas, and the spatial details of the predictions are not as good as those of the SwiftRNN model. Overall, the PredRNN model is less accurate compared with the SwiftRNN model.
Table 3 and
Table 4 show the SSIM metrics and LPIPS metrics of the 12 h predictions by the ConvLSTM, PredRNN and SwiftRNN model. It can be seen from these tables that the performance of the SwiftRNN model is better than the PredRNN model in four seasons. Thus, the proposed SwiftRNN model is conducive to predict clearer and more detailed atmospheric visibility images.
Figure 7 compares image similarity metrics in four seasons of the SwiftRNN and PredRNN model; we can see in winter, spring and autumn, the SSIM metrics of the predicted atmospheric visibility images by the SwiftRNN model are lower than those by the PredRNN model in the first four hours, but the SSIM metrics by the SwiftRNN model are higher than those by the PredRNN model in the following hours, which indicates that the SwiftRNN model can capture the characteristics of a longer time. Compared the SSIM metric in summer, the SwiftRNN model performs better than the PredRNN model in all hours. In terms of the LPIPS metric, the LPIPS metrics of the SwiftRNN model are lower than those of the PredRNN model. Therefore, in general, the images predicted by the SwiftRNN model are more detailed and closer to the observed images than those predicted by the PredRNN model.
Table 5,
Table 6 and
Table 7 show the CSI metrics of the 12 h predictions by the ConvLSTM, PredRNN and SwiftRNN model. To fairly compare and fully evaluate the performance of the algorithm, we calculate the CSI metric under three atmospheric visibility thresholds in different seasons, including 1000 m, 4000 m and 10,000 m.
It can be seen that in the deep learning method, the nonlinear and convolutional structure of the network can learn some complex spatiotemporal patterns in the data set. Obviously, our proposed SwiftRNN model better achieves the CSI metric than the PredRNN model. At the threshold of 1000 m, the CSI metrics of the SwiftRNN model in winter, spring, summer, and autumn are increased by about 7.82%, 6.11%, 7.91% and 8.12%, respectively, compared with those of the PredRNN model. At the threshold of 4000 m, the CSI metrics of the SwiftRNN model in winter, spring, summer, and autumn are increased by about 6.24%, 5.59%, 6.15% and 6.08%, respectively compared with those of the PredRNN model. At the threshold of 10,000 m, the CSI metrics of the SwiftRNN model in winter, spring, summer, and autumn are increased by about 4.93%, 4.51%, 5.06% and 5.18%, respectively, compared with those of the PredRNN method. It can be concluded that although the SwiftRNN model obtains the promising performance at the threshold of 4000 m and 10,000 m, there is a maximum percentage increase at the threshold of 1000 m, which means that the proposed model has better prediction performance for low visibility. In addition, the improvement effect of atmospheric visibility prediction in winter, summer and autumn is relatively similar, and the improvement effect of atmospheric visibility prediction in spring is not obvious.
Figure 8 shows the CSI metrics of three models in four seasons. Comparing (a) January, (b) April, (c) July, and (d) October horizontally, it can be seen that the two models obtain the best predicted effect in summer and the worst predicted effect in winter. Vertical comparison of (a) January, (b) April, (c) July, and (d) October shows that the predicted effect of 10,000 m threshold is the best, the predicted effect of a 4000 m threshold is the second, and the predicted effect of 1000 m is the worst. Additionally, the SwiftRNN model performs better than the PredRNN model in all three thresholds.
Table 8 indicates the training speed of three models. Based on 40,000 epochs, where the PredRNN model is about 6.132 s per epoch and the SwiftRNN model is about 5.255 s per epoch, the improvement effect is 14.3%.