iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.3390/s23083906
A Lightweight Feature Distillation and Enhancement Network for Super-Resolution Remote Sensing Images
Next Article in Journal
Auto Sizing of CANDU Nuclear Reactor Fuel Channel Flaws from UT Scans
Previous Article in Journal
Recognition of Hand Gestures Based on EMG Signals with Deep and Double-Deep Q-Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Lightweight Feature Distillation and Enhancement Network for Super-Resolution Remote Sensing Images

1
College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
2
Key Laboratory of Signal Detection and Processing, Xinjiang University, Urumqi 830046, China
3
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
4
Shanghai Institute of Satellite Engineering, Shanghai 201109, China
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(8), 3906; https://doi.org/10.3390/s23083906
Submission received: 5 March 2023 / Revised: 22 March 2023 / Accepted: 23 March 2023 / Published: 12 April 2023
(This article belongs to the Section Remote Sensors)

Abstract

:
Super-resolution (SR) images based on deep networks have achieved great accomplishments in recent years, but the large number of parameters that come with them are not conducive to use in equipment with limited capabilities in real life. Therefore, we propose a lightweight feature distillation and enhancement network (FDENet). Specifically, we propose a feature distillation and enhancement block (FDEB), which contains two parts: a feature-distillation part and a feature-enhancement part. Firstly, the feature-distillation part uses the stepwise distillation operation to extract the layered feature, and here we use the proposed stepwise fusion mechanism (SFM) to fuse the retained features after stepwise distillation to promote information flow and use the shallow pixel attention block (SRAB) to extract information. Secondly, we use the feature-enhancement part to enhance the extracted features. The feature-enhancement part is composed of well-designed bilateral bands. The upper sideband is used to enhance the features, and the lower sideband is used to extract the complex background information of remote sensing images. Finally, we fuse the features of the upper and lower sidebands to enhance the expression ability of the features. A large number of experiments show that the proposed FDENet both produces less parameters and performs better than most existing advanced models.

1. Introduction

To recover high-resolution (HR) images from low-resolution (LR) images, the single-image super-resolution task (SISR) is a hot topic in computer vision, which is closely related to various computer vision tasks, such as target detection [1,2] and scene marking [3,4].
SISR is an ill-posed problem—that is, there are many solution spaces for restoring HR images from LR images. From the current point of view, the methods of image SR reconstruction are mainly divided into three types: interpolation-based methods [5], reconstruction-based methods [6], and learning-based methods [7]. The first methods such as bicubic interpolation [8]. Although the algorithm is simple, due to the lack of attention to edge information, the reconstructed image will lose details. Reconstruction-based methods require prior information to constrain the reconstruction process, and when dealing with image tasks with large magnification factors, the performance of the algorithm will become poor because of the lack of prior information.
With the rapid development of computer vision in recent years, the method based on deep learning has gradually become the mainstream. More and more deep networks with excellent performance are being created. Dong et al. [9] proposed the first model to reconstruct HR images using a convolution neural network (CNN) approach, and remarkable results were achieved compared to traditional methods, but the huge computational load limited the possibility of it being able to perform well. Deconvolution [10] and sub-pixel convolution [11] methods were proposed that can realize post-sampling at the network end, which greatly reduces the computational cost. Kim et al. designed a very deep network [12], and the introduction of residual learning [13] can effectively alleviate the gradient problem and promote the information flow of the network to accelerate the convergence of the network. Zhang et al. [14] considered the correlation between feature map channels, and proposed the channel attention mechanism combined with residual learning, which allowed the network to be more inclined to learn high-frequency information. Haris et al. [15] introduced an error feed-back mechanism to obtain better reconstruction results by calculating the upper and lower projection errors. Dai et al. [16] proposed a second-order channel attention module and non-local augmented residual group structure, which realized more powerful feature representation and feature-related learning.
As a branch of image SR, remote sensing image SR, has also developed rapidly. Lei et al. [17] proposed a multi-level representation for learning remote sensing images, concatenating the results obtained after different layers of convolution and then combining these groups with a convolution layer, which can represent local details and global environment priors. RDBPN [18] was improved from DBPN [15] by replacing its downsampling unit with a simpler downscaling unit, which greatly simplified the network. Haut et al. [19] adopted residual, skip connection and parallel convolution layers with a kernel size of 1 × 1 to extract more informative feature and reduce the network’s information loss. Zhang et al. [20] proposed a parallel multi-scale convolution method to extract multi-scale features and combined it with a channel attention mechanism to further utilize multi-scale feature. Zhang et al. [21] replaced element-wise addition with weighted channel connections in skip connections and performed feature optimization by modeling complex high-order statistics to further refine the extracted features. Dong et al. [22] proposed a second-order, multi-scale super-resolution network to subtly capture multi-scale feature information by aggregating features from different deep learning algorithms in a single path. Xu et al. [23] combined the details of remote sensing images with the background information by connecting local and global memory to increase the receptive field. In order to speed up the calculation speed of the model, the space size of the feature map is also reduced by the way of down-sampling. Aiming at the problem that traditional supervision methods struggle to obtain paired HR and LR images, Zhang et al. [24] designed a cyclic convolution neural network composed of two cyclic modules, which could be trained with unpaired data and had good robustness to image noise and blur. Li et al. [25] designed a recursive block, which focused on high-frequency information through the attention mechanism, and combined low-resolution and high-resolution hierarchical local information to reconstruct the image. Dong et al. [26] proposed a dense-sampling network, which enabled the network to jointly consider multiple levels when performing reconstruction priors and achieved good experimental results.
All the above methods achieved state-of-the-art performance at that time. However, their biggest problem is having too many parameters, which will place a heavy computational burden on hardware facilities, and make them not conducive to effective use in real life. In recent years, some scholars have also begun to focus on lightweight image SR networks that can be used in daily life. For example, take the network based on feature distillation [27,28,29]. Although the channel-separation operation can gradually expand the receptive field and extract more comprehensive information, this operation will lead to insufficient information flow between the separated channels [30], thereby hindering the expression of feature information. There is also the multi-scale feature extraction network MADNet, having the attention mechanism proposed by Lan et al. [31], which using repeated feature extraction blocks not only makes the network’s structure redundant, but also increases the number of network parameters.
In the field of remote sensing, it is very unreasonable to obtain high-quality images by spending more on high-precision sensors, especially in some specific fields of application, such as field surveying, individual reconnaissance, and vehicular satellite positioning and navigation. Most of the devices they use are portable, placing higher requirements on the weight of remote sensing image SR algorithms. Therefore, in order to build a network with fewer parameters and more competitive performance, we propose a lightweight feature distillation and enhancement network (FDENet). The parameters number only 501K, which is almost half the number in the advanced MADNet, and the experimental performance was also better. Figure 1 gives the overall architecture of FDENet. We exploit the backward fusion module (BFM) [32] to fuse the features extracted by four cascaded FDEBs, and then use the Gaussian context transformer (GCT) [33] to improve the feature-expression ability. FDEB contains two parts: the feature-distillation part and the feature-enhancement part. The feature-distillation part uses channel separation to extract layered features. To avoid the problem of insufficient information expression caused by this operation, we use the proposed stepwise fusion mechanism (SFM) to fuse features retained after stepwise distillation to promote information flow. The bilateral bands in the feature-enhancement part are used to enhance feature and extract complex background information from remote sensing images. Finally, the features of the two are fused to enhance the feature expression.
Overall, the main contributions are as follows:
  • We propose a shallow pixel attention block (SRAB), which introduces the pixel attention mechanism, which can make the network pay attention to repair the missing texture details with very few parameters.
  • We propose the SFM, which fuses the retained feature after stepwise distillation to make full use of the reserved features and promote information flow, so that it can make the feature expression more comprehensive.
  • We propose a bilateral feature-enhancement module (BFEM), which extracts contextual information and enhances the resulting feature separately by means of a bilateral band.

2. Proposed Method

In this part, we will introduce the structure of our proposed network and then introduce the proposed feature-distillation part and feature-enhancement part in detail.

2.1. Network Architecture

FDENet’s overall structure is shown in Figure 1. We first extract primary features from the LR image, then extract deep features through four cascaded FDEBs, and finally, pass the data through a 3 × 3 convolution layer and an upsampling layer to obtain the SR image.
(1) Primary feature extraction: Given a LR image I LR R H × W × 3 , where H, W, and 3 are the length, width, and number of channels, to make the network as lightweight as possible, we only use a 3 × 3 convolution layer for primary feature extraction, and let F init denote a convolution layer with a kernel size of 3 × 3 and the number of channels C. Then, the obtained primary feature F 0 is:
F 0 = F init LR R H × W × C
(2) Deep feature extraction: We use four lightweight cascaded FDEBs to extract deep feature. Let F FDEB i and F GCT denote the feature generated after passing through the ith FDEB block and the feature generated after passing through the GCT module, respectively, where i 1 , 4 . Then, the output deep feature F d is:
F d = F GCT F FDEB 4 F FDEB i F FDEB 1 F 0 R H × W × C
(3) Reconstruction layer: Let F up denote a convolution layer with a kernel size of 3 × 3 and an upsampling layer. Then, the final reconstructed SR image I SR is:
I SR = F up F 0 + F d

2.2. The Proposed FDEB

The proposed FDEB consists of a feature-distillation part and a feature-enhancement part. Next, we will give more details.

2.2.1. Feature-Distillation Part

The structure of feature-distillation part is shown in the blue box in Figure 2 right. First, we use a convolution layer with a kernel size of 3 × 3 to extract the features roughly, and then use the method of stepwise distillation to increase the receptive field and further extract the layered features. Specifically, we use the stepwise channel separation operation to retain some of the features and extract information from the other portion. However, the channel-separation operation of stepwise feature distillation inevitably leads to insufficient information flow between channels, hindering the expression of features. Thus, we propose the SFM, which fuses the features retained after each distillation and uses the SRAB to extract information. This can not only make full use of the retained feature, but also effectively avoids the problem of insufficient information flow between channels. We take the proposed SRAB (as shown in Figure 3) as the basic unit of FDEB feature extraction. On top of the SRB proposed by RFDN [29], we introduce the pixel attention mechanism, which enables the network to focus on repairing the missing textural details when extracting features. Let the input feature of the nth FDEB be F i n ; then, the output feature F D n of this process can be described as:
F refined 1 n , F coarse 1 n = split 1 n Conv 3 F i n F refined 2 n , F coarse 2 n = split 2 n SRAB coarse 1 F coarse 1 n F refined 3 n , F coarse 3 n = split 3 n SRAB coarse 2 F coarse 2 n F refined 4 n = SRAB coarse 3 F coarse 3 n
F distil 1 n = SRAB refined 1 concate F refined 1 n , F refined 2 n F distil 2 n = SRAB refined 2 concate F distil 1 n , F refined 3 n F D n = Conv 1 concate F refined 4 n , F distil 2 n F i n
SRAB i = σ ( Conv 1 ( F i ) ) SilU ( Conv 3 ( F i ) F i )
Equations (4)–(6) constitute the feature distillation process used to extract the stratified features, the SFM, and the general formula of SRAB, respectively. F i n represents the input features of the nth FDEB; F coarse i n and F refined i n represent the ith distillation feature and the ith retained feature in the nth FDEB, respectively. split j n represents the jth operation of channel separation in the nth FDEB, SRAB represents our shallow pixel attention block, F i represents the input features of the corresponding SRAB i , F distil i n represents the retained features after fusing and extracting features, and F D n represent the output features of this whole process.

2.2.2. Bilateral Feature Enhancement Module

Compared with natural images, remote sensing images have more complex structural and background information. Therefore, in order to make full use of its background information, we propose the BFEM (as shown in Figure 4), which can focus on extracting the background information of remote sensing image while enhancing features. Let F n represent the input features for the enhancement block. Then, the output features F i n + 1 of this process are:
F i n + 1 = Conv 1 c o n c a t e F BFEM up , F BFEM down
where Conv 1 represent the 1*1 convolution layer; concate represents the operation of fusing features; F BFEM up and F BFEM down represent the output features of upper and lower sidebands, respectively.
In the upper sideband, we use the enhanced spatial attention [34] (ESA) to expand the receptive field extracted by FDEBs and help to obtain a clearer reconstructed image. This part is composed of a step convolution layer with a step size of 2 and a kernel size of 3 × 3 ; a maxpooling layer with a step size of 3 and a kernel size of 7 × 7 ; and three convolution layers with a kernel size of 3 × 3 . Let F ^ represent the features obtained through these above steps and F D n represent the input features for the upper sideband. Then, the feature F BFEM up obtained after passing the upper sideband can be expressed as:
F BFEM up = F D n σ ( Conv 1 ( Conv 1 ( F D n ) F ^ ) )
where ⊗ and ⊕ represent element-wise multiplication and element-wise summation, respectively. Conv 1 represents the convolution layer with a kernel size of 1 × 1 . σ represents the sigmoid function.
The lower sideband is used to extract the contextual feature information of remote sensing images to help obtain more details from the complex background. This part is composed of an avgpooling layer with step size of 2, a kernel size of 2 × 2 , a convolution layer with a kernel size of 1 × 1 and a bilinear upsampling layer. Let F ˜ represent the features obtained through these above steps and F D n represent the input features of the lower sideband. Then, the feature F BFEM down obtained after passing the lower sideband can be expressed as:
F BFEM down = F D n σ ( Conv 1 ( Conv 1 ( F D n ) F ˜ ) )

2.3. Gaussian Context Transformer

The structure of GCT as shown in Figure 5, compared with some other attention mechanism, it is not only lighter, but also can achieve context feature motivation, leading to better performance. Therefore, we pass the features through the GCT to improve the features’ expression ability before sampling.

3. Results

3.1. Experimental Settings

3.1.1. Dataset

DIV2K [35] is a dataset of 900 natural images with a resolution of 2K. It includes various natural images of buildings, animals, plants, etc. Following SMSR [26], we chose the first 800 images as the training set and the last 100 images as the validation set. Following the example of FeNet [30], we randomly selected 240 images from the UC Merced dataset containing 21 scenes to make two test sets, RS-1 and RS-2 [36]. Both of them contain 120 images and cover for the composite evaluation. RS-1 contains 120 images from ten classes, including agricultural, airplane, baseballdiamond, beach, buildings, chaparral, denseresidential, forest, freeway, and golfcourse (12 images per class). The RS-2 contains 120 images from ten classes, including intersection, mediumresidential, mobilehomepark, overpass, parkinglot, river, runway, sparseresidential, storagetanks, and tenniscourt (12 images per class). In order to further prove the generalization ability of the proposed model, we also tested it on four natural benchmark datasets—Set5 [37], Set14 [38], Urban100 [39], and BSD100 [40].

3.1.2. Degradation Method

We used the bicubic interpolation method to downsample the original high-resolution image ×2, ×3, and ×4 in MATLAB R2018a to obtain LR images as training and test data.

3.1.3. Training Details

We chose the L1 loss function [41] as the training loss function, which calculates the sum of the absolute difference between the actual value and the target value. Let y ^ i represent the SR image and y i represent the real HR image. Then, the loss function can be expressed as:
L 1 ( y i , y ^ i ) = 1 m i = 0 m | y i y ^ i |
In order to get the most out of training data, we used random rotation and flipping to enhance the data. The randomly cropped training patch size of the HR image was 192 × 192, and we set the pixel range of the input image to between [0, 1]. The ADAM [42] was used as an optimizer with β 1 = 0.9 , β 2 = 0.999 ; the initial learning rate was set to 5 × 10 4 ; and the learning rate decayed by half every 200 epochs, for a total of 500 epochs. All experiments were implemented using the pytorch framework, and we used a NVIDIA Tesla V100 GPU to complete the entire training and testing process.

3.1.4. Evaluation Index

We used the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) to evaluate the results [43]. Let x and y be the ground truth value and the reconstructed SR image, respectively. Then, the PSNR value is:
PSNR x , y = 10 log 10 255 2 MSE x , y
MSE ( x , y ) = 1 H × W i = 1 H j = 1 W X ( i , j ) Y ( i , j ) 2
where H and W represent the height and width of the given image; 255 represents the maximum RGB value for each pixel; X ( i , j ) and Y ( i , j ) represent the sizes of the pixels corresponding to the the real HR image and generated SR image, respectively. The SSIM value is:
SSIM x , y = 2 μ x μ y 2 σ x y + C 2 μ x 2 + μ y 2 + C 1 σ x 2 + σ y 2 + C 2
σ x y = 1 N 1 i = 1 N ( x i μ x ) ( y i μ y )
where μ x ( μ y ) and σ x , σ y represent the mean and variance, respectively; σ x y is the covariance of x and y; C 1 and C 2 are constants. We evaluate the PSNR and SSIM values on the y channel of the transformed YCbCr space [12]. The conversion method is as follows:
Y = 0.257 × R + 0.564 × G + 0.098 × B + 16 Cb = 0.148 × R 0.291 × G + 0.439 × B + 128 Cr = 0.439 × R 0.368 × G 0.071 × B + 128
where Y, Cb , and Cr represent the brightness, the difference between the blue part of the input signal and the brightness value of the RGB signal, and the difference between the red part of the input signal and the brightness value of the RGB signal, respectively.

3.2. Comparison of Visualization Results

3.2.1. Results on Remote Sensing Images

We quantitatively compare the results of FDENet with the results published in CVPR, ECCV, TGRS, and other well-known conferences and journals over the years. It can be clearly seen in Table 1 that our model has excellent performance when the amplification coefficients are ×2, ×3, and ×4. Take the advanced FeNet [30] as an example. Our PSNR values on the data sets RS-1 and RS-2 are 0.02 and 0.09 dB higher. On the whole, we have fewer parameters and multi-adds than most models. Figure 6 and Figure 7 are the visualization results of each model on the remote sensing image datasets. Take Figure 6 as an example. Compared with the advanced FeNet, the white car and red car in our reconstructed image have a clearer outline and more comprehensive details. From other comparison graphs, we can also see that the edge details of our model’s result graph are also richer.

3.2.2. Results on Natural Images

In order to further prove the generalization ability of our model, we compared it with remote sensing image SR models on four natural benchmark datasets, Set5 [37], Set14 [38], BSD100 [39], and Urban100 [40]. Table 2 shows our quantitative comparison’s results. It can be seen in the table that our model still performs better on natural images than other SR models of remote sensing images. Although its performance for ×2 magnification is slightly inferior to that of the advanced MADNet [31] and FeNet [30], it is far superior on all data sets of ×3 and ×4. Figure 8 shows the visualization results of each model under the ×3 magnification factor on BSD100 [39] and Urban100 [40] datasets, from which we can see that after our model’s reconstruction, the window shape is better restored and the outlines are clearer; more details are retained.

4. Discussion

4.1. Comparison of SRB and SRAB

Our feature extractor’s basic unit, SRAB, introduces the pixel attention mechanism, which has been proven suitable for lightweight networks and can repair the missing texture details of images in the feature-extraction process [32]. The pixel attention mechanism only uses a convolution layer with a kernel size of 1 × 1 and a sigmoid function to obtain the attention maps, and then they will be multiplied with the input features. Due to this method being used in the feature-distillation part, the number of channels of the feature map will gradually decrease during the distillation process, so the amount of parameters introduced can be almost ignored. Table 3 shows that under the same conditions, the results of our FDENet after using SRAB are better than using SRB on four test sets. This fully proves the effectiveness of the SRAB.

4.2. Comparison of ESA and BFEM

Since the background information of remote sensing images is more important than that of nature images, it contains a variety of complex scenes, and the scales of features in different scenes are not the same. Therefore, we propose the BFEM, which can focus on extracting the contextual information of remote sensing images. Compared with the ESA proposed by RFANet [34], the BFEM proposed by us adds a lower sideband for extracting context information, to avoid introducing a large number of parameters. Instead of using a large convolution kernel, we use an avgpooling layer, a bilinear upsampling layer, and some convolution layers with a kernel size of 1*1 to achieve this goal. Table 4 is the result of FDENet replacing the feature-enhancement module with an upsampling factor of four for ESA [34] and BFEM. Under the same conditions, we can see that our BFEM has only 37K more parameters than ESA, showing more powerful performance on four test sets.

4.3. Analysis of SFM

The SFM we proposed fuses the reserved features after each extraction and extracts the features through SRAB, which not only makes full use of the reserved features, but also alleviates the problem of insufficient information flow in the process of feature extraction. Table 5 shows the results of our ablation experiment on SFM. Due to our SFM adopting the strategy of fusing reserved features, we are always in a very light state when using reserved features. This is not only due to the effectiveness of SFM itself, but also due to the ability of our SRAB to effectively extract feature. From the table, we can see that our SFM only introduces 9K parameters, and the PSNR values for the four data sets were 0.03, 0.01, 0.01, and 0.09 dB higher, respectively.

4.4. Analysis of Model Complexity

The quantity of parameters is an important indicator for evaluating the quality of lightweight models. From the results shown in Table 1 and Table 2, although our number of parameters exceeds those of SRCNN [9], LGCNet [17], and the advanced lightweight model FeNet [30], our performance is far ahead of theirs, which fully makes up for the shortcoming of more parameters. In a comprehensive comparison, our number of parameters was still shown to be less than those of most models, and our performance is more competitive. In addition to evaluating the complexity of the model with parameter quantity, we also used multi-adds to evaluate the computational complexity of the network. We set the size of the query image (HR image) to 1280 × 720. Compared with some recent models, such as IDN [27], LESRCNN [44], and MADNet [31], FDENet also has relatively few multi-adds.

5. Conclusions

In this article, we proposed a lightweight feature distillation and enhancement network for SR tasks of remote sensing images. Specifically, we proposed a SFM that can effectively alleviate the problem of insufficient information flow caused by channel separation during feature distillation. We use the designed lightweight SRAB as the main feature extraction method of the FDEB, which can make the network more inclined to extract high-frequency details when extracting features without introducing a large number of parameters. After feature extraction, we enhance the features with SFM, which can extract background information of remote sensing images while enhancing the features. A large number of experiments showed that our model has strong competitiveness compared with some advanced models in terms of performance and parameter quantity. This provides a certain application foundation for lightweight remote sensing image super-resolution reconstruction in field investigations, individual reconnaissance, and other fields of application.

Author Contributions

Conceptualization, F.G. and L.L.; data curation, J.W.; investigation, K.S.; methodology, F.G. and L.L.; resources, M.L. and H.M.; software, F.G. and L.L.; visualization, F.G.; writing—original draft, F.G.; writing—review and editing, Z.J. and H.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Shanghai Aerospace Science and Technology Innovation Fund under Grant No. SAST2019-048; the Cross-Media Intelligent Technology Project of Beijing National Research Center for Information Science and Technology (BNRist) under Grant No. BNR2019TD01022.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Li, W.; Wei, W.; Zhang, L. GSDet: Object Detection in Aerial Images Based on Scale Reasoning. IEEE Trans. Image Process. 2021, 30, 4599–4609. [Google Scholar] [CrossRef] [PubMed]
  2. Li, Y.; Chen, W.; Zhang, Y.; Tao, C.; Xiao, R.; Tan, Y. Accurate Cloud Detection in High-Resolution Remote Sensing Imagery by Weakly Supervised Deep Learning. Remote Sens. Environ. 2020, 250, 112045. [Google Scholar] [CrossRef]
  3. Zhu, Q.; Fan, X.; Zhong, Y.; Guan, Q.; Zhang, L.; Li, D. Super Resolution Generative Adversarial Network Based Image Augmentation for Scene Classification of Remote Sensing Images. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 573–576. [Google Scholar] [CrossRef]
  4. Wang, L.; Guo, S.; Huang, W.; Xiong, Y.; Qiao, Y. Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs. IEEE Trans. Image Process. 2017, 26, 2055–2068. [Google Scholar] [CrossRef] [Green Version]
  5. Li, R.; Lv, Q. Image Sharpening Algorithm Based on a Variety of Interpolation Methods. In Proceedings of the 2012 International Conference on Image Analysis and Signal Processing, Huangzhou, China, 9–11 November 2012; pp. 1–4. [Google Scholar] [CrossRef]
  6. Zhang, K.; Gao, X.; Tao, D.; Li, X. Single Image Super-Resolution with Non-Local Means and Steering Kernel Regression. IEEE Trans. Image Process. 2012, 21, 4544–4556. [Google Scholar] [CrossRef]
  7. Timofte, R.; De, V.; Gool, L.V. Anchored Neighborhood Regression for Fast Example-Based Super-Resolution. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 1920–1927. [Google Scholar] [CrossRef]
  8. Thurnhofer, S.; Mitra, S.K. Edge-enhanced image zooming. Opt. Eng. 1996, 35, 1862–1870. [Google Scholar] [CrossRef]
  9. Dong, C.; Loy, C.C.; He, K.; Tang, X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Dong, C.; Loy, C.C.; Tang, X. Accelerating the Super-Resolution Convolutional Neural Network. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 391–407. [Google Scholar]
  11. Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar] [CrossRef] [Green Version]
  12. Kim, J.; Lee, J.K.; Lee, K.M. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar] [CrossRef] [Green Version]
  13. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
  14. Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 294–310. [Google Scholar]
  15. Haris, M.; Shakhnarovich, G.; Ukita, N. Deep Back-Projection Networks for Super-Resolution. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1664–1673. [Google Scholar] [CrossRef] [Green Version]
  16. Dai, T.; Cai, J.; Zhang, Y.; Xia, S.T.; Zhang, L. Second-Order Attention Network for Single Image Super-Resolution. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11057–11066. [Google Scholar] [CrossRef]
  17. Lei, S.; Shi, Z.; Zou, Z. Super-Resolution for Remote Sensing Images via Local-Global Combined Network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1243–1247. [Google Scholar] [CrossRef]
  18. Pan, Z.; Ma, W.; Guo, J.; Lei, B. Super-Resolution of Single Remote Sensing Image Based on Residual Dense Backprojection Networks. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7918–7933. [Google Scholar] [CrossRef]
  19. Haut, J.M.; Paoletti, M.E.; Fernandez-Beltran, R.; Plaza, J.; Plaza, A.; Li, J. Remote Sensing Single-Image Superresolution Based on a Deep Compendium Model. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1432–1436. [Google Scholar] [CrossRef]
  20. Zhang, S.; Yuan, Q.; Li, J.; Sun, J.; Zhang, X. Scene-Adaptive Remote Sensing Image Super-Resolution Using a Multiscale Attention Network. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4764–4779. [Google Scholar] [CrossRef]
  21. Zhang, D.; Shao, J.; Li, X.; Shen, H.T. Remote Sensing Image Super-Resolution via Mixed High-Order Attention Network. IEEE Trans. Geosci. Remote Sens. 2021, 59, 5183–5196. [Google Scholar] [CrossRef]
  22. Dong, X.; Wang, L.; Sun, X.; Jia, X.; Gao, L.; Zhang, B. Remote Sensing Image Super-Resolution Using Second-Order Multi-Scale Networks. IEEE Trans. Geosci. Remote Sens. 2021, 59, 3473–3485. [Google Scholar] [CrossRef]
  23. Xu, W.; XU, G.; Wang, Y.; Sun, X.; Lin, D.; WU, Y. High Quality Remote Sensing Image Super-Resolution Using Deep Memory Connected Network. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 8889–8892. [Google Scholar] [CrossRef]
  24. Zhang, H.; Wang, P.; Jiang, Z. Nonpairwise-Trained Cycle Convolutional Neural Network for Single Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2021, 59, 4250–4261. [Google Scholar] [CrossRef]
  25. Li, X.; Zhang, D.; Liang, Z.; Ouyang, D.; Shao, J. Fused Recurrent Network Via Channel Attention For Remote Sensing Satellite Image Super-Resolution. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar] [CrossRef]
  26. Dong, X.; Sun, X.; Jia, X.; Xi, Z.; Gao, L.; Zhang, B. Remote Sensing Image Super-Resolution Using Novel Dense-Sampling Networks. IEEE Trans. Geosci. Remote Sens. 2021, 59, 1618–1633. [Google Scholar] [CrossRef]
  27. Hui, Z.; Wang, X.; Gao, X. Fast and Accurate Single Image Super-Resolution via Information Distillation Network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 723–731. [Google Scholar] [CrossRef] [Green Version]
  28. Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight Image Super-Resolution with Information Multi-Distillation Network. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; Volume 10, pp. 2024–2032. [Google Scholar]
  29. Liu, J.; Tang, J.; Wu, G. Residual Feature Distillation Network for Lightweight Image Super-Resolution. In Proceedings of the European Conference on Computer Vision AIM Workshops, Glasgow, UK, 23–28 August 2020. [Google Scholar]
  30. Wang, Z.; Li, L.; Xue, Y.; Jiang, C.; Wang, J.; Sun, K.; Ma, H. FeNet: Feature Enhancement Network for Lightweight Remote-Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
  31. Lan, R.; Sun, L.; Liu, Z.; Lu, H.; Pang, C.; Luo, X. MADNet: A Fast and Lightweight Network for Single-Image Super Resolution. IEEE Trans. Cybern. 2021, 51, 1443–1453. [Google Scholar] [CrossRef] [PubMed]
  32. Zhao, H.; Kong, X.; He, J.; Qiao, Y.; Dong, C. Efficient Image Super-Resolution Using Pixel Attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
  33. Yang, Z.; Zhu, L.; Wu, Y.; Yang, Y. Gated Channel Transformation for Visual Recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11791–11800. [Google Scholar] [CrossRef]
  34. Liu, J.; Zhang, W.; Tang, Y.; Tang, J.; Wu, G. Residual Feature Aggregation Network for Image Super-Resolution. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2356–2365. [Google Scholar] [CrossRef]
  35. Agustsson, E.; Timofte, R. NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1122–1131. [Google Scholar] [CrossRef]
  36. Yang, Y.; Newsam, S. Bag-of-Visual-Words and Spatial Extensions for Land-Use Classification. In Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, San Jose, CA, USA, 2–5 November 2010; pp. 270–279. [Google Scholar]
  37. Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi-Morel, M.L. Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding. In Proceedings of the 2012 British Machine Vision Conference, Surrey, UK, 3–7 September 2012. [Google Scholar]
  38. Yang, J.; Wright, J.; Huang, T.S.; Ma, Y. Image Super-Resolution Via Sparse Representation. IEEE Trans. Image Process. 2010, 19, 2861–2873. [Google Scholar] [CrossRef] [PubMed]
  39. Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, ICCV 2001, Montreal, BC, Canada, 7–14 July 2001; Volume 2, pp. 416–423. [Google Scholar] [CrossRef] [Green Version]
  40. Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar] [CrossRef]
  41. Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss Functions for Image Restoration with Neural Networks. IEEE Trans. Comput. Imaging 2017, 3, 47–57. [Google Scholar] [CrossRef]
  42. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  43. Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  44. Tian, C.; Zhuge, R.; Wu, Z.; Xu, Y.; Zuo, W.; Chen, C.; Lin, C.W. Lightweight image super-resolution with enhanced CNN. Knowl.-Based Syst. 2020, 205, 106235. [Google Scholar] [CrossRef]
  45. Zhang, K.; Zuo, W.; Zhang, L. Learning a Single Convolutional Super-Resolution Network for Multiple Degradations. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3262–3271. [Google Scholar] [CrossRef] [Green Version]
Figure 1. The overall network architecture of the proposed FDENet. The content in the green box represents the backward fusion module; ⊕ represents the element-wise summation.
Figure 1. The overall network architecture of the proposed FDENet. The content in the green box represents the backward fusion module; ⊕ represents the element-wise summation.
Sensors 23 03906 g001
Figure 2. Comparison between the RFDB and the FDEB. (Left), the structure of RFDB. (Right), the structure of our FDEB. © represents the feature fusion; ⊕ and ⊗ represent the element-wise summation and the element-wise summation multiplication, respectively. The green and brown boxes represent the basic feature-extraction unit and the feature-enhancement block, respectively.
Figure 2. Comparison between the RFDB and the FDEB. (Left), the structure of RFDB. (Right), the structure of our FDEB. © represents the feature fusion; ⊕ and ⊗ represent the element-wise summation and the element-wise summation multiplication, respectively. The green and brown boxes represent the basic feature-extraction unit and the feature-enhancement block, respectively.
Sensors 23 03906 g002
Figure 3. The comparison between SRB and SRAB. (Left), the structure of SRB. (Right), the structure of the SRAB; ⊕ and ⊗ represent the element-wise summation and the element-wise summation multiplication, respectively.
Figure 3. The comparison between SRB and SRAB. (Left), the structure of SRB. (Right), the structure of the SRAB; ⊕ and ⊗ represent the element-wise summation and the element-wise summation multiplication, respectively.
Sensors 23 03906 g003
Figure 4. The structure of our well-designed bilateral bands. The upper sideband is used to enhance the features, and the lower sideband is used to extract the complex background information. © represents the feature fusion; ⊕ and ⊗ represent the element-wise summation and the element-wise summation multiplication, respectively.
Figure 4. The structure of our well-designed bilateral bands. The upper sideband is used to enhance the features, and the lower sideband is used to extract the complex background information. © represents the feature fusion; ⊕ and ⊗ represent the element-wise summation and the element-wise summation multiplication, respectively.
Sensors 23 03906 g004
Figure 5. The structure of the Gaussian context transformer (GCT).
Figure 5. The structure of the Gaussian context transformer (GCT).
Sensors 23 03906 g005
Figure 6. Visualization results of several SR methods and our proposed network, FDENet, on the RS-1 dataset for ×3 SR. Zoom in with blue box for best view.
Figure 6. Visualization results of several SR methods and our proposed network, FDENet, on the RS-1 dataset for ×3 SR. Zoom in with blue box for best view.
Sensors 23 03906 g006
Figure 7. Visualization results of several SR methods and our proposed network, FDENet, on the RS-2 dataset for ×3 SR. Zoom in with blue box for best view.
Figure 7. Visualization results of several SR methods and our proposed network, FDENet, on the RS-2 dataset for ×3 SR. Zoom in with blue box for best view.
Sensors 23 03906 g007
Figure 8. Visualization results of several SR methods and our proposed network, FDENet, on natural datasets for ×3 SR. Zoom in with blue box for best view.
Figure 8. Visualization results of several SR methods and our proposed network, FDENet, on natural datasets for ×3 SR. Zoom in with blue box for best view.
Sensors 23 03906 g008
Table 1. Quantitative evaluation results of remote sensing data sets. “Params” represents the model parameter quantity; the best and second best results are red and blue, respectively. “-” indicates that no result is provided.
Table 1. Quantitative evaluation results of remote sensing data sets. “Params” represents the model parameter quantity; the best and second best results are red and blue, respectively. “-” indicates that no result is provided.
MethodScaleParamsRS-T1RS-T2
PSNR/SSIMPSNR/SSIM
Bicubic -33.25/0.893430.64/0.8837
SRCNN [9] 57 K35.18/0.924332.87/0.9209
VDSR [12] 666 K35.85/0.931233.86/0.9312
LGCNet [17]×2193 K35.65/0.929833.47/0.9281
IDN [27] 55 3K36.13/0.933934.07/0.9329
LESRCNN [44] 626 K36.04/0.932834.00/0.9320
FeNet [30] 351 K36.23/0.934134.22/0.9337
FDENet (ours) 480 K36.26/0.934634.28/0.9338
Bicubic -29.73/0.781827.23/0.7697
SRCNN [9] 57 K30.95/0.822828.59/0.8180
VDSR [12] 666 K31.55/0.835229.40/0.8391
LGCNet [17]×3193 K31.30/0.831429.03/0.8312
IDN [27] 553 K31.73/0.843029.59/0.8450
LESRCNN [44] 810 K31.68/0.839829.65/0.8444
FeNet [30] 357 K31.89/0.843229.80/0.8481
FDENet (ours) 488 K31.98/0.848829.88/0.8489
Bicubic -27.91/0.696825.40/0.6770
SRCNN [9] 57 K28.87/0.738226.46/0.7296
VDSR [12] 666 K29.33/0.754627.03/0.7525
LGCNet [17]×4193 K29.13/0.748126.76/0.7426
IDN [27] 553 K29.56/0.762327.31/0.7627
LESRCNN [44] 774 K29.62/0.762527.41/0.7646
FeNet [30] 366 K29.70/0.768827.45/0.7672
FDENet (ours) 501 K29.72/0.765827.54/0.7697
Table 2. Quantitative results of four super-resolution benchmark datasets. “Params” and “Multi-Adds” represent the model’s parameter quantity and model complexity, respectively. The best and second-best results are red and blue, respectively. “-” indicates that no result was provided.
Table 2. Quantitative results of four super-resolution benchmark datasets. “Params” and “Multi-Adds” represent the model’s parameter quantity and model complexity, respectively. The best and second-best results are red and blue, respectively. “-” indicates that no result was provided.
MethodScaleParamsMulti-AddsSet5Set14B100Urban100
PSNR/SSIMPSNR/SSIMPSNR/SSIMPSNR/SSIM
Bicubic --33.66/0.929930.24/0.868829.56/0.843126.88/0.8403
SRCNN [9] 57 K52.7 G36.66/0.954232.45/0.906731.36/0.887929.50/0.8946
VDSR [12] 666 K612.6 G37.53/0.958733.03/0.912431.90/0.896030.76/0.9140
LGCNet [17] 193 K178.1G37.31/0.958032.94/0.912031.74/0.893930.53/0.9112
SRMDNF [45] 1513 K347.7 G37.79/0.960033.32/0.915032.05/0.898031.33/0.9200
IDN [27]×2553 K124.6 G37.83/0.960033.30/0.914832.08/0.898531.27/0.9196
LESRCNN [44] 626 K281.5 G37.65/0.958633.32/0.914831.95/0.896431.45/0.9206
MADNet [32] 878 K187.1 G37.94/0.960433.46/0.916732.10/0.898831.74/0.9246
FeNet [30] 351 K77.9 G37.90/0.960233.45/0.916232.09/0.898531.75/0.9245
FDENet (ours) 480 K138.7 G37.89/0.959433.50/0.917032.15/0.898832.02/0.9270
Bicubic --30.39/0.868227.55/0.774227.21/0.738524.46/0.7349
SRCNN [9] 57 K52.7 G32.75/0.909029.30/0.821528.41/0.786326.24/0.7989
VDSR [12] 666 K612.6 G33.66/0.921329.77/0.831428.82/0.797627.14/0.8279
LGCNet [17] 193 K79.0 G33.32/0.917229.67/0.828928.63/0.792326.77/0.8180
SRMDNF [45] 1530K156.3 G34.12/0.925030.04/0.837028.97/0.803027.57/0.8400
IDN [27]×3553 K56.3 G34.11/0.925329.99/0.835428.95/0.801327.42/0.8359
LESRCNN [44] 810 K238.9 G33.93/0.923130.12/0.838028.91/0.800527.70/0.8415
MADNet [31] 930 K88.4 G34.26/0.926230.29/0.841029.04/0.803327.91/0.8464
FeNet [30] 357 K35.2 G34.21/0.925630.15/0.838328.98/0.802027.82/0.8447
FDENet (ours) 488 K61.7 G34.28/0.925330.33/0.841529.05/0.803328.03/0.8494
Bicubic --28.42/0.810426.00/0.702725.96/0.667523.14/0.6577
SRCNN [9] 57 K52.7 G30.48/0.862827.50/0.751326.90/0.710124.52/0.7221
VDSR [12] 666 K612.6 G31.35/0.883828.01/0.767427.29/0.725125.18/0.7524
LGCNet [17] 193 K44.5 G30.87/0.874627.82/0.763027.08/0.718624.82/0.7399
SRMDNF [45] 1555 K89.3 G31.96/0.893028.35/0.777027.49/0.734025.68/0.7730
IDN [27]×4553 K32.3 G31.82/0.890328.25/0.773027.41/0.729725.41/0.7632
LESRCNN [44] 774 K241.6 G31.88/0.890328.44/0.777227.45/0.731325.77/0.7732
MADNet [31] 1002 K54.1 G32.11/0.893928.52/0.779927.52/0.734025.89/0.7782
FeNet [30] 366 K20.4 G32.02/0.891928.38/0.776427.47/0.731925.75/0.7747
FDENet (ours) 501 K35.9 G32.12/0.892928.52/0.779527.53/0.733925.97/0.7811
Table 3. Results of our model on four test sets after using SRAB and SRB, respectively.
Table 3. Results of our model on four test sets after using SRAB and SRB, respectively.
MethodParamsRS-T1RS-T2BSD100Urban100
With SRB501K29.68/0.767527.52/0.769727.52/0.734125.94/0.7815
with SRAB501K29.72/0.765827.54/0.769727.53/0.733925.97/0.7811
Notes: The best results are indicated in bold font.
Table 4. Results of our model on four test sets after using ESA or BFEM.
Table 4. Results of our model on four test sets after using ESA or BFEM.
MethodParamsRS-T1RS-T2BSD100Urban100
With ESA463K29.70/0.765627.55/0.769227.51/0.733525.88/0.7779
with BFEM501K29.72/0.765827.54/0.769727.53/0.733925.97/0.7811
Notes: The best results are indicated in bold font.
Table 5. Results of our model on four test sets based on whether or not SFM is used.
Table 5. Results of our model on four test sets based on whether or not SFM is used.
MethodParamsRS-T1RS-T2BSD100Urban100
w/o SFM492K29.69/0.765227.53/0.769127.52/0.733525.88/0.7796
w/ SFM501K29.72/0.765827.54/0.769727.53/0.733925.97/0.7811
Notes: The best results are indicated in bold font.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, F.; Li, L.; Wang, J.; Sun, K.; Lv, M.; Jia, Z.; Ma, H. A Lightweight Feature Distillation and Enhancement Network for Super-Resolution Remote Sensing Images. Sensors 2023, 23, 3906. https://doi.org/10.3390/s23083906

AMA Style

Gao F, Li L, Wang J, Sun K, Lv M, Jia Z, Ma H. A Lightweight Feature Distillation and Enhancement Network for Super-Resolution Remote Sensing Images. Sensors. 2023; 23(8):3906. https://doi.org/10.3390/s23083906

Chicago/Turabian Style

Gao, Feng, Liangliang Li, Jiawen Wang, Kaipeng Sun, Ming Lv, Zhenhong Jia, and Hongbing Ma. 2023. "A Lightweight Feature Distillation and Enhancement Network for Super-Resolution Remote Sensing Images" Sensors 23, no. 8: 3906. https://doi.org/10.3390/s23083906

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop