Abstract
Recently, significant progress has been made in the research field of unmanned aerial vehicle (UAV) object detection through deep learning. The proliferation of unmanned aerial vehicles has notably facilitated the acquisition of corresponding data. However, the presence of substantial rotated objects in various orientations within UAV data sets poses challenges for traditional horizontal box object detection methods. These conventional approaches struggle to precisely locate rotated objects. Consequently, algorithms for rotated bounding-box object detection have been proposed; however, some of these existing methods exhibit issues, including periodicity of angle and exchangeability of edges. We propose a joint key point representation and rotated distance loss object detection network to solve the above problems. It is mainly composed of the key point representation module and the rotated distance-IoU loss. The key-points representation is used to indirectly represent the angle parameter of the rotated bounding box. It accomplishes this by measuring the angle between the line connecting the center point of the rotated bounding box to a specific boundary center point and the horizontal line. Next, the coordinates of the center points of anchor and the center points of its boundary are used to obtain the height dimension of the rotated bounding box and the width dimension of a rotated bounding box is introduced. Like this, the rotated bounding box can be represented by two points and a width dimension. Also, based on the traditional rotated IoU loss which does not incorporate the distance between the center point of the prediction box and the center point of ground truth in the regression process, the rotated distance-IoU loss is proposed to replace the traditional rotated IoU loss, which speeds up the convergence of the network. We have conducted extensive experiments on the DOTA data set and the DroneVehicle data set and have demonstrated the effectiveness of the proposed method.
Similar content being viewed by others
Data availibility statement
These data were derived from the following resources available in the public domain: DOTA v1.0 and DroneVehicle: https://captain-whu.github.io/DOTA/dataset.htmlhttps://github.com/VisDrone/DroneVehicle
References
Feng, J., Yi, C.: Lightweight Detection Network for Arbitrary-Oriented Vehicles in UAV Imagery via Global Attentive Relation and Multi-Path Fusion. Drones. 6, 108 (2022)
Taheri Tajar, A., Ramazani, A., Mansoorizadeh, M.: A lightweight Tiny-YOLOv3 vehicle detection approach. J. Real-Time Image Proc. 18, 2389–2401 (2021). https://doi.org/10.1007/s11554-021-01131-w
Zerrouk, I., Moumen, Y., Khiati, W.: Evolutionary algorithm for optimized CNN architecture search applied to real-time boat detection in aerial images. J. Real-Time Image Proc. 20, 78 (2023). https://doi.org/10.1007/s11554-023-01332-5
Zeng, T., Fang, J., Yin, C., Li, Y., Fu, W., Zhang, H., Wang, J., Zhang, X.: Recognition of Rubber Tree Powdery Mildew Based on UAV Remote Sensing with Different Spatial Resolutions. Drones. 7, 533 (2023)
Wang, S., Zhao, J., Ta, N., et al.: A real-time deep learning forest fire monitoring algorithm based on an improved Pruned + KD model. J. Real-Time Image Proc. 18, 2319–2329 (2021). https://doi.org/10.1007/s11554-021-01124-9
Marx, A., Chou, Y.-H., Mercy, K., Windisch, R.: A lightweight, robust exploitation system for temporal Stacks of UAS data: use case for forward-deployed military or emergency responders. Drones. 3, 29 (2019)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 779-788, (2016)
Redmon, J., Farhadi, A.: "YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 6517-6525, (2017)
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Zhai, S., Shang, D., Wang, S., Dong, S.: DF-SSD: An improved SSD object detection algorithm based on DenseNet and feature fusion. IEEE Access 8, 24344–24357 (2020)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, pp. 580-587, (2014)
Girshick, R.: "Fast R-CNN," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 1440-1448, (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Patt. Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Wang, C. Y., Bochkovskiy, A., Liao, H. Y. M.: "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464-7475 (2023)
Talaat, F.M., ZainEldin, H.: An improved fire detection approach based on YOLO-v8 for smart cities. Neural Comput. Appl. 35, 20939–20954 (2023)
Tian, Z., Shen, C., Chen, H., He, T.: "FCOS: Fully Convolutional One-Stage Object Detection," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 9626-9635, (2019)
Xu, S., Wang, X., Lv, W., et al.: PP-YOLOE: An evolved version of YOLO. arXiv preprint arXiv:2203.16250 (2022)
Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: "RepPoints: Point Set Representation for Object Detection," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 9656-9665, (2019)
Yang, X., Zhou, Y., Zhang, G., et al.: The KFIoU loss for rotated object detection. arXiv preprint arXiv:2201.12558 (2022)
Han, J., Ding, J., Li, J., et al.: Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2021)
Li, W., Chen, Y., Hu, K., Zhu, J.: "Oriented RepPoints for Aerial Object Detection," 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp. 1819-1828, (2022)
Lyu, C., Zhang, W., Huang, H., et al.: RTMDet: An Empirical Study of Designing Real-Time Object Detectors. arXiv preprint arXiv:2212.07784 (2022)
Yang, X., Yan, J., Feng, Z., et al.: R3det: Refined single-stage detector with feature refinement for rotating object. Proceed. AAAI Conf Artif. Intell. 35(4), 3163–3171 (2021)
Xu, Y., Fu, M., Wang, Q., et al.: Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Patt. Analys. Mach. Intell. 43(4), 1452–1459 (2020)
Yang, X., Yan, J.: Arbitrary-oriented object detection with circular smooth label. Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part VIII 16. Springer International Publishing, 677-694 (2020)
Yang, X., Yan, J., Ming, Q., et al.: Rethinking rotated object detection with gaussian wasserstein distance loss. International conference on machine learning. PMLR, 11830-11841 (2021)
Yang, X., Yang, X., Yang, J., et al.: Learning high-precision bounding box for rotated object detection via kullback-leibler divergence. Adv. Neural. Inf. Process. Syst. 34, 18381–18394 (2021)
Ge, Z., Liu, S., Wang, F., et al.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Li, X., Lv, C., Wang, W., et al.: Generalized focal loss: Towards efficient representation learning for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3139–3153 (2022)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: "Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 658-666, (2019)
Yang, X., Yan, J., Liao, W., et al.: Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. Trans. Patt. Anal. Mach. Intell. 45(2), 2384–2399 (2022)
Lin, T. -Y., Goyal, P., Girshick, R., He, K., Dollár, P.: "Focal Loss for Dense Object Detection," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp. 2999-3007, (2017)
Zheng, Z., Wang, P., Liu, W., et al.: Distance-IoU loss: Faster and better learning for bounding box regression. Proceed. AAAI Conf. Artif. Intell. 34(07), 12993–13000 (2020)
Zhou, D., et al.: "IoU Loss for 2D/3D Object Detection," 2019 International Conference on 3D Vision (3DV), Quebec City, QC, Canada, pp. 85-94, (2019)
Xia, G. -S., et al.: "DOTA: A Large-Scale Dataset for Object Detection in Aerial Images," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 3974-3983, (2018)
Sun, Y., Cao, B., Zhu, P., et al.: Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Trans. Circuits Syst. Video Technol. 32(10), 6700–6713 (2022)
Xie, X., Cheng, G., Wang, J., et al.: "Oriented R-CNN for object detection," Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 3520-3529, (2021)
Acknowledgements
This study was funded by Guangdong Basic and Applied Basic Research Foundation (No. 2021A1515011576), Guangdong Science and Technology Planning Project (No. 2021A0505030080, No. 2021A0505060011), Guangdong Higher Education Innovation and Strengthening School Project (No. 2020ZDZX3031, No. 2022ZDZX1032, No. 2023ZDZX1029), Wuyi University Hong Kong and Macao Joint Research and Development Fund (No. 2022WGALH19), Guangdong Jiangmen Science and Technology Research Project (No. 2220002000246, No. 2023760300070008390), Guangdong Science and Technology Innovation Strategy Special Fund (pdjh2022b0528, pdjh2024a374).
Author information
Authors and Affiliations
Contributions
Methodology, H.Z. and Y.H.; investigation, Y.Z. and H.Z.; data curation, J.Z. and F.D.; validation, Y.X.; writing-original draft preparation, Y.H. and J.Z.; writing-review and editing, Y.Z. and Y.X. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, H., Huang, Y., Xu, Y. et al. Unmanned aerial vehicle (UAV) object detection algorithm based on keypoints representation and rotated distance-IoU loss. J Real-Time Image Proc 21, 58 (2024). https://doi.org/10.1007/s11554-024-01444-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-024-01444-6