MultiCut-MultiMix: a two-level data augmentation method for detecting small and densely distributed objects in large-size images

Xin, Zhimeng; Lu, Tongwei; Li, Yuzhou; You, Xinge

doi:10.1007/s00371-023-02920-z

MultiCut-MultiMix: a two-level data augmentation method for detecting small and densely distributed objects in large-size images

Original article
Published: 19 June 2023

Volume 40, pages 2347–2361, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Zhimeng Xin¹,
Tongwei Lu²,
Yuzhou Li³ &
…
Xinge You³

269 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Detecting small and densely distributed (SDD) objects in large-size images is quite challenging due to the facts that directly inputting such images to the detection network would result in severe geometric deformation and the overcrowded objects would lead to unclear feature expression. In this paper, we propose a two-level data augmentation method, referred to as MultiCut-MultiMix, to solve the problems, in which MultiCut is developed to avoid feature distortion at the physical level and MultiMix is designed to enrich background at the pixel level. Specifically, according to the images size required by the detection network, MultiCut, by appropriately tuning two introduced parameters, cuts large-size images into a series of image chips that are suitable for training, and meanwhile ensuring that the objects at the cutting edge would not lose. Furthermore, to strengthen feature information in these obtained image chips, MultiMix fuses different chips into new ones, in which the chip with most SDD objects will be remained as the major information and the others as the background. The fused chips from MultiMix, together with the original chips from MultiCut, then serve as the new data to train the detection network, by which the dataset is enlarged and thus overfitting can be effectively avoided. Extensive ablation experiments show that, compared with existing approaches, our method usually significantly assists detection networks to identify SDD objects in large-size images. For example, based on the bacterial dataset, 14.69% improvement in the mean average precision is achieved over the classical CutMix on Darknet53.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SDGC-YOLOv5: A More Accurate Model for Small Object Detection

MSF-YOLO: A multi-scale features fusion-based method for small object detection

Article 06 January 2024

CDYL for infrared and visible light image dense small object detection

Article Open access 12 February 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Three datasets are used in this paper, including bacterial dataset, Airbus Ship dataset, and UCAS-AOD dataset, which can be obtained in the following ways: (i) The bacterial dataset generated during and/or analyzed during the current study is not publicly available due to the privacy protection but is available from the corresponding author on reasonable request. (ii) The Airbus Ship dataset generated during and/or analyzed during the current study is available in the Kaggle repository, [https://www.kaggle.com/datasets/mikaelstrauhs/airbus-ship-detection-train-set-70]. (iii) The UCAS-AOD dataset generated during and/or analyzed during the current study is available in the OpenDataLab repository, [https://opendatalab.com/UCAS-AOD/download].

Notes

It is noted that we will take the bacterial dataset as the example to show the introduced operations, such as cutting and fusing, throughout this paper.

References

Lorenz, K.S., Serrano, F., Salama, P., Delp, E.J.: Segmentation and registration based analysis of microscopy images. In: Proceedings of International Conference on Image Processing (ICIP), Cairo, Egypt, pp. 4213–4216 (2009)
Rohith, G., Kumar, L.S.: Paradigm shifts in super-resolution techniques for remote sensing applications. Vis. Comput. 37(7), 1965–2008 (2021)
Article Google Scholar
Hua, W., Wang, R., Zeng, X., Tang, Y., Wang, H., Bao, H.: Compressing repeated content within large-scale remote sensing images. Vis. Comput. 28(6), 755–764 (2012)
Article Google Scholar
Shawky, O.A., Hagag, A., El-Dahshan, E.-S.A., Ismail, M.A.: Remote sensing image scene classification using CNN-MLP with data augmentation. Optik 221, 165356 (2020)
Article Google Scholar
Wu, M., Jin, X., Jiang, Q., Lee, S.-J., Liang, W., Lin, G., Yao, S.: Remote sensing image colorization using symmetrical multi-scale DCGAN in YUV color space. Vis. Comput. 37(7), 1707–1729 (2021)
Article Google Scholar
Sadgal, M., El Fazziki, A., Ait Ouahman, A.: Aerial image processing and object recognition. Vis. Comput. 21(1), 118–123 (2005)
Article Google Scholar
Lu, A.X., Kraus, O.Z., Cooper, S., Moses, A.M.: Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting. PLoS Comput. Biol. 15(9), 1007348 (2019)
Article Google Scholar
Cheng, G., Han, J.: A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote. Sens. 117, 11–28 (2016)
Article Google Scholar
Aftab, U., Siddiqui, G.F.: Big data augmentation with data warehouse: a survey. In: Proceedings of IEEE Big Data, Seattle, WA, USA, pp. 2775–2784 (2018)
Shin, H., Lee, K., Lee, C.: Data augmentation method of object detection for deep learning in maritime image. In: Proceedings of IEEE BigComp, Busan, Korea (South), pp. 463–466 (2020)
Ametefe, D.S., Sarnin, S.S., Ali, D.M., Muhammad, Z.Z.: Fingerprint pattern classification using deep transfer learning and data augmentation. Vis. Comput. (2022)
Ben Fredj, H., Bouguezzi, S., Souani, C.: Face recognition in unconstrained environment with CNN. Vis. Comput. 37(2), 217–226 (2021)
Article Google Scholar
Antoniou, A., Storkey, A., Edwards, H.: Data Augmentation Generative Adversarial Networks. arXiv e-prints, 1711–04340 (2017) arXiv:1711.04340 [stat.ML]
Zoph, B., Cubuk, E.D., Ghiasi, G., Lin, T.-Y., Shlens, J., Le, Q.V.: Learning Data Augmentation Strategies for Object Detection. arXiv e-prints, 1906–11172 (2019) arXiv:1906.11172 [cs.CV]
Zhou, F., Hu, Y., Shen, X.: MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition. Vis. Comput. 35(11), 1583–1594 (2019)
Article Google Scholar
Li, N., Ai, H.: EfiLoc: large-scale visual indoor localization with efficient correlation between sparse features and 3D points. Vis. Comput. 38(6), 2091–2106 (2022)
Article Google Scholar
Khan, M.J., Khan, M.J., Siddiqui, A.M., Khurshid, K.: An automated and efficient convolutional architecture for disguise-invariant face recognition using noise-based data augmentation and deep transfer learning. Vis. Comput. 38(2), 509–523 (2022)
Article Google Scholar
Asad, M., Yang, J., He, J., Shamsolmoali, P., He, X.: Multi-frame feature-fusion-based model for violence detection. Vis. Comput. 37(6), 1415–1431 (2021)
Article Google Scholar
Bang, S., Baek, F., Park, S., Kim, W., Kim, H.: Image augmentation to improve construction resource detection using generative adversarial networks, cut-and-paste, and image transformation techniques. Autom. Constr. 115, 103198 (2020)
Article Google Scholar
Xi, Y., Zheng, J., Li, X., Xu, X., Ren, J., Xie, G.: SR-POD: sample rotation based on principal-axis orientation distribution for data augmentation in deep object detection. Cogn. Syst. Res. 52, 144–154 (2018)
Article Google Scholar
Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: Fast and flexible image augmentations. Information 11(2), 125 (2020)
Article Google Scholar
Van Etten, A.: You only look twice: rapid multi-scale object detection in satellite imagery. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake, USA (2018) [cs.CV]
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet Google Scholar
Yu, X., Zhao, Y., Gao, Y., Xiong, S.: Maskcov: a random mask covariance network for ultra-fine-grained visual categorization. Pattern Recogn. 119, 108067 (2021)
Article Google Scholar
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: YOLOv4: optimal speed and accuracy of object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA (2020)
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)
Yun, S., Han, D., Chun, S., Oh, S.J., Yoo, Y., Choe, J.: Cutmix: Regularization strategy to train strong classifiers with localizable features. In: Proceedings of IEEE Conference on International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 6022–6031 (2019)
Yoo, J., Ahn, N., Sohn, K.-A.: Rethinking data augmentation for image super-resolution: a comprehensive analysis and a new strategy. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 8372–8381 (2020)
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond Empirical Risk Minimization. In: Proceedings of 6th International Conference on Learning Representations (ICLR), Vancouver, Canada (2018) [cs.LG]
Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake, USA (2018)
Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 6517–6525 (2017)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 5987–5995 (2017)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 2261–2269 (2017)
Girshick, R.: Fast R-CNN. In: Proceedings of IEEE Conference on International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Sun, X., Wu, P., Hoi, S.C.H.: Face detection using deep learning: an improved faster RCNN approach. Neurocomputing 299, 42–50 (2018)
Article Google Scholar
Wei, B., Hao, K., Gao, L., Tang, X.-S.: Detecting textile micro-defects: a novel and efficient method based on visual gain mechanism. Inf. Sci. 541, 60–74 (2020)
Article MathSciNet Google Scholar
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)
Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: Proceedings of the 2016 ACM Multimedia Conference (ACM MM), Amsterdam, United kingdom, pp. 516–520 (2016)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 658–666 (2019)
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-iou loss: Faster and better learning for bounding box regression. In: Proceedings of 34th AAAI Conference on Articial Intelligence (AAAI), New York, USA (2020)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 2818–2826 (2016)
Misra, D.: Mish: A Self Regularized Non-Monotonic Activation Function. arXiv e-prints, 1908–08681 (2019) arXiv:1908.08681 [cs.LG]
Neubeck, A., Van Gool, L.: Efficient non-maximum suppression. In: Proceedings of IEEE International Conference on Pattern Recognition (ICPR), Hong Kong, China, vol. 3, pp. 850–855 (2006)
Zhu, H., Chen, X., Dai, W., Fu, K., Ye, Q., Jiao, J.: Orientation robust object detection in aerial images using deep convolutional neural network. In: Proceedings of IEEE International Conference on Image Processing (ICIP), Quebec, QC, Canada, pp. 3735–3739 (2015)

Download references

Acknowledgements

This work is partially supported by the National Key R &D Program of China under Grant 2022YFC3301704, Cooperation Project of Industry, Education, and Research of Zhuhai under Grant ZH22017001210089PWC, NSFC under Grant 61772220, Special projects for technological innovation in Hubei Province under Grant 2018ACA135, and Key R &D Plan of Hubei Province under Grant 2020BAB027.

Author information

Authors and Affiliations

School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China
Zhimeng Xin
School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan, 430205, China
Tongwei Lu
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, 430074, China
Yuzhou Li & Xinge You

Authors

Zhimeng Xin
View author publications
You can also search for this author in PubMed Google Scholar
Tongwei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yuzhou Li
View author publications
You can also search for this author in PubMed Google Scholar
Xinge You
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhimeng Xin.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xin, Z., Lu, T., Li, Y. et al. MultiCut-MultiMix: a two-level data augmentation method for detecting small and densely distributed objects in large-size images. Vis Comput 40, 2347–2361 (2024). https://doi.org/10.1007/s00371-023-02920-z

Download citation

Accepted: 28 May 2023
Published: 19 June 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00371-023-02920-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MultiCut-MultiMix: a two-level data augmentation method for detecting small and densely distributed objects in large-size images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SDGC-YOLOv5: A More Accurate Model for Small Object Detection

MSF-YOLO: A multi-scale features fusion-based method for small object detection

CDYL for infrared and visible light image dense small object detection

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

MultiCut-MultiMix: a two-level data augmentation method for detecting small and densely distributed objects in large-size images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SDGC-YOLOv5: A More Accurate Model for Small Object Detection

MSF-YOLO: A multi-scale features fusion-based method for small object detection

CDYL for infrared and visible light image dense small object detection

Explore related subjects

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation