iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/s00371-023-02920-z
MultiCut-MultiMix: a two-level data augmentation method for detecting small and densely distributed objects in large-size images | The Visual Computer Skip to main content
Log in

MultiCut-MultiMix: a two-level data augmentation method for detecting small and densely distributed objects in large-size images

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Detecting small and densely distributed (SDD) objects in large-size images is quite challenging due to the facts that directly inputting such images to the detection network would result in severe geometric deformation and the overcrowded objects would lead to unclear feature expression. In this paper, we propose a two-level data augmentation method, referred to as MultiCut-MultiMix, to solve the problems, in which MultiCut is developed to avoid feature distortion at the physical level and MultiMix is designed to enrich background at the pixel level. Specifically, according to the images size required by the detection network, MultiCut, by appropriately tuning two introduced parameters, cuts large-size images into a series of image chips that are suitable for training, and meanwhile ensuring that the objects at the cutting edge would not lose. Furthermore, to strengthen feature information in these obtained image chips, MultiMix fuses different chips into new ones, in which the chip with most SDD objects will be remained as the major information and the others as the background. The fused chips from MultiMix, together with the original chips from MultiCut, then serve as the new data to train the detection network, by which the dataset is enlarged and thus overfitting can be effectively avoided. Extensive ablation experiments show that, compared with existing approaches, our method usually significantly assists detection networks to identify SDD objects in large-size images. For example, based on the bacterial dataset, 14.69% improvement in the mean average precision is achieved over the classical CutMix on Darknet53.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

Three datasets are used in this paper, including bacterial dataset, Airbus Ship dataset, and UCAS-AOD dataset, which can be obtained in the following ways: (i) The bacterial dataset generated during and/or analyzed during the current study is not publicly available due to the privacy protection but is available from the corresponding author on reasonable request. (ii) The Airbus Ship dataset generated during and/or analyzed during the current study is available in the Kaggle repository, [https://www.kaggle.com/datasets/mikaelstrauhs/airbus-ship-detection-train-set-70]. (iii) The UCAS-AOD dataset generated during and/or analyzed during the current study is available in the OpenDataLab repository, [https://opendatalab.com/UCAS-AOD/download].

Notes

  1. It is noted that we will take the bacterial dataset as the example to show the introduced operations, such as cutting and fusing, throughout this paper.

References

  1. Lorenz, K.S., Serrano, F., Salama, P., Delp, E.J.: Segmentation and registration based analysis of microscopy images. In: Proceedings of International Conference on Image Processing (ICIP), Cairo, Egypt, pp. 4213–4216 (2009)

  2. Rohith, G., Kumar, L.S.: Paradigm shifts in super-resolution techniques for remote sensing applications. Vis. Comput. 37(7), 1965–2008 (2021)

    Article  Google Scholar 

  3. Hua, W., Wang, R., Zeng, X., Tang, Y., Wang, H., Bao, H.: Compressing repeated content within large-scale remote sensing images. Vis. Comput. 28(6), 755–764 (2012)

    Article  Google Scholar 

  4. Shawky, O.A., Hagag, A., El-Dahshan, E.-S.A., Ismail, M.A.: Remote sensing image scene classification using CNN-MLP with data augmentation. Optik 221, 165356 (2020)

    Article  Google Scholar 

  5. Wu, M., Jin, X., Jiang, Q., Lee, S.-J., Liang, W., Lin, G., Yao, S.: Remote sensing image colorization using symmetrical multi-scale DCGAN in YUV color space. Vis. Comput. 37(7), 1707–1729 (2021)

    Article  Google Scholar 

  6. Sadgal, M., El Fazziki, A., Ait Ouahman, A.: Aerial image processing and object recognition. Vis. Comput. 21(1), 118–123 (2005)

    Article  Google Scholar 

  7. Lu, A.X., Kraus, O.Z., Cooper, S., Moses, A.M.: Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting. PLoS Comput. Biol. 15(9), 1007348 (2019)

    Article  Google Scholar 

  8. Cheng, G., Han, J.: A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote. Sens. 117, 11–28 (2016)

    Article  Google Scholar 

  9. Aftab, U., Siddiqui, G.F.: Big data augmentation with data warehouse: a survey. In: Proceedings of IEEE Big Data, Seattle, WA, USA, pp. 2775–2784 (2018)

  10. Shin, H., Lee, K., Lee, C.: Data augmentation method of object detection for deep learning in maritime image. In: Proceedings of IEEE BigComp, Busan, Korea (South), pp. 463–466 (2020)

  11. Ametefe, D.S., Sarnin, S.S., Ali, D.M., Muhammad, Z.Z.: Fingerprint pattern classification using deep transfer learning and data augmentation. Vis. Comput. (2022)

  12. Ben Fredj, H., Bouguezzi, S., Souani, C.: Face recognition in unconstrained environment with CNN. Vis. Comput. 37(2), 217–226 (2021)

    Article  Google Scholar 

  13. Antoniou, A., Storkey, A., Edwards, H.: Data Augmentation Generative Adversarial Networks. arXiv e-prints, 1711–04340 (2017) arXiv:1711.04340 [stat.ML]

  14. Zoph, B., Cubuk, E.D., Ghiasi, G., Lin, T.-Y., Shlens, J., Le, Q.V.: Learning Data Augmentation Strategies for Object Detection. arXiv e-prints, 1906–11172 (2019) arXiv:1906.11172 [cs.CV]

  15. Zhou, F., Hu, Y., Shen, X.: MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition. Vis. Comput. 35(11), 1583–1594 (2019)

    Article  Google Scholar 

  16. Li, N., Ai, H.: EfiLoc: large-scale visual indoor localization with efficient correlation between sparse features and 3D points. Vis. Comput. 38(6), 2091–2106 (2022)

    Article  Google Scholar 

  17. Khan, M.J., Khan, M.J., Siddiqui, A.M., Khurshid, K.: An automated and efficient convolutional architecture for disguise-invariant face recognition using noise-based data augmentation and deep transfer learning. Vis. Comput. 38(2), 509–523 (2022)

    Article  Google Scholar 

  18. Asad, M., Yang, J., He, J., Shamsolmoali, P., He, X.: Multi-frame feature-fusion-based model for violence detection. Vis. Comput. 37(6), 1415–1431 (2021)

    Article  Google Scholar 

  19. Bang, S., Baek, F., Park, S., Kim, W., Kim, H.: Image augmentation to improve construction resource detection using generative adversarial networks, cut-and-paste, and image transformation techniques. Autom. Constr. 115, 103198 (2020)

    Article  Google Scholar 

  20. Xi, Y., Zheng, J., Li, X., Xu, X., Ren, J., Xie, G.: SR-POD: sample rotation based on principal-axis orientation distribution for data augmentation in deep object detection. Cogn. Syst. Res. 52, 144–154 (2018)

    Article  Google Scholar 

  21. Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: Fast and flexible image augmentations. Information 11(2), 125 (2020)

    Article  Google Scholar 

  22. Van Etten, A.: You only look twice: rapid multi-scale object detection in satellite imagery. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake, USA (2018) [cs.CV]

  23. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  Google Scholar 

  24. Yu, X., Zhao, Y., Gao, Y., Xiong, S.: Maskcov: a random mask covariance network for ultra-fine-grained visual categorization. Pattern Recogn. 119, 108067 (2021)

    Article  Google Scholar 

  25. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: YOLOv4: optimal speed and accuracy of object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA (2020)

  26. DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)

  27. Yun, S., Han, D., Chun, S., Oh, S.J., Yoo, Y., Choe, J.: Cutmix: Regularization strategy to train strong classifiers with localizable features. In: Proceedings of IEEE Conference on International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 6022–6031 (2019)

  28. Yoo, J., Ahn, N., Sohn, K.-A.: Rethinking data augmentation for image super-resolution: a comprehensive analysis and a new strategy. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 8372–8381 (2020)

  29. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond Empirical Risk Minimization. In: Proceedings of 6th International Conference on Learning Representations (ICLR), Vancouver, Canada (2018) [cs.LG]

  30. Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake, USA (2018)

  31. Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 6517–6525 (2017)

  32. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 5987–5995 (2017)

  33. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 2261–2269 (2017)

  34. Girshick, R.: Fast R-CNN. In: Proceedings of IEEE Conference on International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 1440–1448 (2015)

  35. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  36. Sun, X., Wu, P., Hoi, S.C.H.: Face detection using deep learning: an improved faster RCNN approach. Neurocomputing 299, 42–50 (2018)

    Article  Google Scholar 

  37. Wei, B., Hao, K., Gao, L., Tang, X.-S.: Detecting textile micro-defects: a novel and efficient method based on visual gain mechanism. Inf. Sci. 541, 60–74 (2020)

    Article  MathSciNet  Google Scholar 

  38. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)

  39. Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: Proceedings of the 2016 ACM Multimedia Conference (ACM MM), Amsterdam, United kingdom, pp. 516–520 (2016)

  40. Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 658–666 (2019)

  41. Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-iou loss: Faster and better learning for bounding box regression. In: Proceedings of 34th AAAI Conference on Articial Intelligence (AAAI), New York, USA (2020)

  42. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 2818–2826 (2016)

  43. Misra, D.: Mish: A Self Regularized Non-Monotonic Activation Function. arXiv e-prints, 1908–08681 (2019) arXiv:1908.08681 [cs.LG]

  44. Neubeck, A., Van Gool, L.: Efficient non-maximum suppression. In: Proceedings of IEEE International Conference on Pattern Recognition (ICPR), Hong Kong, China, vol. 3, pp. 850–855 (2006)

  45. Zhu, H., Chen, X., Dai, W., Fu, K., Ye, Q., Jiao, J.: Orientation robust object detection in aerial images using deep convolutional neural network. In: Proceedings of IEEE International Conference on Image Processing (ICIP), Quebec, QC, Canada, pp. 3735–3739 (2015)

Download references

Acknowledgements

This work is partially supported by the National Key R &D Program of China under Grant 2022YFC3301704, Cooperation Project of Industry, Education, and Research of Zhuhai under Grant ZH22017001210089PWC, NSFC under Grant 61772220, Special projects for technological innovation in Hubei Province under Grant 2018ACA135, and Key R &D Plan of Hubei Province under Grant 2020BAB027.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhimeng Xin.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xin, Z., Lu, T., Li, Y. et al. MultiCut-MultiMix: a two-level data augmentation method for detecting small and densely distributed objects in large-size images. Vis Comput 40, 2347–2361 (2024). https://doi.org/10.1007/s00371-023-02920-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02920-z

Keywords

Navigation