Joint Semantic Segmentation and Object Detection Based on Relational Mask R-CNN

Zhang, Yanni; Xu, Hui; Fan, Jingxuan; Qi, Miao; Liu, Tao; Wang, Jianzhong

doi:10.1007/978-3-031-13870-6_43

Yanni Zhang¹³,
Hui Xu¹³,
Jingxuan Fan¹³,
Miao Qi^13,14,
Tao Liu¹⁴ &
…
Jianzhong Wang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13393))

Included in the following conference series:

International Conference on Intelligent Computing

Abstract

As fundamental and important problems in computer vision field, semantic segmentation and object detection have made a series of breakthroughs in recent years. Although the existing semantic segmentation and object detection methods have achieved impressive performance in some detection benchmarks, they only focus on local information near the region of objects. However, an image usually contains rich semantic information, including scene context information and dependency information between objects. As a result, ignoring this semantic information will inevitably deteriorate their performance. In this paper, we propose a novel network named joint semantic segmentation and object detection based on relational Mask R-CNN (RM-RCNN) to solve above limitations. By designing the object dependence calculation module (DCM), we can model the relationship information between objects by their geometric and appearance features, so as to improve the accuracy of semantic segmentation and object detection. At the same time, we also design a cross-scale information transmission module (CSITM), which can make the features of different levels transmit information to each other. By using CSITM, our method can effectively retain the useful information and discard the useless information to further improve its performance. Experiments on two benchmark datasets demonstrate the effectiveness of our proposed network.

Supported by Fund of the Jilin Provincial Science and Technology Department (20210101187JC), and the Fundamental Research Funds for the Central Universities (2412020FZ029).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cross-scale Dynamic Relation Network for Object Detection

Towards Unified Object Detection and Semantic Segmentation

BG-Net: boundary-guidance network for object consistency maintaining in semantic segmentation

Article 15 February 2023

References

Goodfellow, I.J., Bengio, Y., Courville, A.C.: Deep learning. In: Adaptive Computation and Machine Learning. MIT Press (2016), http://www.deeplearningbook.org
Long, J., et al.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Badrinarayanan, V., et al.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Chen, L.C., Papandreou, G., et al.: Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. Adv. Neural Inf. Proces. Syst. 24, 1–9 (2011)
Google Scholar
He, K., Gkioxari, G., et al.: Mask R-CNN. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)
Google Scholar
Mottaghi, R., et al.: The role of context for object detection and semantic segmentation in the wild. In: Computer Vision and Pattern Recognition (2014)
Google Scholar
Yu, S., et al.: Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection. arXiv preprint arXiv:2203.05787 (2022)
Chen, Q., et al.: You only look one-level feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13039–13048 (2021)
Google Scholar
Sun, P., et al.: Sparse R-CNN: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)
Google Scholar
Wang, J., et al.: End-to-end object detection with fully convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15849–15858 (2021)
Google Scholar
Li, W., et al.: SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection. arXiv preprint arXiv:2203.06398 (2022)
Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: IEEE International Conference on Computer Vision, vol. 2, p. 273. IEEE Computer Society (2003)
Google Scholar
Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Zeng, X., Ouyang, W., Yang, B., Yan, J., Wang, X.: Gated bi-directional CNN for object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9911, pp. 354–369. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_22
Chapter Google Scholar
Shrivastava, A., Gupta, A.: Contextual priming and feedback for faster R-CNN. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9905, pp. 330–348. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_20
Chapter Google Scholar
Chen, X., Gupta, A.: Spatial memory for context reasoning in object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Yong, L., Wang, R., Shan, S., Chen, X.: Structure inference net: Object detection using scene-level context and instance-level relationships. In: IEEE (2018)
Google Scholar
Zhang, Y., Kong, J., Qi, M., Liu, Y., Lu, Y.: Object detection based on multiple information fusion net. Appl. Sci. 10(1), 418 (2020)
Article Google Scholar
Zhao, H., et al.: Pyramid scene parsing network. In: IEEE Computer Society (2016)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention — MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Pinheiro, P., Collobert, R., Dollar, P.: Learning to segments objects candidates. Adv. Neural Inf. Proces. Syst. 28, 1–9 (2015)
Google Scholar
Dai, J., He, K., Li, Y., Ren, S., Sun, J.: Instance-sensitive fully convolutional networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9910, pp. 534–549. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_32
Chapter Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 1–9 (2015)
Google Scholar
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
Article Google Scholar
Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Article Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Redmon, J., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Hochreiter, S., et al.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Lin, T.-Y., et al.: Microsoft coco: Common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Everingham, M., et al.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Article Google Scholar
Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. IEEE (2016)
Google Scholar
Li, Y., et al.: Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2359–2367 (2017)
Google Scholar
Bolya, D., et al.: Real-time instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9157–9166 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
Yanni Zhang, Hui Xu, Jingxuan Fan, Miao Qi & Jianzhong Wang
Changchun Humanities and Sciences College, Changchun, 130117, China
Miao Qi & Tao Liu

Authors

Yanni Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jingxuan Fan
View author publications
You can also search for this author in PubMed Google Scholar
Miao Qi
View author publications
You can also search for this author in PubMed Google Scholar
Tao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianzhong Wang .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Xi'an Polytechnic University, Xi'an, China
Junfeng Jing
The University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne
Polytecnic of Bari, Bari, Italy
Vitoantonio Bevilacqua
Liverpool John Moores University, Liverpool, UK
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Xu, H., Fan, J., Qi, M., Liu, T., Wang, J. (2022). Joint Semantic Segmentation and Object Detection Based on Relational Mask R-CNN. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2022. Lecture Notes in Computer Science, vol 13393. Springer, Cham. https://doi.org/10.1007/978-3-031-13870-6_43

Download citation

DOI: https://doi.org/10.1007/978-3-031-13870-6_43
Published: 15 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13869-0
Online ISBN: 978-3-031-13870-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Joint Semantic Segmentation and Object Detection Based on Relational Mask R-CNN

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cross-scale Dynamic Relation Network for Object Detection

Towards Unified Object Detection and Semantic Segmentation

BG-Net: boundary-guidance network for object consistency maintaining in semantic segmentation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Joint Semantic Segmentation and Object Detection Based on Relational Mask R-CNN

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cross-scale Dynamic Relation Network for Object Detection

Towards Unified Object Detection and Semantic Segmentation

BG-Net: boundary-guidance network for object consistency maintaining in semantic segmentation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation