Abstract
As fundamental and important problems in computer vision field, semantic segmentation and object detection have made a series of breakthroughs in recent years. Although the existing semantic segmentation and object detection methods have achieved impressive performance in some detection benchmarks, they only focus on local information near the region of objects. However, an image usually contains rich semantic information, including scene context information and dependency information between objects. As a result, ignoring this semantic information will inevitably deteriorate their performance. In this paper, we propose a novel network named joint semantic segmentation and object detection based on relational Mask R-CNN (RM-RCNN) to solve above limitations. By designing the object dependence calculation module (DCM), we can model the relationship information between objects by their geometric and appearance features, so as to improve the accuracy of semantic segmentation and object detection. At the same time, we also design a cross-scale information transmission module (CSITM), which can make the features of different levels transmit information to each other. By using CSITM, our method can effectively retain the useful information and discard the useless information to further improve its performance. Experiments on two benchmark datasets demonstrate the effectiveness of our proposed network.
Supported by Fund of the Jilin Provincial Science and Technology Department (20210101187JC), and the Fundamental Research Funds for the Central Universities (2412020FZ029).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Goodfellow, I.J., Bengio, Y., Courville, A.C.: Deep learning. In: Adaptive Computation and Machine Learning. MIT Press (2016), http://www.deeplearningbook.org
Long, J., et al.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Badrinarayanan, V., et al.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Chen, L.C., Papandreou, G., et al.: Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. Adv. Neural Inf. Proces. Syst. 24, 1–9 (2011)
He, K., Gkioxari, G., et al.: Mask R-CNN. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)
Mottaghi, R., et al.: The role of context for object detection and semantic segmentation in the wild. In: Computer Vision and Pattern Recognition (2014)
Yu, S., et al.: Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection. arXiv preprint arXiv:2203.05787 (2022)
Chen, Q., et al.: You only look one-level feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13039–13048 (2021)
Sun, P., et al.: Sparse R-CNN: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)
Wang, J., et al.: End-to-end object detection with fully convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15849–15858 (2021)
Li, W., et al.: SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection. arXiv preprint arXiv:2203.06398 (2022)
Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: IEEE International Conference on Computer Vision, vol. 2, p. 273. IEEE Computer Society (2003)
Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Zeng, X., Ouyang, W., Yang, B., Yan, J., Wang, X.: Gated bi-directional CNN for object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9911, pp. 354–369. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_22
Shrivastava, A., Gupta, A.: Contextual priming and feedback for faster R-CNN. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9905, pp. 330–348. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_20
Chen, X., Gupta, A.: Spatial memory for context reasoning in object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Yong, L., Wang, R., Shan, S., Chen, X.: Structure inference net: Object detection using scene-level context and instance-level relationships. In: IEEE (2018)
Zhang, Y., Kong, J., Qi, M., Liu, Y., Lu, Y.: Object detection based on multiple information fusion net. Appl. Sci. 10(1), 418 (2020)
Zhao, H., et al.: Pyramid scene parsing network. In: IEEE Computer Society (2016)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention — MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Pinheiro, P., Collobert, R., Dollar, P.: Learning to segments objects candidates. Adv. Neural Inf. Proces. Syst. 28, 1–9 (2015)
Dai, J., He, K., Li, Y., Ren, S., Sun, J.: Instance-sensitive fully convolutional networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9910, pp. 534–549. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_32
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 1–9 (2015)
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Redmon, J., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Hochreiter, S., et al.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Lin, T.-Y., et al.: Microsoft coco: Common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Everingham, M., et al.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. IEEE (2016)
Li, Y., et al.: Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2359–2367 (2017)
Bolya, D., et al.: Real-time instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9157–9166 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Xu, H., Fan, J., Qi, M., Liu, T., Wang, J. (2022). Joint Semantic Segmentation and Object Detection Based on Relational Mask R-CNN. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2022. Lecture Notes in Computer Science, vol 13393. Springer, Cham. https://doi.org/10.1007/978-3-031-13870-6_43
Download citation
DOI: https://doi.org/10.1007/978-3-031-13870-6_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13869-0
Online ISBN: 978-3-031-13870-6
eBook Packages: Computer ScienceComputer Science (R0)