An Empirical Study of Attention Networks for Semantic Segmentation

Guo, Hao; Si, Hongbiao; Jiang, Guilin; Zhang, Wei; Liu, Zhiyan; Zhu, Xuanyi; Zhang, Xulong; Liu, Yang

doi:10.1007/978-981-97-2303-4_15

Hao Guo¹²,
Hongbiao Si¹³,
Guilin Jiang¹³,
Wei Zhang¹⁴,
Zhiyan Liu¹⁵,
Xuanyi Zhu¹³,
Xulong Zhang¹⁶ &
…
Yang Liu¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14331))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

209 Accesses

Abstract

Semantic segmentation is a vital problem in computer vision. Recently, a common solution to semantic segmentation is the end-to-end convolution neural network, which is much more accurate than traditional methods. Recently, the decoders based on attention achieve state-of-the-art (SOTA) performance on various datasets. But these networks always are compared with the mIoU of previous SOTA networks to prove their superiority and ignore their characteristics without considering the computation complexity and precision in various categories, which is essential for engineering applications. Besides, the methods to analyze the FLOPs and memory are not consistent between different networks, which makes the comparison hard to be utilized. What’s more, various methods utilize attention in semantic segmentation, but the conclusion of these methods is lacking. This paper first conducts experiments to analyze their computation complexity and compare their performance. Then it summarizes suitable scenes for these networks and concludes key points that should be concerned when constructing an attention network. Last it points out some future directions of the attention network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A lightweight network with attention decoder for real-time semantic segmentation

Article 07 May 2021

Fully convolutional network with attention modules for semantic segmentation

Article 02 January 2021

PPNet : pooling position attention network for semantic segmentation

Article 02 September 2023

References

Chen, W., et al.: Tensor low-rank reconstruction for semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV 202, vol. 12362, pp. 52–69. Springer, Heidelberg (2020). https://doi.org/10.1007/978-3-030-58520-4_4
Chapter Google Scholar
Cheng, B., Schwing, A., Kirillov, A.: Per-pixel classification is not all you need for semantic segmentation. Adv. Neural. Inf. Process. Syst. 34, 17864–17875 (2021)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Google Scholar
Fu, J., et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Google Scholar
Guo, M.H., et al.: Attention mechanisms in computer vision: a survey. In: Computational Visual Media, pp. 1–38 (2022)
Google Scholar
Hayhoe, M., Ballard, D.: Eye movements in natural behavior. Trends Cogn. Sci. 9(4), 188–194 (2005)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hu, X., Yang, K., Fei, L., Wang, K.: ACNET: attention based network to exploit complementary features for rgbd semantic segmentation. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 1440–1444. IEEE (2019)
Google Scholar
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 603–612 (2019)
Google Scholar
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Article Google Scholar
Li, L., Zhou, T., Wang, W., Li, J., Yang, Y.: Deep hierarchical semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1246–1257 (2022)
Google Scholar
Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., Liu, H.: Expectation-maximization attention networks for semantic segmentation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9167–9176 (2019)
Google Scholar
Li, X., Zhao, H., Han, L., Tong, Y., Tan, S., Yang, K.: Gated fully fusion for semantic segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11418–11425 (2020)
Google Scholar
Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440 (2016)
Ravanbakhsh, M., et al.: Human-machine collaboration for medical image segmentation. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1040–1044. IEEE (2020)
Google Scholar
Song, Q., Li, J., Li, C., Guo, H., Huang, R.: Fully attentional network for semantic segmentation. arXiv preprint arXiv:2112.04108 (2021)
Strudel, R., Garcia, R., Laptev, I., Schmid, C.: Segmenter: transformer for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7262–7272 (2021)
Google Scholar
Sun, A., Zhang, X., Ling, T., Wang, J., Cheng, N., Xiao, J.: Pre-avatar: an automatic presentation generation framework leveraging talking avatar. In: 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1002–1006 (2022). https://doi.org/10.1109/ICTAI56018.2022.00153
Valenzuela, A., Arellano, C., Tapia, J.: An efficient dense network for semantic segmentation of eyes images captured with virtual reality lens. In: 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pp. 28–34. IEEE (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 5998–6008 (2017)
Google Scholar
Wang, P., et al.: Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1451–1460. IEEE (2018)
Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077–12090 (2021)
Google Scholar
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR (2015)
Google Scholar
Yin, M., et al.: Disentangled non-local neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 191–207. Springer, Heidelberg (2020)
Google Scholar
Yuan, J., Deng, Z., Wang, S., Luo, Z.: Multi receptive field network for semantic segmentation. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1883–1892. IEEE (2020)
Google Scholar
Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 173–190. Springer, Heidelberg (2020). https://doi.org/10.1007/978-3-030-58539-6_11
Chapter Google Scholar
Zhao, H., et al.: Psanet: point-wise spatial attention network for scene parsing. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 267–283 (2018)
Google Scholar
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)
Google Scholar
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 633–641 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Hunan Chasing Securities Co., Ltd., Changsha, China
Hao Guo & Yang Liu
Hunan Chasing Financial Holdings Co., Ltd., Changsha, China
Hongbiao Si, Guilin Jiang & Xuanyi Zhu
Hunan Chasing Digital Technology Co., Ltd., Changsha, China
Wei Zhang
Hunan Chasing Trust Co., Ltd., Changsha, China
Zhiyan Liu
Ping An Technology (Shenzhen) Co., Ltd., Shenzhen, China
Xulong Zhang

Authors

Hao Guo
View author publications
You can also search for this author in PubMed Google Scholar
Hongbiao Si
View author publications
You can also search for this author in PubMed Google Scholar
Guilin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xuanyi Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xulong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Liu .

Editor information

Editors and Affiliations

Peng Cheng Laboratory, Shenzhen, China
Xiangyu Song
China University of Geosciences, Wuhan, China
Ruyi Feng
China University of Geosciences, Wuhan, China
Yunliang Chen
Deakin University, Burwood, VIC, Australia
Jianxin Li
University of Exeter, Exeter, UK
Geyong Min

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, H. et al. (2024). An Empirical Study of Attention Networks for Semantic Segmentation. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14331. Springer, Singapore. https://doi.org/10.1007/978-981-97-2303-4_15

Download citation

DOI: https://doi.org/10.1007/978-981-97-2303-4_15
Published: 29 May 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2302-7
Online ISBN: 978-981-97-2303-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Empirical Study of Attention Networks for Semantic Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A lightweight network with attention decoder for real-time semantic segmentation

Fully convolutional network with attention modules for semantic segmentation

PPNet : pooling position attention network for semantic segmentation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Empirical Study of Attention Networks for Semantic Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A lightweight network with attention decoder for real-time semantic segmentation

Fully convolutional network with attention modules for semantic segmentation

PPNet : pooling position attention network for semantic segmentation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation