Saliency prediction on omnidirectional images with attention-aware feature fusion network

Zhu, Dandan; Chen, Yongqing; Zhao, Defang; Zhou, Qiangqiang; Yang, Xiaokang

doi:10.1007/s10489-020-01857-3

Saliency prediction on omnidirectional images with attention-aware feature fusion network

Published: 08 January 2021

Volume 51, pages 5344–5357, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Dandan Zhu¹,
Yongqing Chen²,
Defang Zhao³,
Qiangqiang Zhou⁴ &
…
Xiaokang Yang ORCID: orcid.org/0000-0002-0564-5575¹

651 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Recent years have witnessed rapid development of deep learning technology and its successful application in the saliency prediction of traditional 2D images. However, when using deep neural network (DNN) models to perform saliency prediction on omnidirectional images (ODIs), there are two critical issues: (1) The datasets for ODIs are small-scale that cannot support the training DNN-based models. (2) It is challenging to perform saliency prediction in that some ODIs contain complex background clutters. In order to solve these two problems, we propose a novel Attention-Aware Features Fusion Network (AAFFN) model which is first trained with traditional 2D images and then transferred to the ODIs for saliency prediction. Specifically, our proposed AAFFN model consists of three modules: a Part-guided Attention (PA) module, a Visibility Score (VS) module, and a Attention-Aware Features Fusion (AAFF) module. The PA module is used to extract precise features to estimate attention of the finer part on ODIs, and eliminate the influence of cluttered background. Meanwhile, the VS module is introduced to measure the proportion of the foreground and background parts and generate visibility scores in the feature learning process. Finally, in the AAFF module, we utilize the weighted fusion of attention maps and visibility scores to generate the final saliency map. Extensive experiments and ablation analysis demonstrate that the proposed model achieves superior performance and outperforms other state-of-the-art methods on public benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention-Based Asymmetric Fusion Network for Saliency Prediction in 3D Images

360 $$^{\circ }$$ Omnidirectional Salient Object Detection with Multi-scale Interaction and Densely-Connected Prediction

FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction

Article 07 July 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Reina MA, Nieto XG, McGuinness K, O’Connor NE (2017) Saltinet: scan-path prediction on 360 degree images using saliency volumes. In: Proceedings of the IEEE international conference on computer vision, pp 2331–2338
Battisti F, Baldoni S, Brizzi M, Carli M (2018) A feature-based approach for saliency estimation of omni-directional images. Signal Process: Image Commun 69:53–59
Google Scholar
Borji A (2012) Boosting bottom-up and top-down visual features for saliency estimation. In: 2012 Boosting IEEE conference on computer vision and pattern recognition, pp 438–445
Corbillon X, De Simone F, Simon G (2017) 360-degree video head movement dataset. In: Proceedings of the 8th ACM on multimedia systems conference. ACM, pp 199–204
Cornia M, Baraldi L, Serra G, Cucchiara R (2016) A deep multi-level network for saliency prediction. In: 2016 23rd International conference on pattern recognition (ICPR). IEEE, pp 3488–3493
Cornia M, Baraldi L, Serra G, Cucchiara R (2018) Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans Image Process 27(10):5142–5154
Article MathSciNet Google Scholar
David EJ, Gutierrez J, Coutrot A, Da Silva MP, Le Callet P (2018) A dataset of head and eye movements for 360 videos. In: Proceedings of the 9th ACM multimedia systems conference. ACM, pp 432–437
De Abreu A, Ozcinar C, Smolic A (2017) Look around you: saliency maps for omnidirectional images in vr applications. In: 2017 Ninth international conference on quality of multimedia experience (QoMEX). IEEE, pp 1–6
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Harel J, Koch C, Perona P (2007) Graphbased visual saliency. In: Advances in neural information processing systems, pp 545–552
Hu B, Johnson-Bey I, Sharma M, Niebur E (2017) Head movements during visual exploration of natural images in virtual reality. In: 2017 51st Annual conference on information sciences and systems (CISS). IEEE, pp 1–6
Hu H-N, Lin Y-C, Liu M-Y, Cheng H-T, Chang Y-J, Sun M (2017) Deep 360 pilot: learning a deep agent for piloting through 360 sports videos. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1396–1405
Huang X, Shen C, Boix X, Zhao Q (2015) Salicon: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 262–270
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Article Google Scholar
Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations., MIT tech report, Tech. Rep
Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 2106–2113
Kruthiventi SSS, Ayush K, Babu RV (2017) Deepfix: a fully convolutional neural network for predicting human eye fixations. IEEE Trans Image Process 26(9):4446–4456
Article MathSciNet Google Scholar
Le Meur O, Le Callet P, Barba D (2007) Predicting visual fixations on video based on low-level visual features. Vis Res 47(19):2483–2498
Article Google Scholar
Lebreton P, Raake A (2018) Gbvs360, bms360, prosal: extending existing saliency prediction models from 2d to omnidirectional images. Signal Process: Image Commun 69:69–78
Google Scholar
Lo W-C, Fan C-L, Lee J, Huang C-Y, Chen K-T, Hsu C-H (2017) 360 video viewing dataset in head-mounted virtual reality. In: Proceedings of the 8th ACM on multimedia systems conference. ACM, pp 211–216
Monroy R, Lutz S, Chalasani T, Smolic A (2018) Salnet360: saliency maps for omni-directional images with cnn. Signal Process: Image Commun 69:26–34
Google Scholar
Otani M, Nakashima Y, Rahtu E, Heikkila J, Yokoya N (2016) Video summarization using deep semantic features. In: Asian conference on computer vision. Springer, pp 361– 377
Ozcinar C, Smolic A (2018) Visual attention in omnidirectional video for virtual reality applications. In: 2018 Tenth international conference on quality of multimedia experience (QoMEX), pp 1–6
Pan J, Sayrol E, Nieto XG, McGuinness K, O’Connor NE (2016) Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 598–606
Pan J, Sayrol E, Nieto XG, Ferrer CC, Torres J, McGuinness K, O’Connor NE (2017) Salgan: visual saliency prediction with adversarial networks. In: CVPR scene understanding workshop (SUNw)
Peters RJ, Iyer A, Itti L, Koch C (2005) Components of bottom-up gaze allocation in natural images. Vis Res 45(18):2397– 2416
Article Google Scholar
Rai Y, Gutiérrez J, Le Callet P (2017) A dataset of head and eye movements for 360 degree images. In: Proceedings of the 8th ACM on multimedia systems conference. ACM, pp 205–210
Riche N, Duvinage M, Mancas M, Gosselin B, Dutoit T (2013) Saliency and human fixations: state-of-the-art and study of comparison metrics. In: Proceedings of the IEEE international conference on computer vision, pp 1153–1160
Sitzmann V, Serrano A, Pavel A, Agrawala M, Gutierrez D, Masia B, Wetzstein G (2018) Saliency in vr: how do people explore virtual environments? IEEE Trans Visual Comput Graph 24 (4):1633– 1642
Article Google Scholar
Startsev M, Dorr M (2018) 360-aware saliency estimation with conventional image saliency predictors. Signal Process: Image Commun 69:43–52
Google Scholar
Upenik E, Ebrahimi T (2017) A simple method to obtain visual attention data in head mounted virtual reality. In: 2017 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 73–78
Wang R, Li W, Qin R, Wu JZ (2017) Blur image classification based on deep learning. In: 2017 IEEE international conference on imaging systems and techniques (IST). IEEE, pp 1–6
Wang W, Shen J (2017) Deep visual attention prediction. IEEE Trans Image Process 27 (5):2368–2378
Article MathSciNet Google Scholar
Xu Y, Dong Y, Wu J, Sun Z, Shi Z, Yu J, Gao S (2018) Gaze prediction in dynamic 360 immersive videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5333–5342
Zhai M, Chen L, Mori G, Roshtkhari MJ (2018) Deep learning of appearance models for online object tracking. In: Proceedings of the European conference on computer vision (ECCV)
Zhang J, Sclaroff S (2015) Exploiting surroundedness for saliency detection: a boolean map approach. IEEE Trans Pattern Anal Mach Intell 38(5):889–902
Article Google Scholar
Zhang Z, Xu Y, Yu J, Gao S (2018) Saliency detection in 360 videos. In: Proceedings of the European conference on computer vision (ECCV), pp 488–503
Zhao Z-Q, Zheng P, Xu S-T, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232
Article Google Scholar
Zhu Y, Zhai G, Min X (2018) The prediction of head and eye movement for 360 degree images. Signal Process: Image Commun 69:15–25
Article Google Scholar
Salvucci DD, Goldberg JH (2000) Identifying fixations and saccades in eye-tracking protocols. In: Proceedings of the 2000 symposium on eye tracking research & applications, pp 71–78

Download references

Author information

Authors and Affiliations

Artificial Intelligence Institute, Shanghai Jiao Tong University, Shanghai, 200240, China
Dandan Zhu & Xiaokang Yang
Hainan Air Traffic Management Sub-Bureau, HaiKou, 570000, China
Yongqing Chen
School of Software Engineering, Tongji University, Shanghai, 201804, China
Defang Zhao
School of Information and Computer, Shanghai Business School, Shanghai, 201400, China
Qiangqiang Zhou

Authors

Dandan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yongqing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Defang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Qiangqiang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaokang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dandan Zhu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, D., Chen, Y., Zhao, D. et al. Saliency prediction on omnidirectional images with attention-aware feature fusion network. Appl Intell 51, 5344–5357 (2021). https://doi.org/10.1007/s10489-020-01857-3

Download citation

Published: 08 January 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10489-020-01857-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Saliency prediction on omnidirectional images with attention-aware feature fusion network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Attention-Based Asymmetric Fusion Network for Saliency Prediction in 3D Images

360 $$^{\circ }$$ Omnidirectional Salient Object Detection with Multi-scale Interaction and Densely-Connected Prediction

FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Saliency prediction on omnidirectional images with attention-aware feature fusion network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Attention-Based Asymmetric Fusion Network for Saliency Prediction in 3D Images

360 $$^{\circ }$$ Omnidirectional Salient Object Detection with Multi-scale Interaction and Densely-Connected Prediction

FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation