Abstract
As the important research field in computer vision, gaze detection has attracted the attention of researchers. When dealing with head pose changes and incomplete head images, some existing gaze detection methods directly extract head global features and ignore some local features. Moreover, the feature extraction network neglects small objects or objects with insignificant features in the scene as the network layer deepens. Regarding the above issue, we propose a gaze detection model based on local feature fusion (Local-GazeNet). First, attention mechanisms is adopted to the scene feature extraction network for enhancing the significance of objects of interest and suppressing redundant information in the image. Secondly, a local feature extractor is employed to the gaze direction detection module, and the extracted local features are supplemented into the global features in the form of residuals, so that the model can obtain more abundant head features to deal with the complex changes of head poses. We validate the performance of the Local-GazeNet model on the GazeFollow dataset. The experimental results illustrate that our proposed method can achieve satisfactory performance and outperform some existing state-of-the art gaze detection methods.
Y. Dong—This work is supported by Science and Technology Research Project of Education Department of Jilin Province (No. JJKH20170976SK), Humanities and Social Science Research Project of Education Department of Jilin Province (No. JJH20221328SK). And Jilin Provincial Science and Technology Department Project (No. 20200401081GX).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Recasens, A., Khosla, A., Vondrick, C., Torralba, A.: Where are they looking? In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Frischen, A., Bayliss, A.P., Tipper, S.P.: Gaze cueing of attention: visual attention, social cognition, and individual differences. Psychol. Bull. 133(4), 694–724 (2007)
Lian, D., Yu, Z., Gao, S.: Believe it or not, we know what you are looking at! In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 35–50. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_3
Aung, A.M., Ramakrishnan, A., Whitehill, J.R.: Who are they Looking At? Automatic eye gaze following for classroom observation video analysis. Int. Educ. Data Mining Soc. (2018)
Ghosh, S., Dhall, A., Hayat, M., Knibbe, J., Ji, Q.: Automatic gaze analysis: a survey of deep learning based approaches. arXiv preprint arXiv:2108.05479 (2021)
Corcoran, P.M., Nanu, F., Petrescu, S., Bigioi, P.: Real-time eye gaze tracking for gaming design and consumer electronics systems. IEEE Trans. Consum. Electron. 58(2), 347–355 (2012)
Xia, D., Ruan, Z.: IR Image based eye gaze estimation. In: Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007), vol. 1, pp. 220–224. IEEE Computer Society (2007)
Chong, E., Wang, Y., Ruiz, N., Rehg, J.M.: Detecting attended visual targets in video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5396–5406 (2020)
Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: full-face appearance-based gaze estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 51–60 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolution-al neural networks. Commun. ACM 60(6), 84–90 (2017)
Hou, Q., Zhou, D., Feng J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
Lai, C.C., Shih, S.W., Hung, Y.P.: Hybrid method for 3-D gaze tracking using glint and contour features. IEEE Trans. Circuits Syst. Video Technol. 25(1), 24–37 (2014)
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886(2012)
Sugano, Y., Matsushita, Y., Sato, Y.: Appearance-based gaze estimation using visual saliency. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 329–341 (2012)
Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: Appearance-based gaze estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4511–4520 (2015)
Gorji, S., Clark, J.J.: Attentional push: a deep convolutional network for augmenting image salience with shared attention modeling in social scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2510–2519 (2017)
Chong, E., Ruiz, N., Wang, Y., Zhang, Y., Rozga, A., Rehg, J.M.: Connecting gaze, scene, and attention: generalized attention estimation via joint modeling of gaze and scene saliency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 397–412. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_24
Jeong, J.E., Choi, Y.S.: Depth-enhanced gaze following method. In: Proceedings of the 36th Annual ACM Symposium on Applied Computing, pp. 1090–1093 (2021)
Guan, J., Yin, L., Sun, J., Qi, S., Wang, X., Liao, Q.: Enhanced gaze following via object detection and human pose estimation. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 502–513. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_41
Zhao, Z., Liu, Q., Zhou, F.: Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, pp. 3510–3519 (2021)
Park, J., Woo, S., Lee, J.Y., Kweon, I.S.: BAM: Bottleneck Attention Module. In: British Machine Vision Conference (BMVC), vol. 147 (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, vol.1, pp. 487–495 (2014)
Funes Mora, K.A., Monay, F., Odobez, J.M.: EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 255–258 (2014)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009)
Pan, J., et al.: SalGAN: Visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081 (2017)
Chen, W., et al.: Gaze estimation via the joint modeling of multiple cues. IEEE Trans. Circuits Syst. Video Technol. 32, 1390–1402 (2022)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, J., Dong, Y., Xu, H., Sun, H., Qi, M. (2022). A Novel Gaze Detection Method Based on Local Feature Fusion. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2022. Lecture Notes in Computer Science(), vol 13395. Springer, Cham. https://doi.org/10.1007/978-3-031-13832-4_32
Download citation
DOI: https://doi.org/10.1007/978-3-031-13832-4_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13831-7
Online ISBN: 978-3-031-13832-4
eBook Packages: Computer ScienceComputer Science (R0)