FreeGaze: A Framework for 3D Gaze Estimation Using Appearance Cues from a Facial Video
Abstract
:1. Introduction
- We develop a framework, named FreeGaze, for appearance-based 3D gaze estimation from facial videos and study the contributions of face and eye features.
- We improve the normalization method using orthogonal matrices, proving that the improved normalization method has a higher accuracy and a lower computational time in gaze estimation.
- We propose a dual-branch CNN, which combines face and eye appearances for gaze estimation, and evaluate the contribution of both face and eye features separately.
- We study the effect of facial landmarks in different facial regions for normalization on gaze estimation accuracy.
2. Method
2.1. Facial Landmarks Detection and 3D Head Pose Estimation
2.2. Normalization
2.3. The Architecture of FG-Net
2.4. 3D Gaze Estimation
3. Experiments and Results Analysis
3.1. Datasets and Preprocessing
3.2. Implementation Details
3.3. Ten-Fold Cross-Validation Evaluation
3.4. Ablation Studies
3.4.1. The Effectiveness of Facial Landmarks in Different Facial Regions for Normalization
3.4.2. The Effectiveness of Dual-Branch Architecture
3.4.3. The Effectiveness of the Improved Normalization Method
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Raković, M.; Duarte, N.F.; Marques, J.; Billard, A.; Santos-Victor, J. The Gaze Dialogue Model: Nonverbal Communication in HHI and HRI. IEEE Trans. Cybern. 2022. [Google Scholar] [CrossRef]
- Menges, R.; Kumar, C.; Staab, S. Improving user experience of eye tracking-based interaction: Introspecting and adapting interfaces. ACM Trans. Comput. Hum. Interact. 2019, 26, 1–46. [Google Scholar] [CrossRef]
- Wang, F.S.; Kreiner, T.; Lutz, A.; Lohmeyer, Q.; Meboldt, M. What we see is what we do: A practical Peripheral Vision-Based HMM framework for gaze-enhanced recognition of actions in a medical procedural task. User Model User-Adap. 2023, 33, 939–965. [Google Scholar] [CrossRef]
- Mao, C.; Go, K.; Kinoshita, Y.; Kashiwagi, K.; Toyoura, M.; Fujishiro, I.; Li, J.; Mao, X. Different Eye Movement Behaviors Related to Artificial Visual Field Defects—A Pilot Study of Video-Based Perimetry. IEEE Access 2021, 9, 77649–77660. [Google Scholar] [CrossRef]
- Yu, W.; Zhao, F.; Ren, Z.; Jin, D.; Yang, X.; Zhang, X. Mining attention distribution paradigm: Discover gaze patterns and their association rules behind the visual image. Comput. Methods Programs Biomed. 2023, 230, 107330. [Google Scholar] [CrossRef]
- Fan, K.; Cao, J.; Meng, Z.; Zhu, J.; Ma, H.; Ng, A.C.M.; Ng, T.; Qian, W.; Qi, S. Predicting the Reader’s English Level From Reading Fixation Patterns Using the Siamese Convolutional Neural Network. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 30, 1071–1080. [Google Scholar] [CrossRef]
- Hansen, D.W.; Ji, Q. In the Eye of the Beholder: A Survey of Models for Eyes and Gaze. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 478–500. [Google Scholar] [CrossRef]
- Guestrin, E.D.; Eizenman, M. General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Trans. Biomed. Eng. 2006, 53, 1124–1133. [Google Scholar] [CrossRef]
- Nakazawa, A.; Nitschke, C. Point of gaze estimation through corneal surface reflection in an active illumination environment. In Proceedings of the Proceedings Part II, of the 12th European Conference on Computer Vision—ECCV 2012, Florence, Italy, 7–13 October 2012; Volume 7573, pp. 159–172. [Google Scholar]
- Alberto Funes Mora, K.; Odobez, J.M. Geometric generative gaze estimation (g3e) for remote rgb-d cameras. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1773–1780. [Google Scholar]
- Lu, F.; Gao, Y.; Chen, X. Estimating 3D gaze directions using unlabeled eye images via synthetic iris appearance fitting. IEEE Trans. Multimed. 2016, 18, 1772–1782. [Google Scholar] [CrossRef]
- Valenti, R.; Sebe, N.; Gevers, T. Combining head pose and eye location information for gaze estimation. IEEE Trans. Image Process. 2011, 21, 802–815. [Google Scholar] [CrossRef]
- Schneider, T.; Schauerte, B.; Stiefelhagen, R. Manifold Alignment for Person Independent Appearance-Based Gaze Estimation. In Proceedings of the 22nd International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014; pp. 1167–1172. [Google Scholar]
- Sugano, Y.; Matsushita, Y.; Sato, Y. Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1821–1828. [Google Scholar]
- Guo, Y.; Liu, Y.; Oerlemans, A.; Lao, S.; Wu, S.; Lew, M.S. Deep learning for visual understanding: A review. Neurocomputing 2016, 187, 27–48. [Google Scholar] [CrossRef]
- Zhang, X.; Sugano, Y.; Fritz, M.; Bulling, A. It’s written all over your face: Full-face appearance-based gaze estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 2299–2308. [Google Scholar]
- Palmero, C.; Selva, J.; Bagheri, M.A.; Escalera, S. Recurrent cnn for 3d gaze estimation using appearance and shape cues. arXiv 2018, arXiv:1805.03064. [Google Scholar]
- Zhang, X.; Sugano, Y.; Fritz, M.; Bulling, A. Appearance-based gaze estimation in the wild. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 4511–4520. [Google Scholar]
- Mora, F.; Alberto, K.; Monay, F.; Odobez, J.M. EYEDIAP: A Database for the Development and Evaluation of Gaze Estimation Algorithms from RGB and RGB-D Cameras. In Proceedings of the Symposium on Eye Tracking Research and Applications, Safety Harbor, FL, USA, 26–28 March 2014; pp. 255–258. [Google Scholar]
- Park, S.; Spurr, A.; Hilliges, O. Deep pictorial gaze estimation. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; pp. 741–757. [Google Scholar]
- Lian, D.; Hu, L.; Luo, W.; Xu, Y.; Duan, L.; Yu, J.; Gao, S. Multiview multitask gaze estimation with deep convolutional neural networks. IEEE Trans. Neural. Netw. Learn. Syst. 2018, 30, 3010–3023. [Google Scholar] [CrossRef]
- Liu, G.; Yu, Y.; Mora, K.A.F.; Odobez, J.M. A differential approach for gaze estimation with calibration. In Proceedings of the 2018 British Machine Vision Conference, Newcastle, UK, 3–6 September 2018; p. 6. [Google Scholar]
- Huang, L.; Li, Y.; Wang, X.; Wang, H.; Bouridane, A.; Chaddad, A. Gaze Estimation Approach Using Deep Differential Residual Network. Sensors 2022, 22, 5462. [Google Scholar] [CrossRef]
- Yu, Y.; Odobez, J.M. Unsupervised Representation Learning for Gaze Estimation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 7312–7322. [Google Scholar] [CrossRef]
- Ren, D.; Chen, J.; Zhong, J.; Lu, Z.; Jia, T.; Li, Z. Gaze estimation via bilinear pooling-based attention networks. J. Vis. Commun. Image Represent. 2021, 81, 103369. [Google Scholar] [CrossRef]
- Gu, S.; Wang, L.; He, L.; He, X.; Wang, J. Gaze estimation via a differential eyes’ appearances network with a reference grid. Engineering 2021, 7, 777–786. [Google Scholar] [CrossRef]
- Krafka, K.; Khosla, A.; Kellnhofer, P.; Kannan, H.; Bhandarkar, S.; Matusik, W.; Torralba, A. Eye tracking for everyone. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 10623–10630. [Google Scholar]
- Zhou, X.; Lin, J.; Zhang, Z.; Shao, Z.; Chen, S.; Liu, H. Improved itracker combined with bidirectional long short-term memory for 3D gaze estimation using appearance cues. Neurocomputing 2020, 390, 217–225. [Google Scholar] [CrossRef]
- Kellnhofer, P.; Recasens, A.; Stent, S.; Matusik, W.; Torralba, A. Gaze360: Physically Unconstrained Gaze Estimation in the Wild. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6911–6920. [Google Scholar]
- Chen, Z.; Shi, B.E. Towards high performance low complexity calibration in appearance based gaze estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1174–1188. [Google Scholar] [CrossRef]
- Li, Y.; Huang, L.; Chen, J.; Wang, X.; Tan, B. Appearance-Based Gaze Estimation Method Using Static Transformer Temporal Differential Network. Mathematics 2023, 11, 686. [Google Scholar] [CrossRef]
- Bazarevsky, V.; Kartynnik, Y.; Vakunov, A.; Raveendran, K.; Grundmann, M. Blazeface: Sub-millisecond neural face detection on mobile gpus. arXiv 2019, arXiv:1907.05047v2. [Google Scholar]
- Grishchenko, I.; Ablavatski, A.; Kartynnik, Y.; Raveendran, K.; Grundmann, M. Attention mesh: High-fidelity face mesh prediction in real-time. arXiv 2020, arXiv:2006.10962. [Google Scholar]
- Lepetit, V.; Moreno-Noguer, F.; Fua, P. EPnP: An Accurate O(n) Solution to the PnP Problem. Int. J. Comput. Vis. 2009, 81, 155–166. [Google Scholar] [CrossRef]
- Chen, Z.; Shi, B.E. Appearance-based gaze estimation using dilated-convolutions. In Proceedings of the Computer Vision—ACCV 2018, Perth, Australia, 2–6 December 2018; pp. 309–324. [Google Scholar]
- Abdelrahman, A.A.; Hempel, T.; Khalifa, A.; Al-Hamadi, A. L2CS-Net: Fine-Grained Gaze Estimation in Unconstrained Environments. arXiv 2022, arXiv:2203.03339. [Google Scholar]
- Cheng, Y.; Huang, S.; Wang, F.; Qian, C.; Lu, F. A coarse-to-fine adaptive network for appearance-based gaze estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 10623–10630. [Google Scholar]
- Cheng, Y.; Zhang, X.; Lu, F.; Sato, Y. Gaze estimation by exploring two-eye asymmetry. IEEE Trans. Image Process. 2020, 29, 5259–5272. [Google Scholar] [CrossRef]
Method | Eye | Face | Advantages |
---|---|---|---|
Multimodal CNN [18] | y | n | Low complexity |
Gazemap [20] | y | n | Robustness to head pose and image quality |
Multiview CNN [21] | y | n | Multitask solution |
Differential NN [22] | y | n | Less calibration |
DRNet [23] | y | n | Robustness to noise |
U-Train [24] | y | n | Unsupervised |
Spatial weights CNN [16] | n | y | Robustness to facial appearance variation |
BPA-Net [25] | n | y | Robustness to facial appearance variation |
Recurrent CNN [17] | n | y | Temporal modality |
DEA-Net [26] | y | n | Less samples |
iTracker [27] | y | y | High generalization in different datasets |
Bi-LSTM [28] | y | y | Low complexity and robustness to resolution |
Gaze360 [29] | n | y | High generalization in real scene |
GEDD-Net [30] | y | y | low complexity high performance calibration |
STTDN [31] | y | y | feature fusion and dynamic feature extraction |
FreeGaze (Ours) | y | y | Improved normalization method and landmarks’ impact on gaze estimation |
Method | 3D Angular Error (°) | |
---|---|---|
MPIIGaze | EyeDiap | |
Multimodal CNN [18] | 6.3 | - |
Spatial weights CNN [16] | 4.8 | 6.0 |
Dilated-Convolutions [35] | 4.8 | - |
Recurrent CNN [17] | - | 3.4 |
L2CS-Net [36] | 3.92 | - |
Bi-LSTM [28] | 4.18 | 5.84 |
CA-Net [37] | 4.1 | 5.3 |
FARE-Net [38] | 4.3 | 5.71 |
DEA-Net [26] | 4.38 | - |
GEDD-Net [30] | 4.5 | 5.4 |
STTDN [31] | 3.73 | 5.02 |
U-Train [24] | - | 6.79 |
DRNet [23] | 4.57 | 6.14 |
FreeGaze | 3.11 | 2.75 |
Facial Regions for Preprocessing | Number of Landmarks | 3D Angular Error (°) | |
---|---|---|---|
MPIIGaze | EyeDiap | ||
Corners of eyes and mouth | 6 | 3.26 | 2.79 |
Eyes and nose | 92 | 3.11 | 2.79 |
Eyes and mouth | 112 | 3.22 | 2.80 |
Eyes, nose, and mouth | 166 | 3.06 | 2.78 |
Full face | 468 | 3.11 | 2.75 |
Number of Landmarks (Facial Regions) | Branches | 3D Angular Error (°) | |
---|---|---|---|
MPIIGaze | EyeDiap | ||
468 (full face) | Eye branch | 6.33 | 2.88 |
Face branch | 3.13 | 2.73 | |
Dual branch | 3.11 | 2.75 | |
166 (eye, nose, and mouth) | Eye branch | 6.39 | 2.90 |
Face branch | 3.13 | 2.76 | |
Dual branch | 3.06 | 2.78 |
Normalization Method | 3D Angular Error (°) | Computational Time (ms) | ||
---|---|---|---|---|
MPIIGaze | EyeDiap | MPIIGaze | EyeDiap | |
Original | 8.00 | 5.58 | 5.96 | 5.11 |
Improved | 3.11 | 2.75 | 5.26 | 4.67 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tian, S.; Tu, H.; He, L.; Wu, Y.I.; Zheng, X. FreeGaze: A Framework for 3D Gaze Estimation Using Appearance Cues from a Facial Video. Sensors 2023, 23, 9604. https://doi.org/10.3390/s23239604
Tian S, Tu H, He L, Wu YI, Zheng X. FreeGaze: A Framework for 3D Gaze Estimation Using Appearance Cues from a Facial Video. Sensors. 2023; 23(23):9604. https://doi.org/10.3390/s23239604
Chicago/Turabian StyleTian, Shang, Haiyan Tu, Ling He, Yue Ivan Wu, and Xiujuan Zheng. 2023. "FreeGaze: A Framework for 3D Gaze Estimation Using Appearance Cues from a Facial Video" Sensors 23, no. 23: 9604. https://doi.org/10.3390/s23239604