Abstract
This paper proposes an end-to-end temporal attention learning method to improve the performance of action quality assessment in sports video. For temporal weighted training, an attention-learning module is built to simulate the attention mechanism and judgement preference of human perception on action quality assessment. The weights are learned based on the loss of the segmented prediction errors and used to balance the significance of segmented features. We evaluate the proposed method on diving and gym-vault action of the benchmark AQA-7 dataset. The experimental results show that the proposed attention-aware feature training method is more effective than temporal aggregation and existing temporal relationship learning methods. Furthermore, only using the distance loss between the predicated score and the ground-truth score, without considering the ranking loss of different videos for training, this paper has achieved the state-of-the-art performance on both of the spearman rank correlation and mean Euclidean distance of the predicted scores against the judge’s scores.
Similar content being viewed by others
References
Parmar, P., Morris, B.: Action quality assessment across multiple actions. in 2019 IEEE winter conference on applications of computer vision (WACV). (2019)
Lei, Q., et al.: A survey of vision-based human action evaluation methods. Sensors 19(19), 4129 (2019)
Parmar, P., Morris, B.T.: Learning to score olympic events. in computer vision & pattern recognition workshops. (2017)
Xiang, X., et al.: S3D: Stacking segmental P3D for action quality assessment. In 2018 25th IEEE International conference on image processing (ICIP). (2018)
Li, Y., Chai, X., Chen, X.: ScoringNet: learning key fragment for action quality assessment with ranking loss in skilled sports. Springer, Cham (2019)
Patrona, F., et al.: Motion analysis: action detection. Recognit. Eval. Based Motion Capture Data 76, S0031320317304910 (2017)
Weeratunga, K., Dharmaratne, A., How, K.B.: Application of computer vision and vector space model for tactical movement classification in badminton. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. (2017)
Morel, M., et al.: Automatic evaluation of sports motion: a generic computation of spatial and temporal errors. Imag. Vision Comput. 64, 67–78 (2017)
O'Connor, N.E., Kelly, P.: Evaluating a dancer's performance using Kinect-based skeleton tracking. (2011)
Pirsiavash, H., Vondrick, C.: and A. Torralba, Assessing the Quality of Actions (2014)
Venkataraman, V., Vlachos, I., Turaga, T.K.: Dynamical regularity for action analysis. In BMVC. (2015)
Gordon, A.S.: Automated video assessment of human performance. In Proceedings of AI-ED. (1995)
Ilg, W., Mezger, J., Giese, M.: Estimation of skill levels in sports based on hierarchical spatio-temporal correspondences. (2003)
Wnuk, K., Soatto, S.: Analyzing diving: a dataset for judging action quality. In International conference on computer vision. (2010)
Yongjun Li1, Xiujuan Chai1,2, and Xilin Chen: End-To-End learning for action quality assessment. (2019)
William McNally Kanav Vats Tyler Pinto Chris Dulhanty John McPhee Alexander Wong, S.D.E., University ofWaterloo, GolfDB: a video database for golf swing sequencing, in cvpr 2019. (2019)
Hiteshi Jain, G.H.a.A.S.: Action quality assessment using siamese network-based deep metric learning. (2020)
Xu, C., et al.: Learning to score figure skating sport videos. IEEE transactions on circuits and systems for video technology, p. 1–1 (2019)
Yansong Tang1, 3,∗, Zanlin Ni1,∗, Jiahuan Zhou5, Danyang Zhang1, Jiwen Lu1,2,3, Ying Wu5, Jie Zhou1,2,3,4: Uncertainty-aware score distribution learning for action quality assessment. cvpr, (2020)
Parmar, P., Morris, B.T.: What and how well you performed? A multitask learning approach to action quality assessment. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). (2019)
Pan, J., Gao, J., Zheng, W.: Action assessment by joint relation graphs. In 2019 IEEE/CVF International conference on computer vision (ICCV). (2019)
Du, T., et al.: Learning spatiotemporal features with 3D convolutional networks. In IEEE International conference on computer vision. (2015)
Carreira, J., Zisserman, A.: Quo Vadis, Action Recognition? A new model and the kinetics dataset. p. 4724–4733 (2017)
Lea, C., et al.: Temporal convolutional networks for action segmentation and detection. In 2017 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE (2017)
UNLV AQA dataset.http://rtis.oit.unlv.edu/datasets.html.(Accessed on 22 Aug. 2020)
Kingma, D.P., Ba, J.J.a.L.: Adam: a method for stochastic optimization. (2014)
Funding
The National Nature Science Foundation of China (61871196, 62001176), the Natural Science Foundation of Fujian Province, China (2019J01082, 2020J01085), and Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University (ZQN-YX601), supported this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lei, Q., Zhang, H. & Du, J. Temporal attention learning for action quality assessment in sports video. SIViP 15, 1575–1583 (2021). https://doi.org/10.1007/s11760-021-01890-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-021-01890-w