CRNet: Centroid Radiation Network for Temporal Action Localization

Ding, Xinpeng; Wang, Nannan; Li, Jie; Gao, Xinbo

doi:10.1007/978-3-030-88004-0_3

Xinpeng Ding¹⁶,
Nannan Wang¹⁷,
Jie Li¹⁶ &
…
Xinbo Gao^16,18

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13019))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2808 Accesses
2 Citations

Abstract

Temporal action localization aims to localize segments in an untrimmed video that contains different actions. Since contexts at boundaries between action instances and backgrounds are similar, how to separate the action instances from their surrounding is a challenge to be solved. In fact, the similar or dissimilar contents in actions play an important role in accomplishing the task. Intuitively, the instances with the same class label are affinitive while those with different labels are divergent. In this paper, we propose a novel method to model the relations between pairs of frames and generate precise action boundaries based on the relations, namely Centroid Radiation Network (CRNet). Specifically, we propose a Relation Network (RelNet) to represent the relations between sampled pairs of frames by employing an affinity matrix. To generate action boundaries, we use an Offset Network (OffNet) to estimate centroids of each action segments and their corresponding class labels. Based on the assumption that a centroid and its propagating areas have the same action label, we obtain action boundaries by adopting random walk to propagate a centroid to its related areas. Our proposed method is an one-stage method and can be trained in an end-to-end fashion. Experimental results show that our approach outperforms the state-of-the-art methods on THUMOS14 and ActivityNet.

This work was supported in part by the National Key Research and Development Program of China under Grant 2018AAA0103202; in part by the National Natural Science Foundation of China under Grant Grants 62036007, 61922066, 61876142, 61772402, and 62050175; in part by the Xidian University Intellifusion Joint Innovation Laboratory of Artificial Intelligence; in part by the Fundamental Research Funds for the Central Universities. Student Paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Boundary discrimination and proposal evaluation for temporal action proposal generation

Article 11 September 2020

Weakly supervised temporal action localization with proxy metric modeling

Article 08 August 2022

Temporal Action Detection with Structured Segment Networks

Article 28 August 2019

References

Buch, S., Escorcia, V., Ghanem, B., Fei-Fei, L., Niebles, J.C.: End-to-end, single-stream temporal action detection in untrimmed videos. In: Proceedings of the British Machine Vision Conference, vol. 2, p. 7 (2017)
Google Scholar
Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: Activitynet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–970 (2015)
Google Scholar
Duan, X., Huang, W., Gan, C., Wang, J., Zhu, W., Huang, J.: Weakly supervised dense event captioning in videos. In: Proceedings of Advances in Neural Information Processing Systems, pp. 3059–3069 (2018)
Google Scholar
Fan, L., Huang, W., Gan, C., Ermon, S., Gong, B., Huang, J.: End-to-end learning of motion representation for video understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6016–6025 (2018)
Google Scholar
Gan, C., Gong, B., Liu, K., Su, H., Guibas, L.J.: Geometry guided convolutional neural networks for self-supervised video representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5589–5597 (2018)
Google Scholar
Gan, C., Wang, N., Yang, Y., Yeung, D.Y., Hauptmann, A.G.: Devnet: a deep event network for multimedia event detection and evidence recounting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2568–2577 (2015)
Google Scholar
Gan, C., Yang, Y., Zhu, L., Zhao, D., Zhuang, Y.: Recognizing an action using its name: a knowledge-based approach. Int. J. Comput. Vis. 120(1), 61–77 (2016)
Article MathSciNet Google Scholar
Gan, C., Yao, T., Yang, K., Yang, Y., Mei, T.: You lead, we exceed: labor-free video concept learning by jointly exploiting web videos and images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 923–932 (2016)
Google Scholar
Grady, L.: Random walks for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1768–1783 (2006)
Article Google Scholar
Ji, J., Cao, K., Niebles, J.C.: Learning temporal action proposals with fewer labels. arXiv preprint arXiv:1910.01286 (2019)
Lin, T., Liu, X., Li, X., Ding, E., Wen, S.: Bmn: boundary-matching network for temporal action proposal generation. arXiv preprint arXiv:1907.09702 (2019)
Lin, T., Zhao, X., Shou, Z.: Single shot temporal action detection. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 988–996. ACM (2017)
Google Scholar
Lin, T., Zhao, X., Shou, Z.: Temporal convolution based action proposal: submission to activitynet 2017. arXiv preprint arXiv:1707.06750 (2017)
Lin, T., Zhao, X., Su, H., Wang, C., Yang, M.: Bsn: boundary sensitive network for temporal action proposal generation. In: Proceedings of the European Conference on Computer Vision, pp. 3–19 (2018)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Gaussian temporal awareness networks for action localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 344–353 (2019)
Google Scholar
Nguyen, P., Liu, T., Prasad, G., Han, B.: Weakly supervised action localization by sparse temporal pooling network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 6752–6761 (2018)
Google Scholar
Oneata, D., Verbeek, J., Schmid, C.: The lear submission at thumos 2014 (2014)
Google Scholar
Paul, S., Roy, S., Roy-Chowdhury, A.K.: W-talc: weakly-supervised temporal activity localization and classification. In: Proceedings of the European Conference on Computer Vision, pp. 563–579 (2018)
Google Scholar
Shou, Z., Chan, J., Zareian, A., Miyazawa, K., Chang, S.F.: Cdc: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5734–5743 (2017)
Google Scholar
Shou, Z., Wang, D., Chang, S.F.: Temporal action localization in untrimmed videos via multi-stage cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1049–1058 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Proceedings of Advances in Neural InforKFCmation Processing Systems, pp. 568–576 (2014)
Google Scholar
Tang, K., Yao, B., Fei-Fei, L., Koller, D.: Combining the right features for complex event recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2696–2703 (2013)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Google Scholar
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Xiao, F., Jae Lee, Y.: Video object detection with an aligned spatial-temporal memory. In: Proceedings of the European Conference on Computer Vision, pp. 485–501 (2018)
Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Yuan, Z., Stroud, J.C., Lu, T., Deng, J.: Temporal action localization by structured maximal sums. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3684–3692 (2017)
Google Scholar
Zhao, Y., Xiong, Y., Wang, L., Wu, Z., Tang, X., Lin, D.: Temporal action detection with structured segment networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2914–2923 (2017)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. In: arXiv preprint arXiv:1904.07850 (2019)

Download references

Author information

Authors and Affiliations

State Key Laboratory of Integrated Services Networks, School of Electronic Engineering, Xidian University, Xi’an, 710071, China
Xinpeng Ding, Jie Li & Xinbo Gao
State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University, Xi’an, 710071, China
Nannan Wang
Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Xinbo Gao

Authors

Xinpeng Ding
View author publications
You can also search for this author in PubMed Google Scholar
Nannan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Li
View author publications
You can also search for this author in PubMed Google Scholar
Xinbo Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nannan Wang .

Editor information

Editors and Affiliations

University of Science and Technology Beijing, Beijing, China
Huimin Ma
Chinese Academy of Sciences, Beijing, China
Liang Wang
Tsinghua University, Beijing, China
Changshui Zhang
Zhejiang University, Hangzhou, China
Fei Wu
Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hunan University, Changsha, China
Yaonan Wang
Sun Yat-Sen University, Guangzhou, Guangdong, China
Jianhuang Lai
Beijing Jiaotong University, Beijing, China
Yao Zhao

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 841 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, X., Wang, N., Li, J., Gao, X. (2021). CRNet: Centroid Radiation Network for Temporal Action Localization. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13019. Springer, Cham. https://doi.org/10.1007/978-3-030-88004-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-88004-0_3
Published: 22 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88003-3
Online ISBN: 978-3-030-88004-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CRNet: Centroid Radiation Network for Temporal Action Localization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Boundary discrimination and proposal evaluation for temporal action proposal generation

Weakly supervised temporal action localization with proxy metric modeling

Temporal Action Detection with Structured Segment Networks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 841 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

CRNet: Centroid Radiation Network for Temporal Action Localization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Boundary discrimination and proposal evaluation for temporal action proposal generation

Weakly supervised temporal action localization with proxy metric modeling

Temporal Action Detection with Structured Segment Networks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 841 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation