Deep convolutional neural model for human activities recognition in a sequence of video by combining multiple CNN streams

Varshney, Neeraj; Bakariya, Brijesh

doi:10.1007/s11042-021-11220-4

Deep convolutional neural model for human activities recognition in a sequence of video by combining multiple CNN streams

1220: Visual and Sensory Data Processing for Real Time Intelligent Surveillance System
Published: 26 August 2021

Volume 81, pages 42117–42129, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Neeraj Varshney¹ &
Brijesh Bakariya²

429 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

The video file is a collection of image sequential; this image sequence holds both spatial and temporal information. Optical flow and motion history images are two well-known methods for the identification of human activities. Optical flow describes the speed of every individual pixel point in the picture. Still, this information about the motion cannot represent the complete action and different movement speeds. The durations of Local body parts show almost similar intensity in the Motion history image. Therefore, similar actions are not identifying with good precision. In this paper, a deep convolutional neural model for human activities recognition video has been proposed in which multiple CNN streams are combined. The model combines spatial and temporal information. Two fusion schemes, i.e. Average fusion and convolution fusion of spatial and temporal stream, are discussed in this paper. The proposed method performs better than other approaches based on human activity recognition methods on a benchmark dataset, namely UCF101 and HMDB51.Average fusion score 95.4% test accuracy and convolution fusion score 97.2% test accuracy on UCF101 and for HMDB51, average fusion score 84.3% and convolution fusion score 85.1% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Activities Recognition from Videos Based on Compound Deep Neural Network

Activity Identification and Recognition in Real-Time Video Data Using Deep Learning Techniques

Multi-stream with Deep Convolutional Neural Networks for Human Action Recognition in Videos

References

Bhagat C, Kushwaha AKR (2019) Delving Deeper with Dual-Stream CNN for Activity Recognition: Select Proceedings of IC3E 2018. https://doi.org/10.1007/978-981-13-2685-1_32
Bilen H, Fernando B, Gavves E, Vedaldi A, Gould S (2016) Dynamic image networks for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3034–3042
Deng J, Dong W, Socher R, Li L, Li K, Li F (2009) ImageNet: a large-scale hierarchical image database. In: CVPR, pp 248–255
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941
Feichtenhofer C, Pinz A, Wildes R (2016) Spatiotemporal residual networks for video action recognition. In: Proceedings of the Advances in Neural Information pro- cessing systems, pp 3468–3476
Girdhar R, Deva R, Abhinav G, Josef S, Bryan R (2017) Actionvlad: Learning spatio-temporal aggregation for action classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 971–980
Karpathy A, George T, Sanketh S, Thomas L, Rahul S, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 1725–1732
Khurana R, Kushwaha AKS (2019) Delving Deeper with Dual-Stream CNN for Activity Recognition. In Recent Trends in Communication, Computing, and Electronics, pp 333–342. Springer, Singapore
Khaire P, Kumar P, Imran J (2018) Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recogn Lett. https://doi.org/10.1016/j.patrec.2018.04.035
Article Google Scholar
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: A large video database for human motion recognition. ICCV
Kushwaha AKS, Srivastava S, Srivastava R (2017) Multi-view human activity recognition based on silhouette and uniform rotation invariant local binary patterns. Multimedia Syst 23(4):451–467
Article Google Scholar
Roy D, Srinivas M, Chalavadi KM (2016) Sparsity-inducing dictionaries for effective action classification. Pattern Recogn. https://doi.org/10.1016/j.patcog.2016.03.011
Article Google Scholar
Simonyan K, Andrew Z (2014) Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems, pp 568–576
Singh R, Kushwaha AKS, Srivastava R (2019) Multi-view recognition system for human activity based on multiple features for video surveillance system. Multimedia Tools Appl 78(12):17165–17196
Article Google Scholar
Soomro K, Zamir AR, Shah M (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint https://arXiv:1212.0402
Sun L, Kui J, Dit-Yan Y, Bertram ES (2015) Human action recognition using factorized spatio-temporal convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp 4597–4605
Tsai D-M, Chiu W-Y, Lee M-H (2015) Optical flow-motion history image (OF-MHI) for action recognition. SIViP 9(8):1897–1906. https://github.com/tomar840/two-stream-fusion-for-action-recognition-in-videos
Article Google Scholar
Tran D, Lubomir B, Rob F, Lorenzo T, Manohar P (2015) Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp 4489–4497
Tu Z, Xie W, Qin Q, Poppe R, Veltkamp R, Li B, Yuan J (2018) Multi-stream CNN: learning representations based on human related regions for action recognition. Pattern Recogn 79:32–43
Article Google Scholar
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3551–3558
Wang J, Cherian A, Porikli F, Gould S (2018) Video representation learning using discriminative pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 pp 1149–1158
Wang L, Ge L, Li R, Fang Y (2017) Three-stream CNNs for action recognition. Pattern Recogn Lett 92:33–40
Article Google Scholar
Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-con- volutional descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4305–4314
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision, pp 20–36. Springer, Cham. https://towardsdatascience.com/gentle-dive-into-math-behind-convolutional-neural-networks-9a07dd44cf9
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of the European Conference on Computer Vision, Springer, pp 20–36
Zhu Y, Zhenzhong L, Shawn N, Alexander H (2018) Hidden two-stream convolutional networks for action recognition. Asian Conference on Computer Vision. Springer, Cham, pp 363–378
Google Scholar

Download references

Author information

Authors and Affiliations

I. K. Gujral Punjab Technical University, Kapurthala, India
Neeraj Varshney
I. K. Gujral Punjab Technical University, Hoshiarpur Campus, Hoshiarpur, India
Brijesh Bakariya

Authors

Neeraj Varshney
View author publications
You can also search for this author in PubMed Google Scholar
Brijesh Bakariya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neeraj Varshney.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Varshney, N., Bakariya, B. Deep convolutional neural model for human activities recognition in a sequence of video by combining multiple CNN streams. Multimed Tools Appl 81, 42117–42129 (2022). https://doi.org/10.1007/s11042-021-11220-4

Download citation

Received: 09 March 2021
Revised: 10 June 2021
Accepted: 06 July 2021
Published: 26 August 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11042-021-11220-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep convolutional neural model for human activities recognition in a sequence of video by combining multiple CNN streams

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Human Activities Recognition from Videos Based on Compound Deep Neural Network

Activity Identification and Recognition in Real-Time Video Data Using Deep Learning Techniques

Multi-stream with Deep Convolutional Neural Networks for Human Action Recognition in Videos

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Deep convolutional neural model for human activities recognition in a sequence of video by combining multiple CNN streams

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Human Activities Recognition from Videos Based on Compound Deep Neural Network

Activity Identification and Recognition in Real-Time Video Data Using Deep Learning Techniques

Multi-stream with Deep Convolutional Neural Networks for Human Action Recognition in Videos

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation