iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1007/S11042-021-11220-4
Deep convolutional neural model for human activities recognition in a sequence of video by combining multiple CNN streams | Multimedia Tools and Applications Skip to main content
Log in

Deep convolutional neural model for human activities recognition in a sequence of video by combining multiple CNN streams

  • 1220: Visual and Sensory Data Processing for Real Time Intelligent Surveillance System
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The video file is a collection of image sequential; this image sequence holds both spatial and temporal information. Optical flow and motion history images are two well-known methods for the identification of human activities. Optical flow describes the speed of every individual pixel point in the picture. Still, this information about the motion cannot represent the complete action and different movement speeds. The durations of Local body parts show almost similar intensity in the Motion history image. Therefore, similar actions are not identifying with good precision. In this paper, a deep convolutional neural model for human activities recognition video has been proposed in which multiple CNN streams are combined. The model combines spatial and temporal information. Two fusion schemes, i.e. Average fusion and convolution fusion of spatial and temporal stream, are discussed in this paper. The proposed method performs better than other approaches based on human activity recognition methods on a benchmark dataset, namely UCF101 and HMDB51.Average fusion score 95.4% test accuracy and convolution fusion score 97.2% test accuracy on UCF101 and for HMDB51, average fusion score 84.3% and convolution fusion score 85.1% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Bhagat C, Kushwaha AKR (2019) Delving Deeper with Dual-Stream CNN for Activity Recognition: Select Proceedings of IC3E 2018. https://doi.org/10.1007/978-981-13-2685-1_32

  2. Bilen H, Fernando B, Gavves E, Vedaldi A, Gould S (2016) Dynamic image networks for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3034–3042

  3. Deng J, Dong W, Socher R, Li L, Li K, Li F (2009) ImageNet: a large-scale hierarchical image database. In: CVPR, pp 248–255

  4. Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634

  5. Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941

  6. Feichtenhofer C, Pinz A, Wildes R (2016) Spatiotemporal residual networks for video action recognition. In: Proceedings of the Advances in Neural Information pro- cessing systems, pp 3468–3476

  7. Girdhar R, Deva R, Abhinav G, Josef S, Bryan R (2017) Actionvlad: Learning spatio-temporal aggregation for action classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 971–980

  8. Karpathy A, George T, Sanketh S, Thomas L, Rahul S, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 1725–1732

  9. Khurana R, Kushwaha AKS (2019) Delving Deeper with Dual-Stream CNN for Activity Recognition. In Recent Trends in Communication, Computing, and Electronics, pp 333–342. Springer, Singapore

  10. Khaire P, Kumar P, Imran J (2018) Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recogn Lett. https://doi.org/10.1016/j.patrec.2018.04.035

    Article  Google Scholar 

  11. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: A large video database for human motion recognition. ICCV

  12. Kushwaha AKS, Srivastava S, Srivastava R (2017) Multi-view human activity recognition based on silhouette and uniform rotation invariant local binary patterns. Multimedia Syst 23(4):451–467

    Article  Google Scholar 

  13. Roy D, Srinivas M, Chalavadi KM (2016) Sparsity-inducing dictionaries for effective action classification. Pattern Recogn. https://doi.org/10.1016/j.patcog.2016.03.011

    Article  Google Scholar 

  14. Simonyan K, Andrew Z (2014) Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems, pp 568–576

  15. Singh R, Kushwaha AKS, Srivastava R (2019) Multi-view recognition system for human activity based on multiple features for video surveillance system. Multimedia Tools Appl 78(12):17165–17196

    Article  Google Scholar 

  16. Soomro K, Zamir AR, Shah M (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint https://arXiv:1212.0402

  17. Sun L, Kui J, Dit-Yan Y, Bertram ES (2015) Human action recognition using factorized spatio-temporal convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp 4597–4605

  18. Tsai D-M, Chiu W-Y, Lee M-H (2015) Optical flow-motion history image (OF-MHI) for action recognition. SIViP 9(8):1897–1906. https://github.com/tomar840/two-stream-fusion-for-action-recognition-in-videos

    Article  Google Scholar 

  19. Tran D, Lubomir B, Rob F, Lorenzo T, Manohar P (2015) Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp 4489–4497

  20. Tu Z, Xie W, Qin Q, Poppe R, Veltkamp R, Li B, Yuan J (2018) Multi-stream CNN: learning representations based on human related regions for action recognition. Pattern Recogn 79:32–43

    Article  Google Scholar 

  21. Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3551–3558

  22. Wang J, Cherian A, Porikli F, Gould S (2018) Video representation learning using discriminative pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 pp 1149–1158

  23. Wang L, Ge L, Li R, Fang Y (2017) Three-stream CNNs for action recognition. Pattern Recogn Lett 92:33–40

    Article  Google Scholar 

  24. Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-con- volutional descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4305–4314

  25. Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision, pp 20–36. Springer, Cham. https://towardsdatascience.com/gentle-dive-into-math-behind-convolutional-neural-networks-9a07dd44cf9

  26. Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of the European Conference on Computer Vision, Springer, pp 20–36

  27. Zhu Y, Zhenzhong L, Shawn N, Alexander H (2018) Hidden two-stream convolutional networks for action recognition. Asian Conference on Computer Vision. Springer, Cham, pp 363–378

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neeraj Varshney.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Varshney, N., Bakariya, B. Deep convolutional neural model for human activities recognition in a sequence of video by combining multiple CNN streams. Multimed Tools Appl 81, 42117–42129 (2022). https://doi.org/10.1007/s11042-021-11220-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11220-4

Keywords

Navigation