Abstract
Understanding the behavior of human motion in social environments is important for various domains of a smart city, e.g, smart transportation, automatic navigation of service robots, efficient navigation of autonomous cars and surveillance systems. Examining past trajectories or environmental factors alone are not enough to address this problem. We propose a novel methodology to predict future motion trajectories of humans based on past attitude of individuals, crowd attitude and environmental context. Many researchers have proposed different techniques based on different features extraction and features fusion to predict the future motion trajectory. They used traditional machine learning algorithms like SVM,social forces, probabilistic models and LSTM to analyze the heuristic motion trajectories but they didn’t consider the other environmental factors e.g relative positions of other humans present in environment and positions of objects present in environment which can affect the motion trajectories of humans. We intend to achieve this goal by employing Long Short Term Memory(LSTM) units to analyze motion histories, convolution neural networks to environmental facts e.g. human-human, human-object interaction and relative positioning of 80 different objects including pedestrians and generative adversarial networks(GANs) to predict possible future motion paths. Our proposed method achieved 70% lower Average Displacement Error(ADE) and 41% lower Final Displacement Error(FDE) in comparison to other state of the art techniques.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alahi A, Goel K, Ramanathan V, Robicquet A, Fei-Fei L, Savarese S (2016) Social lstm: human trajectory prediction in crowded spaces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 961–971
Ali A, Rafique H, Arshad T, Alqarni MA, Chauhdary SH, Bashir AK (2019) A fractal-based authentication technique using sierpinski triangles in smart devices. Sensors 19(3):678
Azad MA, Morla R (2013) Caller-rep: detecting unwanted calls with caller social strength. Comput Secur 39:219–236
Azad MA, Alazab M, Riaz F, Arshad J, Abullah T (2020) Socioscope: I know who you are, a robo, human caller or service number. Futur Gener Comput Syst 105:297–307
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:http://arxiv.org/abs/1409.0473
Ballan L, Castaldo F, Alahi A, Palmieri F, Savarese S (2016) Knowledge transfer for scene-specific motion prediction. In: European conference on computer vision. Springer, pp 697–713
Bhatti MH, Khan J, Khan MUG, Iqbal R, Aloqaily M, Jararweh Y, Gupta B (2019) Soft computing-based eeg classification by optimal feature selection and neural networks. IEEE Trans Ind Inform 15(10):5747–5754
Bush PCM (2019) Police with the latest information on the mosque shootings. https://www.rnz.co.nz/news/national/384896/police-with-the-latest-information-on-the-mosque-shootings,
Chathuramali KM, Rodrigo R (2012) Faster human activity recognition with svm. In: International conference on advances in ICT for emerging regions (ICTer2012). IEEE, pp 197–203
Chorowski J, Bahdanau D, Cho K, Bengio Y (2014) End-to-end continuous speech recognition using attention-based recurrent nn: first results. arXiv:http://arxiv.org/abs/1412.1602
Chung J, Kastner K, Dinh L, Goel K, Courville A, Bengio Y (2015) A recurrent latent variable model for sequential data. In: Advances in neural information processing systems, pp 2980–2988
Coscia P, Castaldo F, Palmieri FA, Ballan L, Alahi A, Savarese S (2016) Point-based path prediction from polar histograms. In: 2016 19th international conference on information fusion (FUSION). IEEE, pp 1961–1967
Deng J, Dong W, Socher R, Li L -J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR09
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
Fernando T, Denman S, Sridharan S, Fookes C (2018) Soft+ hardwired attention: an lstm framework for human trajectory prediction and abnormal event detection. Neural Netw 108:466–478
Gambrell J, Aya Batrawy AP (2015) New tally shows at least 1,621 killed in saudi hajj tragedy. https://www.businessinsider.com/ap-new-tally-shows-at-least-1621-killed-in-saudi-hajj-tragedy-2015-10
Gashteroodkhani O, Majidi M, Etezadi-Amoli M, Nematollahi A, Vahidi B (2019) A hybrid svm-tt transform-based method for fault location in hybrid transmission lines with underground cables. Electr Power Syst Res 170:205–214
Goel K, Robicquet A (2015) Learning causalities behind human trajectories. In: Conference on computer vision and pattern recognition
Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks. In: International conference on machine learning, pp 1764–1772
Gupta A, Johnson J, Fei-Fei L, Savarese S, Alahi A (2018) Social gan: socially acceptable trajectories with generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2255–2264
He Z, Jin L (2009) Activity recognition from acceleration data based on discrete consine transform and svm. In: 2009 IEEE international conference on systems, man and cybernetics. IEEE, pp 5041–5044
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hussain CS, Park M -S, Bashir AK, Shah SC, Lee J (2013) A collaborative scheme for boundary detection and tracking of continuous objects in wsns. Intell Autom Soft Comput 19(3):439–456
Jiang S, Lian M, Lu C, Ruan S, Wang Z, Chen B (2019) Svm-ds fusion based soft fault detection and diagnosis in solar water heaters. Energy Explor Exploit 37(3):1125–1146
Khan MZ, Harous S, Hassan SU, Khan MUG, Iqbal R, Mumtaz S (2019) Deep unified model for face recognition based on convolution neural network and edge computing. IEEE Access 7:72622–72633
Karpathy A, Joulin A, Fei-Fei LF (2014) Deep fragment embeddings for bidirectional image sentence mapping. In: Advances in neural information processing systems, pp 1889–1897
Khan MZ, Jabeen S, ul Hassan S, Hassan M, Khan MUG (2019) Video summarization using cnn and bidirectional lstm by utilizing scene boundary detection. In: 2019 International conference on applied and engineering mathematics (ICAEM). IEEE, pp 197–202
Khan G, Jabeen S, Khan MZ, Khan MUG, Iqbal R (2020) Blockchain-enabled deep semantic video-to-video summarization for iot devices. Comput Electr Eng 81:106524
Kim B, Pineau J (2016) Socially adaptive path planning in human environments using inverse reinforcement learning. Int J Social Robot 8(1):51–66
Lee N, Choi W, Vernaza P, Choy CB, Torr PH, Chandraker M (2017) Desire: distant future prediction in dynamic scenes with interacting agents. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 336–345
Lin T -Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, pp 816–833
Luber M, Stork JA, Tipaldi GD, Arras KO (2010) People tracking with human motion predictions from social forces. In: 2010 IEEE international conference on robotics and automation. IEEE, pp 464–469
Master N (2010) Intentional homicide, number and rate per 100,000 population. https://www.nationmaster.com/country-info/stats/Crime/Violent-crime/Murder-rate
Peltier E, Breeden A (2010) France declares strasbourg shooting an act of terrorism. https://www.nytimes.com/2018/12/12/world/europe/france-strasbourg-shooting.html
Qassim H, Verma A, Feinzimer D (2018) Compressed residual-vgg16 cnn model for big data places image recognition. In: 2018 IEEE 8th annual computing and communication workshop and conference (CCWC). IEEE, pp 169–175
Sadeghian A, Kosaraju V, Sadeghian A, Hirose N, Rezatofighi H, Savarese S (2019) Sophie: an attentive gan for predicting paths compliant to social and physical constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1349–1358
Saleem S, Dilawari A, Khan UG, Iqbal R, Wan S, Umer T (2019) Stateful human-centered visual captioning system to aid video surveillance. Comput Electr Eng 78:108–119
Satake S, Kanda T, Glas DF, Imai M, Ishiguro H, Hagita N (2009) How to approach humans?: strategies for social robots to initiate interaction. In: Proceedings of the 4th ACM/IEEE international conference on human robot interaction. ACM, pp 109–116
Shu T, Todorovic S, Zhu S -C (2017) Cern: confidence-energy recurrent network for group activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5523–5531
Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: International conference on machine learning, pp 843–852
Sultan S, Javed A, Irtaza A, Dawood H, Dawood H, Bashir AK (2019) A hybrid egocentric video summarization method to improve the healthcare for alzheimer patients. J Ambient Intell Hum Comput 10(10):4197–4206
Vasquez D, Large F, Fraichard T, Laugier C (2004) High-speed autonomous navigation with motion prediction for unknown moving obstacles. In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS)(IEEE cat. no. 04CH37566), vol 1. IEEE, pp 82–87
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Acknowledgments
Financial support for this study was provided by a grant from the National Center For Artificial Intelligence at University of Engineering and Technology, Lahore, Pakistan. The authors wish to thank Al-Khawarizimi Institute of Computer Science, UET Lahore for providing research platform and technical support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Financial support for this study was provided by agrant from the National Center For Artificial Intelligence at University of Engineering and Technology, Lahore, Pakistan
Rights and permissions
About this article
Cite this article
Hassan, M.A., Khan, M.U.G., Iqbal, R. et al. Predicting humans future motion trajectories in video streams using generative adversarial network. Multimed Tools Appl 83, 15289–15311 (2024). https://doi.org/10.1007/s11042-021-11457-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11457-z