Multi-Target Tracking Based on a Combined Attention Mechanism and Occlusion Sensing in a Behavior-Analysis System
Abstract
:1. Introduction
- (1)
- In complex environments, such as environments with many occlusions, the algorithm’s tracking needs to be improved for the same object.
- (2)
- Operational efficiency. The application scenario of multi-target tracking can determine that it needs to run at a speed close to or even beyond the real-time speed (20 FPS). Specifically for online algorithms, high running-speed requirements are proposed.
2. Related Work
2.1. MOT Algorithm
2.1.1. Two-Step MOT Algorithm
2.1.2. One-Step MOT Algorithm
2.2. Attention Mechanism
2.3. Shortcomings or Research Gaps
- Robustness: Multi-object tracking algorithms still need to improve their robustness to external factors, such as lighting changes, occlusions, and motion blur.
- Long-term tracking: Long-term tracking involves cross-frame object re-identification and model updates, and there are still some challenges, such as model drift, occlusions, and motion blur.
- Object re-identification: Object re-identification is one of the key technologies of multi-object tracking; however, there are still certain issues, such as object deformations and viewpoint changes.
- Algorithm efficiency: Multi-object tracking algorithms usually need to process a large amount of data, and fast and efficient algorithms are needed for real-time applications.
3. Proposed Method
3.1. FairMOT Framework
3.1.1. DLAseg-Based Backbone
3.1.2. Object Detection Branch
3.1.3. Object Recognition Branch
3.2. Convolutional Attention Module
3.3. Occlusion-Sensing Module
3.4. Loss Function
3.4.1. Heatmap Loss Function
3.4.2. Box Offset and Size Prediction Loss Function
3.4.3. Object Recognition Loss Function
3.4.4. Overall Loss Function
4. Experiments and Analysis
4.1. Experimental Datasets
4.1.1. MOT Series Dataset
4.1.2. CrowdHuman Dataset
4.1.3. MIX Dataset
4.2. Evaluation Metrics
4.3. Analysis of Experimental Results
4.3.1. Training Process and Results
4.3.2. Model Comparison Experiment
4.4. Ablation Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ravì, D.; Wong, C.; Deligianni, F.; Berthelot, M.; Andreu-Perez, J.; Lo, B.; Yang, G.Z. Deep learning for health informatics. IEEE J. Biomed. Health Inform. 2016, 21, 4–21. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nasri, N.; López-Sastre, R.J.; Pacheco-da Costa, S.; Fernández-Munilla, I.; Gutiérrez-Álvarez, C.; Pousada-García, T.; Acevedo-Rodríguez, F.J.; Maldonado-Bascón, S. Assistive Robot with an AI-Based Application for the Reinforcement of Activities of Daily Living: Technical Validation with Users Affected by Neurodevelopmental Disorders. Appl. Sci. 2022, 12, 9566. [Google Scholar] [CrossRef]
- Yu, J.; Gao, H.; Zhou, D.; Liu, J.; Gao, Q.; Ju, Z. Deep temporal model-based identity-aware hand detection for space human–robot interaction. IEEE Trans. Cybern. 2021, 52, 13738–13751. [Google Scholar] [CrossRef] [PubMed]
- Huang, C.; Wu, Z.; Wen, J.; Xu, Y.; Jiang, Q.; Wang, Y. Abnormal event detection using deep contrastive learning for intelligent video surveillance system. IEEE Trans. Ind. Inform. 2021, 18, 5171–5179. [Google Scholar] [CrossRef]
- Chen, J.; Li, K.; Deng, Q.; Li, K.; Yu, P.S. Distributed deep learning model for intelligent video surveillance systems with edge computing. IEEE Trans. Ind. Inform. 2019, 1–8. [Google Scholar] [CrossRef] [Green Version]
- Qureshi, S.A.; Hussain, L.; Chaudhary, Q.u.a.; Abbas, S.R.; Khan, R.J.; Ali, A.; Al-Fuqaha, A. Kalman filtering and bipartite matching based super-chained tracker model for online multi object tracking in video sequences. Appl. Sci. 2022, 12, 9538. [Google Scholar] [CrossRef]
- Shuai, B.; Berneshawi, A.; Li, X.; Modolo, D.; Tighe, J. Siammot: Siamese multi-object tracking. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 12372–12382. [Google Scholar]
- Li, M.; Gao, J.; Zhao, L.; Shen, X. Adaptive computing scheduling for edge-assisted autonomous driving. IEEE Trans. Veh. Technol. 2021, 70, 5318–5331. [Google Scholar] [CrossRef]
- Gad, A.; Basmaji, T.; Yaghi, M.; Alheeh, H.; Alkhedher, M.; Ghazal, M. Multiple Object Tracking in Robotic Applications: Trends and Challenges. Appl. Sci. 2022, 12, 9408. [Google Scholar] [CrossRef]
- Jin, X.; Zhang, J.; Kong, J.; Su, T.; Bai, Y. A reversible automatic selection normalization (RASN) deep network for predicting in the smart agriculture system. Agronomy 2022, 12, 591. [Google Scholar] [CrossRef]
- Shadrin, D.; Menshchikov, A.; Somov, A.; Bornemann, G.; Hauslage, J.; Fedorov, M. Enabling precision agriculture through embedded sensing with artificial intelligence. IEEE Trans. Instrum. Meas. 2019, 69, 4103–4113. [Google Scholar] [CrossRef]
- Qiu, J.; Yan, X.; Wang, W.; Wei, W.; Fang, K. Skeleton-Based Abnormal Behavior Detection Using Secure Partitioned Convolutional Neural Network Model. IEEE J. Biomed. Health Inform. 2021, 26, 5829–5840. [Google Scholar] [CrossRef] [PubMed]
- Dawadi, P.N.; Cook, D.J.; Schmitter-Edgecombe, M. Automated cognitive health assessment from smart home-based behavior data. IEEE J. Biomed. Health Inform. 2015, 20, 1188–1194. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sivaraman, S.; Trivedi, M.M. Looking at vehicles on the road: A survey of vision-based vehicle detection, tracking, and behavior analysis. IEEE Trans. Intell. Transp. Syst. 2013, 14, 1773–1795. [Google Scholar] [CrossRef] [Green Version]
- Bochinski, E.; Eiselein, V.; Sikora, T. High-Speed tracking-by-detection without using image information. In Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, Computer Sociey, AVSS 2017, Lecce, Italy, 29 August–1 September 2017; pp. 1–6. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Sheng, H.; Wu, Y.; Wang, S.; Lyu, W.; Ke, W.; Xiong, Z. Long-term tracking with deep tracklet association. IEEE Trans. Image Process. 2020, 29, 6694–6706. [Google Scholar] [CrossRef]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.T.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing, ICIP 2016, Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar] [CrossRef] [Green Version]
- Sun, Y.; Wang, X.; Tang, X. Deep Learning Face Representation from Predicting 10, 000 Classes. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Computer Society CVPR 2014, Columbus, OH, USA, 23–28 June 2014; pp. 1891–1898. [Google Scholar] [CrossRef] [Green Version]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing, ICIP 2017, Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar] [CrossRef] [Green Version]
- Zhang, G.; Yin, J.; Deng, P.; Sun, Y.; Zhou, L.; Zhang, K. Achieving Adaptive Visual Multi-Object Tracking with Unscented Kalman Filter. Sensors 2022, 22, 9106. [Google Scholar] [CrossRef]
- Wang, Z.; Zheng, L.; Liu, Y.; Li, Y.; Wang, S. Towards Real-Time Multi-Object Tracking. In Proceedings of the Computer Vision-ECCV 2020—16th European Conference, Glasgow, UK, 23–28 August 2020; Lecture Notes in Computer Science. Vedaldi, A., Bischof, H., Brox, T., Frahm, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12356, pp. 107–122. [Google Scholar] [CrossRef]
- Yoo, Y.S.; Lee, S.H.; Bae, S.H. Effective Multi-Object Tracking via Global Object Models and Object Constraint Learning. Sensors 2022, 22, 7943. [Google Scholar] [CrossRef]
- Boragule, A.; Jang, H.; Ha, N.; Jeon, M. Pixel-Guided Association for Multi-Object Tracking. Sensors 2022, 22, 8922. [Google Scholar] [CrossRef]
- Bergmann, P.; Meinhardt, T.; Leal-Taixe, L. Tracking without bells and whistles. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 941–951. [Google Scholar]
- Zhang, Y.; Wang, C.; Wang, X.; Zeng, W.; Liu, W. FairMOT: On the Fairness of Detection and Re-identification in Multiple Object Tracking. Int. J. Comput. Vis. 2021, 129, 3069–3087. [Google Scholar] [CrossRef]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6568–6577. [Google Scholar] [CrossRef] [Green Version]
- Xu, Y.; Ban, Y.; Delorme, G.; Gan, C.; Rus, D.; Alameda-Pineda, X. TransCenter: Transformers with Dense Queries for Multiple-Object Tracking. arXiv 2021, arXiv:2103.15145. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Computer Vision Foundation, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef] [Green Version]
- Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial Transformer Networks. In Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 7–12 December 2015; Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R., Eds.; IEEE: Piscataway, NJ, USA, 2015; pp. 2017–2025. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision-ECCV 2018—15th European Conference, Munich, Germany, 8–14 September 2018; Lecture Notes in Computer Science, Part VII. Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11211, pp. 3–19. [Google Scholar] [CrossRef] [Green Version]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Computer Vision Foundation, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019; pp. 3146–3154. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Yu, F.; Wang, D.; Shelhamer, E.; Darrell, T. Deep Layer Aggregation. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Computer Vision Foundation, Computer Society, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2403–2412. [Google Scholar] [CrossRef]
- Zhou, X.; Jia, Y.; Bai, C.; Zhu, H.; Chan, S. Multi-object tracking based on attention networks for Smart City system. Sustain. Energy Technol. Assess. 2022, 52, 102216. [Google Scholar] [CrossRef]
- Mahmoudi, N.; Ahadi, S.M.; Rahmati, M. Multi-target tracking using CNN-based features: CNNMTT. Multim. Tools Appl. 2019, 78, 7077–7096. [Google Scholar] [CrossRef]
- Leal-Taixé, L.; Milan, A.; Reid, I.D.; Roth, S.; Schindler, K. MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking. arXiv 2015, arXiv:1504.01942. [Google Scholar] [CrossRef]
- Milan, A.; Leal-Taixé, L.; Reid, I.D.; Roth, S.; Schindler, K. MOT16: A Benchmark for Multi-Object Tracking. arXiv 2016, arXiv:1603.00831. [Google Scholar] [CrossRef]
- Dendorfer, P.; Rezatofighi, H.; Milan, A.; Shi, J.; Cremers, D.; Reid, I.D.; Roth, S.; Schindler, K.; Leal-Taixé, L. MOT20: A benchmark for multi object tracking in crowded scenes. arXiv 2020, arXiv:2003.09003. [Google Scholar] [CrossRef]
- Shao, S.; Zhao, Z.; Li, B.; Xiao, T.; Yu, G.; Zhang, X.; Sun, J. CrowdHuman: A Benchmark for Detecting Human in a Crowd. arXiv 2018, arXiv:1805.00123. [Google Scholar] [CrossRef]
- Bernardin, K.; Stiefelhagen, R. Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. EURASIP J. Image Video Process. 2008, 2008, 246309. [Google Scholar] [CrossRef] [Green Version]
- Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 17–35. [Google Scholar]
- Sanchez-Matilla, R.; Poiesi, F.; Cavallaro, A. Online Multi-target Tracking with Strong and Weak Detections. In Lecture Notes in Computer Science Part II, Proceedings of the Computer Vision-ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–10 October 2016; Hua, G., Jégou, H., Eds.; IEEE: Piscataway, NJ, USA, 2016; Volume 9914, pp. 84–99. [Google Scholar] [CrossRef]
- Wan, X.; Wang, J.; Kong, Z.; Zhao, Q.; Deng, S. Multi-Object Tracking Using Online Metric Learning with Long Short-Term Memory. In Proceedings of the 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, 7–10 October 2018; pp. 788–792. [Google Scholar] [CrossRef]
- Pang, B.; Li, Y.; Zhang, Y.; Li, M.; Lu, C. TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Computer Vision Foundation, CVPR 2020, Seattle, WA, USA, 13–19 June 2020; pp. 6307–6317. [Google Scholar] [CrossRef]
- Peng, J.; Wang, C.; Wan, F.; Wu, Y.; Wang, Y.; Tai, Y.; Wang, C.; Li, J.; Huang, F.; Fu, Y. Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking. In Lecture Notes in Computer Science, Part IV, Proceedings of the Computer Vision-ECCV 2020—16th European Conference, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12349, pp. 145–161. [Google Scholar] [CrossRef]
- Sun, S.; Akhtar, N.; Song, H.; Mian, A.; Shah, M. Deep Affinity Network for Multiple Object Tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 104–119. [Google Scholar] [CrossRef] [Green Version]
- Zhou, X.; Koltun, V.; Krähenbühl, P. Tracking Objects as Points. In Lecture Notes in Computer Science, Part IV, Proceedings of the Computer Vision-ECCV 2020—16th European Conference, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12349, pp. 474–490. [Google Scholar] [CrossRef]
MOTA ↑ | IDF1 ↑ | MT ↑ | ML ↓ | IDs ↓ | |
---|---|---|---|---|---|
FairMOT [25] | 61.00% | 65.3% | 66.80% | 7.6% | 5243 |
Ours | 62.70% | 66.84% | 66.92 % | 7.63% | 4806 |
MOTA ↑ | IDF1 ↑ | MT ↑ | ML ↓ | IDs ↓ | |
---|---|---|---|---|---|
EAMTT [42] | 52.5% | 53.3% | 19.9% | 34.9% | 910 |
SORT [17] | 59.8% | 53.8% | 25.4% | 22.7% | 1423 |
VMaxx [43] | 62.6% | 49.2% | 32.7% | 21.1% | 1389 |
TubeTK [44] | 64.0% | 59.4% | 33.5% | 19.4% | 1117 |
JDE [21] | 64.4% | 55.8% | 35.4% | 20.0% | 1544 |
CNNMTT [35] | 65.2% | 62.2% | 32.4% | 21.3% | 946 |
CTrackV1 [45] | 67.6% | 57.2% | 32.9% | 23.1% | 5529 |
FairMOT [25] | 74.7% | 73.4% | 44.7% | 15.9% | 1074 |
Ours | 74.8% | 74.7% | 41.5% | 19.0% | 819 |
MOTA ↑ | IDF1 ↑ | MT ↑ | ML ↓ | IDs ↓ | |
---|---|---|---|---|---|
SST [46] | 52.40% | 49.50% | 21.40% | 30.70% | 8431 |
TubeTK [44] | 63.00% | 58.60% | 31.20% | 19.90% | 4137 |
CTrackV1 [45] | 66.60% | 57.40% | 32.20% | 24.20% | 5529 |
CenterTrack [47] | 67.80% | 64.70% | 34.60% | 24.60% | 2583 |
FairMOT [25] | 73.10% | 72.70% | 41.10% | 19.00% | 2964 |
Ours | 73.20% | 73.90% | 39.70% | 21.00% | 2553 |
MOTA ↑ | IDF1 ↑ | MT ↑ | ML ↓ | IDs ↓ | |
---|---|---|---|---|---|
Without OS | 61.21% | 64.70% | 66.63% | 8.60% | 5568 |
With OS | 62.70% | 66.84% | 66.92% | 7.63% | 4806 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, X.; Chan, S.; Qiu, C.; Jiang, X.; Tang, T. Multi-Target Tracking Based on a Combined Attention Mechanism and Occlusion Sensing in a Behavior-Analysis System. Sensors 2023, 23, 2956. https://doi.org/10.3390/s23062956
Zhou X, Chan S, Qiu C, Jiang X, Tang T. Multi-Target Tracking Based on a Combined Attention Mechanism and Occlusion Sensing in a Behavior-Analysis System. Sensors. 2023; 23(6):2956. https://doi.org/10.3390/s23062956
Chicago/Turabian StyleZhou, Xiaolong, Sixian Chan, Chenhao Qiu, Xiaodan Jiang, and Tinglong Tang. 2023. "Multi-Target Tracking Based on a Combined Attention Mechanism and Occlusion Sensing in a Behavior-Analysis System" Sensors 23, no. 6: 2956. https://doi.org/10.3390/s23062956