iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1007/S11042-023-17983-2
Exploring the State-of-the-Art in Multi-Object Tracking: A Comprehensive Survey, Evaluation, Challenges, and Future Directions | Multimedia Tools and Applications Skip to main content
Log in

Exploring the State-of-the-Art in Multi-Object Tracking: A Comprehensive Survey, Evaluation, Challenges, and Future Directions

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Multiple object tracking (MOT), as a typical application scenario of computer vision, has attracted significant attention from both academic and industrial communities. With its rapid development, MOT has becomes an hot topic. However, maintaining robust MOT in complex scenarios still faces significant challenges, such as irregular motion patterns, similar appearances, and frequent occlusions. Based on an extensive investigation into the state-of-the-art MOT, this survey has made the following efforts: 1) listing down preceding MOT approaches and current classifications; 2) surveying the MOT metrics and benchmark databases; 3) evaluating the MOT approaches frequently employed; 4) discussing the main challenges for MOT; and 5) putting forward potential directions for the development of future MOT approaches. By doing so, it strives to provide a systematic and comprehensive overview of existing MOT methods from SDE to TBA perspectives, thereby promoting further research into this emerging and important field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

All relevant data are within the paper.

References

  1. Seidenschwarz J, Brasó G, Serrano VC, Elezi I, Leal-Taixé L (2023) Simple cues lead to a strong multi-object tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13813–13823. https://doi.org/10.1109/CVPR52729.2023.01327

    Book  Google Scholar 

  2. Li S, Fischer T, Ke L, Ding H, Danelljan M, Yu F (2023) Ovtrack: Open vocabulary multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5567–5577. https://doi.org/10.1109/CVPR52729.2023.00539

    Book  Google Scholar 

  3. Wu D, Han W, Wang T, Dong X, Zhang X, Shen J (2023) Referring multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14633–14642. https://doi.org/10.1109/CVPR52729.2023.01406

    Book  Google Scholar 

  4. Meimetis D, Daramouskas I, Perikos I, Hatzilygeroudis I (2023) Real-time multiple object tracking using deep learning methods. Neural Comput Appl 35(1):89–118

    Google Scholar 

  5. Yin J, Wang W, Meng Q, Yang R, Shen J (2020) A unified object motion and affinity model for online multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6768–6777. https://doi.org/10.1109/CVPR42600.2020.00680

    Book  Google Scholar 

  6. Welch G, Bishop G (1995) An introduction to the kalman filter. In: Proceedings of international conference on computer graphics and interactive techniques, pp 1–16

    Google Scholar 

  7. Hu W, Li X, Luo W, Zhang X, Maybank S, Zhang Z (2012) Single and multiple object tracking using log-euclidean riemannian subspace and block-division appearance model. IEEE Trans Pattern Anal Mach Intell 34(12):2420–2440

    Google Scholar 

  8. Zhang L, Van Der Maaten L (2013) Preserving structure in model-free tracking. IEEE Trans Pattern Anal Mach Intell 36(4):756–769

    Google Scholar 

  9. Morimitsu H, Bloch I, Cesar-Jr RM (2017) Exploring structure for long-term tracking of multiple objects in sports videos. Comput Vis Image Underst 159:89–104

    Google Scholar 

  10. Ošep A, Mehner W, Voigtlaender P, Leibe B (2018) Track, then decide: Category-agnostic vision-based multi-object tracking. In: Proceedings of the IEEE international conference on robotics and automation (ICRA), pp 3494–3501. https://doi.org/10.1109/ICRA.2018.8460975

    Chapter  Google Scholar 

  11. Zhang L, Maaten L (2013) Structure preserving object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1838–1845. https://doi.org/10.1109/CVPR.2013.240

    Chapter  Google Scholar 

  12. Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 3464–3468. https://doi.org/10.1109/ICIP.2016.7533003

    Chapter  Google Scholar 

  13. Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 3645–3649. https://doi.org/10.1109/ICIP.2017.8296962

    Chapter  Google Scholar 

  14. Cao J, Pang J, Weng X, Khirodkar R, Kitani K (2023) Observation-centric sort: Rethinking sort for robust multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9686–9696. https://doi.org/10.1109/CVPR52729.2023.00934

    Book  Google Scholar 

  15. Meneses M, Matos L, Prado B, Carvalho A, Macedo H (2020) Learning to associate detections for real-time multiple object tracking. https://doi.org/10.48550/arXiv.2007.06041

  16. Aharon N, Orfaig R, Bobrovsky BZ (2022) Bot-sort: Robust associations multi-pedestrian tracking. Comput Vis Pattern Recognit. https://doi.org/10.48550/arXiv.2206.14651

  17. Du Y, Zhao Z, Song Y, Zhao Y, Su F, Gong T, Meng H (2023) Strongsort: Make deepsort great again. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2023.3240881

  18. Zhang Y, Sun P, Jiang Y, Yu D, Weng F, Yuan Z, Luo P, Liu W, Wang X (2022) Bytetrack: Multi-object tracking by associating every detection box. In: Proceedings of the european conference on computer vision, pp 1–21. https://doi.org/10.48550/arXiv.2110.06864

    Chapter  Google Scholar 

  19. Ren H, Han S, Ding H, Zhang Z, Wang H, Wang F (2023) Focus on details: Online multi-object tracking with diverse fine-grained representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11289–11298. https://doi.org/10.1109/CVPR52729.2023.01086

    Book  Google Scholar 

  20. Kong J, Mo E, Jiang M, Liu T (2022) Motfr: Multiple object tracking based on feature recoding. IEEE Trans Circuits Syst Video Technol 32(11):7746–7757

    Google Scholar 

  21. Jiang M, Zhou C, Kong J (2022) Aoh: Online multiple object tracking with adaptive occlusion handling. IEEE Signal Process Lett 29:1644–1648

    Google Scholar 

  22. Li C, Dobler G, Feng X, Tracknet WY (2019) Tracknet: Simultaneous object detection and tracking and its application in traffic video analysis. https://doi.org/10.48550/arXiv.1902.01466

  23. Sun S, Akhtar N, Song H, Mian A, Shah M (2019) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43(1):104–119

    Google Scholar 

  24. Liang C, Zhang Z, Zhou X, Li B, Zhu S, Hu W (2022) Rethinking the competition between detection and reid in multiobject tracking. IEEE Trans Image Process 31:3182–3196

    Google Scholar 

  25. Chu P, Wang J, You Q, Ling H, Liu Z (2023) Transmot: Spatial-temporal graph transformer for multiple object tracking. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 4870–4880. https://doi.org/10.1109/WACV56688.2023.00485

    Book  Google Scholar 

  26. Xu J, Cao Y, Zhang Z, Hu H (2019) Spatial-temporal relation networks for multi-object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3988–3998. https://doi.org/10.1109/ICCV.2019.00409

    Book  Google Scholar 

  27. Ciaparrone G, Sánchez FL, Tabik S, Troiano L, Tagliaferri R, Herrera F (2020) Deep learning in video multi-object tracking: A survey. Neurocomputing 381:61–88

    Google Scholar 

  28. Emami P, Pardalos PM, Elefteriadou L, Ranka S (2020) Machine learning methods for data association in multi-object tracking. ACM Computing Surveys (CSUR) 53(4):1–34

    Google Scholar 

  29. Rakai L, Song H, Sun S, Zhang W, Yang Y (2022) Data association in multiple object tracking: A survey of recent techniques. Expert Syst Appl 192:116300

    Google Scholar 

  30. Park Y, Dang LM, Lee S, Han D, Moon H (2021) Multiple object tracking in deep learning approaches: A survey. Electronics 10(19):2406

    Google Scholar 

  31. Camplani M, Paiement A, Mirmehdi M, Damen D, Hannuna S, Burghardt T, Tao L (2017) Multiple human tracking in rgbdepth data: A survey. IET Comput Vision 11(4):265–285

    Google Scholar 

  32. Luo W, Xing J, Milan A, Zhang X, Liu W, Kim TK (2021) Multiple object tracking: A literature review. Artif Intell 293:103448

    MathSciNet  Google Scholar 

  33. Cao ZQ, Sai B, Lu X (2020) Review of pedestrian tracking: Algorithms and applications. Acta Phys Sin 69(8):084203-1-084203-18

    Google Scholar 

  34. Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell 51:6400–6429

    Google Scholar 

  35. Sun P, Cao JK, Jiang Y, Yuan ZH, Bai S, Kitani K, Luo P (2022) DanceTrack: Multi-object tracking in uniform appearance and diverse motion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 20961–20970. https://doi.org/10.1109/CVPR52688.2022.02032

    Chapter  Google Scholar 

  36. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  37. Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In: Proceedings of the neural information processing systems, pp 2553–2561

    Google Scholar 

  38. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: Integrated recognition, localization and detection using convolutional networks. In: Proceedings of the international conference on learning representations

    Google Scholar 

  39. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  40. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969. https://doi.org/10.1109/ICCV.2017.322

    Book  Google Scholar 

  41. Sun J, Chen L, Xie Y, Zhang S, Jiang Q, Zhou X, Bao H (2020) Disp R-CNN: Stereo 3d object detection via shape prior guided instance disparity estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10548–10557. https://doi.org/10.1109/CVPR42600.2020.01056

    Book  Google Scholar 

  42. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg A.C (2016) Ssd: Single shot multibox detector. In: Proceedings of the european conference on computer vision, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

  43. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 99:2999–3007

    Google Scholar 

  44. Wang CY, Bochkovskiy A, Liao HYM (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721

    Book  Google Scholar 

  45. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proceedings of the european conference on computer vision (ECCV), pp 213–229. https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  46. Gupta A, Narayan S, Joseph KJ, Khan S, Khan FS, Shah M (2022) Ow-detr: Open-world detection transformer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9225–9234. https://doi.org/10.1109/CVPR52688.2022.00902

    Book  Google Scholar 

  47. Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: Deformable transformers for end-to-end object detection. https://doi.org/10.48550/arXiv.2010.04159

  48. Sun P, Tan M, Wang W, Liu C, Xia F, Leng Z, Anguelov D (2022) Swformer: Sparse window transformer for 3d object detection in point clouds. In: Proceedings of the European conference on computer vision, pp 426–442. https://doi.org/10.1007/978-3-031-20080-9_25

    Book  Google Scholar 

  49. Wang X, Doretto G, Sebastian T, Rittscher J, Tu P (2007) Shape and appearance context modeling. In: Proceedings of the IEEE 11th international conference on computer vision, pp 1–8. https://doi.org/10.1109/ICCV.2007.4409019

  50. Farenzena M, Bazzani L, Perina A, Murino V, Cristani M (2010) Person re-identification by symmetry-driven accumulation of local features. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 2360–2367. https://doi.org/10.1109/CVPR.2010.5539926

    Book  Google Scholar 

  51. Zhao R, Ouyang W, Wang X (2013) Unsupervised salience learning for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3586–3593. https://doi.org/10.1109/CVPR.2013.460

    Book  Google Scholar 

  52. Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2197–2206. https://doi.org/10.1109/CVPR.2015.7298832

    Book  Google Scholar 

  53. Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int J Comput Vision 129:3069–3087

    Google Scholar 

  54. Xiao T, Li S, Wang B, Lin WX (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3415–3424. https://doi.org/10.1109/CVPR.2017.360

    Book  Google Scholar 

  55. Liu H, Feng J, Qi M, Jiang J, Yan S (2017) End-to-end comparative attention networks for person re-identification. IEEE Trans Image Process 26(7):3492–3506

    MathSciNet  Google Scholar 

  56. Chang X, Huang PY, Shen YD, Liang X, Yang Y, Hauptmann AG (2018) Rcaa: Relational context-aware agents for person search. In: Proceedings of the European conference on computer vision (ECCV), pp 84–100. https://doi.org/10.1007/978-3-030-01240-3_6

  57. Wang Z, Zheng L, Liu Y, Li Y, Wang S (2020) Towards real-time multi-object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 107–122. https://doi.org/10.1007/978-3-030-58621-8_7

  58. Lu Z, Rathod V, Votel R, Huang J (2020) Retinatrack: Online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14668–14678. https://doi.org/10.1109/CVPR42600.2020.01468

    Book  Google Scholar 

  59. Chen D, Zhang S, Yang J, Schiele B (2021) Norm-aware embedding for efficient person search and tracking. Int J Comput Vision 129:3154–3168

    Google Scholar 

  60. Yoon JH, Lee CR, Yang MH, Yoon KJ (2016) Online multi-object tracking via structural constraint event aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1392–1400. https://doi.org/10.1109/CVPR.2016.155

    Book  Google Scholar 

  61. Bochinski E, Eiselein V, Sikora T (2017) High-speed tracking-by-detection without using image information. In: Proceedings of the 14th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–6. https://doi.org/10.1109/avss.2017.8078516

  62. Zhou H, Ouyang W, Cheng J, Wang X, Li H (2018) Deep continuous conditional random fields with asymmetric inter-object constraints for online multi-object tracking. IEEE Trans Circuits Syst Video Technol 29(4):1011–1022

    Google Scholar 

  63. Shan C, Wei C, Deng B, Huang J, Hua XS, Cheng X, Liang K (2020) Tracklets predicting based adaptive graph tracking. https://doi.org/10.48550/arXiv.2010.09015

  64. Girbau A, Giró-i-Nieto X, Rius I, Marqués F (2021) Multiple object tracking with mixture density networks for trajectory estimation. https://doi.org/10.48550/arXiv:2106.10950

  65. Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Wang C, Li J, Huang F, Fu Y (2020) Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 145–161. https://doi.org/10.1007/978-3-030-58548-8_9

  66. Pang B, Li Y, Zhang Y, Li LC (2020) Tubetk: Adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6308–6318. https://doi.org/10.1109/CVPR42600.2020.00634

    Book  Google Scholar 

  67. Han S, Huang P, Wang H, Yu E, Liu D, Pan X (2022) Mat: Motion-aware multi-object tracking. Neurocomputing 476:75–86

    Google Scholar 

  68. Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 941–951. https://doi.org/10.1109/ICCV.2019.00103

    Book  Google Scholar 

  69. Yu E, Li Z, Han S, Wang H (2022) Relationtrack: Relation-aware multiple object tracking with decoupled representation. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2022.3150169

  70. Liang C, Zhang Z, Zhou X, Li B, Lu Y (2022) One more check: Making “fake background” be tracked again. In: Proceedings of the AAAI conference on artificial intelligence, pp 1546–1554. https://doi.org/10.1609/aaai.v36i2.20045

    Book  Google Scholar 

  71. Liu Q, Chen D, Chu Q, Yuan L, Liu B, Zhang L, Yu N (2022) Online multi-object tracking with unsupervised re-identification learning and occlusion estimation. Neurocomputing 483:333–347

    Google Scholar 

  72. Cui YM, Yan LQ, Cao ZW, Liu DF (2021) TF-Blender: Temporal feature blender for video object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 8118–8127. https://doi.org/10.1109/ICCV48922.2021.00803

  73. Liu DF, Cui YM, Chen YJ, Zhang JY, Fan B (2020) Video object detection for autonomous driving: Motion-aid feature calibration. Neurocomputing 409:1–11

    Google Scholar 

  74. Sheng H, Zhang Y, Wu YB, Wang S, Lyu WF, Ke W, Xiong Z (2020) Hypothesis testing based tracking with spatio-temporal joint interaction modeling. IEEE Trans Circuits Syst Video Technol 30(9):2971–2983

    Google Scholar 

  75. Wang S, Sheng H, Zhang Y, Wu YB, Xiong Z (2021) A general recurrent tracking framework without real data. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 13219–13228. https://doi.org/10.1109/ICCV48922.2021.01297

    Chapter  Google Scholar 

  76. Wu H, Nie JH, Zhu ZM, He ZW, Gao MY (2022) Leveraging temporal-aware FNE-grained features for robust multiple object tracking. J Supercomput 79:2910–2931

    Google Scholar 

  77. Lang C, Braun A, Schillingmann L, Valada A (2023) Self-supervised multi-object tracking for autonomous driving from consistency across timescales. IEEE Robot Autom Lett 8(11):7711–7718

    Google Scholar 

  78. Zhou TF, Li JW, Li XY, Shao L (2021) Target-aware object discovery and association for unsupervised video multi-object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6985–6994. https://doi.org/10.1109/CVPR46437.2021.00691

  79. Peng JL, Wang T, Lin WY, Wang J, See J, Wen SL, Ding E (2020) TPM: Multiple object tracking with tracklet-plane matching. Pattern Recogn 107:107480

    Google Scholar 

  80. Mhalla A, Chateau T (2019) Improving multi-object tracking-by-detection model using a temporal interlaced encoding and a specialized deep detector. In: Proceedings of the IEEE intelligent vehicles symposium, pp 510–516. https://doi.org/10.1109/IVS.2019.8814102

    Book  Google Scholar 

  81. Zhao SY, Wu YB, Wang S, Ke W, Sheng H (2022) Mask guided spatial-temporal fusion network for multiple object tracking. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 3231–3235. https://doi.org/10.1109/ICIP46576.2022.9898054

    Chapter  Google Scholar 

  82. Zhang JJ, Wang MY, Jiang HR, Zhang XY, Yan CG, Zeng D (2023) STAT: Multi-object tracking based on spatio-temporal topological constraints. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2023.3323852

  83. You SS, Yao HT, Xu CS (2022) Multi-object tracking with spatial-temporal topology-based detector. IEEE Trans Circuits Syst Video Technol 32(5):3023–3035

    Google Scholar 

  84. Pang ZQ, Li J, Tokmakov P, Chen D, Zagoruyko S, Wang YX (2023) Standing between past and future spatio-temporal modeling for multi-camera 3D multi-object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 17928–17938. https://doi.org/10.1109/CVPR52729.2023.01719

  85. Wang YX, Kitani K, Weng XS (2021) Joint object detection and multi-object tracking with graph neural networks. In: Proceedings of the IEEE international conference on robotics and automation (ICRA), pp 13708–13715. https://doi.org/10.1109/ICRA48506.2021.9561110

  86. Wang SK, Sun YX, Wang Z, Liu M (2024) ST-TrackNet: A multiple-object tracking network using spatio-temporal information. IEEE Trans Autom Sci Eng 21(1):284–295. https://doi.org/10.1109/TASE.2022.3216450

    Article  Google Scholar 

  87. Zhu TY, Hiller M, Ehsanpour M, Ma RK, Drummond T, Rezatofighi H (2021) Looking beyond two frames: End-to-end multi-object tracking using spatial and temporal transformers. IEEE Trans Pattern Anal Mach Intell 45:12783–12797

    Google Scholar 

  88. Hu MJ, Zhu XT, Wang HT, Cao SX, Liu C, Song Q (2023) STDFormer: Spatial-temporal motion transformer for multiple object tracking. IEEE Trans Circuits Syst Video Technol 33(11):6571–6594

    Google Scholar 

  89. Yang M, Wu Y, Jia Y (2017) A hybrid data association framework for robust online multi-object tracking. IEEE Trans Image Process 26(12):5667–5679

    MathSciNet  Google Scholar 

  90. Yang M, Jia Y (2016) Temporal dynamic appearance modeling for online multi-person tracking. Comput Vis Image Underst 153:16–28

    Google Scholar 

  91. Guo S, Wang J, Wang X, Tao D (2021) Online multiple object tracking with cross-task synergy. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8136–8145. https://doi.org/10.1109/CVPR46437.2021.00804

    Chapter  Google Scholar 

  92. Xu Y, Osep A, Ban Y, Horaud R, LealTaixé L, Alameda-Pineda X (2020) How to train your deep multi-object tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6787–6796. https://doi.org/10.1109/CVPR42600.2020.00682

    Book  Google Scholar 

  93. Sadeghian A, Alahi A, Savarese S (2017) Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE international conference on computer vision, pp 300–311. https://doi.org/10.1109/ICCV.2017.41

    Book  Google Scholar 

  94. Rezatofighi SH, Milan A, Zhang Z, Shi Q, Dick A, Reid I (2015) Joint probabilistic data association revisited. In: Proceedings of the IEEE international conference on computer vision, pp 3047–3055. https://doi.org/10.1109/ICCV.2015.349

    Book  Google Scholar 

  95. Benfold B, Reid I (2011) Stable multi-target tracking in real-time surveillance video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3457–3464. https://doi.org/10.1109/CVPR.2011.5995667

    Book  Google Scholar 

  96. Kim C, Li F, Ciptadi A, Rehg JM (2015) Multiple hypothesis tracking revisited. In: Proceedings of the IEEE international conference on computer vision, pp 4696–4704. https://doi.org/10.1109/ICCV.2015.533

    Book  Google Scholar 

  97. Brasó G, Leal-Taixé L (2020) Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6247–6257. https://doi.org/10.1109/CVPR42600.2020.00628

    Book  Google Scholar 

  98. Gori M, Monfardini G, Scarselli F (2005) A new model for learning in graph domains. In: Proceedings of 2005 IEEE international joint conference on neural networks, pp 729–734. https://doi.org/10.1109/IJCNN.2005.1555942

  99. Zhang L, Li Y, Nevatia R (2008) Global data association for multi-object tracking using network flows. In: Proceedings of 2008 IEEE conference on computer vision and pattern recognition, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587584

  100. Chari V, Lacoste-Julien S, Laptev I, Sivic J (2015) On pairwise costs for network flow multi-object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5537–5545. https://doi.org/10.1109/CVPR.2015.7299193

    Book  Google Scholar 

  101. Butt AA, Collins RT (2013) Multi-target tracking by lagrangian relaxation to mincost network flow. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 1846–1853. https://doi.org/10.1109/CVPR.2013.241

  102. Berclaz J, Fleuret F, Turetken E, Fua P (2011) Multiple object tracking using k-shortest paths optimization. IEEE Trans Pattern Anal Mach Intell 33(9):1806–1819

    Google Scholar 

  103. Jiang H, Fels S, Little JJ (2007) A linear programming approach for multiple object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8. https://doi.org/10.1109/CVPR.2007.383180

    Book  Google Scholar 

  104. Pirsiavash H, Ramanan D, Fowlkes CC (2011) Globally-optimal greedy algorithms for tracking a variable number of objects. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1201–1208. https://doi.org/10.1109/CVPR.2011.5995604

    Book  Google Scholar 

  105. Roshan Zamir A, Dehghan A, Shah M (2012) Gmcp-tracker: Global multi-object tracking using generalized minimum clique graphs. In: Proceedings of the European conference on computer vision (ECCV), pp 343–356. https://doi.org/10.1007/978-3-642-33709-3_25

  106. Wang B, Wang G, Chan KL, Wang L (2016) Tracklet association by online target-specific metric learning and coherent dynamics estimation. IEEE Trans Pattern Anal Mach Intell 39(3):589–602

    Google Scholar 

  107. Xiang J, Xu G, Ma C, Hou J (2020) End-to-end learning deep crf models for multi-object tracking deep crf models. IEEE Trans Circuits Syst Video Technol 31(1):275–288

    Google Scholar 

  108. Brendel W, Amer M, Todorovic S (2011) Multiobject tracking as maximum weight independent set. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1273–1280. https://doi.org/10.1109/CVPR.2011.5995395

    Book  Google Scholar 

  109. Wang T, Chen K, Lin W, See J, Zhang Z, Xu Q, Jia X (2023) Spatio-temporal point process for multiple object tracking. IEEE Trans Neural Netw Learn Syst 34(4):1777–1788. https://doi.org/10.1109/TNNLS.2020.2997006

    Article  Google Scholar 

  110. Peng J, Gu Y, Wang Y, Wang C, Li J, Huang F (2020) Dense scene multiple object tracking with box-plane matching. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 4615–4619. https://doi.org/10.1145/3394171.3416283

  111. Ren W, Wang X, Tian J, Tang Y, Chan AB (2020) Tracking-by-counting: Using network flows on crowd density maps for tracking multiple targets. IEEE Trans Image Process 30:1439–1452

    MathSciNet  Google Scholar 

  112. He Y, Wei X, Hong X, Ke W, Gong Y (2022) Identity-quantity harmonic multi-object tracking. IEEE Trans Image Process 31:2201–2215

    Google Scholar 

  113. Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) Poi: Multiple object tracking with high performance detection and appearance feature. In: Proceedings of the European conference on computer vision (ECCV), pp 36–42. https://doi.org/10.1007/978-3-319-48881-3_3

  114. Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: Proceedings of the IEEE winter conference on applications of computer vision (WACV), pp 466–475. https://doi.org/10.1109/WACV.2018.00057

  115. Zhou Z, Xing J, Zhang M, Hu W (2018) Online multi-target tracking with tensor-based high-order graph matching. In: Proceedings of the 24th international conference on pattern recognition (ICPR), pp 1809–1814. https://doi.org/10.1109/ICPR.2018.8545450

  116. Mahmoudi N, Ahadi SM, Rahmati M (2019) Multi-target tracking using CNN-based features: CNNMTT. Multimed Tools Appl 78:7077–7096

    Google Scholar 

  117. Baisa NL (2021) Occlusion-robust online multi-object visual tracking using a GM-PHD filter with CNN-based re-identification. J Vis Commun Image Represent 80:103279

    Google Scholar 

  118. Yan LQ, Wang QF, Ma SQ, Wang JG, Yu CB (2022) Solve the puzzle of instance segmentation in videos: A weakly supervised framework with spatio-temporal collaboration. IEEE Trans Circuits Syst Video Technol 33:393–406

    Google Scholar 

  119. Liu DF, Cui YM, Yan LQ, Mousas C, Yang B, Chen YJ (2021) Densernet: Weakly supervised visual localization using multi-scale feature aggregation. In: Proceedings of the AAAI conference on artificial intelligence, pp 6101–6109. https://doi.org/10.1609/aaai.v35i7.16760

    Book  Google Scholar 

  120. Bastani F, He ST, Madden S (2021) Self-supervised multi-object tracking with cross-input consistency. Adv Neural Inf Process Syst 34:13695–13706

    Google Scholar 

  121. Su C, Zhang SL, Xing JL, Gao W, Tian Q (2016) Deep attributes driven multi-camera person re-identification. In: Proceedings of the European conference on computer vision (ECCV), pp 475–491. https://doi.org/10.1007/978-3-319-46475-6_30

  122. Huang K, Lertniphonphan K, Chen F, Li J, Wang ZP (2023) Multi-object tracking by self-supervised learning appearance model. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 3163–3169. https://doi.org/10.1109/CVPRW59228.2023.00318

  123. Engilberge M, Liu WZ, Fua P (2023) Multi-view tracking using weakly supervised human motion prediction. In: Proceedings of the IEEE Winter conference on applications of computer vision (WACV), pp 1582–1592. https://doi.org/10.1109/WACV56688.2023.00163

  124. Cucchiara R, Fabbri M (2022) Fine-grained human analysis under occlusions and perspective constraints in multimedia surveillance. ACM Trans Multimed Comput Commun Appl (TOMM) 18:1–23. https://doi.org/10.1145/3476839

    Article  Google Scholar 

  125. Kieritz H, Hubner W, Arens M (2018) Joint detection and online multi-object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1459–1467. https://doi.org/10.1109/CVPRW.2018.00195

    Book  Google Scholar 

  126. Shuai B, Berneshawi A, Wang M, Liu C, Modolo D, Li X, Tighe J (2020) Application of multi-object tracking with siamese track-RCNN to the human in events dataset. In: Proceedings of the 28th ACM international conference on multimedia, pp 4625–4629. https://doi.org/10.1145/3394171.3416297

  127. Liu K, Jin S, Fu ZH, Chen Z, Jiang RX, Ye JP (2023) Uncertainty-aware unsupervised multi-object tracking. In: Proceedings of the IEEE International conference on computer vision, pp 9962–9971. https://doi.org/10.1109/ICCV51070.2023.00917

    Book  Google Scholar 

  128. Li YL, Lu Y, Li J, Wang HZ (2023) Learning to reconnect interrupted trajectories for weakly supervised multi-object tracking. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1–5. https://doi.org/10.1109/ICASSP49357.2023.10095463

  129. Ruiz I, Porzi L, Bulò SR, Kontschieder P, Serrat J (2021) Weakly supervised multi-object tracking and segmentation. In: Proceedings of the IEEE winter conference on applications of computer vision (WACV), pp 125–133. https://doi.org/10.1109/WACVW52041.2021.00018

  130. Chu P, Ling H (2019) Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6172–6181. https://doi.org/10.1109/ICCV.2019.00627

    Book  Google Scholar 

  131. Shuai B, Berneshawi AG, Li XY, Modolo D, Tighe J (2021) SiamMOT: Siamese multi-object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 12372–12382. https://doi.org/10.1109/CVPR46437.2021.01219

  132. Pang JM, Qiu LL, Li X, Chen HF, Li Q, Darrell T, Yu F (2021) Quasi-dense similarity learning for multiple object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 164–173. https://doi.org/10.1109/CVPR46437.2021.00023

  133. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: Proceedings of the European conference on computer vision, pp 850–865. https://doi.org/10.1007/978-3-319-48881-3_56

    Book  Google Scholar 

  134. Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1142. https://doi.org/10.1109/CVPR.2016.158

    Book  Google Scholar 

  135. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980. https://doi.org/10.1109/CVPR.2018.00935

    Book  Google Scholar 

  136. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4282–4291. https://doi.org/10.1109/CVPR.2019.00441

    Book  Google Scholar 

  137. Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: Proceedings of the European conference on computer vision (ECCV), pp 474–490. https://doi.org/10.1007/978-3-030-58548-8_28

  138. Silva D, Alemu LT, Shah M (2020) CL-MOT: A contrastive learning framework for multi-object tracking. In: Proceedings of the British machine vision conference (BMCV), pp 1–13.

  139. Chung T, Cho M, Lee H, Lee S (2022) SSAT: Self-supervised associating network for multiobject tracking. IEEE Trans Circuits Syst Video Technol 32(11):7858–7868

    Google Scholar 

  140. Kim S, Lee J, Ko BC (2022) SSL-MOT: Self-supervised learning based multi-object tracking. Appl Intell 53:930–940

    Google Scholar 

  141. Wang Q, Zheng Y, Pan P, Xu Y (2021) Multiple object tracking with correlation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3876–3886. https://doi.org/10.1109/CVPR46437.2021.00387

    Book  Google Scholar 

  142. Tokmakov P, Li J, Burgard W, Gaidon A (2021) Learning to track with object permanence. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10860–10869. https://doi.org/10.1109/ICCV48922.2021.01068

    Book  Google Scholar 

  143. Wang G, Wang Y, Gu R, Hu W, Hwang JN (2022) Split and connect: A universal tracklet booster for multi-object tracking. IEEE Trans Multimed 25:1256–1268. https://doi.org/10.1109/TMM.2022.3140919

  144. Yang M, Liu S, Chen K, Zhang H, Zhao E, Zhao T (2020) A hierarchical clustering approach to fuzzy semantic representation of rare words in neural machine translation. IEEE Trans Fuzzy Syst 28(5):992–1002

    Google Scholar 

  145. Sun P, Cao J, Jiang Y, Zhang R, Xie E, Yuan Z, Wang C, Luo P (2020) Transtrack: Multiple object tracking with transformer. https://doi.org/10.48550/arXiv.2012.15460

  146. Meinhardt T, Kirillov A, Leal-Taixe L, Feichtenhofer C (2022) Trackformer: Multi-object tracking with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8844–8854. https://doi.org/10.1109/CVPR52688.2022.00864

    Book  Google Scholar 

  147. Xu Y, Ban Y, Delorme G, Gan C, Rus D, Alameda-Pineda X (2021) Transcenter: Transformers with dense queries for multiple-object tracking. https://doi.org/10.48550/arXiv.2103.1514

  148. Zeng F, Dong B, Zhang Y, Wang T, Zhang X, Wei Y (2022) Motr: End-to-end multiple-object tracking with transformer. In:Proceedings of the European Conference on Computer Vision (ECCV), pp 659–675. https://doi.org/10.1007/978-3-031-19812-0_38

  149. Chen X, Iranmanesh SM, Lien KC (2022) Patchtrack: Multiple object tracking using frame patches. https://doi.org/10.48550/arXiv:2201.00080

    Google Scholar 

  150. Leal-Taixé L, Milan A, Reid I, Roth S, Schindler K (2015) Motchallenge 2015: Towards a benchmark for multi-target tracking. https://doi.org/10.48550/arXiv.1504.01942

  151. Yang B, Yan J, Lei Z, Li SZ (2014) Aggregate channel features for multi-view face detection. In: Proceedings of the IEEE international joint conference on biometrics, pp 1–8. https://doi.org/10.1109/BTAS.2014.6996284

    Book  Google Scholar 

  152. Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K (2016) Mot16: A benchmark for multi-object tracking. https://doi.org/10.48550/arXiv.1603.00831

  153. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Google Scholar 

  154. Dendorfer P, Osep A, Milan A, Schindler K, Cremers D, Reid I, Roth S, Leal-Taixé L (2021) Motchallenge: A benchmark for singlecamera multiple target tracking. Int J Comput Vision 129:845–881

    Google Scholar 

  155. Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129–2137. https://doi.org/10.1109/CVPR.2016.234

    Book  Google Scholar 

  156. Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, Reid I, Roth S, Schindler K, Leal-Taixé L (2020) Mot20: A benchmark for multi object tracking in crowded scenes. https://doi.org/10.48550/arXiv.2003.09003

  157. Cheng ZY, Liang J, Tao GH, Liu DF, Zhang XY (2023) Adversarial training of self-supervised monocular depth estimation against physical-world attacks. Comput Vis Pattern Recognit. https://doi.org/10.48550/arXiv.2301.13487

  158. Qin ZY, Lu XK, Liu DF, Nie XS, Yin YL, Shen JB, Loui AC (2023) Reformulating graph kernels for self-supervised space-time correspondence learning. IEEE Trans Image Process 32:6543–6557

    Google Scholar 

  159. Wang WG, Han C, Zhou TF, Liu DF (2022) Visual recognition with deep nearest centroids. In: Proceedings of the international conference on learning representations (ICLR), pp 1–30

  160. Qin ZY, Lu XK, Nie XS, Liu DF, Yin YL, Wang WG (2023) Coarse-to-fine video instance segmentation with factorized conditional appearance flows. IEEE/CAA J Autom Sin 10:1192–1208

    Google Scholar 

  161. Liu DF, Liang J, Geng T, Loui AC, Zhou TF (2023) Tripartite feature enhanced pyramid network for dense prediction. IEEE Trans Image Process 32:2678–2692

    Google Scholar 

  162. Zhu P, Wen L, Du D, Bian X, Hu Q, Ling H (2020) Vision meets drones: Past, present and future. https://doi.org/10.48550/arXiv.2001.06303

  163. Du D, Qi Y, Yu H, Yang Y, Duan K, Li G, Zhang W, Huang Q, Tian Q (2018) The unmanned aerial vehicle benchmark: Object detection and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 370–386. https://doi.org/10.1007/978-3-030-01249-6_23

  164. Dave A, Khurana T, Tokmakov P, Schmid C, Ramanan D (2020) Tao: A large-scale benchmark for tracking any object. In: Proceedings of the European conference on computer vision (ECCV), pp 436–454. https://doi.org/10.1007/978-3-030-58558-7_26

  165. Gupta A, Dollar P, Girshick R (2019) Lvis: A dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5356–5364. https://doi.org/10.1109/CVPR.2019.00550

    Book  Google Scholar 

  166. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074

    Book  Google Scholar 

  167. Yu F, Chen H, Wang X, Xian W, Chen Y, Liu F, Madhavan V, Darrell T (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2636–2645. https://doi.org/10.1109/CVPR42600.2020.00271

    Book  Google Scholar 

  168. Wen L, Du D, Cai Z, Lei Z, Chang MC, Qi H, Lim J, Yang MH, Lyu S (2020) UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. Comput Vis Image Underst 193:102907

    Google Scholar 

  169. Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B, Vasudevan V, Han W, Ngiam J, Zhao H, Timofeev A, Ettinger S, Krivokon M, Gao A, Joshi A, Anguelov D (2020) Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2443–2451. https://doi.org/10.1109/CVPR42600.2020.00252

    Book  Google Scholar 

  170. Lin W, Liu H, Liu S, Li Y, Qian R, Wang T, Xu N, Xiong H, Qi GJ, Sebe N (2020) Human in events: A large-scale benchmark for human-centric video analysis in complex events. https://doi.org/10.48550/arXiv.2005.04490

  171. Athar A, Luiten J, Voigtlaender P, Khurana T, Dave A, Leibe B (1674–1683) Ramanan D (2023) Burst: A benchmark for unifying object recognition, segmentation and tracking in video. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1674–1683. https://doi.org/10.1109/WACV56688.2023.00172

  172. Voigtlaender P, Luo L, Yuan C, Jiang Y, Leibe B (2021) Reducing the annotation effort for video object segmentation datasets. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 3060–3069. https://doi.org/10.1109/WACV48630.2021.00310

    Book  Google Scholar 

  173. Sundararaman R, De Almeida BC, Marchand E, Pettre J (2021) Tracking pedestrian heads in dense crowd. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3865–3875. https://doi.org/10.1109/CVPR46437.2021.00386

    Book  Google Scholar 

  174. Weber M, Xie J, Collins M, Zhu Y, Voigtlaender P, Adam H, Green B, Geiger A, Leibe B, Cremers D, Osep A, Leal-Taixé L, Chen LC (2021) Step: Segmenting and tracking every pixel. https://doi.org/10.48550/arXiv.2102.11859

  175. Fabbri M, Brasó G, Maugeri G, Cetintas O, Gasparini R, Ošep A, Calderara S, Leal-Taixé L, Cucchiara R (2021) Motsynth: How can synthetic data help pedestrian detection and tracking? In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10849–10859. https://doi.org/10.1109/ICCV48922.2021.01067

    Book  Google Scholar 

  176. Pedersen M, Haurum JB, Bengtson SH, Moeslund TB (2020) 3d-zef: A 3d zebrafish tracking benchmark dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2426–2436. https://doi.org/10.1109/CVPR42600.2020.00250

    Book  Google Scholar 

  177. Anjum S, Gurari D (2020) Ctmc: Cell tracking with mitosis detection dataset challenge. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 982–983. https://doi.org/10.1109/CVPRW50498.2020.00499

    Book  Google Scholar 

  178. Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) Mots: Multi-object tracking and segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7942–7951. https://doi.org/10.1109/CVPR.2019.00813

    Book  Google Scholar 

  179. Andriluka M, Roth S, Schiele B (2010) Monocular 3d pose estimation and tracking by detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 623–630. https://doi.org/10.1109/CVPR.2010.5540156

    Book  Google Scholar 

  180. Ferryman J, Shahrokni A (2009) Pets2009: Dataset and challenge. In: Proceedings of the twelfth IEEE International workshop on performance evaluation of tracking and surveillance, pp 1–6. https://doi.org/10.1109/PETS-WINTER.2009.5399556

  181. Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J Image Vid Process 2008:1–10

    Google Scholar 

  182. Luiten J, Osep A, Dendorfer P, Torr P, Geiger A, Leal-Taixé L, Leibe B (2021) Hota: A higher order metric for evaluating multi-object tracking. Int J Comput Vision 129:548–578

    Google Scholar 

  183. Wu Y, Sheng H, Zhang Y, Wang S, Xiong Z, Ke W (2022) Hybrid motion model for multiple object tracking in mobile devices. IEEE Int Things J 10(6):4735–4748. https://doi.org/10.1109/JIOT.2022.3219627

    Article  Google Scholar 

  184. Hornakova A, Kaiser T, Swoboda P, Rolinek M, Rosenhahn B, Henschel R (2021) Making higher order mot scalable: An efficient approximate solver for lifted disjoint paths. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6330–6340. https://doi.org/10.1109/ICCV48922.2021.00627

    Book  Google Scholar 

  185. Zhang J, Zhou S, Chang X, Wan F, Wang J, Wu Y, Huang D (2020) Multiple object tracking by flowing and fusing. https://doi.org/10.48550/arXiv.2001.11180

  186. Zhang Y, Sheng H, Wu Y, Wang S, Ke W, Xiong Z (2020) Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet Things J 7(9):7892–7902

    Google Scholar 

  187. Chen L, Ai H, Zhuang Z, Shang C (2018) Real-time multiple people tracking with deeply learned candidate selection and person reidentification. In: Proceedings of 2018 IEEE international conference on multimedia and expo (ICME), pp 1–6. https://doi.org/10.1109/ICME.2018.8486597

  188. Son J, Baek M, Cho M, Han B (2017) Multi-object tracking with quadruplet convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5620–5629. https://doi.org/10.1109/CVPR.2017.403

    Book  Google Scholar 

  189. Chen J, Sheng H, Zhang Y, Xiong Z (2017) Enhancing detection model for multiple hypothesis tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 18–27. https://doi.org/10.1109/CVPRW.2017.266

    Book  Google Scholar 

Download references

Funding

This work was supported in part by the Natural Science Foundation of China under Grant 61671192, and in part by the National Science Foundation for Post-Doctoral Scientists of China under Grant 2017M114, and in part by the Top-Ranking Discipline a Class of Electronics Science and Technology in Zhejiang Province, China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingbiao Yao.

Ethics declarations

Conflicts of Interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, C., Lin, C., Jin, R. et al. Exploring the State-of-the-Art in Multi-Object Tracking: A Comprehensive Survey, Evaluation, Challenges, and Future Directions. Multimed Tools Appl 83, 73151–73189 (2024). https://doi.org/10.1007/s11042-023-17983-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17983-2

Keywords

Navigation