Abstract
Multiple object tracking (MOT), as a typical application scenario of computer vision, has attracted significant attention from both academic and industrial communities. With its rapid development, MOT has becomes an hot topic. However, maintaining robust MOT in complex scenarios still faces significant challenges, such as irregular motion patterns, similar appearances, and frequent occlusions. Based on an extensive investigation into the state-of-the-art MOT, this survey has made the following efforts: 1) listing down preceding MOT approaches and current classifications; 2) surveying the MOT metrics and benchmark databases; 3) evaluating the MOT approaches frequently employed; 4) discussing the main challenges for MOT; and 5) putting forward potential directions for the development of future MOT approaches. By doing so, it strives to provide a systematic and comprehensive overview of existing MOT methods from SDE to TBA perspectives, thereby promoting further research into this emerging and important field.
Similar content being viewed by others
Data availability
All relevant data are within the paper.
References
Seidenschwarz J, Brasó G, Serrano VC, Elezi I, Leal-Taixé L (2023) Simple cues lead to a strong multi-object tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13813–13823. https://doi.org/10.1109/CVPR52729.2023.01327
Li S, Fischer T, Ke L, Ding H, Danelljan M, Yu F (2023) Ovtrack: Open vocabulary multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5567–5577. https://doi.org/10.1109/CVPR52729.2023.00539
Wu D, Han W, Wang T, Dong X, Zhang X, Shen J (2023) Referring multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14633–14642. https://doi.org/10.1109/CVPR52729.2023.01406
Meimetis D, Daramouskas I, Perikos I, Hatzilygeroudis I (2023) Real-time multiple object tracking using deep learning methods. Neural Comput Appl 35(1):89–118
Yin J, Wang W, Meng Q, Yang R, Shen J (2020) A unified object motion and affinity model for online multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6768–6777. https://doi.org/10.1109/CVPR42600.2020.00680
Welch G, Bishop G (1995) An introduction to the kalman filter. In: Proceedings of international conference on computer graphics and interactive techniques, pp 1–16
Hu W, Li X, Luo W, Zhang X, Maybank S, Zhang Z (2012) Single and multiple object tracking using log-euclidean riemannian subspace and block-division appearance model. IEEE Trans Pattern Anal Mach Intell 34(12):2420–2440
Zhang L, Van Der Maaten L (2013) Preserving structure in model-free tracking. IEEE Trans Pattern Anal Mach Intell 36(4):756–769
Morimitsu H, Bloch I, Cesar-Jr RM (2017) Exploring structure for long-term tracking of multiple objects in sports videos. Comput Vis Image Underst 159:89–104
Ošep A, Mehner W, Voigtlaender P, Leibe B (2018) Track, then decide: Category-agnostic vision-based multi-object tracking. In: Proceedings of the IEEE international conference on robotics and automation (ICRA), pp 3494–3501. https://doi.org/10.1109/ICRA.2018.8460975
Zhang L, Maaten L (2013) Structure preserving object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1838–1845. https://doi.org/10.1109/CVPR.2013.240
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 3464–3468. https://doi.org/10.1109/ICIP.2016.7533003
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 3645–3649. https://doi.org/10.1109/ICIP.2017.8296962
Cao J, Pang J, Weng X, Khirodkar R, Kitani K (2023) Observation-centric sort: Rethinking sort for robust multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9686–9696. https://doi.org/10.1109/CVPR52729.2023.00934
Meneses M, Matos L, Prado B, Carvalho A, Macedo H (2020) Learning to associate detections for real-time multiple object tracking. https://doi.org/10.48550/arXiv.2007.06041
Aharon N, Orfaig R, Bobrovsky BZ (2022) Bot-sort: Robust associations multi-pedestrian tracking. Comput Vis Pattern Recognit. https://doi.org/10.48550/arXiv.2206.14651
Du Y, Zhao Z, Song Y, Zhao Y, Su F, Gong T, Meng H (2023) Strongsort: Make deepsort great again. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2023.3240881
Zhang Y, Sun P, Jiang Y, Yu D, Weng F, Yuan Z, Luo P, Liu W, Wang X (2022) Bytetrack: Multi-object tracking by associating every detection box. In: Proceedings of the european conference on computer vision, pp 1–21. https://doi.org/10.48550/arXiv.2110.06864
Ren H, Han S, Ding H, Zhang Z, Wang H, Wang F (2023) Focus on details: Online multi-object tracking with diverse fine-grained representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11289–11298. https://doi.org/10.1109/CVPR52729.2023.01086
Kong J, Mo E, Jiang M, Liu T (2022) Motfr: Multiple object tracking based on feature recoding. IEEE Trans Circuits Syst Video Technol 32(11):7746–7757
Jiang M, Zhou C, Kong J (2022) Aoh: Online multiple object tracking with adaptive occlusion handling. IEEE Signal Process Lett 29:1644–1648
Li C, Dobler G, Feng X, Tracknet WY (2019) Tracknet: Simultaneous object detection and tracking and its application in traffic video analysis. https://doi.org/10.48550/arXiv.1902.01466
Sun S, Akhtar N, Song H, Mian A, Shah M (2019) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43(1):104–119
Liang C, Zhang Z, Zhou X, Li B, Zhu S, Hu W (2022) Rethinking the competition between detection and reid in multiobject tracking. IEEE Trans Image Process 31:3182–3196
Chu P, Wang J, You Q, Ling H, Liu Z (2023) Transmot: Spatial-temporal graph transformer for multiple object tracking. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 4870–4880. https://doi.org/10.1109/WACV56688.2023.00485
Xu J, Cao Y, Zhang Z, Hu H (2019) Spatial-temporal relation networks for multi-object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3988–3998. https://doi.org/10.1109/ICCV.2019.00409
Ciaparrone G, Sánchez FL, Tabik S, Troiano L, Tagliaferri R, Herrera F (2020) Deep learning in video multi-object tracking: A survey. Neurocomputing 381:61–88
Emami P, Pardalos PM, Elefteriadou L, Ranka S (2020) Machine learning methods for data association in multi-object tracking. ACM Computing Surveys (CSUR) 53(4):1–34
Rakai L, Song H, Sun S, Zhang W, Yang Y (2022) Data association in multiple object tracking: A survey of recent techniques. Expert Syst Appl 192:116300
Park Y, Dang LM, Lee S, Han D, Moon H (2021) Multiple object tracking in deep learning approaches: A survey. Electronics 10(19):2406
Camplani M, Paiement A, Mirmehdi M, Damen D, Hannuna S, Burghardt T, Tao L (2017) Multiple human tracking in rgbdepth data: A survey. IET Comput Vision 11(4):265–285
Luo W, Xing J, Milan A, Zhang X, Liu W, Kim TK (2021) Multiple object tracking: A literature review. Artif Intell 293:103448
Cao ZQ, Sai B, Lu X (2020) Review of pedestrian tracking: Algorithms and applications. Acta Phys Sin 69(8):084203-1-084203-18
Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell 51:6400–6429
Sun P, Cao JK, Jiang Y, Yuan ZH, Bai S, Kitani K, Luo P (2022) DanceTrack: Multi-object tracking in uniform appearance and diverse motion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 20961–20970. https://doi.org/10.1109/CVPR52688.2022.02032
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In: Proceedings of the neural information processing systems, pp 2553–2561
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: Integrated recognition, localization and detection using convolutional networks. In: Proceedings of the international conference on learning representations
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969. https://doi.org/10.1109/ICCV.2017.322
Sun J, Chen L, Xie Y, Zhang S, Jiang Q, Zhou X, Bao H (2020) Disp R-CNN: Stereo 3d object detection via shape prior guided instance disparity estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10548–10557. https://doi.org/10.1109/CVPR42600.2020.01056
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg A.C (2016) Ssd: Single shot multibox detector. In: Proceedings of the european conference on computer vision, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 99:2999–3007
Wang CY, Bochkovskiy A, Liao HYM (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proceedings of the european conference on computer vision (ECCV), pp 213–229. https://doi.org/10.1007/978-3-030-58452-8_13
Gupta A, Narayan S, Joseph KJ, Khan S, Khan FS, Shah M (2022) Ow-detr: Open-world detection transformer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9225–9234. https://doi.org/10.1109/CVPR52688.2022.00902
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: Deformable transformers for end-to-end object detection. https://doi.org/10.48550/arXiv.2010.04159
Sun P, Tan M, Wang W, Liu C, Xia F, Leng Z, Anguelov D (2022) Swformer: Sparse window transformer for 3d object detection in point clouds. In: Proceedings of the European conference on computer vision, pp 426–442. https://doi.org/10.1007/978-3-031-20080-9_25
Wang X, Doretto G, Sebastian T, Rittscher J, Tu P (2007) Shape and appearance context modeling. In: Proceedings of the IEEE 11th international conference on computer vision, pp 1–8. https://doi.org/10.1109/ICCV.2007.4409019
Farenzena M, Bazzani L, Perina A, Murino V, Cristani M (2010) Person re-identification by symmetry-driven accumulation of local features. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 2360–2367. https://doi.org/10.1109/CVPR.2010.5539926
Zhao R, Ouyang W, Wang X (2013) Unsupervised salience learning for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3586–3593. https://doi.org/10.1109/CVPR.2013.460
Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2197–2206. https://doi.org/10.1109/CVPR.2015.7298832
Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int J Comput Vision 129:3069–3087
Xiao T, Li S, Wang B, Lin WX (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3415–3424. https://doi.org/10.1109/CVPR.2017.360
Liu H, Feng J, Qi M, Jiang J, Yan S (2017) End-to-end comparative attention networks for person re-identification. IEEE Trans Image Process 26(7):3492–3506
Chang X, Huang PY, Shen YD, Liang X, Yang Y, Hauptmann AG (2018) Rcaa: Relational context-aware agents for person search. In: Proceedings of the European conference on computer vision (ECCV), pp 84–100. https://doi.org/10.1007/978-3-030-01240-3_6
Wang Z, Zheng L, Liu Y, Li Y, Wang S (2020) Towards real-time multi-object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 107–122. https://doi.org/10.1007/978-3-030-58621-8_7
Lu Z, Rathod V, Votel R, Huang J (2020) Retinatrack: Online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14668–14678. https://doi.org/10.1109/CVPR42600.2020.01468
Chen D, Zhang S, Yang J, Schiele B (2021) Norm-aware embedding for efficient person search and tracking. Int J Comput Vision 129:3154–3168
Yoon JH, Lee CR, Yang MH, Yoon KJ (2016) Online multi-object tracking via structural constraint event aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1392–1400. https://doi.org/10.1109/CVPR.2016.155
Bochinski E, Eiselein V, Sikora T (2017) High-speed tracking-by-detection without using image information. In: Proceedings of the 14th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–6. https://doi.org/10.1109/avss.2017.8078516
Zhou H, Ouyang W, Cheng J, Wang X, Li H (2018) Deep continuous conditional random fields with asymmetric inter-object constraints for online multi-object tracking. IEEE Trans Circuits Syst Video Technol 29(4):1011–1022
Shan C, Wei C, Deng B, Huang J, Hua XS, Cheng X, Liang K (2020) Tracklets predicting based adaptive graph tracking. https://doi.org/10.48550/arXiv.2010.09015
Girbau A, Giró-i-Nieto X, Rius I, Marqués F (2021) Multiple object tracking with mixture density networks for trajectory estimation. https://doi.org/10.48550/arXiv:2106.10950
Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Wang C, Li J, Huang F, Fu Y (2020) Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 145–161. https://doi.org/10.1007/978-3-030-58548-8_9
Pang B, Li Y, Zhang Y, Li LC (2020) Tubetk: Adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6308–6318. https://doi.org/10.1109/CVPR42600.2020.00634
Han S, Huang P, Wang H, Yu E, Liu D, Pan X (2022) Mat: Motion-aware multi-object tracking. Neurocomputing 476:75–86
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 941–951. https://doi.org/10.1109/ICCV.2019.00103
Yu E, Li Z, Han S, Wang H (2022) Relationtrack: Relation-aware multiple object tracking with decoupled representation. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2022.3150169
Liang C, Zhang Z, Zhou X, Li B, Lu Y (2022) One more check: Making “fake background” be tracked again. In: Proceedings of the AAAI conference on artificial intelligence, pp 1546–1554. https://doi.org/10.1609/aaai.v36i2.20045
Liu Q, Chen D, Chu Q, Yuan L, Liu B, Zhang L, Yu N (2022) Online multi-object tracking with unsupervised re-identification learning and occlusion estimation. Neurocomputing 483:333–347
Cui YM, Yan LQ, Cao ZW, Liu DF (2021) TF-Blender: Temporal feature blender for video object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 8118–8127. https://doi.org/10.1109/ICCV48922.2021.00803
Liu DF, Cui YM, Chen YJ, Zhang JY, Fan B (2020) Video object detection for autonomous driving: Motion-aid feature calibration. Neurocomputing 409:1–11
Sheng H, Zhang Y, Wu YB, Wang S, Lyu WF, Ke W, Xiong Z (2020) Hypothesis testing based tracking with spatio-temporal joint interaction modeling. IEEE Trans Circuits Syst Video Technol 30(9):2971–2983
Wang S, Sheng H, Zhang Y, Wu YB, Xiong Z (2021) A general recurrent tracking framework without real data. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 13219–13228. https://doi.org/10.1109/ICCV48922.2021.01297
Wu H, Nie JH, Zhu ZM, He ZW, Gao MY (2022) Leveraging temporal-aware FNE-grained features for robust multiple object tracking. J Supercomput 79:2910–2931
Lang C, Braun A, Schillingmann L, Valada A (2023) Self-supervised multi-object tracking for autonomous driving from consistency across timescales. IEEE Robot Autom Lett 8(11):7711–7718
Zhou TF, Li JW, Li XY, Shao L (2021) Target-aware object discovery and association for unsupervised video multi-object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6985–6994. https://doi.org/10.1109/CVPR46437.2021.00691
Peng JL, Wang T, Lin WY, Wang J, See J, Wen SL, Ding E (2020) TPM: Multiple object tracking with tracklet-plane matching. Pattern Recogn 107:107480
Mhalla A, Chateau T (2019) Improving multi-object tracking-by-detection model using a temporal interlaced encoding and a specialized deep detector. In: Proceedings of the IEEE intelligent vehicles symposium, pp 510–516. https://doi.org/10.1109/IVS.2019.8814102
Zhao SY, Wu YB, Wang S, Ke W, Sheng H (2022) Mask guided spatial-temporal fusion network for multiple object tracking. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 3231–3235. https://doi.org/10.1109/ICIP46576.2022.9898054
Zhang JJ, Wang MY, Jiang HR, Zhang XY, Yan CG, Zeng D (2023) STAT: Multi-object tracking based on spatio-temporal topological constraints. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2023.3323852
You SS, Yao HT, Xu CS (2022) Multi-object tracking with spatial-temporal topology-based detector. IEEE Trans Circuits Syst Video Technol 32(5):3023–3035
Pang ZQ, Li J, Tokmakov P, Chen D, Zagoruyko S, Wang YX (2023) Standing between past and future spatio-temporal modeling for multi-camera 3D multi-object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 17928–17938. https://doi.org/10.1109/CVPR52729.2023.01719
Wang YX, Kitani K, Weng XS (2021) Joint object detection and multi-object tracking with graph neural networks. In: Proceedings of the IEEE international conference on robotics and automation (ICRA), pp 13708–13715. https://doi.org/10.1109/ICRA48506.2021.9561110
Wang SK, Sun YX, Wang Z, Liu M (2024) ST-TrackNet: A multiple-object tracking network using spatio-temporal information. IEEE Trans Autom Sci Eng 21(1):284–295. https://doi.org/10.1109/TASE.2022.3216450
Zhu TY, Hiller M, Ehsanpour M, Ma RK, Drummond T, Rezatofighi H (2021) Looking beyond two frames: End-to-end multi-object tracking using spatial and temporal transformers. IEEE Trans Pattern Anal Mach Intell 45:12783–12797
Hu MJ, Zhu XT, Wang HT, Cao SX, Liu C, Song Q (2023) STDFormer: Spatial-temporal motion transformer for multiple object tracking. IEEE Trans Circuits Syst Video Technol 33(11):6571–6594
Yang M, Wu Y, Jia Y (2017) A hybrid data association framework for robust online multi-object tracking. IEEE Trans Image Process 26(12):5667–5679
Yang M, Jia Y (2016) Temporal dynamic appearance modeling for online multi-person tracking. Comput Vis Image Underst 153:16–28
Guo S, Wang J, Wang X, Tao D (2021) Online multiple object tracking with cross-task synergy. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8136–8145. https://doi.org/10.1109/CVPR46437.2021.00804
Xu Y, Osep A, Ban Y, Horaud R, LealTaixé L, Alameda-Pineda X (2020) How to train your deep multi-object tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6787–6796. https://doi.org/10.1109/CVPR42600.2020.00682
Sadeghian A, Alahi A, Savarese S (2017) Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE international conference on computer vision, pp 300–311. https://doi.org/10.1109/ICCV.2017.41
Rezatofighi SH, Milan A, Zhang Z, Shi Q, Dick A, Reid I (2015) Joint probabilistic data association revisited. In: Proceedings of the IEEE international conference on computer vision, pp 3047–3055. https://doi.org/10.1109/ICCV.2015.349
Benfold B, Reid I (2011) Stable multi-target tracking in real-time surveillance video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3457–3464. https://doi.org/10.1109/CVPR.2011.5995667
Kim C, Li F, Ciptadi A, Rehg JM (2015) Multiple hypothesis tracking revisited. In: Proceedings of the IEEE international conference on computer vision, pp 4696–4704. https://doi.org/10.1109/ICCV.2015.533
Brasó G, Leal-Taixé L (2020) Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6247–6257. https://doi.org/10.1109/CVPR42600.2020.00628
Gori M, Monfardini G, Scarselli F (2005) A new model for learning in graph domains. In: Proceedings of 2005 IEEE international joint conference on neural networks, pp 729–734. https://doi.org/10.1109/IJCNN.2005.1555942
Zhang L, Li Y, Nevatia R (2008) Global data association for multi-object tracking using network flows. In: Proceedings of 2008 IEEE conference on computer vision and pattern recognition, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587584
Chari V, Lacoste-Julien S, Laptev I, Sivic J (2015) On pairwise costs for network flow multi-object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5537–5545. https://doi.org/10.1109/CVPR.2015.7299193
Butt AA, Collins RT (2013) Multi-target tracking by lagrangian relaxation to mincost network flow. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 1846–1853. https://doi.org/10.1109/CVPR.2013.241
Berclaz J, Fleuret F, Turetken E, Fua P (2011) Multiple object tracking using k-shortest paths optimization. IEEE Trans Pattern Anal Mach Intell 33(9):1806–1819
Jiang H, Fels S, Little JJ (2007) A linear programming approach for multiple object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8. https://doi.org/10.1109/CVPR.2007.383180
Pirsiavash H, Ramanan D, Fowlkes CC (2011) Globally-optimal greedy algorithms for tracking a variable number of objects. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1201–1208. https://doi.org/10.1109/CVPR.2011.5995604
Roshan Zamir A, Dehghan A, Shah M (2012) Gmcp-tracker: Global multi-object tracking using generalized minimum clique graphs. In: Proceedings of the European conference on computer vision (ECCV), pp 343–356. https://doi.org/10.1007/978-3-642-33709-3_25
Wang B, Wang G, Chan KL, Wang L (2016) Tracklet association by online target-specific metric learning and coherent dynamics estimation. IEEE Trans Pattern Anal Mach Intell 39(3):589–602
Xiang J, Xu G, Ma C, Hou J (2020) End-to-end learning deep crf models for multi-object tracking deep crf models. IEEE Trans Circuits Syst Video Technol 31(1):275–288
Brendel W, Amer M, Todorovic S (2011) Multiobject tracking as maximum weight independent set. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1273–1280. https://doi.org/10.1109/CVPR.2011.5995395
Wang T, Chen K, Lin W, See J, Zhang Z, Xu Q, Jia X (2023) Spatio-temporal point process for multiple object tracking. IEEE Trans Neural Netw Learn Syst 34(4):1777–1788. https://doi.org/10.1109/TNNLS.2020.2997006
Peng J, Gu Y, Wang Y, Wang C, Li J, Huang F (2020) Dense scene multiple object tracking with box-plane matching. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 4615–4619. https://doi.org/10.1145/3394171.3416283
Ren W, Wang X, Tian J, Tang Y, Chan AB (2020) Tracking-by-counting: Using network flows on crowd density maps for tracking multiple targets. IEEE Trans Image Process 30:1439–1452
He Y, Wei X, Hong X, Ke W, Gong Y (2022) Identity-quantity harmonic multi-object tracking. IEEE Trans Image Process 31:2201–2215
Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) Poi: Multiple object tracking with high performance detection and appearance feature. In: Proceedings of the European conference on computer vision (ECCV), pp 36–42. https://doi.org/10.1007/978-3-319-48881-3_3
Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: Proceedings of the IEEE winter conference on applications of computer vision (WACV), pp 466–475. https://doi.org/10.1109/WACV.2018.00057
Zhou Z, Xing J, Zhang M, Hu W (2018) Online multi-target tracking with tensor-based high-order graph matching. In: Proceedings of the 24th international conference on pattern recognition (ICPR), pp 1809–1814. https://doi.org/10.1109/ICPR.2018.8545450
Mahmoudi N, Ahadi SM, Rahmati M (2019) Multi-target tracking using CNN-based features: CNNMTT. Multimed Tools Appl 78:7077–7096
Baisa NL (2021) Occlusion-robust online multi-object visual tracking using a GM-PHD filter with CNN-based re-identification. J Vis Commun Image Represent 80:103279
Yan LQ, Wang QF, Ma SQ, Wang JG, Yu CB (2022) Solve the puzzle of instance segmentation in videos: A weakly supervised framework with spatio-temporal collaboration. IEEE Trans Circuits Syst Video Technol 33:393–406
Liu DF, Cui YM, Yan LQ, Mousas C, Yang B, Chen YJ (2021) Densernet: Weakly supervised visual localization using multi-scale feature aggregation. In: Proceedings of the AAAI conference on artificial intelligence, pp 6101–6109. https://doi.org/10.1609/aaai.v35i7.16760
Bastani F, He ST, Madden S (2021) Self-supervised multi-object tracking with cross-input consistency. Adv Neural Inf Process Syst 34:13695–13706
Su C, Zhang SL, Xing JL, Gao W, Tian Q (2016) Deep attributes driven multi-camera person re-identification. In: Proceedings of the European conference on computer vision (ECCV), pp 475–491. https://doi.org/10.1007/978-3-319-46475-6_30
Huang K, Lertniphonphan K, Chen F, Li J, Wang ZP (2023) Multi-object tracking by self-supervised learning appearance model. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 3163–3169. https://doi.org/10.1109/CVPRW59228.2023.00318
Engilberge M, Liu WZ, Fua P (2023) Multi-view tracking using weakly supervised human motion prediction. In: Proceedings of the IEEE Winter conference on applications of computer vision (WACV), pp 1582–1592. https://doi.org/10.1109/WACV56688.2023.00163
Cucchiara R, Fabbri M (2022) Fine-grained human analysis under occlusions and perspective constraints in multimedia surveillance. ACM Trans Multimed Comput Commun Appl (TOMM) 18:1–23. https://doi.org/10.1145/3476839
Kieritz H, Hubner W, Arens M (2018) Joint detection and online multi-object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1459–1467. https://doi.org/10.1109/CVPRW.2018.00195
Shuai B, Berneshawi A, Wang M, Liu C, Modolo D, Li X, Tighe J (2020) Application of multi-object tracking with siamese track-RCNN to the human in events dataset. In: Proceedings of the 28th ACM international conference on multimedia, pp 4625–4629. https://doi.org/10.1145/3394171.3416297
Liu K, Jin S, Fu ZH, Chen Z, Jiang RX, Ye JP (2023) Uncertainty-aware unsupervised multi-object tracking. In: Proceedings of the IEEE International conference on computer vision, pp 9962–9971. https://doi.org/10.1109/ICCV51070.2023.00917
Li YL, Lu Y, Li J, Wang HZ (2023) Learning to reconnect interrupted trajectories for weakly supervised multi-object tracking. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1–5. https://doi.org/10.1109/ICASSP49357.2023.10095463
Ruiz I, Porzi L, Bulò SR, Kontschieder P, Serrat J (2021) Weakly supervised multi-object tracking and segmentation. In: Proceedings of the IEEE winter conference on applications of computer vision (WACV), pp 125–133. https://doi.org/10.1109/WACVW52041.2021.00018
Chu P, Ling H (2019) Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6172–6181. https://doi.org/10.1109/ICCV.2019.00627
Shuai B, Berneshawi AG, Li XY, Modolo D, Tighe J (2021) SiamMOT: Siamese multi-object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 12372–12382. https://doi.org/10.1109/CVPR46437.2021.01219
Pang JM, Qiu LL, Li X, Chen HF, Li Q, Darrell T, Yu F (2021) Quasi-dense similarity learning for multiple object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 164–173. https://doi.org/10.1109/CVPR46437.2021.00023
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: Proceedings of the European conference on computer vision, pp 850–865. https://doi.org/10.1007/978-3-319-48881-3_56
Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1142. https://doi.org/10.1109/CVPR.2016.158
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980. https://doi.org/10.1109/CVPR.2018.00935
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4282–4291. https://doi.org/10.1109/CVPR.2019.00441
Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: Proceedings of the European conference on computer vision (ECCV), pp 474–490. https://doi.org/10.1007/978-3-030-58548-8_28
Silva D, Alemu LT, Shah M (2020) CL-MOT: A contrastive learning framework for multi-object tracking. In: Proceedings of the British machine vision conference (BMCV), pp 1–13.
Chung T, Cho M, Lee H, Lee S (2022) SSAT: Self-supervised associating network for multiobject tracking. IEEE Trans Circuits Syst Video Technol 32(11):7858–7868
Kim S, Lee J, Ko BC (2022) SSL-MOT: Self-supervised learning based multi-object tracking. Appl Intell 53:930–940
Wang Q, Zheng Y, Pan P, Xu Y (2021) Multiple object tracking with correlation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3876–3886. https://doi.org/10.1109/CVPR46437.2021.00387
Tokmakov P, Li J, Burgard W, Gaidon A (2021) Learning to track with object permanence. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10860–10869. https://doi.org/10.1109/ICCV48922.2021.01068
Wang G, Wang Y, Gu R, Hu W, Hwang JN (2022) Split and connect: A universal tracklet booster for multi-object tracking. IEEE Trans Multimed 25:1256–1268. https://doi.org/10.1109/TMM.2022.3140919
Yang M, Liu S, Chen K, Zhang H, Zhao E, Zhao T (2020) A hierarchical clustering approach to fuzzy semantic representation of rare words in neural machine translation. IEEE Trans Fuzzy Syst 28(5):992–1002
Sun P, Cao J, Jiang Y, Zhang R, Xie E, Yuan Z, Wang C, Luo P (2020) Transtrack: Multiple object tracking with transformer. https://doi.org/10.48550/arXiv.2012.15460
Meinhardt T, Kirillov A, Leal-Taixe L, Feichtenhofer C (2022) Trackformer: Multi-object tracking with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8844–8854. https://doi.org/10.1109/CVPR52688.2022.00864
Xu Y, Ban Y, Delorme G, Gan C, Rus D, Alameda-Pineda X (2021) Transcenter: Transformers with dense queries for multiple-object tracking. https://doi.org/10.48550/arXiv.2103.1514
Zeng F, Dong B, Zhang Y, Wang T, Zhang X, Wei Y (2022) Motr: End-to-end multiple-object tracking with transformer. In:Proceedings of the European Conference on Computer Vision (ECCV), pp 659–675. https://doi.org/10.1007/978-3-031-19812-0_38
Chen X, Iranmanesh SM, Lien KC (2022) Patchtrack: Multiple object tracking using frame patches. https://doi.org/10.48550/arXiv:2201.00080
Leal-Taixé L, Milan A, Reid I, Roth S, Schindler K (2015) Motchallenge 2015: Towards a benchmark for multi-target tracking. https://doi.org/10.48550/arXiv.1504.01942
Yang B, Yan J, Lei Z, Li SZ (2014) Aggregate channel features for multi-view face detection. In: Proceedings of the IEEE international joint conference on biometrics, pp 1–8. https://doi.org/10.1109/BTAS.2014.6996284
Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K (2016) Mot16: A benchmark for multi-object tracking. https://doi.org/10.48550/arXiv.1603.00831
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Dendorfer P, Osep A, Milan A, Schindler K, Cremers D, Reid I, Roth S, Leal-Taixé L (2021) Motchallenge: A benchmark for singlecamera multiple target tracking. Int J Comput Vision 129:845–881
Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129–2137. https://doi.org/10.1109/CVPR.2016.234
Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, Reid I, Roth S, Schindler K, Leal-Taixé L (2020) Mot20: A benchmark for multi object tracking in crowded scenes. https://doi.org/10.48550/arXiv.2003.09003
Cheng ZY, Liang J, Tao GH, Liu DF, Zhang XY (2023) Adversarial training of self-supervised monocular depth estimation against physical-world attacks. Comput Vis Pattern Recognit. https://doi.org/10.48550/arXiv.2301.13487
Qin ZY, Lu XK, Liu DF, Nie XS, Yin YL, Shen JB, Loui AC (2023) Reformulating graph kernels for self-supervised space-time correspondence learning. IEEE Trans Image Process 32:6543–6557
Wang WG, Han C, Zhou TF, Liu DF (2022) Visual recognition with deep nearest centroids. In: Proceedings of the international conference on learning representations (ICLR), pp 1–30
Qin ZY, Lu XK, Nie XS, Liu DF, Yin YL, Wang WG (2023) Coarse-to-fine video instance segmentation with factorized conditional appearance flows. IEEE/CAA J Autom Sin 10:1192–1208
Liu DF, Liang J, Geng T, Loui AC, Zhou TF (2023) Tripartite feature enhanced pyramid network for dense prediction. IEEE Trans Image Process 32:2678–2692
Zhu P, Wen L, Du D, Bian X, Hu Q, Ling H (2020) Vision meets drones: Past, present and future. https://doi.org/10.48550/arXiv.2001.06303
Du D, Qi Y, Yu H, Yang Y, Duan K, Li G, Zhang W, Huang Q, Tian Q (2018) The unmanned aerial vehicle benchmark: Object detection and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 370–386. https://doi.org/10.1007/978-3-030-01249-6_23
Dave A, Khurana T, Tokmakov P, Schmid C, Ramanan D (2020) Tao: A large-scale benchmark for tracking any object. In: Proceedings of the European conference on computer vision (ECCV), pp 436–454. https://doi.org/10.1007/978-3-030-58558-7_26
Gupta A, Dollar P, Girshick R (2019) Lvis: A dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5356–5364. https://doi.org/10.1109/CVPR.2019.00550
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074
Yu F, Chen H, Wang X, Xian W, Chen Y, Liu F, Madhavan V, Darrell T (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2636–2645. https://doi.org/10.1109/CVPR42600.2020.00271
Wen L, Du D, Cai Z, Lei Z, Chang MC, Qi H, Lim J, Yang MH, Lyu S (2020) UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. Comput Vis Image Underst 193:102907
Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B, Vasudevan V, Han W, Ngiam J, Zhao H, Timofeev A, Ettinger S, Krivokon M, Gao A, Joshi A, Anguelov D (2020) Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2443–2451. https://doi.org/10.1109/CVPR42600.2020.00252
Lin W, Liu H, Liu S, Li Y, Qian R, Wang T, Xu N, Xiong H, Qi GJ, Sebe N (2020) Human in events: A large-scale benchmark for human-centric video analysis in complex events. https://doi.org/10.48550/arXiv.2005.04490
Athar A, Luiten J, Voigtlaender P, Khurana T, Dave A, Leibe B (1674–1683) Ramanan D (2023) Burst: A benchmark for unifying object recognition, segmentation and tracking in video. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1674–1683. https://doi.org/10.1109/WACV56688.2023.00172
Voigtlaender P, Luo L, Yuan C, Jiang Y, Leibe B (2021) Reducing the annotation effort for video object segmentation datasets. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 3060–3069. https://doi.org/10.1109/WACV48630.2021.00310
Sundararaman R, De Almeida BC, Marchand E, Pettre J (2021) Tracking pedestrian heads in dense crowd. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3865–3875. https://doi.org/10.1109/CVPR46437.2021.00386
Weber M, Xie J, Collins M, Zhu Y, Voigtlaender P, Adam H, Green B, Geiger A, Leibe B, Cremers D, Osep A, Leal-Taixé L, Chen LC (2021) Step: Segmenting and tracking every pixel. https://doi.org/10.48550/arXiv.2102.11859
Fabbri M, Brasó G, Maugeri G, Cetintas O, Gasparini R, Ošep A, Calderara S, Leal-Taixé L, Cucchiara R (2021) Motsynth: How can synthetic data help pedestrian detection and tracking? In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10849–10859. https://doi.org/10.1109/ICCV48922.2021.01067
Pedersen M, Haurum JB, Bengtson SH, Moeslund TB (2020) 3d-zef: A 3d zebrafish tracking benchmark dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2426–2436. https://doi.org/10.1109/CVPR42600.2020.00250
Anjum S, Gurari D (2020) Ctmc: Cell tracking with mitosis detection dataset challenge. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 982–983. https://doi.org/10.1109/CVPRW50498.2020.00499
Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) Mots: Multi-object tracking and segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7942–7951. https://doi.org/10.1109/CVPR.2019.00813
Andriluka M, Roth S, Schiele B (2010) Monocular 3d pose estimation and tracking by detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 623–630. https://doi.org/10.1109/CVPR.2010.5540156
Ferryman J, Shahrokni A (2009) Pets2009: Dataset and challenge. In: Proceedings of the twelfth IEEE International workshop on performance evaluation of tracking and surveillance, pp 1–6. https://doi.org/10.1109/PETS-WINTER.2009.5399556
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J Image Vid Process 2008:1–10
Luiten J, Osep A, Dendorfer P, Torr P, Geiger A, Leal-Taixé L, Leibe B (2021) Hota: A higher order metric for evaluating multi-object tracking. Int J Comput Vision 129:548–578
Wu Y, Sheng H, Zhang Y, Wang S, Xiong Z, Ke W (2022) Hybrid motion model for multiple object tracking in mobile devices. IEEE Int Things J 10(6):4735–4748. https://doi.org/10.1109/JIOT.2022.3219627
Hornakova A, Kaiser T, Swoboda P, Rolinek M, Rosenhahn B, Henschel R (2021) Making higher order mot scalable: An efficient approximate solver for lifted disjoint paths. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6330–6340. https://doi.org/10.1109/ICCV48922.2021.00627
Zhang J, Zhou S, Chang X, Wan F, Wang J, Wu Y, Huang D (2020) Multiple object tracking by flowing and fusing. https://doi.org/10.48550/arXiv.2001.11180
Zhang Y, Sheng H, Wu Y, Wang S, Ke W, Xiong Z (2020) Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet Things J 7(9):7892–7902
Chen L, Ai H, Zhuang Z, Shang C (2018) Real-time multiple people tracking with deeply learned candidate selection and person reidentification. In: Proceedings of 2018 IEEE international conference on multimedia and expo (ICME), pp 1–6. https://doi.org/10.1109/ICME.2018.8486597
Son J, Baek M, Cho M, Han B (2017) Multi-object tracking with quadruplet convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5620–5629. https://doi.org/10.1109/CVPR.2017.403
Chen J, Sheng H, Zhang Y, Xiong Z (2017) Enhancing detection model for multiple hypothesis tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 18–27. https://doi.org/10.1109/CVPRW.2017.266
Funding
This work was supported in part by the Natural Science Foundation of China under Grant 61671192, and in part by the National Science Foundation for Post-Doctoral Scientists of China under Grant 2017M114, and in part by the Top-Ranking Discipline a Class of Electronics Science and Technology in Zhejiang Province, China.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of Interests
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Du, C., Lin, C., Jin, R. et al. Exploring the State-of-the-Art in Multi-Object Tracking: A Comprehensive Survey, Evaluation, Challenges, and Future Directions. Multimed Tools Appl 83, 73151–73189 (2024). https://doi.org/10.1007/s11042-023-17983-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17983-2