Abstract
Person re-identification plays a central role in tracking and monitoring crowd movement in public places, and hence it serves as an important means for providing public security in video surveillance application sites. The problem of person re-identification has received significant attention in the past few years, and with the introduction of deep learning, several interesting approaches have been developed. In this paper, we propose an ensemble model called Temporal Motion Aware Network (T-MAN) for handling the visual context and spatio-temporal information jointly from the input video sequences. Our methodology makes use of the long-range motion context with recurrent information for establishing correspondences among multiple cameras. The proposed T-MAN approach first extracts explicit frame-level feature descriptors from a given video sequence by using three different sub-networks (FPAN, MPN, and LSTM), and then aggregates these models using an ensemble technique to perform re-identification. The method has been evaluated on three publicly available data sets, namely, the PRID-2011, iLIDS-VID, and MARS, and re-identification accuracy of 83.0%, 73.5%, and 83.3% have been obtained from these three data sets, respectively. Experimental results emphasize the effectiveness of our approach and its superiority over the state-of-the-art techniques for video-based person re-identification.
Similar content being viewed by others
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), pp 265–283
Ahmed E, Jones M, Marks TK (2015) An improved deep learning architecture for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3908–3916
Cai W, Wei Z (2020) PiiGAN: generative adversarial networks for pluralistic image inpainting. IEEE Access 8:48451–48463
Chen L, Lou J, Xu F, Ren M (2019) Grid-based multi-object tracking with siamese CNN based appearance edge and access region mechanism. Multimed Tool Appl :1–19
Cheng DS, Cristani M, Stoppa M, Bazzani L, Murino V (2011) Custom pictorial structures for re-identification. In: Proceedings of the British machine vision conference. Citeseer, pp 1–11
Chung D, Tahboub K, Delp EJ (2017) A two stream siamese convolutional neural network for person re-identification. In: Proceedings of the IEEE international conference on computer vision , pp 1983–1991
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Proceedings of the advances in neural information processing systems, pp 379–387
Dehghan A, Modiri Assari S, Shah M (2015) GMMCP tracker: globally optimal generalized maximum multi clique problem for multiple object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4091–4099
Fan D-P, Wang W, Cheng M-M, Shen J (2019) Shifting more attention to video salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition , pp 8554–8564
Farenzena M, Bazzani L, Perina A, Murino V, Cristani M (2010) Person re-identification by symmetry-driven accumulation of local features. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2360–2367
Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller P (2019) Deep neural network ensembles for time series classification. arXiv:1903.06602
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446
Gao C, Wang J, Liu L, Yu J-G, Sang N (2016) Temporally aligned pooling representation for video-based person re-identification. In: Proceedings of the IEEE international conference on image processing, IEEE, pp 4284–4288
Gray D, Tao H (2008) Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Proceedings of the European conference on computer vision, Springer, pp 262–275
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Proceedings of the scandinavian conference on image analysis. Springer , pp 91–102
Karanam S, Li Y, Radke RJ (2015) Sparse re-id: block sparsity for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pp 33–40
Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of the 19th british machine vision conference, pp 275:1–10
Koestinger M, Hirzer M, Wohlhart P, Roth PM, Bischof H (2012) Large scale metric learning from equivalence constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2288–2295
Kviatkovsky I, Adam A, Rivlin E (2012) Color invariants for person reidentification. IEEE Trans Pattern Anal Mach Intell 35(7):1622–1634
Li W, Zhao R, Xiao T, Wang X (2014) DeepReID: deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 152–159
Li M, Zhu X, Gong S (2018) Unsupervised person re-identification by deep learning tracklet association. In: Proceedings of the European conference on computer vision, pp 737–753
Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2197–2206
Liu H, Jie Z, Jayashree K, Qi M, Jiang J, Yan S, Feng J (2017) Video-based person re-identification with accumulative motion context. IEEE Trans Circuits Syst Video Technol 28(10):2788–2802
Liu K, Ma B, Zhang W, Huang R (2015) A spatio-temporal appearance representation for video-based pedestrian re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 3810–3818
Liu Y, Yan J, Ouyang W (2017) Quality aware network for set to set recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5790–5799
Lu X, Ma C, Ni B, Yang X (2019) Adaptive region proposal with channel regularization for robust object tracking. IEEE Trans Circuits Syst Video Technol
Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3623–3632
Ma M (2019) Infrared pedestrian detection algorithm based on multimedia image recombination and matrix restoration. Multimed Tools Appl :1–16
Ma B, Su Y, Jurie F (2012) Local descriptors encoded by fisher vectors for person re-identification. In: Proceedings of the European conference on computer vision. Springer, pp 413–422
Ma B, Su Y, Jurie F (2014) Covariance descriptor based on bio-inspired features for person re-identification and face verification. Image Vis Comput 32 (6-7):379–390
McLaughlin N, Martinez del Rincon J, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1325–1334
Minetto R, Segundo MP, Sarkar S (2019) Hydra: An ensemble of convolutional neural networks for geospatial land Classification. IEEE Trans Geosci Remote Sens
Muñoz DU, Ruiz-Aguilar JJ, González-Enrique J, Domínguez I J T (2019) A deep ensemble neural network approach to improve predictions of container inspection volume. In: Proceedings of the international work-conference on artificial neural networks. Springer, pp 806–817
Nguyen HD, Na IS, Kim SH, Lee GS, Yang HJ, Choi JH (2019) Multiple human tracking in drone image. Multimed Tools Appl 78(4):4563–4577
Prosser BJ, Zheng W-S, Gong S, Xiang T, Mary Q (2010) Person re-identification by support vector ranking. In: Proceedings of the British machine vision conference, pp 1–11
Pytorch: Models. https://pytorch.org/docs/stable/torchvision/models.html. Accessed: 2020-01-16
Tagore NK, Singh SK (2019) Crowd counting in a highly congested scene using deep augmentation based convolutional network. Available at SSRN 3392307
Wang T, Gong S, Zhu X, Wang S (2016) Person re-identification by discriminative selection in video ranking. IEEE Trans Pattern Anal Mach Intell 38(12):2501–2514
Wang Z, Zou C, Cai W (2020) Small sample classification of hyperspectral remote sensing images based on sequential joint deeping learning model. IEEE Access 8:71353–71363
Wang Z, Zou C, Cai W (2020) Small sample classification of hyperspectral remote sensing images based on sequential joint deeping learning model. IEEE Access 8:71353–71363
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(Feb):207–244
Wu Z, Shen C, Van Den Hengel A (2019) Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn 90:119–133
Xiong F, Gou M, Camps O, Sznaier M (2014) Person re-identification using kernel-based metric learning methods. In: Proceedings of the European conference on computer vision, Springer, pp 1–16
Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 4733–4742
Yan Y, Ni B, Song Z, Ma C, Yan Y, Yang X (2016) Person re-identification via recurrent feature aggregation. In: Proceedings of the European conference on computer vision, Springer, pp 701–716
Yang X, Chen P (2019) Person re-identification based on multi-scale convolutional network. Multimed Tools Appl :1–15
Ye M, Li J, Ma AJ, Zheng L, Yuen PC (2019) Dynamic graph co-matching for unsupervised video-based person re-identification. IEEE Trans Image Process 28(6):2976–2990
You H, Tian S, Yu L, Lv Y (2019) Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans Geosci Remote Sens 58(2):1281–1293
You H, Tian S, Yu L, Lv Y (2019) Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans Geosci Remote Sens 58(2):1281–1293
You J, Wu A, Li X, Zheng W-S (2016) Top-push video-based person re-identification. In: Proceedings of the ieee conference on computer vision and pattern recognition, pp 1345–1353
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE international conference on computer vision, pp 5209–5217
Zheng W-S, Gong S, Xiang T (2012) Reidentification by relative distance comparison. IEEE Trans Patt Anal Mach Intell 35(3):653–668
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: Proceedings of the IEEE international conference on computer vision, pp 1116–1124
Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q (2017) Person re-identification in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1367–1376
Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4747–4756
Zhu X, Jing X-Y, You X, Zhang X, Zhang T (2018) Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. IEEE Trans Image Process 27(11):5683–5695
Acknowledgements
The authors would like to acknowledge NVIDIA for supporting their research with a TITAN Xp Graphics processing unit.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tagore, N.K., Chattopadhyay, P. & Wang, L. T-MAN: a neural ensemble approach for person re-identification using spatio-temporal information. Multimed Tools Appl 79, 28393–28409 (2020). https://doi.org/10.1007/s11042-020-09398-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09398-0