{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T21:19:18Z","timestamp":1730323158153,"version":"3.28.0"},"publisher-location":"New York, NY, USA","reference-count":45,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,12]]},"DOI":"10.1145\/3394171.3413927","type":"proceedings-article","created":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T12:26:25Z","timestamp":1602505585000},"page":"1469-1477","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["Exploiting Better Feature Aggregation for Video Object Detection"],"prefix":"10.1145","author":[{"given":"Liang","family":"Han","sequence":"first","affiliation":[{"name":"Stony Brook University, Stony Brook, NY, USA"}]},{"given":"Pichao","family":"Wang","sequence":"additional","affiliation":[{"name":"Alibaba Group, Bellevue, WA, USA"}]},{"given":"Zhaozheng","family":"Yin","sequence":"additional","affiliation":[{"name":"Stony Brook University, Stony Brook, NY, USA"}]},{"given":"Fan","family":"Wang","sequence":"additional","affiliation":[{"name":"Alibaba Group, Sunnyvale, CA, USA"}]},{"given":"Hao","family":"Li","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2020,10,12]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01258-8_21"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7814--7823","year":"2018","author":"Chen Kai","key":"e_1_3_2_2_2_1","unstructured":"Kai Chen , Jiaqi Wang , Shuo Yang , Xingcheng Zhang , Yuanjun Xiong , Chen Change Loy , and Dahua Lin . 2018 . Optimizing video object detection via a scaletime lattice . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7814--7823 . Kai Chen, Jiaqi Wang, Shuo Yang, Xingcheng Zhang, Yuanjun Xiong, Chen Change Loy, and Dahua Lin. 2018. Optimizing video object detection via a scaletime lattice. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7814--7823."},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.440"},{"volume-title":"R-fcn: Object detection via region-based fully convolutional networks. In Advances in neural information processing systems. 379--387.","year":"2016","author":"Dai Jifeng","key":"e_1_3_2_2_4_1","unstructured":"Jifeng Dai , Yi Li , Kaiming He , and Jian Sun . 2016 . R-fcn: Object detection via region-based fully convolutional networks. In Advances in neural information processing systems. 379--387. Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-fcn: Object detection via region-based fully convolutional networks. In Advances in neural information processing systems. 379--387."},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00678"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00712"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206532"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.316"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.330"},{"volume-title":"Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659","year":"2017","author":"Fu Cheng-Yang","key":"e_1_3_2_2_10_1","unstructured":"Cheng-Yang Fu , Wei Liu , Ananth Ranga , Ambrish Tyagi , and Alexander C Berg . 2017 . Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017). Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C Berg. 2017. Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00033"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00401"},{"volume-title":"Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, and Thomas S Huang.","year":"2016","author":"Han Wei","key":"e_1_3_2_2_15_1","unstructured":"Wei Han , Pooya Khorrami , Tom Le Paine , Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, and Thomas S Huang. 2016 . Seq-nms for video object detection. arXiv preprint arXiv:1602.08465 (2016). Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, and Thomas S Huang. 2016. Seq-nms for video object detection. arXiv preprint arXiv:1602.08465 (2016)."},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.322"},{"volume-title":"Spatial pyramid pooling in deep convolutional networks for visual recognition","year":"2015","author":"He Kaiming","key":"e_1_3_2_2_17_1","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition . IEEE transactions on pattern analysis and machine intelligence 37, 9 ( 2015 ), 1904--1916. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence 37, 9 (2015), 1904--1916."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"volume-title":"Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861","year":"2017","author":"Howard Andrew G","key":"e_1_3_2_2_19_1","unstructured":"Andrew G Howard , Menglong Zhu , Bo Chen , Dmitry Kalenichenko , Weijun Wang , Tobias Weyand , Marco Andreetto , and Hartwig Adam . 2017 . Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017). Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00378"},{"key":"e_1_3_2_2_21_1","first-page":"10","article-title":"2017. T-cnn: Tubelets with convolutional neural networks for object detection from videos","volume":"28","author":"Kang Kai","year":"2017","unstructured":"Kai Kang , Hongsheng Li , Junjie Yan , Xingyu Zeng , Bin Yang , Tong Xiao , Cong Zhang , Zhe Wang , Ruohui Wang , Xiaogang Wang , 2017. T-cnn: Tubelets with convolutional neural networks for object detection from videos . IEEE Transactions on Circuits and Systems for Video Technology 28 , 10 ( 2017 ), 2896-- 2907. Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, et al. 2017. T-cnn: Tubelets with convolutional neural networks for object detection from videos. IEEE Transactions on Circuits and Systems for Video Technology 28, 10 (2017), 2896-- 2907.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.95"},{"key":"e_1_3_2_2_23_1","unstructured":"Alina Kuznetsova Hassan Rom Neil Alldrin Jasper Uijlings Ivan Krasin Jordi Pont-Tuset Shahab Kamali Stefan Popov Matteo Malloci Tom Duerig etal 2018. The open images dataset v4: Unified image classification object detection and visual relationship detection at scale. arXiv preprint arXiv:1811.00982 (2018). Alina Kuznetsova Hassan Rom Neil Alldrin Jasper Uijlings Ivan Krasin Jordi Pont-Tuset Shahab Kamali Stefan Popov Matteo Malloci Tom Duerig et al. 2018. The open images dataset v4: Unified image classification object detection and visual relationship detection at scale. arXiv preprint arXiv:1811.00982 (2018)."},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5686--5695","year":"2018","author":"Liu Mason","key":"e_1_3_2_2_26_1","unstructured":"Mason Liu and Menglong Zhu . 2018 . Mobile video object detection with temporally-aware feature maps . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5686--5695 . Mason Liu and Menglong Zhu. 2018. Mobile video object detection with temporally-aware feature maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5686--5695."},{"volume-title":"Looking Fast and Slow: Memory-Guided Mobile Video Object Detection. arXiv preprint arXiv:1903.10172","year":"2019","author":"Liu Mason","key":"e_1_3_2_2_27_1","unstructured":"Mason Liu , Menglong Zhu , Marie White , Yinxiao Li , and Dmitry Kalenichenko . 2019. Looking Fast and Slow: Memory-Guided Mobile Video Object Detection. arXiv preprint arXiv:1903.10172 ( 2019 ). Mason Liu, Menglong Zhu, Marie White, Yinxiao Li, and Dmitry Kalenichenko. 2019. Looking Fast and Slow: Memory-Guided Mobile Video Object Detection. arXiv preprint arXiv:1903.10172 (2019)."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.119"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.690"},{"volume-title":"Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767","year":"2018","author":"Redmon Joseph","key":"e_1_3_2_2_32_1","unstructured":"Joseph Redmon and Ali Farhadi . 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 ( 2018 ). Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)."},{"key":"e_1_3_2_2_33_1","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99. Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99."},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"crossref","unstructured":"Olga Russakovsky Jia Deng Hao Su Jonathan Krause Sanjeev Satheesh Sean Ma Zhiheng Huang Andrej Karpathy Aditya Khosla Michael Bernstein etal 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115 3 (2015) 211--252. Olga Russakovsky Jia Deng Hao Su Jonathan Krause Sanjeev Satheesh Sean Ma Zhiheng Huang Andrej Karpathy Aditya Khosla Michael Bernstein et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115 3 (2015) 211--252.","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00985"},{"key":"e_1_3_2_2_37_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008. Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008."},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_33"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794--7803","year":"2018","author":"Girshick Ross","key":"e_1_3_2_2_39_1","unstructured":"XiaolongWang, Ross Girshick , Abhinav Gupta , and Kaiming He . 2018 . Non-local neural networks . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794--7803 . XiaolongWang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794--7803."},{"volume-title":"Proceedings of the IEEE International Conference on Computer Vision. 9217--9225","year":"2019","author":"Chen Yuntao","key":"e_1_3_2_2_40_1","unstructured":"HaipingWu, Yuntao Chen , NaiyanWang, and Zhaoxiang Zhang . 2019 . Sequence Level Semantics Aggregation for Video Object Detection . In Proceedings of the IEEE International Conference on Computer Vision. 9217--9225 . HaipingWu, Yuntao Chen, NaiyanWang, and Zhaoxiang Zhang. 2019. Sequence Level Semantics Aggregation for Video Object Detection. In Proceedings of the IEEE International Conference on Computer Vision. 9217--9225."},{"volume-title":"Proceedings of the European Conference on Computer Vision (ECCV). 485--501","year":"2018","author":"Xiao Fanyi","key":"e_1_3_2_2_41_1","unstructured":"Fanyi Xiao and Yong Jae Lee . 2018 . Video object detection with an aligned spatialtemporal memory . In Proceedings of the European Conference on Computer Vision (ECCV). 485--501 . Fanyi Xiao and Yong Jae Lee. 2018. Video object detection with an aligned spatialtemporal memory. In Proceedings of the European Conference on Computer Vision (ECCV). 485--501."},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.634"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00753"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.52"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.441"}],"event":{"name":"MM '20: The 28th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Seattle WA USA","acronym":"MM '20"},"container-title":["Proceedings of the 28th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394171.3413927","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,5]],"date-time":"2023-01-05T20:33:20Z","timestamp":1672950800000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3413927"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,12]]},"references-count":45,"alternative-id":["10.1145\/3394171.3413927","10.1145\/3394171"],"URL":"http:\/\/dx.doi.org\/10.1145\/3394171.3413927","relation":{},"subject":[],"published":{"date-parts":[[2020,10,12]]},"assertion":[{"value":"2020-10-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}