Du TranTarun KalluriWeiyao Wang 0001Heng WangManmohan ChandrakerLorenzo TorresaniDu TranOpen-world Instance Segmentation: Top-down Learning with Bottom-up Supervision.2693-27032024CVPR Workshopshttps://doi.org/10.1109/CVPRW63382.2024.00275conf/cvpr/2024wdb/conf/cvpr/cvprw2024.html#Kalluri0WCTT22Tarun KalluriDeepak PathakManmohan ChandrakerDu TranFLAVR: flow-free architecture for fast video frame interpolation.832023September34Mach. Vis. Appl.5https://doi.org/10.1007/s00138-023-01433-ydb/journals/mva/mva34.html#KalluriPCT23Xitong YangFu-Jen ChuMatt FeiszliRaghav GoyalLorenzo TorresaniDu TranRelational Space-Time Query in Long-Form Videos.6398-64082023CVPRhttps://doi.org/10.1109/CVPR52729.2023.00619conf/cvpr/2023db/conf/cvpr/cvpr2023.html#YangCFGTT23Tarun KalluriDeepak PathakManmohan ChandrakerDu TranFLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation.2070-20812023WACVhttps://doi.org/10.1109/WACV56688.2023.00211conf/wacv/2023db/conf/wacv/wacv2023.html#KalluriPCT23Raghav GoyalEffrosyni MavroudiXitong YangSainbayar SukhbaatarLeonid SigalMatt FeiszliLorenzo TorresaniDu TranMINOTAUR: Multi-task Video Grounding From Multimodal Queries.2023abs/2302.08063CoRRhttps://doi.org/10.48550/arXiv.2302.08063db/journals/corr/corr2302.html#abs-2302-08063Tarun KalluriWeiyao Wang 0001Heng WangManmohan ChandrakerLorenzo TorresaniDu TranOpen-world Instance Segmentation: Top-down Learning with Bottom-up Supervision.2023abs/2303.05503CoRRhttps://doi.org/10.48550/arXiv.2303.05503db/journals/corr/corr2303.html#abs-2303-05503Du TranJitendra MalikLearning Space-Time Semantic Correspondences.2023abs/2306.10208CoRRhttps://doi.org/10.48550/arXiv.2306.10208db/journals/corr/corr2306.html#abs-2306-10208Weiyao Wang 0001Matt FeiszliHeng WangJitendra MalikDu TranOpen-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity.4412-44222022CVPRhttps://doi.org/10.1109/CVPR52688.2022.00438conf/cvpr/2022db/conf/cvpr/cvpr2022.html#0001FWMT22Jue Wang 0001Gedas BertasiusDu TranLorenzo TorresaniLong-Short Temporal Contrastive Learning of Video Transformers.13990-140002022CVPRhttps://doi.org/10.1109/CVPR52688.2022.01362conf/cvpr/2022db/conf/cvpr/cvpr2022.html#WangBTT22Weiyao Wang 0001Matt FeiszliHeng WangJitendra MalikDu TranOpen-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity.2022abs/2204.06107CoRRhttps://doi.org/10.48550/arXiv.2204.06107db/journals/corr/corr2204.html#abs-2204-06107Weiyao Wang 0001Matt FeiszliHeng WangDu TranUnidentified Video Objects: A Benchmark for Dense, Open-World Segmentation.10756-107652021ICCVhttps://doi.org/10.1109/ICCV48922.2021.01060conf/iccv/2021db/conf/iccv/iccv2021.html#0001FWT21Weiyao Wang 0001Matt FeiszliHeng WangDu TranUnidentified Video Objects: A Benchmark for Dense, Open-World Segmentation.2021abs/2104.04691CoRRhttps://arxiv.org/abs/2104.04691db/journals/corr/corr2104.html#abs-2104-04691Jue Wang 0001Gedas BertasiusDu TranLorenzo TorresaniLong-Short Temporal Contrastive Learning of Video Transformers.2021abs/2106.09212CoRRhttps://arxiv.org/abs/2106.09212db/journals/corr/corr2106.html#abs-2106-09212Linchao ZhuDu TranLaura Sevilla-LaraYi Yang 0001Matt FeiszliHeng WangFASTER Recurrent Networks for Efficient Video Classification.13098-131052020AAAIhttps://doi.org/10.1609/aaai.v34i07.7012conf/aaai/2020db/conf/aaai/aaai2020.html#ZhuTSYFW20Heng WangDu TranLorenzo TorresaniMatt FeiszliVideo Modeling With Correlation Networks.349-3582020CVPRhttps://openaccess.thecvf.com/content_CVPR_2020/html/Wang_Video_Modeling_With_Correlation_Networks_CVPR_2020_paper.htmlhttps://doi.org/10.1109/CVPR42600.2020.00043conf/cvpr/2020db/conf/cvpr/cvpr2020.html#WangTTF20Weiyao Wang 0001Du TranMatt FeiszliWhat Makes Training Multi-Modal Classification Networks Hard?12692-127022020CVPRhttps://openaccess.thecvf.com/content_CVPR_2020/html/Wang_What_Makes_Training_Multi-Modal_Classification_Networks_Hard_CVPR_2020_paper.htmlhttps://doi.org/10.1109/CVPR42600.2020.01271conf/cvpr/2020db/conf/cvpr/cvpr2020.html#WangTF20Humam AlwasselDhruv Mahajan 0001Bruno KorbarLorenzo TorresaniBernard GhanemDu TranSelf-Supervised Learning by Cross-Modal Audio-Video Clustering.2020NeurIPShttps://proceedings.neurips.cc/paper/2020/hash/6f2268bd1d3d3ebaabb04d6b5d099425-Abstract.htmlconf/nips/2020db/conf/nips/neurips2020.html#Alwassel0KTGT20Tarun KalluriDeepak PathakManmohan ChandrakerDu TranFLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation.2020abs/2012.08512CoRRhttps://arxiv.org/abs/2012.08512db/journals/corr/corr2012.html#abs-2012-08512Antoine MiechIvan LaptevJosef SivicHeng WangLorenzo TorresaniDu TranLeveraging the Present to Anticipate the Future in Videos.2915-29222019CVPR Workshopshttp://openaccess.thecvf.com/content_CVPRW_2019/html/Precognition/Miech_Leveraging_the_Present_to_Anticipate_the_Future_in_Videos_CVPRW_2019_paper.htmlhttps://doi.org/10.1109/CVPRW.2019.00351conf/cvpr/2019wdb/conf/cvpr/cvprw2019.html#MiechLSWTT19Deepti GhadiyaramDu TranDhruv Mahajan 0001Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition.12046-120552019CVPRhttp://openaccess.thecvf.com/content_CVPR_2019/html/Ghadiyaram_Large-Scale_Weakly-Supervised_Pre-Training_for_Video_Action_Recognition_CVPR_2019_paper.htmlhttps://doi.org/10.1109/CVPR.2019.01232conf/cvpr/2019db/conf/cvpr/cvpr2019.html#GhadiyaramTM19Rohit GirdharDu TranLorenzo TorresaniDeva RamananDistInit: Learning Video Representations Without a Single Labeled Video.852-8612019ICCVhttps://doi.org/10.1109/ICCV.2019.00094conf/iccv/2019db/conf/iccv/iccv2019.html#GirdharTTR19Du TranHeng WangMatt FeiszliLorenzo TorresaniVideo Classification With Channel-Separated Convolutional Networks.5551-55602019ICCVhttps://doi.org/10.1109/ICCV.2019.00565conf/iccv/2019db/conf/iccv/iccv2019.html#TranWFT19Bruno KorbarDu TranLorenzo TorresaniSCSampler: Sampling Salient Clips From Video for Efficient Action Recognition.6231-62412019ICCVhttps://doi.org/10.1109/ICCV.2019.00633conf/iccv/2019db/conf/iccv/iccv2019.html#KorbarTT19Gedas BertasiusChristoph FeichtenhoferDu TranJianbo ShiLorenzo TorresaniLearning Temporal Pose Estimation from Sparsely-Labeled Videos.3021-30322019NeurIPShttps://proceedings.neurips.cc/paper/2019/hash/7137debd45ae4d0ab9aa953017286b20-Abstract.htmlhttp://papers.nips.cc/paper/8567-learning-temporal-pose-estimation-from-sparsely-labeled-videosconf/nips/2019db/conf/nips/nips2019.html#BertasiusFTST19Rohit GirdharDu TranLorenzo TorresaniDeva RamananDistInit: Learning Video Representations without a Single Labeled Video.2019abs/1901.09244CoRRhttp://arxiv.org/abs/1901.09244db/journals/corr/corr1901.html#abs-1901-09244Du TranHeng WangLorenzo TorresaniMatt FeiszliVideo Classification with Channel-Separated Convolutional Networks.2019abs/1904.02811CoRRhttp://arxiv.org/abs/1904.02811db/journals/corr/corr1904.html#abs-1904-02811Bruno KorbarDu TranLorenzo TorresaniSCSampler: Sampling Salient Clips from Video for Efficient Action Recognition.2019abs/1904.04289CoRRhttp://arxiv.org/abs/1904.04289db/journals/corr/corr1904.html#abs-1904-04289Deepti GhadiyaramMatt FeiszliDu TranXueting YanHeng WangDhruv Mahajan 0001Large-scale weakly-supervised pre-training for video action recognition.2019abs/1905.00561CoRRhttp://arxiv.org/abs/1905.00561db/journals/corr/corr1905.html#abs-1905-00561Weiyao Wang 0001Du TranMatt FeiszliWhat Makes Training Multi-Modal Networks Hard?2019abs/1905.12681CoRRhttp://arxiv.org/abs/1905.12681db/journals/corr/corr1905.html#abs-1905-12681Heng WangDu TranLorenzo TorresaniMatt FeiszliVideo Modeling with Correlation Networks.2019abs/1906.03349CoRRhttp://arxiv.org/abs/1906.03349db/journals/corr/corr1906.html#abs-1906-03349Yufei WangDu TranLorenzo TorresaniUniDual: A Unified Model for Image and Video Understanding.2019abs/1906.03857CoRRhttp://arxiv.org/abs/1906.03857db/journals/corr/corr1906.html#abs-1906-03857Gedas BertasiusChristoph FeichtenhoferDu TranJianbo ShiLorenzo TorresaniLearning Temporal Pose Estimation from Sparsely-Labeled Videos.2019abs/1906.04016CoRRhttp://arxiv.org/abs/1906.04016db/journals/corr/corr1906.html#abs-1906-04016Linchao ZhuLaura Sevilla-LaraDu TranMatt FeiszliYi Yang 0001Heng WangFASTER Recurrent Networks for Video Classification.2019abs/1906.04226CoRRhttp://arxiv.org/abs/1906.04226db/journals/corr/corr1906.html#abs-1906-04226Humam AlwasselDhruv Mahajan 0001Lorenzo TorresaniBernard GhanemDu TranSelf-Supervised Learning by Cross-Modal Audio-Video Clustering.2019abs/1911.12667CoRRhttp://arxiv.org/abs/1911.12667db/journals/corr/corr1911.html#abs-1911-12667Rohit GirdharGeorgia GkioxariLorenzo TorresaniManohar PaluriDu TranDetect-and-Track: Efficient Pose Estimation in Videos.350-3592018CVPRhttp://openaccess.thecvf.com/content_cvpr_2018/html/Girdhar_Detect-and-Track_Efficient_Pose_CVPR_2018_paper.htmlhttps://doi.org/10.1109/CVPR.2018.00044https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00044conf/cvpr/2018db/conf/cvpr/cvpr2018.html#GirdharGTPT18Du TranHeng WangLorenzo TorresaniJamie RayYann LeCunManohar PaluriA Closer Look at Spatiotemporal Convolutions for Action Recognition.6450-64592018CVPRhttp://openaccess.thecvf.com/content_cvpr_2018/html/Tran_A_Closer_Look_CVPR_2018_paper.htmlhttps://doi.org/10.1109/CVPR.2018.00675https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00675conf/cvpr/2018db/conf/cvpr/cvpr2018.html#TranWTRLP18Jamie RayHeng WangDu TranYufei WangMatt FeiszliLorenzo TorresaniManohar PaluriScenes-Objects-Actions: A Multi-task, Multi-label Video Dataset.660-6762018ECCV (14)https://doi.org/10.1007/978-3-030-01264-9_39conf/eccv/2018-14db/conf/eccv/eccv2018-14.html#RayWTWFTP18Bruno KorbarDu TranLorenzo TorresaniCooperative Learning of Audio and Video Models from Self-Supervised Synchronization.7774-77852018NeurIPShttps://proceedings.neurips.cc/paper/2018/hash/c4616f5a24a66668f11ca4fa80525dc4-Abstract.htmlhttp://papers.nips.cc/paper/8002-cooperative-learning-of-audio-and-video-models-from-self-supervised-synchronizationconf/nips/2018db/conf/nips/nips2018.html#KorbarTT18Bruno KorbarDu TranLorenzo TorresaniCo-Training of Audio and Video Representations from Self-Supervised Temporal Synchronization.2018abs/1807.00230CoRRhttp://arxiv.org/abs/1807.00230db/journals/corr/corr1807.html#abs-1807-00230Gedas BertasiusChristoph FeichtenhoferDu TranJianbo ShiLorenzo TorresaniLearning Discriminative Motion Features Through Detection.2018abs/1812.04172CoRRhttp://arxiv.org/abs/1812.04172db/journals/corr/corr1812.html#abs-1812-04172Shruti AgarwalDu TranLorenzo TorresaniHany FaridDeciphering Severely Degraded License Plates.138-1432017Media Watermarking, Security, and Forensicshttps://doi.org/10.2352/ISSN.2470-1173.2017.7.MWSF-337conf/mediaforensics/2017db/conf/mediaforensics/mediaforensics2017.html#AgarwalTTF17Joost R. van AmersfoortAnitha KannanMarc'Aurelio RanzatoArthur SzlamDu TranSoumith ChintalaTransformation-Based Models of Video Sequences.2017abs/1701.08435CoRRhttp://arxiv.org/abs/1701.08435db/journals/corr/corr1701.html#AmersfoortKRSTC17Du TranJamie RayZheng Shou 0001Shih-Fu ChangManohar PaluriConvNet Architecture Search for Spatiotemporal Feature Learning.2017abs/1708.05038CoRRhttp://arxiv.org/abs/1708.05038db/journals/corr/corr1708.html#abs-1708-05038Du TranHeng WangLorenzo TorresaniJamie RayYann LeCunManohar PaluriA Closer Look at Spatiotemporal Convolutions for Action Recognition.2017abs/1711.11248CoRRhttp://arxiv.org/abs/1711.11248db/journals/corr/corr1711.html#abs-1711-11248Rohit GirdharGeorgia GkioxariLorenzo TorresaniManohar PaluriDu TranDetect-and-Track: Efficient Pose Estimation in Videos.2017abs/1712.09184CoRRhttp://arxiv.org/abs/1712.09184db/journals/corr/corr1712.html#abs-1712-09184Du TranRepresentations and Models for Large-Scale Video Understanding.Dartmouth College, USA2016https://digitalcommons.dartmouth.edu/dissertations/53Du TranLorenzo TorresaniEXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis.239-2532016119Int. J. Comput. Vis.3https://doi.org/10.1007/s11263-016-0905-6db/journals/ijcv/ijcv119.html#TranT16Du TranLubomir D. BourdevRob FergusLorenzo TorresaniManohar PaluriDeep End2End Voxel2Voxel Prediction.402-4092016CVPR Workshopshttps://doi.org/10.1109/CVPRW.2016.57https://doi.ieeecomputersociety.org/10.1109/CVPRW.2016.57conf/cvpr/2016wdb/conf/cvpr/cvprw2016.html#TranBFTP16Du TranManohar PaluriLorenzo TorresaniViCom: Benchmark and Methods for Video Comprehension.2016abs/1606.07373CoRRhttp://arxiv.org/abs/1606.07373db/journals/corr/corr1606.html#TranPT16Du TranLubomir D. BourdevRob FergusLorenzo TorresaniManohar PaluriLearning Spatiotemporal Features with 3D Convolutional Networks.4489-44972015ICCVhttps://doi.org/10.1109/ICCV.2015.510https://doi.ieeecomputersociety.org/10.1109/ICCV.2015.510conf/iccv/2015db/conf/iccv/iccv2015.html#TranBFTP15Du TranLubomir D. BourdevRob FergusLorenzo TorresaniManohar PaluriDeep End2End Voxel2Voxel Prediction.2015abs/1511.06681CoRRhttp://arxiv.org/abs/1511.06681db/journals/corr/corr1511.html#TranBFTP15Du TranJunsong Yuan 0001David A. ForsythVideo Event Detection: From Subvolume Localization to Spatiotemporal Path Search.404-416201436IEEE Trans. Pattern Anal. Mach. Intell.2https://doi.org/10.1109/TPAMI.2013.137http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.137https://www.wikidata.org/entity/Q46225083db/journals/pami/pami36.html#TranYF14Du TranLorenzo TorresaniEXMOVES: Classifier-based Features for Scalable Action Recognition.2014conf/iclr/2014ICLR (Poster)http://arxiv.org/abs/1312.5785db/conf/iclr/iclr2014.html#TranT13Du TranLubomir D. BourdevRob FergusLorenzo TorresaniManohar PaluriC3D: Generic Features for Video Analysis.2014abs/1412.0767CoRRhttp://arxiv.org/abs/1412.0767db/journals/corr/corr1412.html#TranBFTP14Du TranJunsong Yuan 0001Max-Margin Structured Output Regression for Spatio-Temporal Action Localization.359-3672012NIPShttps://proceedings.neurips.cc/paper/2012/hash/9872ed9fc22fc182d371c3e9ed316094-Abstract.htmlhttp://papers.nips.cc/paper/4794-max-margin-structured-output-regression-for-spatio-temporal-action-localizationconf/nips/2012db/conf/nips/nips2012.html#TranY12Du TranJunsong Yuan 0001Optimal spatio-temporal path discovery for video event detection.3321-33282011CVPRhttps://doi.org/10.1109/CVPR.2011.5995416https://doi.ieeecomputersociety.org/10.1109/CVPR.2011.5995416conf/cvpr/2011db/conf/cvpr/cvpr2011.html#TranY11Du TranAlexander SorokinHuman Activity Recognition with Metric Learning.548-5612008ECCV (1)https://doi.org/10.1007/978-3-540-88682-2_42conf/eccv/2008-1db/conf/eccv/eccv2008-1.html#TranS08Shruti AgarwalHumam AlwasselJoost R. van AmersfoortGedas BertasiusLubomir D. BourdevManmohan Krishna ChandrakerManmohan ChandrakerShih-Fu ChangSoumith ChintalaFu-Jen ChuHany FaridChristoph FeichtenhoferMatt FeiszliRob FergusDavid A. ForsythDeepti GhadiyaramBernard GhanemRohit GirdharGeorgia GkioxariRaghav GoyalTarun KalluriAnitha KannanBruno KorbarIvan LaptevYann LeCunDhruv Mahajan 0001Jitendra MalikEffrosyni MavroudiAntoine MiechManohar PaluriDeepak PathakDeva RamananMarc'Aurelio RanzatoJamie RayLaura Sevilla-LaraJianbo ShiZheng Shou 0001Leonid SigalJosef SivicAlexander SorokinSainbayar SukhbaatarArthur SzlamLorenzo TorresaniHeng WangJue Wang 0001Weiyao Wang 0001Yufei WangXueting YanXitong YangYi Yang 0001Junsong Yuan 0001Linchao Zhu