default search action
ICCV 2023: Paris, France
- IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023. IEEE 2023, ISBN 979-8-3503-0718-4
- Xinyang Liu, Yijin Li, Yanbin Teng, Hujun Bao, Guofeng Zhang, Yinda Zhang, Zhaopeng Cui:
Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a Light-Weight ToF Sensor. 1-11 - Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, Angela Dai:
ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes. 12-22 - Jiachen Lu, Hongyang Li, Renyuan Peng, Feng Wen, Xinyue Cai, Wei Zhang, Hang Xu, Li Zhang:
Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach. 23-33 - Ruojin Cai, Joseph Tung, Qianqian Wang, Hadar Averbuch-Elor, Bharath Hariharan, Noah Snavely:
Doppelgangers: Learning to Disambiguate Images of Similar Structures. 34-44 - Jinjie Mai, Abdullah Hamdi, Silvio Giancola, Chen Zhao, Bernard Ghanem:
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries. 45-57 - Wenqiang Xu, Wenxin Du, Han Xue, Yutong Li, Ruolin Ye, Yan-Feng Wang, Cewu Lu:
ClothPose: A Real-world Benchmark for Visual Analysis of Garment Pose via An Indirect Recording Solution. 58-68 - Zijie Jiang, Masatoshi Okutomi:
EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity. 69-78 - Ruofan Liang, Huiting Chen, Chunlin Li, Fan Chen, Selvakumar Panneer, Nandita Vijaykumar:
ENVIDR: Implicit Differentiable Renderer with Neural Environment Lighting. 79-89 - Yihua Zhang, Ruisi Cai, Tianlong Chen, Guanhua Zhang, Huan Zhang, Pin-Yu Chen, Shiyu Chang, Zhangyang Wang, Sijia Liu:
Robust Mixture-of-Expert Training for Convolutional Neural Networks. 90-101 - Dong Lu, Zhiqiang Wang, Teng Wang, Weili Guan, Hongchang Gao, Feng Zheng:
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. 102-111 - Hritik Bansal, Fan Yin, Nishad Singhi, Aditya Grover, Yu Yang, Kai-Wei Chang:
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning. 112-123 - Md Farhamdur Reza, Ali Rahmati, Tianfu Wu, Huaiyu Dai:
CGBA: Curvature-aware Geometric Black-box Attack. 124-133 - Minjong Lee, Dongwoo Kim:
Robust Evaluation of Diffusion-Based Adversarial Purification. 134-144 - Yao Ge, Yun Li, Keji Han, Junyi Zhu, Xianzhong Long:
Advancing Example Exploitation Can Alleviate Critical Challenges in Adversarial Training. 145-154 - Zixuan Zhu, Rui Wang, Cong Zou, Lihua Jing:
The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data. 155-164 - Indranil Sur, Karan Sikka, Matthew Walmer, Kaushik Koneripalli, Anirban Roy, Xiao Lin, Ajay Divakaran, Susmit Jha:
TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models. 165-175 - Yangru Huang, Peixi Peng, Yifan Zhao, Yunpeng Zhai, Haoran Xu, Yonghong Tian:
Simoun: Synergizing Interactive Motion-appearance Understanding for Vision-based Reinforcement Learning. 176-185 - Yiming Li, Qi Fang, Jiamu Bai, Siheng Chen, Felix Juefei-Xu, Chen Feng:
Among Us: Adversarially Robust Collaborative Perception by Consensus. 186-195 - Cristiano Saltori, Aljosa Osep, Elisa Ricci, Laura Leal-Taixé:
Walking Your LiDOG: A Journey Through Multiple Domains for LiDAR Semantic Segmentation. 196-206 - Yunpeng Zhai, Peixi Peng, Yifan Zhao, Yangru Huang, Yonghong Tian:
Stabilizing Visual Reinforcement Learning via Asymmetric Interactive Cooperation. 207-216 - Yuanzhi Liang, Xiaohan Wang, Linchao Zhu, Yi Yang:
MAAL: Multimodality-Aware Autoencoder-based Affordance Learning for 3D Articulated Objects. 217-227 - Lingdong Kong, Youquan Liu, Runnan Chen, Yuexin Ma, Xinge Zhu, Yikang Li, Yuenan Hou, Yu Qiao, Ziwei Liu:
Rethinking Range View Representation for LiDAR Segmentation. 228-240 - Haitao Lin, Yanwei Fu, Xiangyang Xue:
PourIt!: Weakly-supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring. 241-251 - Arthur Moreau, Nathan Piasco, Moussâb Bennehar, Dzmitry Tsishkou, Bogdan Stanciulescu, Arnaud de La Fortelle:
CROSSFIRE: Camera Relocalization On Self-Supervised Features from an Implicit Representation. 252-262 - Hyesong Choi, Hunsang Lee, Seongwon Jeong, Dongbo Min:
Environment Agnostic Representation for Visual Reinforcement learning. 263-273 - Qiongjie Cui, Huaijiang Sun, Jianfeng Lu, Weiqing Li, Bin Li, Hongwei Yi, Haofan Wang:
Test-time Personalizable Forecasting of 3D Human Poses. 274-283 - Hao Xiang, Runsheng Xu, Jiaqi Ma:
HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer. 284-295 - Antoine Mercier, Ruan Erasmus, Yashesh Savani, Manik Dhingra, Fatih Porikli, Guillaume Berger:
Efficient neural supersampling on a novel gaming dataset. 296-306 - Hong-Wing Pang, Binh-Son Hua, Sai-Kit Yeung:
Locally Stylized Neural Radiance Fields. 307-316 - Dongqing Wang, Tong Zhang, Sabine Süsstrunk:
NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects. 317-327 - Xiaoyang Kang, Tao Yang, Wenqi Ouyang, Peiran Ren, Lingzhi Li, Xuansong Xie:
DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders. 328-338 - Weicai Ye, Shuo Chen, Chong Bao, Hujun Bao, Marc Pollefeys, Zhaopeng Cui, Guofeng Zhang:
IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis. 339-151 - Jiayi Liu, Ali Mahdavi-Amiri, Manolis Savva:
PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects. 352-363 - Mingyuan Zhang, Xinying Guo, Liang Pan, Zhongang Cai, Fangzhou Hong, Huirong Li, Lei Yang, Ziwei Liu:
ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model. 364-373 - Maham Tanveer, Yizhi Wang, Ali Mahdavi-Amiri, Hao Zhang:
DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion. 374-384 - Yi-Ling Qiao, Alexander Gao, Yiran Xu, Yue Feng, Jia-Bin Huang, Ming C. Lin:
Dynamic Mesh-Aware Radiance Fields. 385-396 - Wenzhang Sun, Yunlong Che, Yandong Guo, Han Huang:
Neural Reconstruction of Relightable Human Model from Monocular Video. 397-407 - Alexander Mai, Dor Verbin, Falko Kuester, Sara Fridovich-Keil:
Neural Microfacet Fields for Inverse Rendering. 408-418 - Ishit Mehta, Manmohan Chandraker, Ravi Ramamoorthi:
A Theory of Topological Derivatives for Inverse Rendering of Geometry. 419-429 - Etai Sella, Gal Fiebelman, Peter Hedman, Hadar Averbuch-Elor:
Vox-E: Text-guided Voxel Editing of 3D Objects. 430-440 - Chenxin Li, Brandon Y. Feng, Zhiwen Fan, Panwang Pan, Zhangyang Wang:
StegaNeRF: Embedding Invisible Information within Neural Radiance Fields. 441-453 - Liu He, Daniel G. Aliaga:
GlobalMapper: Arbitrary-Shaped Urban Layout Generation. 454-464 - Fan Lu, Yan Xu, Guang Chen, Hongsheng Li, Kwan-Yee Lin, Changjun Jiang:
Urban Radiance Field Representation with Deformable Neural Mesh Primitives. 465-476 - Barbara Roessle, Matthias Nießner:
End2End Multi-View Feature Matching with Differentiable Pose Optimization. 477-487 - Chen Geng, Hong-Xing Yu, Sharon Zhang, Maneesh Agrawala, Jiajun Wu:
Tree-Structured Shading Decomposition. 488-498 - Dominique Piché-Meunier, Yannick Hold-Geoffroy, Jianming Zhang, Jean-François Lalonde:
Lens Parameter Estimation for Realistic Depth of Field Modeling. 499-508 - Chongyang Zhong, Lei Hu, Zihao Zhang, Shihong Xia:
AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism. 509-519 - Manuel Ladron de Guevara, Yannick Hold-Geoffroy, Jose Echevarria, Cameron Smith, Yijun Li, Daichi Ito:
Cross-modal Latent Space Alignment for Image to Avatar Translation. 520-529 - Yibo Yang, Stephan Mandt:
Computationally-Efficient Neural Image Compression with Shallow Decoders. 530-540 - Salwa K. Al Khatib, Mohamed El Amine Boudjoghra, Jean Lahoud, Fahad Shahbaz Khan:
3D Instance Segmentation via Enhanced Spatial and Semantic Supervision. 541-550 - Zhijie Deng, Yucen Luo:
Learning Neural Eigenfunctions for Unsupervised Semantic Segmentation. 551-561 - Weiguang Zhao, Yuyao Yan, Chaolong Yang, Jianan Ye, Xi Yang, Kaizhu Huang:
Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise Binarization. 562-571 - Wentong Li, Yuqian Yuan, Song Wang, Jianke Zhu, Jianshu Li, Jian Liu, Lei Zhang:
Point2Mask: Point-supervised Panoptic Segmentation via Optimal Transport. 572-581 - Sina Gholamian, Ali Vahdat:
Handwritten and Printed Text Segmentation: A Signature Case Study. 582-592 - Sihyeon Kim, Juyeon Ko, Minseok Joo, Juhan Cha, Jaewon Lee, Hyunwoo J. Kim:
Semantic-Aware Implicit Template Learning via Part Deformation Consistency. 593-603 - Yunze Liu, Junyu Chen, Zekai Zhang, Jingwei Huang, Li Yi:
LeaF: Learning Frames for 4D Point Cloud Sequence Understanding. 604-613 - Sanghyun Jo, In-Jae Yu, Kyungsu Kim:
MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation. 614-623 - Zelin Peng, Guanchun Wang, Lingxi Xie, Dongsheng Jiang, Wei Shen, Qi Tian:
USAGE: A Unified Seed Area Generation Paradigm for Weakly Supervised Semantic Segmentation. 624-634 - Maksym Bekuzarov, Ariana Bermudez, Joon-Young Lee, Hao Li:
XMem++: Production-level Video Segmentation From Few Annotated Frames. 635-644 - Maolin Gao, Paul Roetzer, Marvin Eisenberger, Zorah Lähner, Michael Möller, Daniel Cremers, Florian Bernard:
ΣIGMA: Scale-Invariant Global Sparse Shape Matching. 645-654 - Qianxiong Xu, Wenting Zhao, Guosheng Lin, Cheng Long:
Self-Calibrated Cross Attention Network for Few-Shot Segmentation. 655-665 - Kehan Li, Yian Zhao, Zhennan Wang, Zesen Cheng, Peng Jin, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen:
Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation. 666-676 - Sunghwan Kim, Dae-Hwan Kim, Hoseong Kim:
Texture Learning Domain Randomization for Domain Generalized Segmentation. 677-687 - Tiankang Su, Huihui Song, Dong Liu, Bo Liu, Qingshan Liu:
Unsupervised Video Object Segmentation with Online Adversarial Self-Tuning. 688-698 - Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Sean Chang Culatana, Mohamed Elhoseiny:
Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only. 699-710 - Nazir Nayal, Misra Yavuz, João F. Henriques, Fatma Güney:
RbA: Segmenting Unknown Regions Rejected by All. 711-722 - Sriram Ravindran, Debraj Basu:
Sempart: Self-supervised Multi-resolution Partitioning of Image Semantics. 723-733 - Sadra Safadoust, Fatma Güney:
Multi-Object Discovery by Low-Dimensional Object Motion. 734-744 - Enxu Li, Sergio Casas, Raquel Urtasun:
MemorySeg: Online LiDAR Semantic Segmentation with a Latent Memory. 745-754 - Changwei Wang, Rongtao Xu, Shibiao Xu, Weiliang Meng, Xiaopeng Zhang:
Treating Pseudo-labels Generation as Image Matting for Weakly Supervised Semantic Segmentation. 755-765 - Rui Yang, Lin Song, Yixiao Ge, Xiu Li:
BoxSnake: Polygonal Instance Segmentation with Box Supervision. 766-776 - Quan Tang, Bowen Zhang, Jiajun Liu, Fagui Liu, Yifan Liu:
Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation. 777-786 - Yichen Liu, Benran Hu, Junkai Huang, Yu-Wing Tai, Chi-Keung Tang:
Instance Neural Radiance Field. 787-796 - Kunyang Han, Yong Liu, Jun Hao Liew, Henghui Ding, Jiajun Liu, Yitong Wang, Yansong Tang, Yujiu Yang, Jiashi Feng, Yao Zhao, Yunchao Wei:
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation. 797-807 - Duo Peng, Ping Hu, Qiuhong Ke, Jun Liu:
Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation. 808-820 - Yuhe Liu, Chuanjian Liu, Kai Han, Quan Tang, Zengchang Qin:
Boosting Semantic Segmentation from the Perspective of Explicit Class Embeddings. 821-831 - Hala Lamdouar, Weidi Xie, Andrew Zisserman:
The Making and Breaking of Camouflage. 832-842 - Zekang Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, Yunchao Wei:
CoinSeg: Contrast Inter- and Intra- Class Representations for Incremental Segmentation. 843-853 - Xueyi Liu, Bin Wang, He Wang, Li Yi:
Few-Shot Physically-Aware Articulated Mesh Generation via Hierarchical Deformation. 854-864 - Fenggen Yu, Yiming Qian, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, Hao Zhang:
HAL3D: Hierarchical Active Learning for Fine-Grained 3D Part Labeling. 865-875 - Tianyi Shi, Xiaohuan Ding, Liang Zhang, Xin Yang:
FreeCOS: Self-Supervised Learning from Fractals and Unlabeled Images for Curvilinear Object Segmentation. 876-886 - Xin Xu, Tianyi Xiong, Zheng Ding, Zhuowen Tu:
MasQCLIP for Open-Vocabulary Universal Image Segmentation. 887-898 - Kaining Ying, Qing Zhong, Weian Mao, Zhenhua Wang, Hao Chen, Lin Yuanbo Wu, Yifan Liu, Chengxiang Fan, Yunzhi Zhuge, Chunhua Shen:
CTVIS: Consistent Training for Online Video Instance Segmentation. 899-908 - Ting Chen, Lala Li, Saurabh Saxena, Geoffrey E. Hinton, David J. Fleet:
A Generalist Framework for Panoptic Segmentation of Images and Videos. 909-919 - Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Ajmal Mian:
Spectrum-guided Multi-granularity Referring Video Object Segmentation. 920-930 - Changqi Wang, Haoyu Xie, Yuhui Yuan, Chong Fu, Xiangyu Yue:
Space Engage: Collaborative Space Supervision for Contrastive-based Semi-Supervised Semantic Segmentation. 931-942 - Hoyoung Kim, Minhyeon Oh, Sehyun Hwang, Suha Kwak, Jungseul Ok:
Adaptive Superpixel for Active Learning in Semantic Segmentation. 943-953 - Yuxin Mao, Jing Zhang, Mochu Xiang, Yiran Zhong, Yuchao Dai:
Multimodal Variational Auto-encoder based Audio-Visual Segmentation. 954-965 - Yichen Yuan, Yifan Wang, Lijun Wang, Xiaoqi Zhao, Huchuan Lu, Yu Wang, Weibo Su, Lei Zhang:
Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation. 966-976 - Cheng-Kun Yang, Min-Hung Chen, Yung-Yu Chuang, Yen-Yu Lin:
2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision. 977-987 - Mischa Dombrowski, Hadrien Reynaud, Matthew Baugh, Bernhard Kainz:
Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models. 988-998 - Muzhi Zhu, Hengtao Li, Hao Chen, Chengxiang Fan, Weian Mao, Chenchen Jing, Yifan Liu, Chunhua Shen:
SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning. 999-1008 - Boyang Li, Yingqian Wang, Longguang Wang, Fei Zhang, Ting Liu, Zaiping Lin, Wei An, Yulan Guo:
Monte Carlo Linear Clustering with Single-Point Supervision is Enough for Infrared Small Target Detection. 1009-1019 - Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianwei Yang, Lei Zhang:
A Simple Framework for Open-Vocabulary Segmentation and Detection. 1020-1031 - Zongwei Wu, Danda Pani Paudel, Deng-Ping Fan, Jingjing Wang, Shuo Wang, Cédric Demonceaux, Radu Timofte, Luc Van Gool:
Source-free Depth for Object Pop-out. 1032-1042 - Amit Kumar Rana, Sabarinath Mahadevan, Alexander Hermans, Bastian Leibe:
DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer. 1043-1052 - Junzhang Chen, Xiangzhi Bai:
Atmospheric Transmission and Thermal Inertia Induced Blind Road Segmentation with a Large-Scale Dataset TBRSD. 1053-1063 - Yuxi Wang, Jian Liang, Jun Xiao, Shuqi Mei, Yuran Yang, Zhaoxiang Zhang:
Informative Data Mining for One-shot Cross-Domain Semantic Segmentation. 1064-1074 - Shan Wang, Chuong Nguyen, Jiawei Liu, Kaihao Zhang, Wenhan Luo, Yanhao Zhang, Sundaram Muthu, Fahira Afzal Maken, Hongdong Li:
Homography Guided Temporal Fusion for Road Line and Marking Segmentation. 1075-1085 - Cong Han, Yujie Zhong, Dengjie Li, Kai Han, Lin Ma:
Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network. 1086-1096 - Junlong Li, Bingyao Yu, Yongming Rao, Jie Zhou, Jiwen Lu:
TCOVIS: Temporally Consistent Online Video Instance Segmentation. 1097-1107 - Liyi Chen, Chenyang Lei, Ruihuang Li, Shuai Li, Zhaoxiang Zhang, Lei Zhang:
FPR: False Positive Rectification for Weakly Supervised Semantic Segmentation. 1108-1118 - Lukas Zbinden, Lars Doorenbos, Theodoros Pissas, Adrian Thomas Huber, Raphael Sznitman, Pablo Márquez-Neila:
Stochastic Segmentation with Conditional Categorical Diffusion Models. 1119-1129 - Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang:
SegGPT: Towards Segmenting Everything In Context. 1130-1140 - Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao:
Open-vocabulary Panoptic Segmentation with Embedding Modulation. 1141-1150 - Yuyuan Liu, Choubo Ding, Yu Tian, Guansong Pang, Vasileios Belagiannis, Ian D. Reid, Gustavo Carneiro:
Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation. 1151-1161 - Pitchaporn Rewatbowornwong, Nattanat Chatthee, Ekapol Chuangsuwanich, Supasorn Suwajanakorn:
Zero-guidance Segmentation Using Zero Segment Labels. 1162-1172 - Jiawei Liu, Changkun Ye, Shan Wang, Ruikai Cui, Jing Zhang, Kaihao Zhang, Nick Barnes:
Model Calibration in Dense Classification with Adaptive Label Perturbation. 1173-1184 - Jie Ma, Chuan Wang, Yang Liu, Liang Lin, Guanbin Li:
Enhanced Soft Label for Semi-Supervised Semantic Segmentation. 1185-1195 - Kaixin Cai, Pengzhen Ren, Yi Zhu, Hang Xu, Jianzhuang Liu, Changlin Li, Guangrun Wang, Xiaodan Liang:
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation. 1196-1205 - Weijia Wu, Yuzhong Zhao, Mike Zheng Shou, Hong Zhou, Chunhua Shen:
DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models. 1206-1217 - Rui Sun, Yuan Wang, Huayu Mai, Tianzhu Zhang, Feng Wu:
Alignment Before Aggregation: Trajectory Memory Retrieval Network for Video Object Segmentation. 1218-1228 - Peixia Li, Pulak Purkait, Thalaiyasingam Ajanthan, Majid Abdolshah, Ravi Garg, Hisham Husain, Chenchen Xu, Stephen Gould, Wanli Ouyang, Anton van den Hengel:
Semi-Supervised Semantic Segmentation under Label Noise via Diverse Learning Groups. 1229-1238 - Cody Simons, Dripta S. Raychaudhuri, Sk Miraj Ahmed, Suya You, Konstantinos Karydis, Amit K. Roy-Chowdhury:
SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets. 1239-1249 - Yu-Hsing Hsieh, Guan-Sheng Chen, Shun-Xian Cai, Ting-Yun Wei, Huei-Fang Yang, Chu-Song Chen:
Class-incremental Continual Learning for Instance Segmentation with Image-level Weak Supervision. 1250-1261 - Jianxiong Gao, Xuelin Qian, Yikai Wang, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu:
Coarse-to-Fine Amodal Segmentation with Shape Prior. 1262-1271 - Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu:
Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation. 1272-1281 - Tao Zhang, Xingye Tian, Yu Wu, Shunping Ji, Xuebo Wang, Yuan Zhang, Pengfei Wan:
DVIS: Decoupled Video Instance Segmentation Framework. 1282-1291 - Ayça Takmaz, Jonas Schult, Irem Kaftan, Mertcan Akçay, Bastian Leibe, Robert W. Sumner, Francis Engelmann, Siyu Tang:
3D Segmentation of Humans in Point Clouds with Synthetic Data. 1292-1304 - Shijie Lian, Hua Li, Runmin Cong, Suqi Li, Wei Zhang, Sam Kwong:
WaterMask: Instance Segmentation for Underwater Imagery. 1305-1315 - Ho Kei Cheng, Seoung Wug Oh, Brian L. Price, Alexander G. Schwing, Joon-Young Lee:
Tracking Anything with Decoupled Video Segmentation. 1316-1326 - Chenming Li, Daoan Zhang, Wenjian Huang, Jianguo Zhang:
Cross Contrasting Feature Perturbation for Domain Generalization. 1327-1337 - Lei Fan, Bo Liu, Haoxiang Li, Ying Wu, Gang Hua:
Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance. 1338-1347 - Rabab Abdelfattah, Qing Guo, Xiaoguang Li, Xiaofeng Wang, Song Wang:
CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification. 1348-1357 - Jongyoun Noh, Hyekang Park, Junghyup Lee, Bumsub Ham:
RankMixup: Ranking-Based Mixup Training for Network Calibration. 1358-1368 - Yang Lu, Yiliang Zhang, Bo Han, Yiu-Ming Cheung, Hanzi Wang:
Label-Noise Learning with Intrinsically Long-Tailed Data. 1369-1378 - Xingyu Liu, Sanping Zhou, Le Wang, Gang Hua:
Parallel Attention Interaction Network for Few-Shot Skeleton-based Action Recognition. 1379-1388 - Jiangning Zhang, Xiangtai Li, Jian Li, Liang Liu, Zhucun Xue, Boshen Zhang, Zhengkai Jiang, Tianxin Huang, Yabiao Wang, Chengjie Wang:
Rethinking Mobile Block for Efficient Attention-based Models. 1389-1400 - Dongjun Lee, Seokwon Song, Jihee Suh, Joonmyeong Choi, Sanghyeok Lee, Hyunwoo J. Kim:
Read-only Prompt Optimization for Vision-Language Few-shot Learning. 1401-1411 - Zhongzhan Huang, Mingfu Liang, Jinghui Qin, Shanshan Zhong, Liang Lin:
Understanding Self-attention Mechanism via Dynamical System Perspective. 1412-1422 - Wenqiao Zhang, Changshuo Liu, Lingze Zeng, Beng Chin Ooi, Siliang Tang, Yueting Zhuang:
Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels. 1423-1432 - Shunxin Wang, Raymond N. J. Veldhuis, Christoph Brune, Nicola Strisciuglio:
What do neural networks learn in image classification? A frequency shortcut perspective. 1433-1442 - Tong Liang, Jim Davis:
Inducing Neural Collapse to a Fixed Hierarchy-Aware Frame for Reducing Mistake Severity. 1443-1452 - Reza Averly, Wei-Lun Chao:
Unified Out-Of-Distribution Detection: A Model-Specific Perspective. 1453-1463 - Myeongho Jeon, Myungjoo Kang, Joonseok Lee:
A Unified Framework for Robustness on Diverse Sampling Errors. 1464-1472 - Xuelin Zhu, Jian Liu, Weijia Liu, Jiawei Ge, Bo Liu, Jiuxin Cao:
Scene-Aware Label Graph Learning for Multi-Label Image Classification. 1473-1482 - Xiaobo Xia, Jiankang Deng, Wei Bao, Yuxuan Du, Bo Han, Shiguang Shan, Tongliang Liu:
Holistic Label Correction for Noisy Multi-Label Classification. 1483-1493 - Guiping Cao, Shengda Luo, Wenjian Huang, Xiangyuan Lan, Dongmei Jiang, Yaowei Wang, Jianguo Zhang:
Strip-MLP: Efficient Token Interaction for Vision MLP. 1494-1504 - Ke Xu, Lei Han, Ye Tian, Shangshang Yang, Xingyi Zhang:
EQ-Net: Elastic Quantization Neural Networks. 1505-1514 - Renrong Shao, Wei Zhang, Jianhua Yin, Jun Wang:
Data-free Knowledge Distillation for Fine-grained Visual Categorization. 1515-1525 - Xilin He, Qinliang Lin, Cheng Luo, Weicheng Xie, Siyang Song, Feng Liu, Linlin Shen:
Shift from Texture-bias to Shape-bias: Edge Deformation-based Augmentation for Robust Object Recognition. 1526-1535 - Isack Lee, Eungi Lee, Seok Bong Yoo:
Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition. 1536-1546 - Nan Zhou, Jiaxin Chen, Di Huang:
DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration. 1547-1556 - Jaewoo Park, Jacky Chen Long Chai, Jaeho Yoon, Andrew Beng Jin Teoh:
Understanding the Feature Norm for Out-of-Distribution Detection. 1557-1567 - Ruoyi Du, Wenqing Yu, Heqing Wang, Ting-En Lin, Dongliang Chang, Zhanyu Ma:
Multi-View Active Fine-Grained Visual Recognition. 1568-1578 - Ruiyuan Gao, Chenchen Zhao, Lanqing Hong, Qiang Xu:
DiffGuard: Semantic Mismatch-Guided Out-of-Distribution Detection using Pre-trained Diffusion Models. 1579-1589 - Yurong Guo, Ruoyi Du, Yuan Dong, Timothy M. Hospedales, Yi-Zhe Song, Zhanyu Ma:
Task-aware Adaptive Learning for Cross-domain Few-shot Learning. 1590-1599 - Qidong Huang, Xiaoyi Dong, Dongdong Chen, Yinpeng Chen, Lu Yuan, Gang Hua, Weiming Zhang, Nenghai Yu:
Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting. 1600-1610 - Shouwen Wang, Qian Wan, Xiang Xiang, Zhigang Zeng:
Saliency Regularization for Self-Training with Partial Annotations. 1611-1620 - Lanyun Zhu, Tianrun Chen, Jianxiong Yin, Simon See, Jun Liu:
Learning Gabor Texture Features for Fine-Grained Recognition. 1621-1631 - Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Limin Wang, Yu Qiao:
UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding. 1632-1643 - Ziyi Zhang, Weikai Chen, Chaowei Fang, Zhen Li, Lechao Chen, Liang Lin, Guanbin Li:
RankMatch: Fostering Confidence and Consistency in Learning with Noisy Labels. 1644-1654 - Yanan Wu, Zhixiang Chi, Yang Wang, Songhe Feng:
MetaGCD: Learning to Continually Learn in Generalized Category Discovery. 1655-1665 - Zhiqiang Shen:
FerKD: Surgical Label Adaptation for Efficient Distillation. 1666-1675 - Chengxin Liu, Hao Lu, Zhiguo Cao, Tongliang Liu:
Point-Query Quadtree for Crowd Counting, Localization, and More. 1676-1685 - Jaewoo Park, Yoon Gyo Jung, Andrew Beng Jin Teoh:
Nearest Neighbor Guidance for Out-of-Distribution Detection. 1686-1695 - HyunJae Lee, Heon Song, Hyeonsoo Lee, Gihyeon Lee, Suyeong Park, Donggeun Yoo:
Bayesian Optimization Meets Self-Distillation. 1696-1705 - Yu-Ming Tang, Yi-Xing Peng, Wei-Shi Zheng:
When Prompt-based Incremental Learning Does Not Meet Strong Pretraining. 1706-1716 - Chengkai Hou, Jieyu Zhang, Tianyi Zhou:
When to Learn What: Model-Adaptive Data Augmentation Curriculum. 1717-1728 - Florent Chiaroni, Jose Dolz, Imtiaz Masud Ziko, Amar Mitiche, Ismail Ben Ayed:
Parametric Information Maximization for Generalized Category Discovery. 1729-1739 - Jiazheng Xing, Mengmeng Wang, Yudi Ruan, Bofan Chen, Yaowei Guo, Boyu Mu, Guang Dai, Jingdong Wang, Yong Liu:
Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching. 1740-1750 - Liang Chen, Yong Zhang, Yibing Song, Anton van den Hengel, Lingqiao Liu:
Domain Generalization via Rationale Invariance. 1751-1760 - Ziqing Wang, Yuetong Fang, Jiahang Cao, Qiang Zhang, Zhongrui Wang, Renjing Xu:
Masked Spiking Transformer. 1761-1771 - Wuxuan Shi, Mang Ye:
Prototype Reminiscence and Augmented Asymmetric Knowledge Aggregation for Non-Exemplar Class-Incremental Learning. 1772-1781 - Yun Li, Zhe Liu, Saurav Jha, Lina Yao:
Distilled Reverse Attention Network for Open-world Compositional Zero-Shot Learning. 1782-1791 - Shuo He, Guowu Yang, Lei Feng:
Candidate-aware Selective Disambiguation Based On Normalized Entropy for Instance-dependent Partial-label Learning. 1792-1801 - Hualiang Wang, Yi Li, Huifeng Yao, Xiaomeng Li:
CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No. 1802-1812 - Benzhi Wang, Yang Yang, Jinlin Wu, Guo-Jun Qi, Zhen Lei:
Self-similarity Driven Scale-invariant Learning for Weakly Supervised Person Search. 1813-1822 - Chanho Ahn, Kikyung Kim, Ji-Won Baek, Jongin Lim, Seungju Han:
Sample-wise Label Confidence Incorporation for Learning with Noisy Labels. 1823-1832 - Xiaobo Xia, Bo Han, Yibing Zhan, Jun Yu, Mingming Gong, Chen Gong, Tongliang Liu:
Combating Noisy Labels with Sample Selection by Mining High-Discrepancy Examples. 1833-1843 - Pingyu Wu, Wei Zhai, Yang Cao, Jiebo Luo, Zheng-Jun Zha:
Spatial-Aware Token for Weakly Supervised Object Localization. 1844-1854 - Sriram Balasubramanian, Soheil Feizi:
Towards Improved Input Masking for Convolutional Neural Networks. 1855-1865 - Robert van der Klis, Stephan Alaniz, Massimiliano Mancini, Cássio Fraga Dantas, Dino Ienco, Zeynep Akata, Diego Marcos:
PDiscoNet: Semantically consistent part discovery for fine-grained recognition. 1866-1876 - Divyansh Srivastava, Tuomas P. Oikarinen, Tsui-Wei Weng:
Corrupting Neuron Explanations of Deep Visual Features. 1877-1886 - Dawid Rymarczyk, Joost van de Weijer, Bartosz Zielinski, Bartlomiej Twardowski:
ICICLE: Interpretable Class Incremental Continual Learning. 1887-1898 - Uddeshya Upadhyay, Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata:
ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models. 1899-1910 - Julia Hornauer, Adrian Holzbock, Vasileios Belagiannis:
Out-of-Distribution Detection for Monocular Depth Estimation. 1911-1921 - Sukrut Rao, Moritz Böhle, Amin Parchami-Araghi, Bernt Schiele:
Studying How to Efficiently and Effectively Guide Models with Explanations. 1922-1933 - Amil Dravid, Yossi Gandelsman, Alexei A. Efros, Assaf Shocher:
Rosetta Neurons: Mining the Common Units in a Model Zoo. 1934-1943 - Nanne van Noord:
Prototype-based Dataset Comparison. 1944-1954 - Haozhe Liu, Mingchen Zhuge, Bing Li, Yuhui Wang, Francesco Faccio, Bernard Ghanem, Jürgen Schmidhuber:
Learning to Identify Critical States for Reinforcement Learning from Videos. 1955-1965 - Alexandros Stergiou, Nikos Deligiannis:
Leaping Into Memories: Space-Time Deep Feature Synthesis. 1966-1976 - Yifei Zhang, Siyi Gu, Yuyang Gao, Bo Pan, Xiaofeng Yang, Liang Zhao:
MAGI: Multi-Annotated Explanation-Guided Learning. 1977-1987 - Wei Huang, Xingyu Zhao, Gaojie Jin, Xiaowei Huang:
SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability. 1988-1998 - Hang Li, Jindong Gu, Rajat Koner, Sahand Sharifzadeh, Volker Tresp:
Do DALL-E and Flamingo Understand Each Other? 1999-2010 - Qihan Huang, Mengqi Xue, Wenqi Huang, Haofei Zhang, Jie Song, Yongcheng Jing, Mingli Song:
Evaluation and Improvement of Interpretability for Self-Explainable Part-Prototype Networks. 2011-2020 - Jingwei Zhang, Farzan Farnia:
MoreauGrad: Sparse and Robust Interpretation of Neural Networks via Moreau Envelope. 2021-2030 - Kelu Yao, Jin Wang, Boyu Diao, Chao Li:
Towards Understanding the Generalization of Deepfake Detectors from a Game-Theoretical View. 2031-2041 - Xue Wang, Zhibo Wang, Haiqin Weng, Hengchang Guo, Zhifei Zhang, Lu Jin, Tao Wei, Kui Ren:
Counterfactual-based Saliency Map: Towards Visual Contrastive Explanations for Neural Networks. 2042-2051 - Giyoung Jeon, Haedong Jeong, Jaesik Choi:
Beyond Single Path Integrated Gradients for Reliable Input Attribution via Randomized Path Sampling. 2052-2061 - Chong Wang, Yuyuan Liu, Yuanhong Chen, Fengbei Liu, Yu Tian, Davis J. McCarthy, Helen Frazer, Gustavo Carneiro:
Learning Support and Trivial Prototypes for Interpretable Image Classification. 2062-2072 - Oren Barkan, Yehonatan Elisha, Yuval Asher, Amit Eshel, Noam Koenigstein:
Visual Explanations via Iterated Integrated Attributions. 2073-2084 - Nan Liu, Yilun Du, Shuang Li, Joshua B. Tenenbaum, Antonio Torralba:
Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models. 2085-2095 - Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, Hongsheng Li:
Human Preference Score: Better Aligning Text-to-image Models with Human Preference. 2096-2105 - Elad Levi, Eli Brosh, Mykola Mykhailych, Meir Perez:
DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer. 2106-2115 - Thanh Van Le, Hao Phung, Thuan Hoang Nguyen, Quan Dao, Ngoc N. Tran, Anh Tuan Tran:
Anti-DreamBooth: Protecting users from personalized text-to-image synthesis. 2116-2127 - Michal J. Tyszkiewicz, Pascal Fua, Eduard Trulls:
GECCO: Geometrically-Conditioned Point Diffusion Models. 2128-2138 - Shengqu Cai, Eric Ryan Chan, Songyou Peng, Mohamad Shahbazi, Anton Obukhov, Luc Van Gool, Gordon Wetzstein:
DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models. 2139-2150 - Korrawe Karunratanakul, Konpat Preechakul, Supasorn Suwajanakorn, Siyu Tang:
Guided Motion Diffusion for Controllable Human Motion Synthesis. 2151-2162 - Yanzhao Zheng, Yunzhou Shi, Yuhao Cui, Zhongzhou Zhao, Zhiling Luo, Wei Zhou:
COOP: Decoupling and Coupling of Whole-Body Grasping Pose Generation. 2163-2173 - Guillaume Couairon, Marlène Careil, Matthieu Cord, Stéphane Lathuilière, Jakob Verbeek:
Zero-shot spatial layout conditioning for text-to-image diffusion models. 2174-2183 - Aibek Alanov, Vadim Titov, Maksim Nakhodnov, Dmitry P. Vetrov:
StyleDomain: Efficient and Lightweight Parameterizations of StyleGAN for One-shot and Few-shot Domain Adaptation. 2184-2194 - Jianfeng Xiang, Jiaolong Yang, Yu Deng, Xin Tong:
GRAM-HD: 3D-Consistent Image Generation at High Resolution with Generative Radiance Manifolds. 2195-2205 - Alexander C. Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, Deepak Pathak:
Your Diffusion Model is Secretly a Zero-Shot Classifier. 2206-2217 - Jiali Cui, Ying Nian Wu, Tian Han:
Learning Hierarchical Features with Joint Latent Space Energy-Based Prior. 2218-2227 - Liang Xu, Ziyang Song, Dongliang Wang, Jing Su, Zhicheng Fang, Chenjing Ding, Weihao Gan, Yichao Yan, Xin Jin, Xiaokang Yang, Wenjun Zeng, Wei Wu:
ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D Human Motion Generation. 2228-2238 - Ruoshi Liu, Chengzhi Mao, Purva Tendulkar, Hao Wang, Carl Vondrick:
Landscape Learning for Neural Network Inversion. 2239-2250 - Martin Nicolas Everaert, Marco Bocchio, Sami Arpa, Sabine Süsstrunk, Radhakrishna Achanta:
Diffusion in Style. 2251-2261 - Gene Chou, Yuval Bahat, Felix Heide:
Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions. 2262-2272 - Xuanmeng Zhang, Jianfeng Zhang, Rohan Chacko, Hongyi Xu, Guoxian Song, Yi Yang, Jiashi Feng:
GETAvatar: Generative Textured Meshes for Animatable Human Avatars. 2273-2282 - Aishwarya Agarwal, Srikrishna Karanam, K. J. Joseph, Apoorv Saxena, Koustava Goswami, Balaji Vasan Srinivasan:
A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis. 2283-2293 - Shilin Lu, Yanzhu Liu, Adams Wai-Kin Kong:
TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition. 2294-2305 - Yijun Qian, Jack Urbanek, Alexander G. Hauptmann, Jungdam Won:
Breaking The Limits of Text-conditioned 3D Motion Synthesis with Elaborative Descriptions. 2306-2316 - Germán Barquero, Sergio Escalera, Cristina Palmero:
BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction. 2317-2327 - Amir Hertz, Kfir Aberman, Daniel Cohen-Or:
Delta Denoising Score. 2328-2337 - Xingyu Chen, Yu Deng, Baoyuan Wang:
Mimic3D: Thriving 3D-Aware GANs via 3D-to-2D Imitation. 2338-2348 - Amit Raj, Srinivas Kaza, Ben Poole, Michael Niemeyer, Nataniel Ruiz, Ben Mildenhall, Shiran Zada, Kfir Aberman, Michael Rubinstein, Jonathan T. Barron, Yuanzhen Li, Varun Jampani:
DreamBooth3D: Subject-Driven Text-to-3D Generation. 2349-2359 - Shuang Song, Yuanbang Liang, Jing Wu, Yu-Kun Lai, Yipeng Qin:
Feature Proliferation - the "Cancer" in StyleGAN and its Treatments. 2360-2370 - Berkay Kicanaoglu, Pablo Garrido, Gaurav Bharaj:
Unsupervised Facial Performance Editing via Vector-Quantized StyleGAN Representations. 2371-2382 - Jianfeng Xiang, Jiaolong Yang, Binbin Huang, Xin Tong:
3D-aware Image Generation using 2D Diffusion Models. 2383-2393 - Ganghun Lee, Minji Kim, Yunsu Lee, Minsu Lee, Byoung-Tak Zhang:
Neural Collage Transfer: Artistic Reconstruction via Material Manipulation. 2394-2405 - Teng Hu, Jiangning Zhang, Liang Liu, Ran Yi, Siqi Kou, Haokun Zhu, Xu Chen, Yabiao Wang, Chengjie Wang, Lizhuang Ma:
Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption. 2406-2415 - Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, Hao Su:
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction. 2416-2425 - Rohit Gandikota, Joanna Materzynska, Jaden Fiotto-Kaufman, David Bau:
Erasing Concepts from Diffusion Models. 2426-2436 - Ziyang Yuan, Yiming Zhu, Yu Li, Hongyu Liu, Chun Yuan:
Make Encoder Great Again in 3D GAN Inversion through Geometry and Occlusion-Aware Encoding. 2437-2447 - Seunggyu Chang, Gihoon Kim, Hayeon Kim:
HairNeRF: Geometry-Aware Image Synthesis for Hairstyle Transfer. 2448-2458 - Yuanze Lin, Chen Wei, Huiyu Wang, Alan L. Yuille, Cihang Xie:
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training. 2459-2469 - Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Xiangyang Ji, Chang Liu, Li Yuan, Jie Chen:
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model. 2470-2481 - Anwen Hu, Shizhe Chen, Liang Zhang, Qin Jin:
Explore and Tell: Embodied Visual Captioning in 3D Environments. 2482-2491 - Xuanlin Li, Yunhao Fang, Minghua Liu, Zhan Ling, Zhuowen Tu, Hao Su:
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability. 2492-2503 - Xu Yang, Zhangzikang Li, Haiyang Xu, Hanwang Zhang, Qinghao Ye, Chenliang Li, Ming Yan, Yu Zhang, Fei Huang, Songfang Huang:
Learning Trajectory-Word Alignments for Video-Language Tasks. 2504-2514 - Dizhan Xue, Shengsheng Qian, Changsheng Xu:
Variational Causal Inference Network for Explanatory Visual Question Answering. 2515-2525 - Moon Ye-Bin, Jisoo Kim, Hongyeob Kim, Kilho Son, Tae-Hyun Oh:
TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation. 2526-2537 - Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo:
Segment Every Reference Object in Spatial and Temporal Spaces. 2538-2550 - Juncheng Li, Minghe Gao, Longhui Wei, Siliang Tang, Wenqiao Zhang, Mengze Li, Wei Ji, Qi Tian, Tat-Seng Chua, Yueting Zhuang:
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models. 2551-2562 - Bumsoo Kim, Yeonsik Jo, Jinhyung Kim, Seung-Hwan Kim:
Misalign, Contrast then Distill: Rethinking Misalignments in Language-Image Pretraining. 2563-2572 - Yifeng Zhang, Shi Chen, Qi Zhao:
Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge. 2573-2583 - Junyu Bi, Daixuan Cheng, Ping Yao, Bochen Pang, Yuefeng Zhan, Chuanguang Yang, Yujing Wang, Hao Sun, Weiwei Deng, Qi Zhang:
VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching. 2584-2593 - Ioana Croitoru, Simion-Vlad Bogolin, Samuel Albanie, Yang Liu, Zhaowen Wang, Seunghyun Yoon, Franck Dernoncourt, Hailin Jin, Trung Bui:
Moment Detection in Long Tutorial Videos. 2594-2604 - Xiangyang Zhu, Renrui Zhang, Bowei He, Aojun Zhou, Dong Wang, Bin Zhao, Peng Gao:
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement. 2605-2615 - Nitzan Bitton Guetta, Yonatan Bitton, Jack Hessel, Ludwig Schmidt, Yuval Elovici, Gabriel Stanovsky, Roy Schwartz:
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images. 2616-2627 - Yixuan Wu, Zhao Zhang, Chi Xie, Feng Zhu, Rui Zhao:
Advancing Referring Expression Segmentation Beyond Single Image. 2628-2638 - Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Ziyao Zeng, Zipeng Qin, Shanghang Zhang, Peng Gao:
PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning. 2639-2650 - Weizhen He, Weijie Chen, Binbin Chen, Shicai Yang, Di Xie, Luojun Lin, Donglian Qi, Yueting Zhuang:
Unsupervised Prompt Tuning for Text-Driven Object Detection. 2651-2661 - Zehan Wang, Haifeng Huang, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao:
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding. 2662-2671 - Sophia Gu, Christopher Clark, Aniruddha Kembhavi:
I can't believe there's no images! : Learning Visual Tasks Using Only Language Supervision. 2672-2683 - Guanghui Li, Mingqi Gao, Heng Liu, Xiantong Zhen, Feng Zheng:
Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples. 2684-2693 - Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Chen Change Loy:
MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions. 2694-2703 - Chun-Mei Feng, Kai Yu, Yong Liu, Salman Khan, Wangmeng Zuo:
Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning. 2704-2714 - Xi Tian, Yong-Liang Yang, Qi Wu:
ShapeScaffolder: Structure-Aware 3D Shape Generation from Text. 2715-2724 - Vishaal Udandarao, Ankush Gupta, Samuel Albanie:
SuS-X: Training-Free Name-Only Transfer of Vision-Language Models. 2725-2736 - Yiwei Ma, Haowei Wang, Xiaoqing Zhang, Guannan Jiang, Xiaoshuai Sun, Weilin Zhuang, Jiayi Ji, Rongrong Ji:
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance. 2737-2748 - Dongming Wu, Tiancai Wang, Yuang Zhang, Xiangyu Zhang, Jianbing Shen:
OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation. 2749-2758 - Yifan Yang, Weiquan Huang, Yixuan Wei, Houwen Peng, Xinyang Jiang, Huiqiang Jiang, Fangyun Wei, Yin Wang, Han Hu, Lili Qiu, Yuqing Yang:
Attentive Mask CLIP. 2759-2769 - Jiangtong Li, Li Niu, Liqing Zhang:
Knowledge Proxy Intervention for Deconfounded Video Question Answering. 2770-2781 - Kevin Qinghong Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex Jinpeng Wang, Rui Yan, Mike Zheng Shou:
UniVTG: Towards Unified Video-Language Temporal Grounding. 2782-2792 - Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, Chenggang Yan, Qingming Huang:
Self-supervised Cross-view Representation Reconstruction for Change Captioning. 2793-2803 - Ziyang Wang, Yi-Lin Sung, Feng Cheng, Gedas Bertasius, Mohit Bansal:
Unified Coarse-to-Fine Alignment for Video-Text Retrieval. 2804-2815 - Yang Liu, Jiahua Zhang, Qingchao Chen, Yuxin Peng:
Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding. 2816-2826 - Chengyang Zhao, Yikang Shen, Zhenfang Chen, Mingyu Ding, Chuang Gan:
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions. 2827-2838 - Wei Lin, Leonid Karlinsky, Nina Shvetsova, Horst Possegger, Mateusz Kozinski, Rameswar Panda, Rogério Feris, Hilde Kuehne, Horst Bischof:
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge. 2839-2850 - Yaowei Li, Bang Yang, Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yuexian Zou:
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation. 2851-2862 - Devaansh Gupta, Siddhant Kharbanda, Jiawei Zhou, Wanhua Li, Hanspeter Pfister, Donglai Wei:
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation. 2863-2874 - Morris Alper, Hadar Averbuch-Elor:
Learning Human-Human Interactions in Images from Weak Textual Supervision. 2875-2887 - Chaoya Jiang, Haiyang Xu, Wei Ye, Qinghao Ye, Chenliang Li, Ming Yan, Bin Bi, Shikun Zhang, Fei Huang, Songfang Huang:
BUS : Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization. 2888-2898 - Ziyu Zhu, Xiaojian Ma, Yixin Chen, Zhidong Deng, Siyuan Huang, Qing Li:
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment. 2899-2909 - Kaicheng Yang, Jiankang Deng, Xiang An, Jiawei Li, Ziyong Feng, Jia Guo, Jing Yang, Tongliang Liu:
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption. 2910-2919 - Cheng Shi, Sibei Yang:
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models. 2920-2929 - Wooyoung Kang, Jonghwan Mun, Sungjun Lee, Byungseok Roh:
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning. 2930-2940 - Zi Qian, Xin Wang, Xuguang Duan, Pengda Qin, Yuhong Li, Wenwu Zhu:
Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering. 2941-2950 - Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo:
PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3. 2951-2963 - Yu Wu, Yana Wei, Haozhe Wang, Yongfei Liu, Sibei Yang, Xuming He:
Grounded Image Text Matching with Mismatched Relation Reasoning. 2964-2975 - Mohamed Ashraf Abdelsalam, Samrudhdhi B. Rangrej, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Afsaneh Fazly:
GePSAn: Generative Procedure Step Anticipation in Cooking Videos. 2976-2985 - Chan Hee Song, Brian M. Sadler, Jiaman Wu, Wei-Lun Chao, Clayton Washington, Yu Su:
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models. 2986-2997 - Zi-Yuan Hu, Yanyang Li, Michael R. Lyu, Liwei Wang:
VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control. 2998-3008 - Manuele Barraco, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara:
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. 3009-3019 - Jaemin Cho, Abhay Zala, Mohit Bansal:
DALL-EVAL: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models. 3020-3031 - Yicong Hong, Yang Zhou, Ruiyi Zhang, Franck Dernoncourt, Trung Bui, Stephen Gould, Hao Tan:
Learning Navigational Visual Representations with Semantic Map Supervision. 3032-3044 - Jiajin Tang, Ge Zheng, Jingyi Yu, Sibei Yang:
CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection. 3045-3055 - Nan Xi, Jingjing Meng, Junsong Yuan:
Open Set Video HOI detection from Action-centric Chain-of-Look Prompting. 3056-3066 - An Yan, Yu Wang, Yiwu Zhong, Chengyu Dong, Zexue He, Yujie Lu, William Yang Wang, Jingbo Shang, Julian J. McAuley:
Learning Concise and Descriptive Attributes for Visual Recognition. 3067-3077 - Dohwan Ko, Ji Soo Lee, Miso Choi, Jaewon Chu, Jihwan Park, Hyunwoo J. Kim:
Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models. 3078-3089 - Thomas Mensink, Jasper R. R. Uijlings, Lluís Castrejón, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araújo, Vittorio Ferrari:
Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories. 3090-3101 - Daechul Ahn, Daneul Kim, Gwangmo Song, Seung Hwan Kim, Honglak Lee, Dongyeop Kang, Jonghyun Choi:
Story Visualization by Online Text Augmentation with Context Memory. 3102-3112 - Junjie Fei, Teng Wang, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng:
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning. 3113-3123 - Alex Jinpeng Wang, Kevin Qinghong Lin, David Junhao Zhang, Stan Weixian Lei, Mike Zheng Shou:
Too Large; Data Reduction for Vision-Language Pre-Training. 3124-3134 - Weihan Wang, Zhen Yang, Bin Xu, Juanzi Li, Yankui Sun:
ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation. 3135-3146 - Roni Paiss, Ariel Ephrat, Omer Tov, Shiran Zada, Inbar Mosseri, Michal Irani, Tali Dekel:
Teaching CLIP to Count to Ten. 3147-3157 - Junsheng Zhou, Baorui Ma, Shujuan Li, Yu-Shen Liu, Zhizhong Han:
Learning a More Continuous Zero Level Set in Unsigned Distance Fields through Level Set Projection. 3158-3169 - Wenyan Cong, Hanxue Liang, Peihao Wang, Zhiwen Fan, Tianlong Chen, Mukund Varma T., Yi Wang, Zhangyang Wang:
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts. 3170-3181 - Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhenzhi Wang, Dahua Lin, Bo Dai:
MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond. 3182-3192 - Aron Schmied, Tobias Fischer, Martin Danelljan, Marc Pollefeys, Fisher Yu:
R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras. 3193-3203 - Yuan Li, Zhi-Hao Lin, David A. Forsyth, Jia-Bin Huang, Shenlong Wang:
ClimateNeRF: Extreme Weather Synthesis in Neural Radiance Field. 3204-3215 - Tiange Xiang, Adam Sun, Jiajun Wu, Ehsan Adeli, Li Fei-Fei:
Rendering Humans from Object-Occluded Monocular Videos. 3216-3227 - Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Bo Dai, Dahua Lin:
AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation. 3228-3238 - Yingfei Liu, Junjie Yan, Fan Jia, Shuailin Li, Aqi Gao, Tiancai Wang, Xiangyu Zhang:
PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images. 3239-3249 - Takuhiro Kaneko:
MIMO-NeRF: Fast Neural Rendering with Multi-input Multi-output Neural Radiance Fields. 3250-3260 - Zelin Gao, Weichen Dai, Yu Zhang:
Adaptive Positional Encoding for Bundle-Adjusting Neural Radiance Fields. 3261-3271 - Yiming Wang, Qin Han, Marc Habermann, Kostas Daniilidis, Christian Theobalt, Lingjie Liu:
NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction. 3272-3283 - Qitong Wang, Long Zhao, Liangzhe Yuan, Ting Liu, Xi Peng:
Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition. 3284-3294 - Junpeng Jing, Jiankun Li, Pengfei Xiong, Jiangyu Liu, Shuaicheng Liu, Yichen Guo, Xin Deng, Mai Xu, Lai Jiang, Leonid Sigal:
Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching. 3295-3304 - Martin Bråtelund, Felix Rydell:
Compatibility of Fundamental Matrices for Complete Viewing Graphs. 3305-3313 - Pin Tang, Hai-Ming Xu, Chao Ma:
ProtoTransfer: Cross-Modal Prototype Transfer for Point Cloud Segmentation. 3314-3324 - Jinqing Zhang, Yanan Zhang, Qingjie Liu, Yunhong Wang:
SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection. 3325-3334 - Ziying Song, Haiyue Wei, Lin Bai, Lei Yang, Caiyan Jia:
GraphAlign: Enhancing Accurate Feature Alignment by Graph matching for Multi-Modal 3D Object Detection. 3335-3346 - Mikhail Terekhov, Viktor Larsson:
Tangent Sampson Error: Fast Approximate Two-view Reprojection Error for Central Camera Models. 3347-3355 - Gilles Puy, Alexandre Boulch, Renaud Marlet:
Using a Waffle Iron for Automotive Point Cloud Semantic Segmentation. 3356-3366 - Levente Hajder, Lajos Lóczi, Daniel Barath:
Fast Globally Optimal Surface Normal from an Affine Correspondence. 3367-3378 - Marcel C. Bühler, Kripasindhu Sarkar, Tanmay Shah, Gengyan Li, Daoye Wang, Leonhard Helminger, Sergio Orts-Escolano, Dmitry Lagun, Otmar Hilliges, Thabo Beeler, Abhimitra Meka:
Preface: A Data-driven Volumetric Prior for Few-shot Ultra High-resolution Face Synthesis. 3379-3390 - Brent Yi, Weijia Zeng, Sam Buchanan, Yi Ma:
Canonical Factors for Hybrid Neural Fields. 3391-3403 - Haobo Jiang, Zheng Dang, Shuo Gu, Jin Xie, Mathieu Salzmann, Jian Yang:
Center-Based Decoupled Point Cloud Registration for 6D Object Pose Estimation. 3404-3414 - Annika Hagemann, Moritz Knorr, Christoph Stiller:
Deep geometry-aware camera self-calibration from video. 3415-3425 - Nathaniel Burgdorfer, Philippos Mordohai:
V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints. 3426-3435 - Yuxiang Cai, Yifan Zhu, Haiwei Zhang, Bo Ren:
Consistent Depth Prediction for Transparent Object Reconstruction from RGB-D Camera. 3436-3445 - Sungwon Hwang, Junha Hyung, Daejin Kim, Min-Jung Kim, Jaegul Choo:
FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields. 3446-3456 - Xiufeng Xie, Riccardo Gherardi, Zhihong Pan, Stephen Huang:
HollowNeRF: Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation. 3457-3467 - Jae-Hyeok Lee, Dae-Shik Kim:
ICE-NeRF: Interactive Color Editing of NeRFs via Decomposition-Aware Weight Optimization. 3468-3478 - Zhijian Huang, Sihao Lin, Guiyu Liu, Mukun Luo, Chaoqiang Ye, Hang Xu, Xiaojun Chang, Xiaodan Liang:
FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration. 3479-3488 - Aarrushi Shandilya, Benjamin Attal, Christian Richardt, James Tompkin, Matthew O'Toole:
Neural Fields for Structured Lighting. 3489-3499 - Tao Xie, Ke Wang, Siyi Lu, Yukun Zhang, Kun Dai, Xiaoyu Li, Jie Xu, Li Wang, Lijun Zhao, Xinyu Zhang, Ruifeng Li:
CO-Net: Learning Multiple Point Cloud Tasks at Once with A Cohesive Network. 3500-3510 - Jiahui Zhang, Fangneng Zhan, Yingchen Yu, Kunhao Liu, Rongliang Wu, Xiaoqin Zhang, Ling Shao, Shijian Lu:
Pose-Free Neural Radiance Fields via Implicit Pose Regularization. 3511-3520 - Xiao Pan, Zongxin Yang, Jianxin Ma, Chang Zhou, Yi Yang:
TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering. 3521-3532 - Haoyu Wu, Alexandros Graikos, Dimitris Samaras:
S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces. 3533-3545 - Chaoran Tian, Weihong Pan, Zimo Wang, Mao Mao, Guofeng Zhang, Hujun Bao, Ping Tan, Zhaopeng Cui:
DPS-Net: Deep Polarimetric Stereo Depth Estimation. 3546-3556 - Changyong Shu, Jiajun Deng, Fisher Yu, Yifan Liu:
3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection. 3557-3566 - Qi Ma, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool:
Deformable Neural Radiance Fields using RGB and Event Cameras. 3567-3577 - Jingyang Zhang, Yao Yao, Shiwei Li, Jingbo Liu, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan:
NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation. 3578-3587 - Chunlin Ren, Qingshan Xu, Shikun Zhang, Jiaqi Yang:
Hierarchical Prior Mining for Non-local Multi-View Stereo. 3588-3597 - Shihao Wang, Yingfei Liu, Tiancai Wang, Ying Li, Xiangyu Zhang:
Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection. 3598-3608 - Sara Rojas, Jesus Zarzar, Juan C. Pérez, Artsiom Sanakoyeu, Ali K. Thabet, Albert Pumarola, Bernard Ghanem:
Re-ReND: Real-time Rendering of NeRFs across Devices. 3609-3618 - Xiaoyang Huang, Yi Zhang, Kai Chen, Teng Li, Wenjun Zhang, Bingbing Ni:
Learning Shape Primitives via Implicit Convexity Regularization. 3619-3628 - Ruihong Yin, Sezer Karaoglu, Theo Gevers:
Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction. 3629-3638 - Zhiwei Zhang, Zhizhong Zhang, Qian Yu, Ran Yi, Yuan Xie, Lizhuang Ma:
LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment. 3639-3648 - Wenjie Ding, Limeng Qiao, Xi Qiu, Chi Zhang:
PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction. 3649-3659 - Ming Qian, Jincheng Xiong, Gui-Song Xia, Nan Xue:
Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs. 3660-3669 - Xin Lai, Yuhui Yuan, Ruihang Chu, Yukang Chen, Han Hu, Jiaya Jia:
Mask-Attention-Free Transformer for 3D Instance Segmentation. 3670-3680 - Xiaoyong Lu, Yaping Yan, Tong Wei, Songlin Du:
Scene-Aware Feature Matching. 3681-3690 - Zhuoxiao Chen, Yadan Luo, Zheng Wang, Mahsa Baktashmotlagh, Zi Huang:
Revisiting Domain-Adaptive 3D Object Detection by Reliable, Diverse and Class-balanced Pseudo-Labeling. 3691-3703 - Youmin Zhang, Fabio Tosi, Stefano Mattoccia, Matteo Poggi:
GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction. 3704-3714 - Valter Piedade, Pedro Miraldo:
BANSAC: A dynamic BAyesian Network for adaptive SAmple Consensus. 3715-3724 - Felix Rydell, Elima Shehu, Angélica Torres:
Theoretical and Numerical Analysis of 3D Reconstruction Using Point and Line Incidences. 3725-3734 - Haozhe Lin, Zequn Chen, Jinzhi Zhang, Bing Bai, Yu Wang, Ruqi Huang, Lu Fang:
RealGraph: A Multiview Dataset for 4D Real-world Context Graph Generation. 3735-3745 - Kaiqiang Xiong, Rui Peng, Zhe Zhang, Tianxing Feng, Jianbo Jiao, Feng Gao, Ronggang Wang:
CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning. 3746-3757 - Zhuofan Zong, Dongzhi Jiang, Guanglu Song, Zeyue Xue, Jingyong Su, Hongsheng Li, Yu Liu:
Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction. 3758-3767 - Zitian Wang, Zehao Huang, Jiahui Fu, Naiyan Wang, Si Liu:
Object as Query: Lifting any 2D Object Detector to 3D Detection. 3768-3777 - Ming Nie, Yujing Xue, Chunwei Wang, Chaoqiang Ye, Hang Xu, Xinge Zhu, Qingqiu Huang, Michael Bi Mi, Xinchao Wang, Li Zhang:
PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection. 3778-3790 - Chuxin Wang, Wenfei Yang, Tianzhu Zhang:
Not Every Side Is Equal: Localization Uncertainty Estimation for Semi-Supervised 3D Object Detection. 3791-3801