VMamba: Visual State Space Model

Liu, Yue; Tian, Yunjie; Zhao, Yuzhong; Yu, Hongtian; Xie, Lingxi; Wang, Yaowei; Ye, Qixiang; Liu, Yunfan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2401.10166 (cs)

[Submitted on 18 Jan 2024 (v1), last revised 26 May 2024 (this version, v3)]

Title:VMamba: Visual State Space Model

Authors:Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, Yunfan Liu

View PDF HTML (experimental)

Abstract:Designing computationally efficient network architectures persists as an ongoing necessity in computer vision. In this paper, we transplant Mamba, a state-space language model, into VMamba, a vision backbone that works in linear time complexity. At the core of VMamba lies a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module. By traversing along four scanning routes, SS2D helps bridge the gap between the ordered nature of 1D selective scan and the non-sequential structure of 2D vision data, which facilitates the gathering of contextual information from various sources and perspectives. Based on the VSS blocks, we develop a family of VMamba architectures and accelerate them through a succession of architectural and implementation enhancements. Extensive experiments showcase VMamba's promising performance across diverse visual perception tasks, highlighting its advantages in input scaling efficiency compared to existing benchmark models. Source code is available at this https URL.

Comments:	25 pages, 14 figures, 15 tables
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2401.10166 [cs.CV]
	(or arXiv:2401.10166v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2401.10166

Submission history

From: Yunjie Tian [view email]
[v1] Thu, 18 Jan 2024 17:55:39 UTC (3,458 KB)
[v2] Wed, 10 Apr 2024 14:25:12 UTC (5,993 KB)
[v3] Sun, 26 May 2024 08:31:28 UTC (8,100 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VMamba: Visual State Space Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VMamba: Visual State Space Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators