V2M: Visual 2-Dimensional Mamba for Image Representation Learning

This repository is the official implementation of V2M: Visual 2-Dimensional Mamba for Image Representation Learning

Paper

V2M: Visual 2-Dimensional Mamba for Image Representation Learning

Chengkun Wang, Wenzhao Zheng, Yuanhui Huang, Jie Zhou, Jiwen Lu

Motivation of V2M

Previous vision Mambas processed image tokens with 1D SSM, whereas we extend SSM to a 2D form for more suitable image representation learning by introducing the prior of enhancing the relevance of adjacent regions for modeling.

Overall framework of V2M

Environments of training

Python 3.10.13
- conda create -n your_env_name python=3.10.13
torch 2.1.1 + cu118
- pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
Requirements: v2m_requirements.txt
- pip install -r v2m/v2m_requirements.txt
Install causal_conv1d and mamba
- pip install -e causal_conv1d>=1.1.0
- pip install -e mamba-1p1p1

Train Your V2M

bash v2m/scripts/tiny.sh

bash v2m/scripts/small.sh

The above code trains V2M based on Vim. Application to other vision mamabs only requires transferring the calculation part of SSM to other frameworks.

Results

Acknowledgement

This project is based on Vision Mamba (code), Mamba (code), Causal-Conv1d (code), DeiT (code). Thanks for their wonderful works.

Citation

If you find this project helpful, please consider citing the following paper:

@article{wang2024V2M,
    title={V2M: Visual 2-Dimensional Mamba for Image Representation Learning},
    author={Chengkun Wang and Wenzhao Zheng and Yuanhui Huang and Jie Zhou and Jiwen Lu},
    journal={arXiv preprint arXiv:2410.10382},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
causal-conv1d		causal-conv1d
mamba-1p1p1		mamba-1p1p1
v2m		v2m
README.md		README.md
framework.png		framework.png
motivation.png		motivation.png
result.png		result.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

V2M: Visual 2-Dimensional Mamba for Image Representation Learning

Paper

Motivation of V2M

Overall framework of V2M

Environments of training

Train Your V2M

Results

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

wangck20/V2M

Folders and files

Latest commit

History

Repository files navigation

V2M: Visual 2-Dimensional Mamba for Image Representation Learning

Paper

Motivation of V2M

Overall framework of V2M

Environments of training

Train Your V2M

Results

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages