This repository is the official implementation of V2M: Visual 2-Dimensional Mamba for Image Representation Learning
V2M: Visual 2-Dimensional Mamba for Image Representation Learning
Chengkun Wang, Wenzhao Zheng, Yuanhui Huang, Jie Zhou, Jiwen Lu
Previous vision Mambas processed image tokens with 1D SSM, whereas we extend SSM to a 2D form for more suitable image representation
learning by introducing the prior of enhancing the relevance of adjacent regions for modeling.
-
Python 3.10.13
conda create -n your_env_name python=3.10.13
-
torch 2.1.1 + cu118
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
-
Requirements: v2m_requirements.txt
pip install -r v2m/v2m_requirements.txt
-
Install
causal_conv1dandmambapip install -e causal_conv1d>=1.1.0pip install -e mamba-1p1p1
bash v2m/scripts/tiny.sh
bash v2m/scripts/small.sh
The above code trains V2M based on Vim. Application to other vision mamabs only requires transferring the calculation part of SSM to other frameworks.
This project is based on Vision Mamba (code), Mamba (code), Causal-Conv1d (code), DeiT (code). Thanks for their wonderful works.
If you find this project helpful, please consider citing the following paper:
@article{wang2024V2M,
title={V2M: Visual 2-Dimensional Mamba for Image Representation Learning},
author={Chengkun Wang and Wenzhao Zheng and Yuanhui Huang and Jie Zhou and Jiwen Lu},
journal={arXiv preprint arXiv:2410.10382},
year={2024}
}