Skip to content
forked from CnFaker/LLaVA-SP

[ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".

License

fyting/LLaVA-SP

 
 

Repository files navigation

The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs" [Paper]

The implementation changes of LLaVA-SP are in llava_arch.py, clip_encoder.py, llava_trainer.py and train.py.

Install

Please see instructions for https://github.com/haotian-liu/LLaVA/

LLaVA-SP Weights

Please check out https://huggingface.co/Levideus/models for all public LLaVA-SP checkpoints.

Quick Start

python llava/eval/run_llava.py
--model_path /path/llava-sp-cropping-lora
--model_base /path/vicuna-1.5-7b

Citation

If you find LLaVA-SP useful for your research and applications, please cite using this BibTeX:

@misc{lou2025llavasp,
    title={LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs},
    author={Lou, Haoran and Fan, Chunxiao and Liu, Ziyan Liu and Wu, Yuexin Wu and Wang, Xinliang},
    publisher={arXiv:2507.00505},
    year={2025}
}

About

[ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 90.4%
  • Shell 6.1%
  • JavaScript 1.8%
  • HTML 1.4%
  • CSS 0.3%