jwyang

🏠

Jianwei Yang jwyang

🏠

1.9k followers · 32 following

Stars

microsoft / Magma

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,887 152 Updated Oct 4, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,783 2,406 Updated Nov 24, 2025

LatentActionPretraining / LAPA

[ICLR 2025] LAPA: Latent Action Pretraining from Videos

Python 431 30 Updated Jan 22, 2025

mu-cai / TemporalBench

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Python 37 1 Updated Nov 10, 2024

henry123-boy / SpaTracker

[CVPR 2024 Highlight] Official PyTorch implementation of SpatialTracker: Tracking Any 2D Pixels in 3D Space

Python 1,033 39 Updated Aug 8, 2025

mu-cai / matryoshka-mm

Matryoshka Multimodal Models

Python 121 9 Updated Jan 22, 2025

MengLcool / DeepStack-VL

[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs".

Python 76 3 Updated Jun 17, 2024

zzxslp / SoM-LLaVA

[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Python 145 4 Updated Aug 23, 2024

myshell-ai / JetMoE

Reaching LLaMA2 Performance with 0.1M Dollars

Python 988 77 Updated Jul 23, 2024

jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Python 751 51 Updated Sep 27, 2024

FoundationVision / GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Python 1,167 74 Updated Oct 21, 2024

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python 2,651 221 Updated Dec 22, 2025

UX-Decoder / DINOv

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"

Python 523 24 Updated Apr 8, 2024

ishan0102 / vimGPT

Browse the web with GPT-4V and Vimium

Python 2,672 200 Updated Sep 25, 2024

roboflow / awesome-openai-vision-api-experiments

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

Python 1,681 133 Updated Jan 14, 2025

ddupont808 / GPT-4V-Act

AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

JavaScript 1,062 100 Updated Dec 9, 2024

microsoft / SoM

[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs

Python 1,499 112 Updated Aug 19, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 66,725 12,337 Updated Jan 3, 2026

microsoft / X-Decoder

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

Python 1,338 160 Updated Oct 5, 2023

TalalWasim / Video-FocalNets

Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]

Python 101 19 Updated Apr 30, 2024

UX-Decoder / Semantic-SAM

[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

Python 2,793 143 Updated Jul 10, 2025

Zhendong-Wang / Prompt-Diffusion

Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"

Python 413 11 Updated Mar 25, 2024

UX-Decoder / Segment-Everything-Everywhere-All-At-Once

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Python 4,761 456 Updated Aug 19, 2024

google-research / arxiv-latex-cleaner

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,633 381 Updated Jun 2, 2025

IDEA-Research / OpenSeeD

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

Python 745 47 Updated Jan 22, 2024

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

Python 1,363 85 Updated Jan 23, 2024

zjc062 / mind-vis

Code base for MinD-Vis

Python 785 105 Updated May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly