Skip to content
View jwyang's full-sized avatar
🏠
🏠

Block or report jwyang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,887 152 Updated Oct 4, 2025

Fully open reproduction of DeepSeek-R1

Python 25,783 2,406 Updated Nov 24, 2025

[ICLR 2025] LAPA: Latent Action Pretraining from Videos

Python 431 30 Updated Jan 22, 2025

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Python 37 1 Updated Nov 10, 2024

[CVPR 2024 Highlight] Official PyTorch implementation of SpatialTracker: Tracking Any 2D Pixels in 3D Space

Python 1,033 39 Updated Aug 8, 2025

Matryoshka Multimodal Models

Python 121 9 Updated Jan 22, 2025

[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs".

Python 76 3 Updated Jun 17, 2024

[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Python 145 4 Updated Aug 23, 2024

Reaching LLaMA2 Performance with 0.1M Dollars

Python 988 77 Updated Jul 23, 2024

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Python 751 51 Updated Sep 27, 2024

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Python 1,167 74 Updated Oct 21, 2024
4 Updated Sep 30, 2024
Python 636 34 Updated Feb 15, 2024
Python 422 16 Updated Jul 29, 2024

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python 2,651 221 Updated Dec 22, 2025

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"

Python 523 24 Updated Apr 8, 2024

Browse the web with GPT-4V and Vimium

Python 2,672 200 Updated Sep 25, 2024

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

Python 1,681 133 Updated Jan 14, 2025

AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

JavaScript 1,062 100 Updated Dec 9, 2024

[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs

Python 1,499 112 Updated Aug 19, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 66,725 12,337 Updated Jan 3, 2026

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

Python 1,338 160 Updated Oct 5, 2023

Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]

Python 101 19 Updated Apr 30, 2024

[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

Python 2,793 143 Updated Jul 10, 2025

Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"

Python 413 11 Updated Mar 25, 2024

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Python 4,761 456 Updated Aug 19, 2024

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,633 381 Updated Jun 2, 2025

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

Python 745 47 Updated Jan 22, 2024

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

Python 1,363 85 Updated Jan 23, 2024

Code base for MinD-Vis

Python 785 105 Updated May 24, 2023
Next