Starred repositories
OpenUS: A Fully Open-Source Foundation Model for Ultrasound Image Analysis via Self-Adaptive Masked Contrastive Learning
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
WiLoR: End-to-end 3D hand localization and reconstruction in-the-wild
A biologically interpretable graph neural network for multistain computational pathology - CVPR 2025
No fortress, purely open ground. OpenManus is Coming.
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
Latest Advances on System-2 Reasoning
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
[ECCV 2024] Official implementation of the paper "X-Pose: Detecting Any Keypoints"
Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?
An official pytorch implementation of "MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[NeurIPS 2024] Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
Unifying 3D Mesh Generation with Language Models
[CVPR'25] DepthSplat: Connecting Gaussian Splatting and Depth
Official Code for MotionCtrl [SIGGRAPH 2024]
HaMeR: Reconstructing Hands in 3D with Transformers
Research code of ICCV 2021 paper "Mesh Graphormer"
Code for AAAl 2024 paper: Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects
Depth Any Video with Scalable Synthetic Data (ICLR 2025)
Official PyTorch implementation of "Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild", CVPR 2023
[ACM MM 2024] Offical Code for "HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting"
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
PyTorch code and models for the DINOv2 self-supervised learning method.