-
Tsinghua University
- Beijing
- https://zrp21.notion.site
Stars
Generative Models by Stability AI
The open-source code for the NeurIPS 2025 paper, "Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning."
[Nature Machine Intelligence 2025] Emulating Human-like Adaptive Vision for Efficient and Flexible Machine Visual Perception
Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"
verl: Volcano Engine Reinforcement Learning for LLMs
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[NeurIPS 2025] LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
[NeurIPS 2025] MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
SC-Depth (V1, V2, and V3) for Unsupervised Monocular Depth Estimation Webpage:https://jiawangbian.github.io/sc_depth_pl/
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
[CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
(ICLR 2025 spotlight) "Poison-splat: Computation Cost Attack on 3D Gaussian Splatting"
CODA: Repurposing Continuous VAEs for Discrete Tokenization
A curated list of recent diffusion models for video generation, editing, and various other applications.
Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
[IROS 2025 Award Finalist] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
Official implementation of “4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models” (CVPR 2025)
[CVPR 2025] Official repo for ART:Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch
Official inference repo for FLUX.1 models
[TPAMI 2023] SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections
Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention
Official implementation of Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion