-
Ant Group
- Hangzhou, China
-
23:11
(UTC +08:00) - https://zengyh1900.github.io/
- @zengyh1900
Highlights
- Pro
Lists (7)
Sort Name ascending (A-Z)
Starred repositories
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
📷 Camera-controlled text-to-video generation, now with intrinsics, distortion and orientation control!
[ICLR'24] GTA: A Geometry-Aware Attention Mechanism for Multi-view Transformers
The official implementation of InfiniteVGGT
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
[ICLR'25] SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
PyTorch code and models for VJEPA2 self-supervised learning from video.
A list of works on video generation towards world model
Post-training with Tinker
Official code of Motus: A Unified Latent Action World Model
Train transformer language models with reinforcement learning.
Benchmarking Knowledge Transfer in Lifelong Robot Learning
RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
AGENTS.md — a simple, open format for guiding coding agents
HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
[ICML 2025] Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences
[CVPR 2025 Highlight] InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Official Implementations for Paper - MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various r…
SGLang is a high-performance serving framework for large language models and multimodal models.
LongLive: Real-time Interactive Long Video Generation