Highlights
- Pro
Stars
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
ROCm / flash-attention
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
PixiEditor is a Universal Editor for all your 2D needs
Generative Models by Stability AI
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
[ICCV2025] "Di[M]O: Distilling Masked Diffusion Models into One-step Generator", Yuanzhi Zhu, Xi Wang, Stéphane Lathuilière, Vicky Kalogeiton
This repo provides a working re-implementation of Latent Adversarial Diffusion Distillation by AMD
MoDM is a cache-aware, hybrid serving system that accelerates image generation by dynamically combining small and large diffusion models for efficient, high-quality output.
The open-source CapCut alternative
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
[CVPR 2025 Highlight] VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
[ICCV2025]LazyMAR: Accelerating Masked Autoregressive Models via Feature Caching
Official Implementation of Diffusion Step Annealing (DiSA) in Autoregressive Image Generation
Janus-Series: Unified Multimodal Understanding and Generation Models
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache).
Official PyTorch implementation for "Effective and Efficient Masked Image Generation Models"
The official repo of continuous speculative decoding
12 Lessons to Get Started Building AI Agents
[NeurIPS 2024] ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.