Lists (1)
Sort Name ascending (A-Z)
Stars
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation
Wan: Open and Advanced Large-Scale Video Generative Models
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
GenZI: Zero-Shot 3D Human-Scene Interaction Generation (CVPR 2024)
MikuDance: Animating Character Art with Mixed Motion Dynamics
Wan: Open and Advanced Large-Scale Video Generative Models
The source code of "DINet: deformation inpainting network for realistic face visually dubbing on high resolution video."
A generative world for general-purpose robotics & embodied AI learning.
Pandora: Towards General World Model with Natural Language Actions and Video States
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
A Collection of Awesome Large Weather Models (LWMs) | AI for Earth (AI4Earth) | AI for Science (AI4Science)
[CVPR 2025] A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[CVPR 2025] Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
[CVPR'25]Tora: Trajectory-oriented Diffusion Transformer for Video Generation
AlphaFold 3 inference pipeline.
DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance. [CVPR 2024] Official PyTorch implementation
High-resolution models for human tasks.
Course: Diffusion Generative AI for Computer Vision and Science
[AAAI 2025] Dynamic Protein Data Bank
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone