Stars
[SIGGRAPH Asia 2025] Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization
PantoMatrix: Generating Face and Body Animation from Speech
Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"
🤗A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.
MiroThinker is an open-source search agent model, built for tool-augmented reasoning and real-world information seeking, aiming to match the deep research experience of OpenAI Deep Research and Gem…
UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.
[AAAI 2026] EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
[SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"
VideoCoF: Unified Video Editing with Temporal Reasoner
[CVPR'24 Highlight] Official PyTorch implementation of CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
[AAAI 2026] Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback
The Best Agent Harness. Meet Sisyphus: The Batteries-Included Agent that codes like you.
This project is the official implementation of 'DreamOmni3: Scribble-based Editing and Generation''
RePlan: Reasoning-Guided Region Planning for Complex Instruction-Based Image Editing
HY-Motion model for 3D character animation generation.
红墨 - 基于🍌Nano Banana Pro🍌 的一站式小红书图文生成器 《一句话一张图片生成小红书图文》 Red Ink - A one-stop Xiaohongshu image-and-text generator based on the 🍌Nano Banana Pro🍌, "One Sentence, One Image: Generate Xiaohongshu Text …
MAI-UI: Real-World Centric Foundation GUI Agents ranging from 2B to 235B
Framework for building conversational agents using a Finite State Machine (FSM) and LLMs
Official code for StoryMem: Multi-shot Long Video Storytelling with Memory
Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.
🦋 An Infographic Generation and Rendering Framework, bring words to life with AI!
AG-UI: the Agent-User Interaction Protocol. Bring Agents into Frontend Applications.