- Sushi Land
- https://www.tobyc.graphics
Stars
Official PyTorch Implementation of "Latent Diffusion Model Without Variational Autoencoder".
[ICCV 2025] What we need is explicit controllability: Training 3D gaze estimator using only facial images.
[ECCV 2024 Oral 🔥] Arc2Face: A Foundation Model for ID-Consistent Human Faces ------------------------ [ICCVW 2025] ID-Consistent, Precise Expression Generation with Blendshape-Guided Diffusion
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
Official code of "UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models"
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
A complete head tracking pipeline from videos to NeRF/3DGS-ready datasets.
SPG:Style-Prompting Guidance for Style-Specific Content Creation
Official repo for FaceShot: Bring Any Character into Life
[ICCV 2025 Oral] DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior
We introduce AI-Face, the first million-scale AI-generated face dataset with demographic annotations, and conduct a comprehensive fairness benchmark. Our work has been accepted at CVPR 2025.
Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching
An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.
[ACM MM24] Official implementation of ACM MM 2024 paper: "ZePo: Zero-Shot Portrait Stylization with Faster Sampling"
[CVPR-2025] The official code of HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
[SIGGRAPH 2025] One Model to Rig Them All: Diverse Skeleton Rigging with UniRig
[ICME 2025] DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation
The source code of the paper "RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos"
[IJCV 2025] Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait
SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers
An multi-platform GUI library for Python based on Dear ImGui with a lot of customization possibilities.
Code for NeurIPS 2024 paper - The GAN is dead; long live the GAN! A Modern Baseline GAN - by Huang et al.
[Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models