enjoybo

enjoybo

3 followers · 105 following

Stars

karpathy / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 23,204 3,038 Updated Aug 15, 2024

dvlab-research / ControlNeXt

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA

Python 1,626 81 Updated Sep 25, 2024

MooreThreads / Moore-AnimateAnyone

Character Animation (AnimateAnyone, Face Reenactment)

Python 3,469 285 Updated May 31, 2024

guoqincode / Open-AnimateAnyone

Unofficial Implementation of Animate Anyone

Python 2,935 242 Updated Jul 9, 2024

zibojia / MiniMax-Remover

This is the official implementation of our paper: "MiniMax-Remover: Taming Bad Noise Helps Video Object Removal"

Python 505 48 Updated Jul 27, 2025

Tencent-Hunyuan / HY-WorldPlay

HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency

Python 755 47 Updated Dec 24, 2025

thu-ml / TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 2,195 132 Updated Dec 26, 2025

ali-vilab / TTS-VAR

Test-time Scaling for VAR models

Python 28 3 Updated Sep 19, 2025

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 18,342 2,044 Updated Dec 23, 2025

EzioBy / Ditto

[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Python 540 43 Updated Oct 29, 2025

zai-org / GLM-TTS

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

Python 795 99 Updated Dec 17, 2025

hkchengrex / MMAudio

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,026 239 Updated Nov 30, 2025

innnky / descript-audio-vae

VAE modified from Descript Audio Codec, which replaces the RVQ with VAE

Python 87 8 Updated Apr 2, 2024

XWH-A / IMAGEdit

🎬IMAGEdit🎬:Let Any Subject transform.It is a training-free and plug-and-play framework that aligns prompts and retargets masks to enable any-subject video editing.

Python 80 3 Updated Oct 14, 2025

zhyang2226 / OPA-DPO

[CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key

Python 96 3 Updated Dec 7, 2025

ByteDance-Seed / Bagel

Open-source unified multimodal model

Python 5,512 481 Updated Oct 27, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,858 1,087 Updated Dec 26, 2025

ModelTC / LightX2V

Light Video Generation Inference Framework

Python 1,510 100 Updated Dec 26, 2025

kandinskylab / kandinsky-5

Kandinsky 5.0: A family of diffusion models for Video & Image generation

Python 643 42 Updated Dec 22, 2025

zfkarl / UniFER

Official repository for the paper “Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models”

Python 24 3 Updated Nov 5, 2025

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 13,833 2,038 Updated Dec 21, 2025

VITA-MLLM / VITA

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,465 180 Updated Mar 28, 2025

PaddlePaddle / PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 66,856 9,546 Updated Dec 23, 2025

character-ai / Ovi

Python 1,473 155 Updated Nov 15, 2025

microsoft / VibeVoice

Open-Source Frontier Voice AI

Python 19,058 2,107 Updated Dec 17, 2025

bala1144 / 3DiFACE

Python 34 3 Updated Sep 1, 2025

NVlabs / LongLive

LongLive: Real-time Interactive Long Video Generation

Python 926 63 Updated Dec 4, 2025

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,167 194 Updated Oct 9, 2025

zishen-ucap / LTX-Video-xDiT

This project is based on the [LTX-Video](https://github.com/Lightricks/LTX-Video) algorithm of the diffusers and optimized and accelerated for multi GPUs inference using the [xDiT](https://github.c…

Python 11 3 Updated Dec 31, 2024

KlingTeam / MODA

[ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding

Python 63 2 Updated Jul 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly