Skip to content
View enjoybo's full-sized avatar

Block or report enjoybo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 23,204 3,038 Updated Aug 15, 2024

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA

Python 1,626 81 Updated Sep 25, 2024

Character Animation (AnimateAnyone, Face Reenactment)

Python 3,469 285 Updated May 31, 2024

Unofficial Implementation of Animate Anyone

Python 2,935 242 Updated Jul 9, 2024

This is the official implementation of our paper: "MiniMax-Remover: Taming Bad Noise Helps Video Object Removal"

Python 505 48 Updated Jul 27, 2025

HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency

Python 755 47 Updated Dec 24, 2025

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 2,195 132 Updated Dec 26, 2025

Test-time Scaling for VAR models

Python 28 3 Updated Sep 19, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 18,342 2,044 Updated Dec 23, 2025

[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Python 540 43 Updated Oct 29, 2025

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

Python 795 99 Updated Dec 17, 2025

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,026 239 Updated Nov 30, 2025

VAE modified from Descript Audio Codec, which replaces the RVQ with VAE

Python 87 8 Updated Apr 2, 2024

🎬IMAGEdit🎬:Let Any Subject transform.It is a training-free and plug-and-play framework that aligns prompts and retargets masks to enable any-subject video editing.

Python 80 3 Updated Oct 14, 2025

[CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key

Python 96 3 Updated Dec 7, 2025

Open-source unified multimodal model

Python 5,512 481 Updated Oct 27, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,858 1,087 Updated Dec 26, 2025

Light Video Generation Inference Framework

Python 1,510 100 Updated Dec 26, 2025

Kandinsky 5.0: A family of diffusion models for Video & Image generation

Python 643 42 Updated Dec 22, 2025

Official repository for the paper “Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models”

Python 24 3 Updated Nov 5, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 13,833 2,038 Updated Dec 21, 2025

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,465 180 Updated Mar 28, 2025

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 66,856 9,546 Updated Dec 23, 2025
Python 1,473 155 Updated Nov 15, 2025

Open-Source Frontier Voice AI

Python 19,058 2,107 Updated Dec 17, 2025
Python 34 3 Updated Sep 1, 2025

LongLive: Real-time Interactive Long Video Generation

Python 926 63 Updated Dec 4, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,167 194 Updated Oct 9, 2025

This project is based on the [LTX-Video](https://github.com/Lightricks/LTX-Video) algorithm of the diffusers and optimized and accelerated for multi GPUs inference using the [xDiT](https://github.c…

Python 11 3 Updated Dec 31, 2024

[ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding

Python 63 2 Updated Jul 10, 2025
Next