Lists (1)
Sort Name ascending (A-Z)
Stars
A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Cook up amazing multimodal AI applications effortlessly with MiniCPM-o
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
VoiceBench: Benchmarking LLM-Based Voice Assistants
A toolkit for speaker diarization.
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Authenticator generates 2-Step Verification codes in your browser.
"Your Fully-Automated Personal AI Assistant"
Toolkit for linearizing PDFs for LLM datasets/training
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
🔊 Text-Prompted Generative Audio Model
[ICASSP 2024] TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
JerryWu-code / TinyZero
Forked from Jiayi-Pan/TinyZeroDeepseek R1 zero tiny version own reproduce on two A100s.