Stars
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
DiFlow-TTS delivers low-latency zero-shot TTS via discrete flow matching and factorized speech tokens. A compact, open framework for fast voice synthesis.🐙
High-performance Image Tokenizers for VAR and AR
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Advanced GRAG implementation for ComfyUI with beginner-friendly and expert modes
https://little-misfit.github.io/GRAG-Image-Editing/
Official implementation of "Continuous Autoregressive Language Models"
FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
ARTalk generates realistic 3D head motions (lip sync, blinking, expressions, head poses) from audio in ⚡ real-time ⚡.
A collection of awesome text-to-image generation studies.
PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437
This is a repo to track the latest autoregressive visual generation papers.
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
[ICCV 25]SpectralAR: Spectral Autoregressive Visual Generation
[CVPR] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantization
Implementation of "Hyperspherical Latents Improve Continuous-Token Autoregressive"
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
Meaningful titles for tabs and PDF downloads! Also supports tab search.
😎 Awesome lists about all kinds of interesting topics
A curated list of reinforcement learning with human feedback resources (continually updated)
A curated list of awesome autoregressive papers in Generative AI
Foundational Models for State-of-the-Art Speech and Text Translation
Foundation Models and Data for Human-Human and Human-AI interactions.
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Official PyTorch implementation of BigVGAN (ICLR 2023)
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"