Lists (1)
Sort Name ascending (A-Z)
Starred repositories
This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs. Demos, technical insights and experimental results are presented on
Paper list for Efficient Reasoning.
AutoThink is a reinforcement learning framework designed to equip R1-style language models with adaptive reasoning capabilities. Instead of always thinking or never thinking, the model learns when …
MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
[TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
Solve Visual Understanding with Reinforced VLMs
This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reas…
A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenization stage.
[Lumina Embodied AI] 具身智能技术指南 Embodied-AI-Guide
Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains papers, codes, datasets, evaluations, and analyses.
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Witness the aha moment of VLM with less than $3.
A fork to add multimodal model training to open-r1