Skip to content
View WhiteFu's full-sized avatar

Block or report WhiteFu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
Python 327 18 Updated May 31, 2025

UTokyo-SaruLab MOS Prediction System

Python 265 27 Updated Oct 12, 2025

This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs. Demos, technical insights and experimental results are presented on

Python 82 5 Updated Sep 19, 2025

Paper list for Efficient Reasoning.

736 27 Updated Nov 20, 2025
Python 6 Updated Aug 30, 2025

AutoThink is a reinforcement learning framework designed to equip R1-style language models with adaptive reasoning capabilities. Instead of always thinking or never thinking, the model learns when …

Python 42 3 Updated Oct 14, 2025

MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…

Python 1,039 91 Updated Nov 4, 2025

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 724 94 Updated Nov 12, 2025

SOTA search powered LLM

Python 3,726 343 Updated Apr 4, 2025

✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Python 306 29 Updated May 14, 2025

[TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

699 34 Updated Oct 20, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,368 317 Updated Jun 21, 2025

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,279 58 Updated Nov 16, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,144 311 Updated Nov 27, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,820 301 Updated Jun 12, 2025

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Python 3,570 302 Updated Nov 13, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,721 371 Updated Oct 21, 2025
Python 6,035 465 Updated Aug 29, 2025

This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reas…

Python 730 19 Updated Sep 10, 2025

A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenization stage.

Python 99 11 Updated Sep 3, 2025

[Lumina Embodied AI] 具身智能技术指南 Embodied-AI-Guide

9,226 616 Updated Sep 22, 2025

Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains papers, codes, datasets, evaluations, and analyses.

255 9 Updated Aug 13, 2025

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Python 974 66 Updated Nov 25, 2025

Witness the aha moment of VLM with less than $3.

Python 3,993 291 Updated May 19, 2025

A fork to add multimodal model training to open-r1

Python 1,423 69 Updated Feb 8, 2025

Reproduce R1 Zero on Logic Puzzle

Python 2,415 162 Updated Mar 20, 2025

s1: Simple test-time scaling

Python 6,607 763 Updated Jun 25, 2025
Next