-
Computer of Science and Technology Beijing
Lists (3)
Sort Name ascending (A-Z)
Stars
High-Resolution Image Synthesis with Latent Diffusion Models
verl: Volcano Engine Reinforcement Learning for LLMs
[NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Compute WER and SER for speech recognition evaluation
MiMo-Audio: Audio Language Models are Few-Shot Learners
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
[TMLR 2025🔥] A survey for the autoregressive models in vision.
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
Text-audio foundation model from Boson AI
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.
Bert-VITS2项目bug多且教程不友好。本proj尽可能修复了Bert-vits2项目的bug,并且可一键启动训练。仅需50条目标说话人语音,获得稳定、快速的TTS模型。
An easy-to-use, fast, and easily integrable tool for evaluating audio LLM
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
SGLang is a fast serving framework for large language models and vision language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
Mddct / maxtext
Forked from AI-Hypercomputer/maxtextDH long context env in jax