Skip to content
View Zth9730's full-sized avatar
🥬
Ataraxy
🥬
Ataraxy
  • Computer of Science and Technology Beijing

Block or report Zth9730

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook 13,606 1,697 Updated Feb 29, 2024

dLLM: Simple Diffusion Language Modeling

Python 1,042 114 Updated Nov 28, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 16,849 2,681 Updated Nov 29, 2025

[NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.

Python 179 12 Updated Oct 16, 2025

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,598 45 Updated Nov 15, 2025

Fast and memory-efficient exact kmeans

Python 128 8 Updated Nov 11, 2025

Compute WER and SER for speech recognition evaluation

Python 15 1 Updated Nov 12, 2025

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 866 86 Updated Sep 20, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,991 177 Updated Oct 9, 2025

个人构建MoE大模型:从预训练到DPO的完整实践

Python 1,874 144 Updated Nov 5, 2025

[TMLR 2025🔥] A survey for the autoregressive models in vision.

753 21 Updated Nov 8, 2025

FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.

Python 215 20 Updated Nov 11, 2025

Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"

Python 128 15 Updated Jun 3, 2025
Python 288 36 Updated Jul 22, 2025

Text-audio foundation model from Boson AI

Python 7,672 568 Updated Sep 15, 2025

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,239 91 Updated Sep 22, 2025

[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".

Python 195 8 Updated Jun 18, 2025

Update ASR paper everyday

Python 382 18 Updated Nov 29, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,865 1,551 Updated Nov 28, 2025

A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.

Python 223 29 Updated Aug 6, 2025
Python 33 4 Updated Sep 6, 2025

Bert-VITS2项目bug多且教程不友好。本proj尽可能修复了Bert-vits2项目的bug,并且可一键启动训练。仅需50条目标说话人语音,获得稳定、快速的TTS模型。

Python 65 9 Updated Aug 19, 2025

An easy-to-use, fast, and easily integrable tool for evaluating audio LLM

Python 167 9 Updated Nov 27, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,777 263 Updated Sep 25, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 63,256 7,649 Updated Nov 27, 2025

(WIP)long form speech generatoins

Python 31 4 Updated Apr 2, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 20,494 3,558 Updated Nov 29, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 64,200 11,628 Updated Nov 29, 2025

DH long context env in jax

Python 5 Updated Nov 13, 2024
Next