Skip to content
View rezzsl's full-sized avatar

Highlights

  • Pro

Block or report rezzsl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Trainging, inference, and testing of the SAC speech codec model.

Python 18 2 Updated Oct 21, 2025

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 30,961 3,563 Updated Oct 21, 2025

LongCat Audio Tokenizer and Detokenizer

Python 166 10 Updated Oct 20, 2025

The best ChatGPT that $100 can buy.

Python 29,626 3,086 Updated Oct 21, 2025

PyTorch media decoding and encoding

Python 754 66 Updated Oct 21, 2025

Game-Time: Evaluating Temporal Dynamics in Spoken Language Models

4 Updated Oct 7, 2025

Python - 100天从新手到大师

Jupyter Notebook 173,545 54,745 Updated Mar 28, 2025
Python 147 20 Updated Oct 1, 2025

We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction

Python 64 6 Updated Oct 15, 2025
Python 116 23 Updated Apr 24, 2023

🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data

TypeScript 64,044 5,090 Updated Oct 21, 2025

Data Pipeline, Models, and Benchmark for Omni-Captioner.

Python 67 Updated Oct 17, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,704 145 Updated Oct 9, 2025

MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining

Python 1,599 67 Updated Jun 5, 2025

🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.

JavaScript 21,544 2,144 Updated Oct 17, 2025
Python 4 Updated Jul 11, 2025

Official code of ICML 2025 paper "NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction"

Python 127 21 Updated Sep 14, 2025

A complete computer science study plan to become a software engineer.

331,599 80,995 Updated Aug 28, 2025

HighRateMOS is the first non-intrusive MOS prediction model that explicitly models sampling rates, achieving first place in five out of eight metrics in AudioMOS Challenge 2025 Track3.

9 Updated Sep 15, 2025
Python 19 Updated Sep 15, 2025

Official baseline for ICASSP 2026 URGENT Challenge Track 2 (Speech Quality Assessment)

Python 18 2 Updated Sep 20, 2025

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 329 15 Updated Sep 15, 2025

This repository provides a benchmark for prompt injection attacks and defenses

Python 306 43 Updated Oct 16, 2025

Model analyzer in PyTorch

Python 90 12 Updated Aug 31, 2025

(ICASSP 2025, official code)FlowSE: Flow Matching-based Speech Enhancement

Python 68 3 Updated Jul 23, 2025

Extract phoneme-level timestamps from speeh audio.

Python 81 8 Updated Oct 17, 2025

A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models

Python 92 4 Updated Sep 21, 2025

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

Python 92 7 Updated Aug 1, 2025

Official implementation of the paper "Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition"

Python 6 Updated Feb 23, 2024

Make text LLMs listen and speak

Python 918 164 Updated Oct 16, 2025
Next