Skip to content
View yl4579's full-sized avatar
  • Columbia University
  • New York, US

Highlights

  • Pro

Block or report yl4579

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Conversational Speech Generation Model

Python 14,490 1,460 Updated May 27, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 673 48 Updated Jun 5, 2025

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 654 52 Updated Jan 21, 2026

Large Concept Models: Language modeling in a sentence representation space

Python 2,333 208 Updated Jan 29, 2025

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 94 4 Updated Dec 3, 2024

Awesome-LLM: a curated list of Large Language Model

26,258 2,293 Updated Jul 31, 2025

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

Python 107 8 Updated Aug 1, 2025

SOTA Open Source TTS

Python 24,906 2,070 Updated Feb 2, 2026

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Python 1,282 149 Updated Feb 18, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 14,089 2,078 Updated Feb 16, 2026

SALMONN family: A suite of advanced multi-modal LLMs

1,391 112 Updated Feb 3, 2026

An Open-Sourced LLM-empowered Foundation TTS System

Python 901 83 Updated Sep 28, 2025
Python 154 8 Updated Nov 22, 2024

LLM101n: Let's build a Storyteller

36,335 1,973 Updated Aug 1, 2024

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

Python 328 25 Updated Dec 17, 2025

Encode and decode audio samples to/from compressed latent representations!

Python 246 25 Updated Sep 19, 2025

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 213 17 Updated Sep 19, 2024

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,638 881 Updated Feb 12, 2026

The open source code for SimpleSpeech series

Python 145 11 Updated Oct 8, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,123 222 Updated May 19, 2025

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Python 147 17 Updated Jan 1, 2025

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,487 182 Updated Mar 28, 2025

Evaluation Protocol for Large-Scale Zero-Shot TTS Literature

Python 93 11 Updated Mar 12, 2025

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 291 22 Updated Oct 12, 2025

Audio Large Language Models

Python 872 44 Updated Jul 5, 2025

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 2,053 160 Updated Apr 21, 2025

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,875 88 Updated Jan 8, 2026

[ACMMM'2024] Generative Expressive Conversational Speech Synthesis

44 2 Updated Oct 28, 2024
Next