Stars
Official Repository of Paper: "Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios"(AAAI 2026)
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Public repository of the Micro QuickJS Javascript Engine
Public repository of the QuickJS Javascript Engine.
SoFlow: Solution Flow Models for One-Step Generative Modeling
The official implementation of HierSpeech++
Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"
A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)
Unsupervised Speech Decomposition Via Triple Information Bottleneck
Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[Paper][AAAI 2025] (MyGO)Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation
✨ Agentic IM ChatBot Infrastructure — 聊天智能体基础设施 ✨ 多消息平台集成(QQ / Telegram / 企微 / 飞书 / 钉钉等),强大易用的插件系统,支持 OpenAI / Gemini / Anthropic / Dify / Coze / 阿里云百炼 / 知识库 / Agent 智能体
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
Multilingual TTS model with voice cloning and duration control, based on T5Gemma encoder-decoder LLM
A meta-language for Go that adds Result types, error propagation (?), and pattern matching while maintaining 100% Go ecosystem compatibility
ylzz1997 / MyGO
Forked from golang/goThe Custom Go programming language for scientific/mathematics computing !!!!
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Multimodal speech quality QA system that turns objective assessment into a natural-language task using audio encoders (AST/Whisper) and a LLaMA-based “quality expert” to predict MOS, dimension-wise…
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
An Open-Ended Embodied Agent with Large Language Models