Stars
Dataflow-MM, multi-media operators for Dataflow. We aim to prepare data for Multimodal Large Language Models.
Text-to-text alignment algorithm for speech recognition error analysis.
MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation Model
yuekaizhang / Fun-ASR-vllm
Forked from FunAudioLLM/Fun-ASRFun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
[WIP] Official PyTorch code for CLSP: Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
Build resilient language agents as graphs.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
ESC-50: Dataset for Environmental Sound Classification
The First Systematic Vibe Coding Open-Source Tutorial | From Zero to Full-Stack, Empowering Everyone to Build Products with AI | Live at: www.vibevibe.cn ;首个系统化 Vibe Coding 开源教程 | 零基础到全栈实战,让人人都能用 A…
[ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Audio text dataset for pytorch training based on webdataset.
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
A lightweight, production-ready C++ library for LLM tokenization, fully compatible with HuggingFace tokenizer.json.
lonelygo / CosyVoice
Forked from FunAudioLLM/CosyVoiceMulti-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.
Fun-ASR-Nano-2512官方发布的仓库内容有点多,部署起来坑也比较多,本项目提供一个简化的部署方案。
获取微信4.0版本以上数据库密钥和图片密钥的工具 | A tool for obtaining database keys and image keys for WeChat versions 4.0 and above
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
Supercharge Your LLM with the Fastest KV Cache Layer
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
Train your own speech AI model from scratch