-
Alibaba Group
- Shanghai
-
05:56
(UTC +08:00) - https://www.meta-speech.com
- https://scholar.google.com/citations?view_op=list_works&hl=zh-CN&hl=zh-CN&user=5gW9zlMAAAAJ
Stars
HunyuanVideo-1.5: A leading lightweight video generation model
A high-throughput and memory-efficient inference and serving engine for LLMs
A list of publically available audio data that anyone can download for ASR or other speech activities
Build resilient language agents as graphs.
ASR online decoding using Kaldi NNet3 GrammarFST
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Semantic Voice Activity Detection adds an lightweight LLM prediction model to continuously evaluate whether a user has really finished speaking.
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
Official PyTorch implementation of BigVGAN (ICLR 2023)
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
Official electron build of draw.io
zzhdbw / Spark-TTS
Forked from SparkAudio/Spark-TTSSpark-TTS Inference Code
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
一个面向中文文本纠错任务的综合平台,集学术研究、模型训练、模型评测和推理部署于一体,覆盖拼写纠错与语法纠错两个核心方向。
Code for Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition
LLM-based ASR recipe with Zipformer encoder and Qwen LLM
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…