Lists (2)
Sort Name ascending (A-Z)
Stars
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"
Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey".
A generative speech model for daily dialogue.
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation
[ICASSP 2025] FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion
PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.
A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning. TPAMI, 2024.
Align Anything: Training All-modality Model with Feedback
Official [AAAI] Code Repository for "Continual Learning with Scaled Gradient Projection".
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
Calculating the actual value of your job beyond just salary
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
A python package to analyze and compare voices with deep learning
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
A high-throughput and memory-efficient inference and serving engine for LLMs
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
This may be the simplest implement of DDPM. You can directly run Main.py to train the UNet on CIFAR-10 dataset and see the amazing process of denoising.
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。