Skip to content
View Dongru1's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report Dongru1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Text-to-Audio/Music Generation

Python 2,554 202 Updated Sep 29, 2024

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,396 337 Updated Jan 5, 2026

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

Python 68 10 Updated Dec 13, 2021

UTokyo-SaruLab MOS Prediction System

Python 280 28 Updated Dec 18, 2025

Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"

Python 141 16 Updated Jun 3, 2025

Qualifying Exam Preparing

16 Updated May 7, 2025

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 19,473 2,087 Updated Oct 21, 2025

This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey".

201 10 Updated Jan 5, 2026

A generative speech model for daily dialogue.

Python 38,485 4,186 Updated Dec 3, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,957 291 Updated Jan 5, 2026

A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation

Python 254 11 Updated Nov 30, 2025

[ICASSP 2025] FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion

Python 89 11 Updated Jul 23, 2025

PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.

Python 4,375 278 Updated Jan 1, 2026

A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning. TPAMI, 2024.

345 18 Updated Jan 2, 2026

Align Anything: Training All-modality Model with Feedback

Python 4,616 507 Updated Nov 27, 2025

Official [AAAI] Code Repository for "Continual Learning with Scaled Gradient Projection".

Python 15 1 Updated Jun 28, 2023

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。

Python 41,217 4,087 Updated Nov 20, 2025

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,380 1,344 Updated Jul 9, 2025

Calculating the actual value of your job beyond just salary

TypeScript 2,986 188 Updated Dec 8, 2025

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

Python 162 8 Updated Jun 8, 2024

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 283 21 Updated Oct 12, 2025

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

Python 36,055 10,900 Updated Nov 15, 2025

A python package to analyze and compare voices with deep learning

Python 3,202 476 Updated Oct 12, 2023

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 493 66 Updated Dec 22, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 67,159 12,483 Updated Jan 9, 2026

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 23,263 3,056 Updated Aug 15, 2024

This may be the simplest implement of DDPM. You can directly run Main.py to train the UNet on CIFAR-10 dataset and see the amazing process of denoising.

Python 2,127 221 Updated Apr 24, 2023

https://hf.co/hexgrad/Kokoro-82M

JavaScript 5,279 601 Updated Aug 6, 2025

End-to-End Speech Processing Toolkit

Python 9,682 2,370 Updated Dec 16, 2025

pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。

Python 6,331 1,162 Updated Jan 6, 2026
Next