Skip to content
View danjuan-77's full-sized avatar
🌴
On vacation
🌴
On vacation

Block or report danjuan-77

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

🎮 Agent System

Agents.
11 repositories

🧪 AI4X

Collections For AI For Science\Financial\Research...
1 repository

🤟 ASR

Collections for Automatic Speech Recognition
1 repository

📀 Audio&Sound&Music Generation

2 repositories

😎 Awesome Series

Collect Some Awesome Projects.
32 repositories

👓 CV

Collections for Computer Vision.
1 repository

📊 Dataset and Benchmark

Some collections for datasets and benchmarks.
38 repositories

🌈 Diffusion

Collections for diffusion methods & models.
2 repositories

Starred repositories

Showing results

🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org

Python 14,928 1,644 Updated Nov 30, 2025

一款提示词优化器,助力于编写高质量的提示词

TypeScript 17,417 2,161 Updated Oct 31, 2025

Latest Advances on Long Chain-of-Thought Reasoning

563 26 Updated Jul 18, 2025

"VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"

Python 287 40 Updated Oct 17, 2025

The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…

Python 4,725 437 Updated Nov 29, 2025

Omni Model Benchmark with high quality and diversity, which reveals the Compositional Law. We’re now focused on Chinese scenarios — and actively seeking partners to co-build English & multilingual …

72 Updated Nov 8, 2025

Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"

Python 95 4 Updated Oct 26, 2025

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,318 189 Updated Nov 19, 2025

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 425 23 Updated Nov 25, 2025

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 867 86 Updated Sep 20, 2025

Open-source framework for conversational voice AI agents

C 8,782 1,023 Updated Nov 30, 2025

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

Python 104 4 Updated Oct 30, 2025
Python 33 1 Updated Nov 4, 2025

Official Repository of UltraVoice

JavaScript 49 1 Updated Oct 28, 2025

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 2,399 289 Updated Nov 27, 2025

本人的科研经验

7,969 457 Updated Aug 12, 2025

Contexts Optical Compression

Python 21,023 1,857 Updated Oct 25, 2025

Trainging, inference, and testing of the SAC speech codec model.

Python 84 6 Updated Nov 1, 2025

Automatic Video Generation from Scientific Papers

Python 1,787 246 Updated Oct 20, 2025
Python 126 7 Updated Oct 13, 2025

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

Python 395 28 Updated Nov 27, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,994 177 Updated Oct 9, 2025

Data Pipeline, Models, and Benchmark for Omni-Captioner.

Python 92 Updated Oct 17, 2025

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

Python 2,947 163 Updated Jul 9, 2025

Awesome curated collection of images and prompts generated by gemini-2.5-flash-image (aka Nano Banana) state-of-the-art image generation and editing model. Explore AI generated visuals created with…

JavaScript 7,800 801 Updated Sep 8, 2025

Traceable TTS: Toward Watermark-Free TTS with Strong Traceability

Python 11 3 Updated Sep 4, 2025

A Survey of Reinforcement Learning for Large Reasoning Models

TeX 2,091 118 Updated Nov 9, 2025

Frontier Open-Source Text-to-Speech

10,088 1,300 Updated Sep 5, 2025

Mini-Omni-Reasoner: a real-time speech reasoning framework that interleaves silent reasoning tokens with spoken response tokens (“thinking-in-speaking”), exploiting the LLM–audio throughput gap to …

160 19 Updated Aug 26, 2025
Next