-
SJTU X-LANCE & BIGAI NLCo
- 中国
-
20:47
(UTC -12:00) - https://danjuan-77.github.io/
Lists (28)
Sort Name ascending (A-Z)
🎮 Agent System
Agents.🧪 AI4X
Collections For AI For Science\Financial\Research...🤟 ASR
Collections for Automatic Speech Recognition📀 Audio&Sound&Music Generation
😎 Awesome Series
Collect Some Awesome Projects.👓 CV
Collections for Computer Vision.📊 Dataset and Benchmark
Some collections for datasets and benchmarks.🌈 Diffusion
Collections for diffusion methods & models.🧱✨ Great Framework
Collections for some great frameworks.🎥 Image&Video Generation
Collections for Image&Video Generation.👷🏻♂️ Infra&Accelerate
Infrastructure for LLM serving, training, and inference.🔥 LALM
Collections of Large Language Audio Models.👩🚀 LLM
Collections for LLMs.🌷 MultiModal Generation
Collections for Multi-Modal Generation, eg, video and audio generation together.🤖 Omni&Multi Modality
Omni&Multi Modality Projects🗃️ RAG
Collections for RAG.🤔 Reasoning
Reasoning models and projects.💥 RL
Collections for RL.💬 SLM
Speech Language Models eg, slam-omni.🔊 Spatial Audio
Collections for Spatial Audio.👄 Speech2Speech Translation
Collections for speech-to-speech translations.📂 Survey&Papers
Some surveys and paper collections.🧩 Tokenizer&Codec
Some tokenizer modeling and audio codecs.🪀 Toys&Tools
Some good Tools or Funny Toys.🎧 TTS
TTS Projects📚 Tutorial
Some Open Source Tutorials.🌍 Understanding The World
Collections about how LLMs understand the world and improve their ability to understand the world.🥂 Video-Audio Gen
Starred repositories
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org
Latest Advances on Long Chain-of-Thought Reasoning
"VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
Omni Model Benchmark with high quality and diversity, which reveals the Compositional Law. We’re now focused on Chinese scenarios — and actively seeking partners to co-build English & multilingual …
Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
MiMo-Audio: Audio Language Models are Few-Shot Learners
Open-source framework for conversational voice AI agents
UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
Trainging, inference, and testing of the SAC speech codec model.
Automatic Video Generation from Scientific Papers
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Data Pipeline, Models, and Benchmark for Omni-Captioner.
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).
Awesome curated collection of images and prompts generated by gemini-2.5-flash-image (aka Nano Banana) state-of-the-art image generation and editing model. Explore AI generated visuals created with…
Traceable TTS: Toward Watermark-Free TTS with Strong Traceability
A Survey of Reinforcement Learning for Large Reasoning Models
Mini-Omni-Reasoner: a real-time speech reasoning framework that interleaves silent reasoning tokens with spoken response tokens (“thinking-in-speaking”), exploiting the LLM–audio throughput gap to …