-
CUHK-Shenzhen
- Shenzhen
-
06:47
(UTC +08:00) - https://ajyy.github.io
Starred repositories
2026 AI/ML internship & new graduate job list updated daily
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.
An easy-to-use, fast, and easily integrable tool for evaluating audio LLM
Wan: Open and Advanced Large-Scale Video Generative Models
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
A collection of full time roles in SWE, Quant, and PM for new grads.
RealSI: Open Benchmark for Simultaneous Interpretation in Real-world Scenarios
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
A bibliography and survey of the papers surrounding o1
A python module to repair invalid JSON from LLMs
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Collection of works for evaluating (and analyzing) large audio-language models (LALMs)
[NeurIPS 2025 D&B] Open-source Multi-agent Poster Generation from Papers
Unified automatic quality assessment for speech, music, and sound.
[TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓
A Conversational Speech Generation Model
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
🧑🚀 全世界最好的LLM资料总结(语音视频生成、Agent、辅助编程、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.
Fully open reproduction of DeepSeek-R1
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)