-
Shanghai Jiao Tong University
- Shanghai
- https://scholar.google.com/citations?user=6aARLhMAAAAJ&hl=zh-CN
Starred repositories
🛠️ DeepAgent: A General Reasoning Agent with Scalable Toolsets
🚀 LLM-I: Transform LLMs into natural interleaved multimodal creators! ✨ Tool-use framework supporting image search, generation, code execution & editing
MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Reference PyTorch implementation and models for DINOv3
Build memory-native AI agents with Memory OS — an open-source framework for long-term memory, retrieval, and adaptive learning in large language models. Agent Memory | Memory System | Memory Manage…
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
A Python toolkit for Machine Learning (ML) practices for Combinatorial Optimization (CO).
Official implementation of ICLR 2025 paper: "Unify ML4TSP: Drawing Methodological Principles for TSP and Beyond from Streamlined Design Space of Learning and Search".
MoBA: Mixture of Block Attention for Long-Context LLMs
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
DeepEP: an efficient expert-parallel communication library
Fully open reproduction of DeepSeek-R1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Ongoing research training transformer models at scale
Example models using DeepSpeed
Official implementation for "TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables" (NeurIPS 2024)
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
[TMLR 2025🔥] A survey for the autoregressive models in vision.