Lists (1)
Sort Name ascending (A-Z)
Starred repositories
A series of technical report on Slow Thinking with LLM
Seed-Coder is a family of lightweight open-source code LLMs comprising base, instruct and reasoning models, developed by ByteDance Seed.
PPO x Family DRL Tutorial Course(决策智能入门级公开课:8节课帮你盘清算法理论,理顺代码逻辑,玩转决策AI应用实践 )
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
🔍 Search-o1: Agentic Search-Enhanced Large Reasoning Models [EMNLP 2025]
Secrets of RLHF in Large Language Models Part I: PPO
Latest Advances on System-2 Reasoning
Exploring Applications of GRPO
Fully open reproduction of DeepSeek-R1
Integrate the DeepSeek API into popular softwares
推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.
LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)
计算广告机制策略相关材料整理(A collection of research and application papers about Strategy in Internet advertising.)
The official implementation of Self-Play Fine-Tuning (SPIN)
搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)
All-in-One: Text Embedding, Retrieval, Reranking and RAG in Transformers
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
https://acl2023-retrieval-lm.github.io/
Official Code for Stable Cascade
Empower Large Language Models (LLM) using Knowledge Graph based Retrieval-Augmented Generation (KG-RAG) for knowledge intensive tasks
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
✨✨Latest Advances on Multimodal Large Language Models
Official repo for consistency models.