Highlights
- Pro
Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
Post-training with Tinker
Understanding R1-Zero-Like Training: A Critical Perspective
🔱 Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
Minimal reproduction of DeepSeek R1-Zero
Official implementation of On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression
Scalable RL solution for advanced reasoning of language models
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
AnchorAttention: Improved attention for LLMs long-context training
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
Benchmarking LLMs with Challenging Tasks from Real Users
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
neural-cognitive-models-for-human-decision-making
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)
Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)
This is work done by the Oxen.ai Community, trying to reproduce the Self-Rewarding Language Model paper from MetaAI.
Dromedary: towards helpful, ethical and reliable LLMs.
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI