Stars
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enabling layer-wise analysis of hidden states and predictions.
A library for mechanistic interpretability of GPT-style language models
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparen…
[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
[NeurIPS 2025] Reasoning Models Better Express Their Confidence"
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
🦜🔗 The platform for reliable agents.
code released for our ICML 2020 paper "Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation"
Recipes to scale inference-time compute of open models
Collection of awesome test-time (domain/batch/instance) adaptation methods
LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty.
A algebraic word problem dataset, with multiple choice questions annotated with rationales.
A framework for few-shot evaluation of language models.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension
A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset
An extremely fast Python package and project manager, written in Rust.