Highlights
- Pro
Stars
Data and code behind the articles and graphics at FiveThirtyEight
The “Agentic Cookbook for Generative AI Agent usage” is a comprehensive guide designed to empower users with the knowledge and tools to effectively implement and utilize Generative AI Agents within…
A Python script that automatically checks in to your Southwest flight 24 hours beforehand.
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
get things from one computer to another, safely
LLM (Large Language Model) FineTuning
Inspect: A framework for large language model evaluations
Improving Alignment and Robustness with Circuit Breakers
Representation Engineering: A Top-Down Approach to AI Transparency
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning method which reduces LLM performance on WMDP while retaining …
Universal Notation for Tensor Operations in Python.