Codebase for "On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback". This repo implements a generative multi-turn RL environment with support for agent, user, user feedback…

Python 22 4 Updated Dec 3, 2024

SihengLi99 / LLM-Honesty-Survey

[2025-TMLR] A Survey on the Honesty of Large Language Models

62 2 Updated Dec 8, 2024

julius-heitkoetter / deception

Python 4 Updated May 11, 2024

TimBMK / LLM-workshop-HI

Slides and Materials for a workshop on NLP, Transformers, LLMs and Agents. Taught at the University of Hildesheim, February 2025

Jupyter Notebook 6 Updated Feb 28, 2025

SALT-NLP / CoAnnotating

This is the official repository for "CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation"

Jupyter Notebook 22 2 Updated Oct 26, 2023

huashen218 / bidirectional-alignment-reading-list

ICLR 2025 Workshop & CHI 2025 SIG: "Bidirectional Human-AI Alignment"

44 Updated Aug 6, 2024

tml-epfl / icl-alignment

Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]

Python 31 3 Updated Jan 23, 2025

p-lambda / incontext-learning

Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

Python 106 14 Updated Nov 10, 2023

andyjliu / persona-steered-generation-bias

Code and data that can be used to replicate results for the ACL 2024 findings paper "Evaluating Large Language Model Biases in Persona-Steered Generation".

Jupyter Notebook 5 1 Updated May 31, 2024

mukobi / welfare-diplomacy

General-Sum variant of the game Diplomacy for evaluating AIs.

Python 31 7 Updated Apr 2, 2024

yunlong10 / Awesome-LLMs-for-Video-Understanding

🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.

2,895 131 Updated Oct 28, 2025

huggingface / trl

Train transformer language models with reinforcement learning.

Python 16,258 2,287 Updated Nov 11, 2025

zjunlp / LLMAgentPapers

Must-read Papers on LLM Agents.

2,762 161 Updated Oct 24, 2025

HumanSignal / Adala

Adala: Autonomous DAta (Labeling) Agent framework

Python 1,289 110 Updated Nov 10, 2025

Data-Provenance-Initiative / Data-Provenance-Collection

Jupyter Notebook 256 46 Updated Mar 26, 2025

anthropics / evals

307 34 Updated Jul 2, 2024

GoogleCloudPlatform / generative-ai

Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI

Jupyter Notebook 12,006 3,501 Updated Nov 11, 2025

google / sycophancy-intervention

Scripts for generating synthetic finetuning data for reducing sycophancy.

Python 117 16 Updated Aug 16, 2023

shivaomrani / LLM-Bias

Code implementation of the paper "Evaluating Biased Attitude Associations of Language Models in an Intersectional Context"

Python 3 Updated Jun 21, 2023

carlini / yet-another-applied-llm-benchmark

A benchmark to evaluate language models on questions I've previously asked them to solve.

Python 1,034 77 Updated Apr 27, 2025

tan92hl / Complex-Question-Answering-Evaluation-of-GPT-family

A large-scale complex question answering evaluation of ChatGPT and similar large-language models

Python 40 3 Updated Apr 23, 2024

ninodimontalcino / moralchoice

Evaluating the Moral Beliefs Encoded in LLMs

Python 31 6 Updated Dec 17, 2024

gesiscss / awesome-computational-social-science

A list of awesome resources for Computational Social Science

R 780 91 Updated Nov 9, 2025

stanford-crfm / helm

Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparen…

Python 2,537 340 Updated Nov 11, 2025

tatsu-lab / opinions_qa

Jupyter Notebook 116 17 Updated May 2, 2024