- NYC
- @batznerjan
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Inference service for Qwen2.5-VL-7b model
A sandbox environment for studying how AI agents shop online, featuring controllable e-commerce experiments with vision-language models
Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)
Codebase for "On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback". This repo implements a generative multi-turn RL environment with support for agent, user, user feedback…
[2025-TMLR] A Survey on the Honesty of Large Language Models
Slides and Materials for a workshop on NLP, Transformers, LLMs and Agents. Taught at the University of Hildesheim, February 2025
This is the official repository for "CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation"
ICLR 2025 Workshop & CHI 2025 SIG: "Bidirectional Human-AI Alignment"
Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]
Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"
Code and data that can be used to replicate results for the ACL 2024 findings paper "Evaluating Large Language Model Biases in Persona-Steered Generation".
General-Sum variant of the game Diplomacy for evaluating AIs.
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
Train transformer language models with reinforcement learning.
Adala: Autonomous DAta (Labeling) Agent framework
Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI
Scripts for generating synthetic finetuning data for reducing sycophancy.
Code implementation of the paper "Evaluating Biased Attitude Associations of Language Models in an Intersectional Context"
A benchmark to evaluate language models on questions I've previously asked them to solve.
A large-scale complex question answering evaluation of ChatGPT and similar large-language models
Evaluating the Moral Beliefs Encoded in LLMs
A list of awesome resources for Computational Social Science
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparen…