Skip to content
View janbatzner's full-sized avatar
🥨
🥨

Highlights

  • Pro

Block or report janbatzner

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Inference service for Qwen2.5-VL-7b model

Python 204 80 Updated Mar 24, 2025
Python 3 2 Updated Nov 9, 2025

A sandbox environment for studying how AI agents shop online, featuring controllable e-commerce experiments with vision-language models

Python 9 2 Updated Sep 26, 2025

Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)

Python 26 1 Updated Jun 27, 2024

Codebase for "On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback". This repo implements a generative multi-turn RL environment with support for agent, user, user feedback…

Python 22 4 Updated Dec 3, 2024

[2025-TMLR] A Survey on the Honesty of Large Language Models

62 2 Updated Dec 8, 2024
Python 4 Updated May 11, 2024

Slides and Materials for a workshop on NLP, Transformers, LLMs and Agents. Taught at the University of Hildesheim, February 2025

Jupyter Notebook 6 Updated Feb 28, 2025

This is the official repository for "CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation"

Jupyter Notebook 22 2 Updated Oct 26, 2023

ICLR 2025 Workshop & CHI 2025 SIG: "Bidirectional Human-AI Alignment"

44 Updated Aug 6, 2024

Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]

Python 31 3 Updated Jan 23, 2025

Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

Python 106 14 Updated Nov 10, 2023

Code and data that can be used to replicate results for the ACL 2024 findings paper "Evaluating Large Language Model Biases in Persona-Steered Generation".

Jupyter Notebook 5 1 Updated May 31, 2024

General-Sum variant of the game Diplomacy for evaluating AIs.

Python 31 7 Updated Apr 2, 2024

🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.

2,895 131 Updated Oct 28, 2025

Train transformer language models with reinforcement learning.

Python 16,258 2,287 Updated Nov 11, 2025

Must-read Papers on LLM Agents.

2,762 161 Updated Oct 24, 2025

Adala: Autonomous DAta (Labeling) Agent framework

Python 1,289 110 Updated Nov 10, 2025

Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI

Jupyter Notebook 12,006 3,501 Updated Nov 11, 2025

Scripts for generating synthetic finetuning data for reducing sycophancy.

Python 117 16 Updated Aug 16, 2023

Code implementation of the paper "Evaluating Biased Attitude Associations of Language Models in an Intersectional Context"

Python 3 Updated Jun 21, 2023

A benchmark to evaluate language models on questions I've previously asked them to solve.

Python 1,034 77 Updated Apr 27, 2025

A large-scale complex question answering evaluation of ChatGPT and similar large-language models

Python 40 3 Updated Apr 23, 2024

Evaluating the Moral Beliefs Encoded in LLMs

Python 31 6 Updated Dec 17, 2024

A list of awesome resources for Computational Social Science

R 780 91 Updated Nov 9, 2025

Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparen…

Python 2,537 340 Updated Nov 11, 2025
Jupyter Notebook 116 17 Updated May 2, 2024
Next