Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
dInfer: An Efficient Inference Framework for Diffusion Language Models
Gemma open-weight LLM library, from Google DeepMind
[EMNLP 2025 Oral] IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents
Curated resources, research, and tools for securing AI systems
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Renderer for the harmony response format to be used with gpt-oss
Repo for the paper "Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks".
Code for the paper "Defeating Prompt Injections by Design"
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
Open-source implementation of AlphaEvolve
Official PyTorch implementation for "Large Language Diffusion Models"
Dataset and code for "JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift"
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
A Survey on Jailbreak Attacks and Defenses against Multimodal Generative Models
A data augmentations library for audio, image, text, and video.
Fast near-duplicate matching is a method for quickly finding near-duplicate spans in a document by utilizing the Rabin-Karp algorithm.
The Security Toolkit for LLM Interactions
Every practical and proposed defense against prompt injection.
Official code for "Measuring Non-Adversarial Reproduction of Training Data in Large Language Models" (https://arxiv.org/abs/2411.10242)