Starred repositories
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning
💫 Toolkit to help you get started with Spec-Driven Development
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
Fully autonomous AI hacker to find actual exploits in your web apps. Shannon has achieved a 96.15% success rate on the hint-free, source-aware XBOW Benchmark.
SWE-bench: Can Language Models Resolve Real-world Github Issues?
The official repository of "A Comprehensive Survey on Reinforcement Learning-based Agentic Search: Foundations, Roles, Optimizations, Evaluations, and Applications".
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
A Claude Skill to give your agent the ability to use a web browser
The better playwright MCP: works as a browser extension. No context bloat. More capable.
Build resilient language agents as graphs.
A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack–Defense Evaluation
A security scanner for your LLM agentic workflows
一个用于 AI 驱动的渗透测试竞赛的**模型上下文协议 (MCP)** 服务器。该工 具提供了一个完整的 API 接口,使 LLM 能够自主参与 CTF 挑战。
A powerful tool for automated LLM fuzzing. It is designed to help developers and security researchers identify and mitigate potential jailbreaks in their LLM APIs.
Postgres Foreign Data Wrapper development framework in Rust.
Collection of specialized AI subagents for Claude Code for personal use (full-stack development).
Actions for running CodeQL analysis
Kode is one unit agent for every human & computer task
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
A research prototype of a human-centered web agent
A lightweight, powerful framework for multi-agent workflows