Skip to content
View Shangyint's full-sized avatar

Highlights

  • Pro

Organizations

@Generative-Program-Analysis

Block or report Shangyint

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Contexts Optical Compression

Python 20,569 1,770 Updated Oct 25, 2025

AI-Driven Research For Systems (ADRS)

Jupyter Notebook 62 8 Updated Nov 13, 2025

Checkpoint/Restore tool

C 3,494 681 Updated Nov 16, 2025

benchmark and evaluate generative research synthesis

Python 65 6 Updated Nov 16, 2025
Python 109 50 Updated Nov 17, 2025

[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Python 187 36 Updated Jul 13, 2025

End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Python 322 19 Updated Sep 22, 2025

Optimize prompts, code, and more with AI-powered Reflective Text Evolution

Jupyter Notebook 1,577 115 Updated Nov 16, 2025

Open-source implementation of AlphaEvolve

Python 4,556 677 Updated Nov 12, 2025

Recovery-Bench is a benchmark for evaluating the capability of LLM agents to recover from mistakes

Python 9 3 Updated Sep 11, 2025

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

Python 42,651 2,869 Updated Nov 17, 2025

Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.

445 15 Updated Apr 18, 2024

NPUEval is an LLM evaluation dataset written specifically to target AIE kernel code generation on RyzenAI hardware.

C++ 24 4 Updated Nov 8, 2025

MCP server integrating GEPA (Genetic-Evolutionary Prompt Architecture) for automatic prompt optimization with Claude Desktop

Python 43 3 Updated Nov 10, 2025

Test Generation for Prompts

TeX 143 20 Updated Nov 16, 2025

Renderer for the harmony response format to be used with gpt-oss

Rust 4,012 225 Updated Nov 5, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,494 263 Updated Nov 17, 2025

Trajectories for running OpenHands on Terminal Bench

3 Updated Jul 25, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,410 232 Updated Nov 2, 2025

[NeurIPS '25] Challenging Software Optimization Tasks for Evaluating SWE-Agents

Python 55 3 Updated Nov 3, 2025
HTML 1 Updated Jun 1, 2025

A benchmark for LLMs on complicated tasks in the terminal

Python 1,076 384 Updated Nov 13, 2025

Sky-T1: Train your own O1 preview model within $450

Python 3,357 340 Updated Jul 12, 2025

Agentic testing for agentic codebases

TypeScript 638 40 Updated Nov 15, 2025

SECOM: On Memory Construction and Retrieval for Personalized Conversational Agents, ICLR 2025

Jupyter Notebook 46 3 Updated Mar 1, 2025
Python 232 35 Updated Jun 25, 2025

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA (+ more DSLs)

Python 663 87 Updated Nov 17, 2025

Letta is the platform for building stateful agents: open AI with advanced memory that can learn and self-improve over time.

Python 19,187 2,002 Updated Nov 14, 2025

Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.

Python 189 22 Updated Mar 7, 2025

AWM: Agent Workflow Memory

Python 353 30 Updated Jan 31, 2025
Next