-
The University of Hong Kong
- Hong Kong, SAR
-
20:46
(UTC +08:00) - tianbaoxie.com
- @TianbaoX
Highlights
Lists (4)
Sort Name ascending (A-Z)
Stars
This is the official implementation for **"AUTOPR: LET'S AUTOMATE YOUR ACADEMIC PROMOTION!**".
All-in-One Sandbox for AI Agents that combines Browser, Shell, File, MCP and VSCode Server in a single Docker container.
A virtual environment for developing and evaluating automated scientific discovery agents.
My learning notes/codes for ML SYS.
An Open-Source Large-Scale Reinforcement Learning Project for Search Agents
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents
Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
Renderer for the harmony response format to be used with gpt-oss
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Qwen Code is a coding agent that lives in the digital world.
The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >70% on SWE-bench verified!
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
Kimi K2 is the large language model series developed by Moonshot AI team
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
OpenCUA: Open Foundations for Computer-Use Agents
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents [NeurIPS 2025 Spotlight]
A benchmark for LLMs on complicated tasks in the terminal
A library for generating difficulty-scalable, multi-tool, and verifiable agentic tasks with execution trajectories.