Skip to content
View pooruss's full-sized avatar

Block or report pooruss

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

All-in-One Sandbox for AI Agents that combines Browser, Shell, File, MCP and VSCode Server in a single Docker container.

Python 1,304 119 Updated Nov 10, 2025

A benchmark for LLMs on complicated tasks in the terminal

Python 1,045 379 Updated Nov 7, 2025

[NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge

Python 89 6 Updated Nov 1, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,487 58 Updated Jun 14, 2025

Benchmark environment for evaluating vision-language models (VLMs) on popular video games!

Python 310 33 Updated May 30, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,281 93 Updated Nov 8, 2025

A very simple GRPO implement for reproducing r1-like LLM thinking.

Python 1,437 110 Updated Aug 5, 2025

The Abstraction and Reasoning Corpus

JavaScript 4,618 697 Updated Apr 4, 2025

AndroidWorld is an environment and benchmark for autonomous agents

Python 500 105 Updated Oct 27, 2025

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

Python 1,215 193 Updated Oct 3, 2025

VisualWebArena is a benchmark for multimodal agents.

Python 400 66 Updated Nov 9, 2024

[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents

Python 47 3 Updated Feb 27, 2025

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

TypeScript 19,451 1,846 Updated Nov 10, 2025
Python 8,166 573 Updated Nov 5, 2025

GUICourse: From General Vision Langauge Models to Versatile GUI Agents

Python 133 7 Updated Jul 17, 2024

[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents

Python 284 12 Updated Jul 18, 2025

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.

Python 783 83 Updated Apr 30, 2025

800,000 step-level correctness labels on LLM solutions to MATH problems

Python 2,066 122 Updated Jun 1, 2023

[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Jupyter Notebook 133 10 Updated Aug 26, 2024

Final project of COMP 7409 Machine Learning in Trading and Finance – Group 7.

Python 5 2 Updated Nov 13, 2023

UniMem: Towards a Unified View of Long-Context Large Language Models (COLM 2024)

Python 9 1 Updated Aug 14, 2024

Repository of GUI Action Narrator

JavaScript 11 Updated Apr 8, 2025

(ICLR 2025) The Official Code Repository for GUI-World.

Python 67 3 Updated Dec 18, 2024

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Python 48,461 4,001 Updated Nov 10, 2025

Graduation Project HKUCS

Python 2 Updated Jul 17, 2024

Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments

Jupyter Notebook 60 3 Updated Aug 19, 2024

[ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 episodes from 6 mobile devices, spanning 6 types of cross-app…

Python 131 8 Updated Aug 4, 2025
Python 4,377 418 Updated Sep 14, 2025

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

Python 71,149 8,817 Updated Oct 21, 2025

Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

Go 155,687 13,584 Updated Nov 8, 2025
Next