-
Bytedance
- Beijing
- https://pooruss.github.io/-lshwebsite/
Stars
All-in-One Sandbox for AI Agents that combines Browser, Shell, File, MCP and VSCode Server in a single Docker container.
A benchmark for LLMs on complicated tasks in the terminal
[NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
Benchmark environment for evaluating vision-language models (VLMs) on popular video games!
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
A very simple GRPO implement for reproducing r1-like LLM thinking.
AndroidWorld is an environment and benchmark for autonomous agents
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
VisualWebArena is a benchmark for multimodal agents.
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
800,000 step-level correctness labels on LLM solutions to MATH problems
[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Final project of COMP 7409 Machine Learning in Trading and Finance – Group 7.
UniMem: Towards a Unified View of Long-Context Large Language Models (COLM 2024)
(ICLR 2025) The Official Code Repository for GUI-World.
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments
[ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 episodes from 6 mobile devices, spanning 6 types of cross-app…
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.