-
Beijing Normal University
- BEIJING
- fangweizhong.xyz
Stars
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
VLM assistant to help visual tracking agent recovery from failure. (Implementation of IROS 2025 "VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Visual-Language …
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Lightweight WebRTC SDK for UnrealEngine's PixelStreaming
🖥️Open-source Computer-USE for Windows
Agent S: an open agentic framework that uses computers like a human
Reference PyTorch implementation and models for DINOv3
[ICLR 2023] FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation
Codes used to test metaphor strategies in the game UNDERCOVER.
A library for generative social simulation
Simulators and baselines for ATEC 2025 software algorithm track (online competition)
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
UnrealZoo / unrealzoo-gym
Forked from zfw1226/gym-unrealcv[ICCV 2025 Highlights] Large-scale photo-realistic virtual worlds for embodied AI
[Lumina Embodied AI] 具身智能技术指南 Embodied-AI-Guide
A generative world for general-purpose robotics & embodied AI learning.
[ICLR 2025] Simulating Human-like Daily Activities with Desire-driven Autonomy
An awesome & curated list for Artificial General Intelligence, an emerging inter-discipline field that combines artificial intelligence and computational cognitive sciences.
OpenEQA Embodied Question Answering in the Era of Foundation Models
The official implementation of the ECCV 2024 paper "Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL"
A simple pip-installable Python tool to generate your own HTML citation world map from your Google Scholar ID.
Run Segment Anything Model 2 on a live video stream
Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.
A small tutorial repository on capturing images with semantic annotation from UnrealEngine to disk.