Skip to content
View zongmianli's full-sized avatar

Highlights

  • Pro

Block or report zongmianli

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The repository provides code for EgoMAN model and dataset creation scripts.

Python 10 Updated Dec 31, 2025

Simulation of manipulation tasks using Galaxea robots

Python 27 Updated Aug 18, 2025

F1: A Vision Language Action Model Bridging Understanding and Generation to Actions

Python 153 10 Updated Jan 2, 2026

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

Python 330 19 Updated Jan 4, 2026

Building General-Purpose Robots Based on Embodied Foundation Model

Python 652 46 Updated Dec 10, 2025

Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo, and OpenVLA) in simulation under common setups (e.g., Google Robot, WidowX+Bridge)

Jupyter Notebook 256 43 Updated Jun 23, 2025

[NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"

Python 218 7 Updated Dec 16, 2025

A Survey of Image Editing

458 12 Updated Aug 24, 2025

ViPE: Video Pose Engine for Geometric 3D Perception

Python 1,602 125 Updated Jan 1, 2026

Wan: Open and Advanced Large-Scale Video Generative Models

Python 15,065 2,256 Updated Dec 15, 2025

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 6,877 396 Updated Dec 31, 2025

RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. 🎉🎉🎉

Python 742 63 Updated Dec 16, 2025

Unified Vision-Language-Action Model

Python 257 20 Updated Oct 15, 2025

Practicalli customisations to the Doom Emacs configuration

Emacs Lisp 18 6 Updated Apr 12, 2025

[ICML 2025] OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction

Python 114 9 Updated Apr 14, 2025

Official implementation of the paper: Task Reconstruction and Extrapolation for $\pi_0$ using Text Latent (https://arxiv.org/pdf/2505.03500)

Jupyter Notebook 98 2 Updated Aug 3, 2025

Code for RSS 2025 paper "Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies"

Python 27 5 Updated Jun 18, 2025

A Modular Toolkit for Robot Kinematic Optimization

Python 1,301 135 Updated Jan 6, 2026

The official implementation of the paper "Human Motion Diffusion as a Generative Prior"

Python 508 26 Updated Jan 25, 2025

[CoRL 2025] UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations

Python 72 2 Updated Dec 18, 2025

[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions

Python 925 54 Updated Nov 19, 2025

NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.

Jupyter Notebook 5,816 922 Updated Dec 18, 2025

[CVPR 2025] The offical Implementation of "Universal Actions for Enhanced Embodied Foundation Models"

Python 224 11 Updated Nov 6, 2025

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

2,088 130 Updated Oct 27, 2025

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程

Jupyter Notebook 27,237 2,722 Updated Jan 7, 2026

CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making

Jupyter Notebook 681 65 Updated Apr 20, 2025

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 12,152 1,287 Updated Oct 11, 2025

🎁 A collection of utilities for LeRobot.

Python 778 66 Updated Jan 5, 2026

RoboDual: Dual-System for Robotic Manipulation

Python 103 6 Updated Jul 2, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,866 308 Updated Jun 12, 2025
Next