Lists (1)
Sort Name ascending (A-Z)
Stars
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning
SAPIEN Manipulation Skill Framework, an open source GPU parallelized robotics simulator and benchmark, led by Hillbot, Inc.
Open-source implementation of AlphaEvolve
Single-file implementation to advance vision-language-action (VLA) models with reinforcement learning.
Dexbotic: Open-Source Vision-Language-Action Toolbox
Reference PyTorch implementation and models for DINOv3
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Building General-Purpose Robots Based on Embodied Foundation Model
✔(已完结)超级全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】【大飞 大模型Agent】
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and search embeddings and metadata.
PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation
[Lumina Embodied AI] 具身智能技术指南 Embodied-AI-Guide
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations https://video-prediction-policy.github.io
ScreenCoder — Turn any UI screenshot into clean, editable HTML/CSS with full control. Fast, accurate, and easy to customize.
An AI agent development platform with all-in-one visual tools, simplifying agent creation, debugging, and deployment like never before. Coze your way to AI Agent creation.
Turn detection for full-duplex dialogue communication
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
SALMONN family: A suite of advanced multi-modal LLMs
NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
[NeurIPS2025] "AI-Researcher: Autonomous Scientific Innovation" -- A production-ready version: https://novix.science/chat