Starred repositories
[CVPR 2025] DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
PyTorch code and models for VJEPA2 self-supervised learning from video.
UnrealZoo / unrealzoo-gym
Forked from zfw1226/gym-unrealcv[ICCV 2025 Highlights] Large-scale photo-realistic virtual worlds for embodied AI
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
[ICLR 2025 Spotlight] Official implementation for "DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes"
[ECCV 2024] Official PyTorch implementation of Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation
[CVPR 2024] The official implementation for "SemCity: Semantic Scene Generation with Triplane Diffusion"
[ICCV 2025] HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Implementation of papers in 100 lines of code.
Official implementation of paper "Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation"
Official implementation of "From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction"
Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving (AAAI-25)
This is the official implementation of UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
RynnVLA-002: A Unified Vision-Language-Action and World Model
Truevision Designer is a 3D tool to design roads, intersections and environments for testing and validating autonomous vehicles and robots.
3D Gaussian Rendering PlayGround: an open-source autonomous driving closed-loop simulator demo using 3D Gaussian Splatting tech
A generative speech model for daily dialogue.
EmotiVoice ๐: a Multi-Voice and Prompt-Controlled TTS Engine
[NeurIPS 2025]Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency
FLARE: Fast Large-scale Autonomous Exploration Guided by Unknown Regions
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"