Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
This repository contains the implementation of all methods evaluated in the paper "Learning a Thousand Tasks in a Day". We provide model architectures, training scripts, and deployment examples.
A toolbox for real-to-sim reconstruction and robotic simulation
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
A Diffusion Model for Regular Time Series Generation from Irregular Data with Completion and Masking
CVPR 2025(Highlight) DexGraspAnything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
Control AI robots. Community-driven UI middleware for controlling robots, recording datasets, training action models. Compatible with SO-100 and SO-101
A minimal implementation of DeepMind's Genie world model
LightGlue: Local Feature Matching at Light Speed (ICCV 2023)
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
CLIP+MLP Aesthetic Score Predictor
VisionOS App + Python Library to stream hand tracking data from Vision Pro, video/audio stream to Vision Pro.
Pytorch implementation of "Genie: Generative Interactive Environments", Bruce et al. (2024).
Official pytorch implementation of paper "RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution"
Taming Transformers for High-Resolution Image Synthesis
A Collection of Variational Autoencoders (VAE) in PyTorch.
[NeurIPS'25] DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
Awesome World Model for Robotics Papers
[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
An autoregressive character-level language model for making more things
Master programming by recreating your favorite technologies from scratch.
TrOMR:Transformer-based Polyphonic Optical Music Recognition
homr is an Optical Music Recognition (OMR) software designed to transform camera pictures of sheet music into machine-readable MusicXML format.
BreezeWhite / oemer
Forked from meteo-team/oemerEnd-to-end Optical Music Recognition (OMR) system. Transcribe phone-taken music sheet image into MusicXML, which can be edited and converted to MIDI.
A collection of papers on diffusion models for 3D generation.
An open-source, GPU-accelerated physics simulation engine built upon NVIDIA Warp, specifically targeting roboticists and simulation researchers.