-
JSK Lab, UTokyo
Highlights
- Pro
Starred repositories
VITRA: Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
RDMA and SHARP plugins for nccl library
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
A curated list for awesome discrete diffusion models resources.
[NeurIPS 2025] Code for BEAST Experiments on CALVIN and LIBERO.
The Best Agent Harness. Meet Sisyphus: The Batteries-Included Agent that codes like you.
[NeurIPS 2025] Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models
Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
Isaac Lab - Arena is a robotics simulation framework that enhances NVIDIA Isaac Lab by providing a composable, scalable system for creating diverse simulation environments and evaluating robot lear…
The official implementation of InfiniteVGGT
DelinQu / SimplerEnv-OpenVLA
Forked from simpler-env/SimplerEnvEvaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo, and OpenVLA) in simulation under common setups (e.g., Google Robot, WidowX+Bridge)
Implementation of the paper 'Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance' (EMNLP 2025)
Human-in-the-loop Online Rejection Sampling for Robotic Manipulation
[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models
Zxy-MLlab / LIBERO-PRO
Forked from Lifelong-Robot-Learning/LIBEROLIBERO-PRO is the official repository of the LIBERO-PRO — an evaluation extension of the original LIBERO benchmark
Code and documentation to train Stanford's Alpaca models, and generate the data.
Fully open reproduction of DeepSeek-R1
End-to-end pipeline converting generative videos (Veo, Sora) to humanoid robot motions
EfficientSAM3 compresses SAM3 into lightweight, edge-friendly models via progressive knowledge distillation for fast promptable concept segmentation and tracking.