Highlights
- Pro
Lists (7)
Sort Name ascending (A-Z)
Starred repositories
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
VLA-0: Building State-of-the-Art VLAs with Zero Modification
Official implementation of "Understanding multi-view transformers" (ICCV 2025 E2E3D Workshop)
AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera Calibration (ICCV 2025)
[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
[CVPR2025] CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
PanSt3R: Multi-view Consistent Panoptic Segmentation (official code)
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
[NeurIPS 2025 Spotlight] Towards Understanding Camera Motions in Any Video
A curated list of awesome papers for reconstructing 4D spatial intelligence from video. (arXiv 2507.21045)
Official repository for "AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos" (CVPR 2025)
Tongyi Deep Research, the Leading Open-source Deep Research Agent
A unified inference and post-training framework for accelerated video generation.
[ICLR'25] SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
🔥🔥🔥 ICLR 2025 Oral. Automating Agentic Workflow Generation.
OctoTools: An agentic framework with extensible tools for complex reasoning
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
not another coding agent, kode is agent cli for everything
[ICCV2025] Extrapolated Urban View Synthesis Benchmark
XLeRobot: Practical Dual-Arm Mobile Home Robot for $660