Stars
[CVPR 2025] UniScene: Unified Occupancy-centric Driving Scene Generation
[IROS 2024] ODTFormer: Efficient Obstacle Detection and Tracking with Stereo Cameras Based on Transformer
Slam Toolbox for lifelong mapping and localization in potentially massive maps with ROS
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
[CVPR2024] Code for "SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation".
Visual SLAM/odometry package based on NVIDIA-accelerated cuVSLAM
Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
Student version of Assignment 2 for Stanford CS336 - Language Modeling From Scratch
Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch
Training VLM agents with multi-turn reinforcement learning
Paper Survey for Transformer-based SLAM
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
《Build a Large Language Model (From Scratch)》是一本深入探讨大语言模型原理与实现的电子书,适合希望深入了解 GPT 等大模型架构、训练过程及应用开发的学习者。为了让更多中文读者能够接触到这本极具价值的教材,我决定将其翻译成中文,并通过 GitHub 进行开源共享。
Tensors and Dynamic neural networks in Python with strong GPU acceleration
OpenMMLab Pose Estimation Toolbox and Benchmark.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
stereoCamera,calibration,stereo matching,SGBM,双目摄像头,相机标定,视差图生成,深度图生成,点云数据生成。
A modern, C++-native, test framework for unit-tests, TDD and BDD - using C++14, C++17 and later (C++11 support is in v2.x branch, and C++03 on the Catch1.x branch)
Pangolin is a lightweight portable rapid development library for managing OpenGL display / interaction and abstracting video input.
This code contains an algorithm to compute stereo visual SLAM by using both point and line segment features.
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities