Lists (14)
Sort Name ascending (A-Z)
Stars
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
One-shot and Few-shot 3D Editing without Per-Scene Optimization
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Code of π^3: Permutation-Equivariant Visual Geometry Learning
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
A curated list of foundation models for vision and language tasks
A Survey on Vision-Language Geo-Foundation Models (VLGFMs)
[Pattern Recognition 25] CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks
A comprehensive collection of IQA papers
This is an unofficial implementation of the paper “PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and Localization”.
Virtual camera is created only using opencv and numpy. It simulates a camera where we can control all its parameters, intrinsic and extrinsic to get a better understanding how each component in the…
Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving (ICCV 2025)
[CVPR 2025] PoseTraj: Pose-Aware Trajectory Control in Video Diffusion
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
A 3DGS framework for omni urban scene reconstruction and simulation.
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
[TPAMI 2025] Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving
Official code repo of ICLR'25 paper: MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations
[CVPR 2025] UniScene: Unified Occupancy-centric Driving Scene Generation
Awesome papers about Multi-Camera 3D Object Detection and Segmentation in Bird's-Eye-View, such as DETR3D, BEVDet, BEVFormer, BEVDepth, UniAD
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Visualizations for machine learning datasets
An open source implementation of CLIP.
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
A curated list of papers, datasets and resources pertaining to open vocabulary object detection.