-
Stanford
- Stanford, CA
-
19:15
(UTC -08:00) - sayands.github.io
- @debsarkar_sayan
- @sayandsarkar.bsky.social
Stars
[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.
[ECCV 2024 - Oral] ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.
Code and data for UniEgoMotion (ICCV 2025)
A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.
STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
Awesome 3D Scene Graphs: a curated list of 3D scene graph generation and related resources!
[NeurIPS 2025] GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer
[CVPR 2025, Highlight] CrossOver: 3D Scene Cross-Modal Alignment
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
🚀 Lightning-fast computer vision models. Fine-tune SOTA models with just a few lines of code. Ready for cloud ☁️ and edge 📱 deployment.
[NeurIPS 2025, Spotlight] Rectified Point Flow: Generic Point Cloud Pose Estimation
Code for "ReSpace: Text-Driven 3D Indoor Scene Synthesis and Editing with Preference Alignment"
[RSS 2025] ROMAN: a view-invariant global localization method that matches objects from different robot views for reliable pose estimation even when a scene is observed from opposite views
A collection of onboarding diagrams of different project online
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
[ICCV 2023] SGAligner: 3D Scene Alignment with Scene Graphs
Code for the paper: "No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance" [NeurIPS'24]
[ICCV 2025 Oral] SceneSplat - Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
PyTorchGeoNodes is a PyTorch module for differentiable shape programs / procedural models in forms of graphs. It can automatically translate Blender geometry node models into PyTorch code. Original…
A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds
[CVPR 2025] WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments
Spurfies: Sparse Surface Reconstruction using Local Geometry Priors
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
[ICCV 2025] HouseTour: A Virtual Real Estate A(I)gent
🟣 Computer Vision interview questions and answers to help you prepare for your next machine learning and data science interview in 2026.