Lists (7)
Sort Name ascending (A-Z)
Stars
InteriorGS: 3D Gaussian Splatting Dataset of Semantically Labeled Indoor Scenes
Repository of the paper "AnyUp: Universal Feature Upsampling".
Paper2Agent is a multi-agent AI system that automatically transforms research papers into interactive AI agents with minimal human input.
Official implementation of AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients. Published in Nature.
NEO Series: Native Vision-Language Models from First Principles
Python implementation of "Efficient Graph-Based Image Segmentation" paper
A no dependency, header-only, fast supervoxel segmentation library for 3D point clouds
💫 Industrial-strength Natural Language Processing (NLP) in Python
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
Code for paper: Reinforced Vision Perception with Tools
[CVPR2024] OneFormer3D: One Transformer for Unified Point Cloud Segmentation
PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation
Mix3D: Out-of-Context Data Augmentation for 3D Scenes (3DV 2021 Oral)
cvg / Mask3D
Forked from JonasSchult/Mask3DMask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.
[ICCV'25 oral] Official Code for "LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models"
official implementation of Splat Feature Solver: https://arxiv.org/abs/2508.12216
[IEEE TPAMI] Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning
A universal foundation model for grounded biomedical image interpretation
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"
[ECCV'20] Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
Reference PyTorch implementation and models for DINOv3
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!
[CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding
[ICLR 2024] This is the official code of the paper "V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection"