-
Peking University
- Peking
- https://miyandoris.github.io/
Highlights
- Pro
Stars
StereoVLA is powered by stereo vision and supports flexible deployment with high tolerance to camera pose variations.
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
A collection of high-quality models for the MuJoCo physics engine, curated by Google DeepMind.
[SIGGRAPH Asia 2025] OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion
A curated list of awesome 3D scene generation papers. (arXiv 2505.05474)
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
[RSS 2025] Dexonomy: Synthesizing All Dexterous Grasp Types in a Grasp Taxonomy
[CoRL25] GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data
[CVPR 2025 Best Paper Nomination] FoundationStereo: Zero-Shot Stereo Matching
A collection of advanced tools for large-scale high-quality mesh data preparing
ManifoldPlus: A Robust and Scalable Watertight Manifold Surface Generation Method for Triangle Soups
[SIGGRAPH2022] Approximate Convex Decomposition for 3D Meshes with Collision-Aware Concavity and Tree Search
Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.
[CVPR 24] MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
A collection of papers on diffusion models for 3D generation.
This is the pytorch implement of our paper "RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model"
Official code release for ConceptGraphs
a state-of-the-art-level open visual language model | 多模态预训练模型
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
EntitySeg Toolbox: Towards Open-World and High-Quality Image Segmentation
CoTracker is a model for tracking any point (pixel) on a video.
SAM-PT: Extending SAM to zero-shot video segmentation with point-based tracking.