-
Zhejiang University
- Hangzhou, China
- https://hxy-123.github.io/
- @XingyiHe1
Stars
The repository provides code for EgoMAN model and dataset creation scripts.
InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation
Code for "InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields"
Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
Any4D: Unified Feed-Forward Metric 4D Reconstruction
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
[NeurIPS 2025]"DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling"
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
The repository provides code for running inference with the SAM 3D Body Model (3DB), links for downloading the trained model checkpoints and datasets, and example notebooks that show how to use the…
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
Official Pytorch Implementation for "Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising"
Scaling Novel View Synthesis for Static and Dynamic Scenes
Fast and Universal 3D reconstruction model for versatile tasks
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Automatically claims free games and DLCs on the Epic Games Store, Amazon Prime Gaming and GOG.
Native Multimodal Models are World Learners
Python wrapper for the NVIDIA cuSFM library
Official code for paper "InstantSfM: Fully Sparse and Parallel Structure-from-Motion"
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
A Modular Toolkit for Robot Kinematic Optimization
[NeurIPS 2025 (Spotlight)] The implementation for the paper "4DGT Learning a 4D Gaussian Transformer Using Real-World Monocular Videos"