Skip to content
View x2-peng's full-sized avatar
  • Huazhong University of Science and Technology
  • Huazhong University of Science and Technology
  • 23:44 (UTC +08:00)

Highlights

  • Pro

Block or report x2-peng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Python 401 14 Updated Nov 10, 2025

"DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"

Python 9,983 1,353 Updated Nov 10, 2025

[NeurIPS 2025] NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding

Python 204 19 Updated Nov 6, 2025

[CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Python 1,539 128 Updated Oct 7, 2025

Official code repository of Shuffle-R1

Python 31 1 Updated Aug 27, 2025

[NeurIPS 2025] More Than Generation: Unifying Generation and Depth Estimation via Text-to-Image Diffusion Models

Python 166 12 Updated Oct 31, 2025

Official Repository for MolmoAct

Python 248 25 Updated Oct 26, 2025

Contexts Optical Compression

Python 20,099 1,490 Updated Oct 25, 2025

[ICCV 2025] ACE-G is an architecture and pre-training scheme to improve generalization for scene coordinate regression-based visual relocalization.

Python 66 1 Updated Nov 5, 2025

Official implementation of Spatial-Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Python 118 4 Updated Nov 2, 2025

[NeurIPS 2025] Pixel-Perfect Depth

Python 620 24 Updated Oct 13, 2025

UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation

Python 110 2 Updated Nov 6, 2025

[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

Python 60 1 Updated Jul 22, 2025

[NeurIPS 2025] DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Python 218 7 Updated Sep 18, 2025

Official implementation of Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Python 72 Updated Oct 21, 2025
Python 58 3 Updated Nov 5, 2025

Official implementation for "JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation"

Python 243 7 Updated Nov 6, 2025

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 514 20 Updated Jan 4, 2025

Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Python 380 11 Updated Jun 22, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 16,180 1,290 Updated Nov 10, 2025

[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs".

Python 66 3 Updated Jun 17, 2024

The accepted work for cvpr2025

Python 15 Updated Aug 23, 2025

Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"

Python 359 15 Updated Sep 15, 2025

Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.

Python 301 10 Updated Oct 16, 2025

Official implementation of "VIRAL: Visual Representation Alignment for MLLMs".

Python 134 5 Updated Sep 21, 2025

Benchmarking Knowledge Transfer in Lifelong Robot Learning

Jupyter Notebook 1,107 221 Updated Mar 15, 2025

[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'

Python 290 16 Updated Apr 20, 2025
Python 206 9 Updated Aug 6, 2025

[CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

Python 1,943 121 Updated Nov 2, 2025

🏷️ 华中科技大学电信学院-电信专业 的课程分享与攻略

Python 256 48 Updated Jul 10, 2023
Next