Skip to content
View ttxskk's full-sized avatar
  • Hong Kong
  • 14:34 (UTC +08:00)

Highlights

  • Pro

Block or report ttxskk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Holistic Evaluation of Multimodal LLMs on Spatial Intelligence

37 1 Updated Nov 22, 2025

Scaling Spatial Intelligence with Multimodal Foundation Models

Python 103 6 Updated Nov 21, 2025

Audio-driven Digital Human Generation Model

32 Updated Sep 14, 2025

[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

Python 62 1 Updated Jul 22, 2025

Office codebase for ICML 2025 paper "Core Knowledge Deficits in Multi-Modal Language Models"

18 Updated Oct 1, 2025

Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model

Python 2,450 207 Updated Oct 22, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 11,931 1,348 Updated Nov 14, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 16,547 1,364 Updated Nov 10, 2025

Official implement of VGGT-Long

Python 648 35 Updated Nov 20, 2025

UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Python 800 24 Updated Oct 23, 2025

OmniGen2: Exploration to Advanced Multimodal Generation.

Jupyter Notebook 3,941 9 Updated Sep 30, 2025

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

Python 79 2 Updated Jul 27, 2025

[CVPR25 Oral (Top 3.3%)] Official code for paper "Reconstructing Humans with a Biomechanically Accurate Skeleton".

Python 566 46 Updated Aug 17, 2025

Open-source unified multimodal model

Python 5,326 462 Updated Oct 27, 2025

[CVPR 2024 Highlight] XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies

Python 514 35 Updated Jun 30, 2025

An AI Hedge Fund Team

Python 42,401 7,516 Updated Nov 13, 2025

This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reas…

Python 728 19 Updated Sep 10, 2025

[NeurIPS 2025] MLLMs Need 3D-Aware Representation Supervision for Scene Understanding

Python 115 Updated Nov 6, 2025

CUDA Python: Performance meets Productivity

Python 3,044 223 Updated Nov 22, 2025

SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation

Python 768 105 Updated Mar 17, 2025

[CVPR 2025 Oral & Best Paper Finalist] Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

Python 919 68 Updated Jun 28, 2025

WiLoR: End-to-end 3D hand localization and reconstruction in-the-wild

Python 401 31 Updated Aug 1, 2025

[ICLR 2025] Track-On: Transformer-based Online Point Tracking with Memory, and [arXiv 2025] Track-On2: Enhancing Online Point Tracking with Memory

Python 81 5 Updated Oct 17, 2025

Official implementation of Continuous 3D Perception Model with Persistent State

Python 1,204 66 Updated Aug 27, 2025

Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation [Siggraph Asian 2025]

Python 438 23 Updated Sep 21, 2025

TAPIP3D: Tracking Any Point in Persistent 3D Geometry

Python 326 22 Updated Sep 27, 2025

Lets make video diffusion practical!

Python 16,219 1,562 Updated Oct 16, 2025

Universal Monocular Metric Depth Estimation

Python 1,073 100 Updated May 18, 2025

Code for the paper MultiPhys: Multi-Person Physics-aware 3D Motion Estimation (CVPR 2024)

Python 77 4 Updated Mar 24, 2025
Next