Skip to content
View wzzheng's full-sized avatar

Highlights

  • Pro

Block or report wzzheng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official implementation of IROS 2025 paper Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline

Python 41 4 Updated Aug 11, 2025

Unfied World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

Python 157 10 Updated Oct 8, 2025
10 Updated Nov 27, 2025

Audio-video joint generation

29 Updated Nov 27, 2025

G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning

Python 144 1 Updated Nov 27, 2025

Thinking in 360°: Humanoid Visual Search in the Wild

Python 57 Updated Nov 26, 2025

GigaWorld-0: World Models as Data Engine to Empower Embodied AI

Python 160 14 Updated Nov 26, 2025
Python 51 Updated Nov 26, 2025

WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving

95 5 Updated Nov 1, 2025

iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation

Python 60 Updated Nov 26, 2025

Official PyTorch Implementation of "Flow Map Distillation Without Data"

Python 72 6 Updated Nov 25, 2025

[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.

Python 1,081 61 Updated Nov 25, 2025
Python 68 3 Updated Nov 27, 2025

Official repository for “DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation”

Python 77 2 Updated Nov 26, 2025

HunyuanVideo-1.5: A leading lightweight video generation model

Python 900 65 Updated Nov 29, 2025

Action-Guided Knowledge Distillation for VLA Models

Python 11 Updated Nov 25, 2025

Muskie: Multi-view Masked Image Modeling for 3D Vision Pre-training

Jupyter Notebook 9 Updated Nov 27, 2025

UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

Python 54 1 Updated Nov 27, 2025

Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model

Python 896 58 Updated Nov 26, 2025

ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation

436 6 Updated Oct 11, 2025

MuM's a pretty good feature extractor for 3D tasks, probably the best.

Python 47 Updated Nov 24, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,354 109 Updated Nov 29, 2025

MiMo-Embodied

Python 259 7 Updated Nov 21, 2025

Offical Repository of POMA-3D: The Point Map Way to 3D Scene Understanding.

12 1 Updated Nov 9, 2025

NaTex: Seamless Texture Generation as Latent Color Diffusion

87 1 Updated Nov 26, 2025

Official Pytorch Implementation for "Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising"

Python 220 17 Updated Nov 27, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,123 826 Updated Nov 20, 2025

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,989 153 Updated Apr 21, 2025
Python 250 26 Updated May 19, 2025
Python 1,365 134 Updated Nov 15, 2025
Next