Skip to content
View angzong's full-sized avatar

Block or report angzong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Towards Scalable Pre-training of Visual Tokenizers for Generation

Python 408 9 Updated Dec 16, 2025

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 2,944 241 Updated Jan 5, 2026
Python 8,810 524 Updated Jan 7, 2026

Official code of Motus: A Unified Latent Action World Model

Python 547 9 Updated Jan 5, 2026

大模型算法岗面试题(含答案):常见问题和概念解析 "大模型面试题"、"算法岗面试"、"面试常见问题"、"大模型算法面试"、"大模型应用基础"

Jupyter Notebook 1,526 111 Updated Jan 7, 2026

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,660 1,500 Updated Jan 4, 2026

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,516 58 Updated Jun 14, 2025
Python 1,542 163 Updated Nov 15, 2025

[TPAMI 2025] Official code for "SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation"

Python 229 22 Updated Nov 3, 2025

[AAAI 2026] EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

Python 697 73 Updated Nov 24, 2025

[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 2,762 471 Updated Dec 18, 2025

HunyuanVideo-1.5: A leading lightweight video generation model

Python 3,004 114 Updated Jan 2, 2026

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,810 310 Updated Aug 14, 2025

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 17,887 1,375 Updated Jan 8, 2026

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 65,340 7,941 Updated Jan 9, 2026

[ICCV 2025] VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE

Python 379 12 Updated Jan 19, 2025

Official project page of MTVCrafter, a new paradigm for animating arbitrary characters with 4D motion tokens.

Python 275 34 Updated Nov 13, 2025

[ ECCV 2024 ] MotionLCM: This repo is the official implementation of "MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model"

Python 435 18 Updated Feb 24, 2025

[ICCV 2025] Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Python 554 26 Updated Dec 26, 2025

The official SpeakerVid-5M data curation code.

Python 64 4 Updated Jul 23, 2025

A feature-rich command-line audio/video downloader

Python 141,208 11,419 Updated Jan 6, 2026

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 32,221 3,883 Updated Jul 23, 2024

Contrastive Language-Audio Pretraining

Python 1,980 203 Updated May 15, 2025

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Python 1,887 343 Updated Nov 26, 2024

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Python 1,197 106 Updated Oct 15, 2025

[ICLR 2025] Autoregressive Video Generation without Vector Quantization

Python 609 20 Updated Oct 29, 2025

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,549 80 Updated Nov 16, 2025
Next