Skip to content
View angzong's full-sized avatar

Block or report angzong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Towards Scalable Pre-training of Visual Tokenizers for Generation

Python 420 10 Updated Dec 16, 2025

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,116 258 Updated Jan 5, 2026
Python 9,115 559 Updated Jan 7, 2026

Official code of Motus: A Unified Latent Action World Model

Python 573 9 Updated Jan 5, 2026

大模型算法岗面试题(含答案):常见问题和概念解析 "大模型面试题"、"算法岗面试"、"面试常见问题"、"大模型算法面试"、"大模型应用基础"

Jupyter Notebook 1,548 113 Updated Jan 12, 2026

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,845 1,530 Updated Jan 4, 2026

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,521 60 Updated Jun 14, 2025
Python 1,558 167 Updated Nov 15, 2025

[TPAMI 2025] Official code for "SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation"

Python 232 24 Updated Nov 3, 2025

[AAAI 2026] EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

Python 711 76 Updated Nov 24, 2025

[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 2,777 472 Updated Dec 18, 2025

HunyuanVideo-1.5: A leading lightweight video generation model

Python 3,478 117 Updated Jan 2, 2026

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,833 313 Updated Aug 14, 2025

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 17,971 1,379 Updated Jan 12, 2026

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 66,015 8,020 Updated Jan 17, 2026

[ICCV 2025] VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE

Python 380 12 Updated Jan 19, 2025

Official project page of MTVCrafter, a new paradigm for animating arbitrary characters with 4D motion tokens.

Python 276 35 Updated Nov 13, 2025

[ ECCV 2024 ] MotionLCM: This repo is the official implementation of "MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model"

Python 438 18 Updated Feb 24, 2025

[ICCV 2025] Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Python 556 27 Updated Jan 14, 2026

The official SpeakerVid-5M data curation code.

Python 65 4 Updated Jul 23, 2025

A feature-rich command-line audio/video downloader

Python 142,461 11,506 Updated Jan 18, 2026

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 32,304 3,884 Updated Jul 23, 2024

Contrastive Language-Audio Pretraining

Python 2,000 201 Updated May 15, 2025

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Python 1,888 343 Updated Nov 26, 2024

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Python 1,200 106 Updated Oct 15, 2025

[ICLR 2025] Autoregressive Video Generation without Vector Quantization

Python 619 21 Updated Oct 29, 2025

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,557 82 Updated Nov 16, 2025
Next