angzong

Youliang Zhang angzong

Wuhan University / major in computer science and technology

2 followers · 5 following

Wuhan University
Wuhan, China
16:30 (UTC -12:00)
https://www.whu.edu.cn/

Stars

MiniMax-AI / VTP

Towards Scalable Pre-training of Visual Tokenizers for Generation

Python 408 9 Updated Dec 16, 2025

facebookresearch / sam-audio

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 2,944 241 Updated Jan 5, 2026

Tongyi-MAI / Z-Image

Python 8,810 524 Updated Jan 7, 2026

thu-ml / Motus

Official code of Motus: A Unified Latent Action World Model

Python 547 9 Updated Jan 5, 2026

meituan-longcat / LongCat-Video

Python 1,871 262 Updated Dec 20, 2025

aceliuchanghong / FAQ_Of_LLM_Interview

大模型算法岗面试题(含答案):常见问题和概念解析 "大模型面试题"、"算法岗面试"、"面试常见问题"、"大模型算法面试"、"大模型应用基础"

Jupyter Notebook 1,526 111 Updated Jan 7, 2026

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,660 1,500 Updated Jan 4, 2026

ByteDance-Seed / Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,516 58 Updated Jun 14, 2025

character-ai / Ovi

Python 1,542 163 Updated Nov 15, 2025

SMPLCap / SMPLest-X

[TPAMI 2025] Official code for "SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation"

Python 229 22 Updated Nov 3, 2025

antgroup / echomimic_v3

[AAAI 2026] EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

Python 697 73 Updated Nov 24, 2025

Omni-Avatar / OmniAvatar

Python 1,769 161 Updated Aug 6, 2025

MeiGen-AI / MultiTalk

[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 2,762 471 Updated Dec 18, 2025

Tencent-Hunyuan / HunyuanVideo-1.5

HunyuanVideo-1.5: A leading lightweight video generation model

Python 3,004 114 Updated Jan 2, 2026

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,810 310 Updated Aug 14, 2025

Alibaba-NLP / DeepResearch

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 17,887 1,375 Updated Jan 8, 2026

hiyouga / LlamaFactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 65,340 7,941 Updated Jan 9, 2026

VideoVerses / VideoVAEPlus

[ICCV 2025] VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE

Python 379 12 Updated Jan 19, 2025

DINGYANB / MTVCrafter

Official project page of MTVCrafter, a new paradigm for animating arbitrary characters with 4D motion tokens.

Python 275 34 Updated Nov 13, 2025

Dai-Wenxun / MotionLCM

[ ECCV 2024 ] MotionLCM: This repo is the official implementation of "MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model"

Python 435 18 Updated Feb 24, 2025

zju3dv / Diffuman4D

[ICCV 2025] Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Python 554 26 Updated Dec 26, 2025

Dorniwang / SpeakerVid-5M-Code

The official SpeakerVid-5M data curation code.

Python 64 4 Updated Jul 23, 2025

yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader

Python 141,208 11,419 Updated Jan 6, 2026

openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 32,221 3,883 Updated Jul 23, 2024

LAION-AI / CLAP

Contrastive Language-Audio Pretraining

Python 1,980 203 Updated May 15, 2025

sicxu / Deep3DFaceRecon_pytorch

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Python 1,887 343 Updated Nov 26, 2024

Tencent-Hunyuan / HunyuanVideo-Avatar

Python 1,981 320 Updated Dec 16, 2025

Tencent-Hunyuan / HunyuanCustom

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Python 1,197 106 Updated Oct 15, 2025

baaivision / NOVA

[ICLR 2025] Autoregressive Video Generation without Vector Quantization

Python 609 20 Updated Oct 29, 2025

Gen-Verse / MMaDA

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,549 80 Updated Nov 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly