angzong

Youliang Zhang angzong

Wuhan University / major in computer science and technology

2 followers · 5 following

Wuhan University
Wuhan, China
12:18 (UTC -12:00)
https://www.whu.edu.cn/

Stars

MiniMax-AI / VTP

Towards Scalable Pre-training of Visual Tokenizers for Generation

Python 420 10 Updated Dec 16, 2025

facebookresearch / sam-audio

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,116 258 Updated Jan 5, 2026

Tongyi-MAI / Z-Image

Python 9,115 559 Updated Jan 7, 2026

thu-ml / Motus

Official code of Motus: A Unified Latent Action World Model

Python 573 9 Updated Jan 5, 2026

meituan-longcat / LongCat-Video

Python 1,929 269 Updated Dec 20, 2025

aceliuchanghong / FAQ_Of_LLM_Interview

大模型算法岗面试题(含答案):常见问题和概念解析 "大模型面试题"、"算法岗面试"、"面试常见问题"、"大模型算法面试"、"大模型应用基础"

Jupyter Notebook 1,548 113 Updated Jan 12, 2026

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,845 1,530 Updated Jan 4, 2026

ByteDance-Seed / Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,521 60 Updated Jun 14, 2025

character-ai / Ovi

Python 1,558 167 Updated Nov 15, 2025

SMPLCap / SMPLest-X

[TPAMI 2025] Official code for "SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation"

Python 232 24 Updated Nov 3, 2025

antgroup / echomimic_v3

[AAAI 2026] EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

Python 711 76 Updated Nov 24, 2025

Omni-Avatar / OmniAvatar

Python 1,778 165 Updated Aug 6, 2025

MeiGen-AI / MultiTalk

[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 2,777 472 Updated Dec 18, 2025

Tencent-Hunyuan / HunyuanVideo-1.5

HunyuanVideo-1.5: A leading lightweight video generation model

Python 3,478 117 Updated Jan 2, 2026

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,833 313 Updated Aug 14, 2025

Alibaba-NLP / DeepResearch

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 17,971 1,379 Updated Jan 12, 2026

hiyouga / LlamaFactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 66,015 8,020 Updated Jan 17, 2026

VideoVerses / VideoVAEPlus

[ICCV 2025] VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE

Python 380 12 Updated Jan 19, 2025

DINGYANB / MTVCrafter

Official project page of MTVCrafter, a new paradigm for animating arbitrary characters with 4D motion tokens.

Python 276 35 Updated Nov 13, 2025

Dai-Wenxun / MotionLCM

[ ECCV 2024 ] MotionLCM: This repo is the official implementation of "MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model"

Python 438 18 Updated Feb 24, 2025

zju3dv / Diffuman4D

[ICCV 2025] Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Python 556 27 Updated Jan 14, 2026

Dorniwang / SpeakerVid-5M-Code

The official SpeakerVid-5M data curation code.

Python 65 4 Updated Jul 23, 2025

yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader

Python 142,461 11,506 Updated Jan 18, 2026

openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 32,304 3,884 Updated Jul 23, 2024

LAION-AI / CLAP

Contrastive Language-Audio Pretraining

Python 2,000 201 Updated May 15, 2025

sicxu / Deep3DFaceRecon_pytorch

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Python 1,888 343 Updated Nov 26, 2024

Tencent-Hunyuan / HunyuanVideo-Avatar

Python 2,003 323 Updated Dec 16, 2025

Tencent-Hunyuan / HunyuanCustom

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Python 1,200 106 Updated Oct 15, 2025

baaivision / NOVA

[ICLR 2025] Autoregressive Video Generation without Vector Quantization

Python 619 21 Updated Oct 29, 2025

Gen-Verse / MMaDA

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,557 82 Updated Nov 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly