Stars
HaMeR: Reconstructing Hands in 3D with Transformers
A simple yet powerful agent framework that delivers with open-source models
[ICCV 2025 Highlight] OminiControl: Minimal and Universal Control for Diffusion Transformer
PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models
This repository open-sources CreatiPoster, an AI-driven graphic design generation system for multi-layer and editable compositions with strong visual appeal.
[CVPR 2025] Official repo for ART:Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
[CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation" . Project page: https://bizgen-msra.github.io/
(ICCV 2025)This repository is the official implementation of AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
[ICLR'25] The first benchmark aiming to evaluate whether LMMs can assist oracle bone inscription processing tasks
Official implementation of the paper: "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models"
Official PyTorch implementation of the paper "FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing"
[AAAI 2026] VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
[ICML 2025] Official Implementation of Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots
MAGI-1: Autoregressive Video Generation at Scale
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and te…
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
[ECCV 2024] Official repo for UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
📚 AIGC 求职面经、必备基础知识、提示词工程、ChatGPT、Stable Diffusion、Prompt、Embedding、Fintune 等 AIGC 求职你所需要知道的一切~
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Deciphering Oracle Bone Language with Diffusion Models (ACL 2024 Best Paper)