Stars
(NeurIPS 2025 D&B Track) OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"
[ACMMM 2025] "Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts" (Official Implementation)
A Collection of Papers and Codes for CVPR2025/ICCV2025/CVPR2024/ECCV2024 AIGC
Calligrapher: Freestyle Text Image Customization
Unified layout planning and image generation, ICCV2025
This repository open-sources CreatiPoster, an AI-driven graphic design generation system for multi-layer and editable compositions with strong visual appeal.
ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning
Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"(ICCV2025)
A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design
[NeurIPS 2025] IEAP: Image Editing As Programs with Diffusion Models
Layout Conditioned Image Generation, NeurIPS2024
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Official inference repo for FLUX.1 models
[CVPRW 2022] MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
Train transformer language models with reinforcement learning.
[AAAI-2025] The official code of Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
[ICCV 2025] Official pytorch implementation of "FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors"
ComfyUI nodes to use segment-anything-2
[NeurIPS 2025] Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Surpasses GPT-4o in ID persistence~ MoE ckpt released! Only 4GB VRAM is enough to run!
RepText: Rendering Visual Text via Replicating 🔥