Lists (13)
Sort Name ascending (A-Z)
Stars
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
The repository provides code for running inference with the SAM 3D Body Model (3DB), links for downloading the trained model checkpoints and datasets, and example notebooks that show how to use the…
Official PyTorch Implementation of "Latent Diffusion Model Without Variational Autoencoder".
(NeurIPS 2025, SOTA) Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation
Official Code for ICLR 2024 Paper: On the Role of Discrete Tokenization in Visual Representation Learning
[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
(NeurIPS 2025) Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation
[NeurIPS 2022] code for "Visual Concepts Tokenization"
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
[CVPR] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantization
Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"
FQGAN: Factorized Visual Tokenization and Generation
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
[ICLR'23 Oral] Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching
LaTeX Thesis Template for the University of Chinese Academy of Sciences
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
Progressive Growing of GANs for Improved Quality, Stability, and Variation
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Your code is powerful, unleash it! The extension made popular by Code in the Dark has finally made its way to VS Code.
Native Multimodal Models are World Learners
[ICCV'23] VQD-SR: Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Offical implementation of "Visual Instruction Pretraining for Domain-Specific Foundation Models"