Stars
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433
[CVPR 2024 Highlight] Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
[NeurIPS 2025] This is the official repository for VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set
[NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"
The official implementation of the paper SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder
[ICML 2025] Unlearning in Diffusion Models using Sparse Autoencoders
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
[ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"
[NeurIPS 2025] The official PyTorch implementation of the "Vision Function Layer in MLLM".
[ACL 2025] Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms
This repository contains the codes of the experiments in Paper An Information-theoretic Metric of Transferability for Task Transfer Learning
Training Sparse Autoencoders on Language Models
A framework that allows you to apply Sparse AutoEncoder on any models
[ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
Code for project on assessing the faithfulness of LLMs
Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs
This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".
This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reas…
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
R1-onevision, a visual language model capable of deep CoT reasoning.