-
University of Michigan, Ann Arbor
- Ann Arbor, United States
- sihanxu.github.io
Highlights
- Pro
Stars
[CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
PyTorch implementation of SimSiam https//arxiv.org/abs/2011.10566
Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders
About Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a cost-effective, self-iterative optimization loop.
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
Collect every awesome work about r1!
PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437
Official Jax Implementation of MaskGIT
Minimal reproduction of DeepSeek R1-Zero
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821