-
SeniorResearcher@MicrosoftResearch
- Beijing
- https://www.microsoft.com/en-us/research/people/yuyua/
- @RainbowYuhui
- All languages
- Assembly
- C
- C#
- C++
- CSS
- CoffeeScript
- Coq
- Cuda
- D
- Dockerfile
- Go
- Groff
- HTML
- Java
- JavaScript
- Jinja
- Julia
- Jupyter Notebook
- Kotlin
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- OCaml
- Objective-C
- Objective-C++
- OpenEdge ABL
- PHP
- PostScript
- PowerShell
- Protocol Buffer
- Python
- R
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Scheme
- Shell
- Swift
- Tcl
- TeX
- TypeScript
- TypeSpec
- Vim Script
- Visual Basic 6.0
- Vue
- XSLT
Starred repositories
[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
aider is AI pair programming in your terminal
Official implementation of Inductive Moment Matching
Muon is an optimizer for hidden layers in neural networks
A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.
Single-pass Adaptive Image Tokenization for Minimum Program Search | What's the Kolmogorov Complexity of an Image?
Kimi K2 is the large language model series developed by Moonshot AI team
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.
SkyReels-V2: Infinite-length Film Generative model
PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learning
The world's first open-source multimodal creative assistant This is a substitute for Canva and Manus that prioritizes privacy and is usable locally.
Official implementation for "SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion" https://arxiv.org/abs/2412.10437
Official DINO-X Model Context Protocol (MCP) server that empowers LLMs with real-world visual perception through image object detection, localization, and captioning APIs.
Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
OmniGen2: Exploration to Advanced Multimodal Generation.
Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
Roblox Foundation Model for 3D Intelligence
[NeurIPS 2025 Oral] Exploring Diffusion Transformer Designs via Grafting
This is official Pytorch implementation of "Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic …
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning [NeurIPS 2025 Poster]
PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models
Mobile-Agent: The Powerful GUI Agent Family
[ECCV2024 Oral] Official implementation of the paper "Relation DETR: Exploring Explicit Position Relation Prior for Object Detection"