-
Tongji University
- Hangzhou, China
-
13:02
(UTC -12:00) - www.Jerrisk.com
- https://orcid.org/0000-0003-3668-5964
Highlights
- Pro
Starred repositories
Recipes to train reward model for RLHF.
Parse LaTeX math expressions
OpenCUA: Open Foundations for Computer-Use Agents
Awesome list of GUI agents (browser and computer use)
Agent S: an open agentic framework that uses computers like a human
ScaleCUA is the open-sourced computer use agents that can operate on cross-platform environments (Windows, macOS, Ubuntu, Android).
Refine high-quality datasets and visual AI models
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement
Official inference repo for FLUX.2 models
MoCha: End-to-End Video Character Replacement without Structural Guidance
A modern chat interface for AI agents built with Next.js, Tailwind CSS, and TypeScript.
A research prototype of a human-centered web agent
Fully Open Framework for Democratized Multimodal Reinforcement Learning.
The simplest, fastest repository for training/finetuning small-sized VLMs.
Solve Visual Understanding with Reinforced VLMs
[NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
[NIPS 25'] Evaluation code of paper "KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models"
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Mu…
Human-taught Computer-use Agent Designed for Real Windows and MacOS Desktops.
Official repository for paper: OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agents
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.
This is the official repository for the paper "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark"
FIBO is a SOTA, first open-source, JSON-native text-to-image model built for controllable, predictable, and legally safe image generation.
[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
Lets make video diffusion practical!
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)