-
Yale University
- New Haven, CT, US
- https://xypb.github.io/
- @yuexi_du
Highlights
- Pro
Stars
Biomedical Visual Instruction Tuning with Clinician Preference Alignment
SGLang is a fast serving framework for large language models and vision language models.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
Think Twice to See More: Iterative Visual Reasoning in Medical VLMs
A python module to repair invalid JSON from LLMs
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
The ground truth segmentation for MIAS, INbreast and CBIS-DDSM subset datasets.
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
Official inference repo for FLUX.1 models
TorchCFM: a Conditional Flow Matching library
A Deep Learning Python Toolkit for Healthcare Applications.
Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
LLM Frontend for Power Users.
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
MedResearcher-R1 is a deep research agent for medical scenarios, built on a knowledge-informed trajectory synthesis framework.
Train transformer language models with reinforcement learning.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
[ACL 2024 Findings] MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning https://arxiv.org/abs/2311.10537
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Reference PyTorch implementation and models for DINOv3
📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.
This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".