-
Renmin University of China
- Beijing
-
22:29
(UTC +08:00) - https://kimokcheon.github.io/
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
A list of VLMs tailored for medical RG and VQA; and a list of medical vision-language datasets
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
XiaomiMiMo / lmms-eval
Forked from EvolvingLMMs-Lab/lmms-evalAccelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
A comprehensive collection of process reward models.
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
支持GPT-4/Claude/Deepseek/Sakura等大语言模型的Galgame自动化翻译解决方案 Automated translation solution for visual novels supporting GPT-4/Claude/Deepseek/Sakura
Developing VLMs for expert-level performance in specific medical specialties
[NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献!💥(100+ LLM/RL Algorithm Maps )
Devcontainer for using LaTeX in VS Code with auto-formatting and one-click arXiv export and link check.
EH-Benchmark: Ophthalmic Hallucination Benchmark and Agent-Driven Top-Down Traceable Reasoning Workflow
Towards a Unified View of Large Language Model Post-Training
UrFound: Towards Universal Retinal Foundation Models via Knowledge-Guided Masked Modeling
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides [EMNLP 2025]
MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.
Is the medical segmentation problem solved-Survey
Repo for "VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning"
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models
🔥🔥First-ever hour scale video understanding models