-
Yonsei University
- Seoul, Republic of Korea
- https://jerife.org
Highlights
- Pro
Stars
๐ป A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
[ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Official Code for "TallyQA: Answering Complex Counting Questions" published at AAAI 2018
TallyQA: Answering Complex Counting Questions dataset
verl: Volcano Engine Reinforcement Learning for LLMs
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
[EMNLP 2025] The official implementation of "Zero-shot Multimodal Document Retrieval via Cross-Modal Question Generation"
๐๐ผ๐ฉ A simple Jekyll + GitHub Pages powered resume template.
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023
A technical report / research paper repository for tool integrated reasoning.
Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
maze datasets for investigating OOD behavior of ML systems
DeepPrivacy2 - A Toolbox for Realistic Image Anonymization
DeepPrivacy: A Generative Adversarial Network for Face Anonymization
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)
[NeurIPS 2024]Repos for "Visualization-of-Thought" dataset, construction code and evaluation.
[Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics]: VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search