Skip to content
View jerife's full-sized avatar

Highlights

  • Pro

Organizations

@medal-contender

Block or report jerife

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
Showing results

๐Ÿ’ป A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

951 53 Updated Aug 17, 2025

[ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models

Python 65 8 Updated Feb 16, 2025

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 3,997 581 Updated Apr 24, 2024

Official Code for "TallyQA: Answering Complex Counting Questions" published at AAAI 2018

Python 9 3 Updated Dec 2, 2021

TallyQA: Answering Complex Counting Questions dataset

27 1 Updated Feb 19, 2024
Python 8,077 568 Updated Oct 30, 2025
Python 16 2 Updated Jun 12, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,017 2,407 Updated Nov 1, 2025
Python 1,327 118 Updated Sep 12, 2025

Referring Expression Datasets API

Jupyter Notebook 544 84 Updated Aug 27, 2024

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 3,292 531 Updated Oct 31, 2025
Python 1 Updated Oct 31, 2025
Python 1 Updated Oct 31, 2025

[EMNLP 2025] The official implementation of "Zero-shot Multimodal Document Retrieval via Cross-Modal Question Generation"

Python 12 Updated Aug 26, 2025

๐Ÿ“„๐Ÿ’ผ๐ŸŽฉ A simple Jekyll + GitHub Pages powered resume template.

HTML 1,937 1,892 Updated Nov 27, 2024

[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy

Python 74 4 Updated Oct 9, 2025

โœจ agents in use

84 14 Updated Aug 3, 2025

๐Ÿงฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023

Python 67 5 Updated Sep 17, 2025

A technical report / research paper repository for tool integrated reasoning.

8 Updated Jun 20, 2025

A programming framework for agentic AI

Python 51,326 7,820 Updated Oct 8, 2025

Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

Jupyter Notebook 264 15 Updated Aug 5, 2025

maze datasets for investigating OOD behavior of ML systems

Jupyter Notebook 64 8 Updated Oct 20, 2025

DeepPrivacy2 - A Toolbox for Realistic Image Anonymization

Python 355 45 Updated Jan 28, 2024

DeepPrivacy: A Generative Adversarial Network for Face Anonymization

Python 1,301 174 Updated Nov 19, 2023

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,076 37 Updated Oct 4, 2025

Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)

Python 58 3 Updated Apr 12, 2025

[NeurIPS 2024]Repos for "Visualization-of-Thought" dataset, construction code and evaluation.

Python 32 3 Updated Oct 23, 2024

[Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics]: VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search

Python 30 3 Updated Jul 24, 2025
Next