Official DINO-X Model Context Protocol (MCP) server that empowers LLMs with real-world visual perception through image object detection, localization, and captioning APIs.

TypeScript 105 10 Updated Oct 28, 2025

IDEA-Research / Rex-Thinker

Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning

Python 127 6 Updated Jun 30, 2025

VectorSpaceLab / OmniGen2

OmniGen2: Exploration to Advanced Multimodal Generation.

Jupyter Notebook 3,949 9 Updated Sep 30, 2025

Ephemeral182 / PosterCraft

Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework

Python 501 30 Updated Sep 23, 2025

apple / ml-flextok

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Jupyter Notebook 274 14 Updated Jun 2, 2025

si0wang / ViCrit

Python 24 1 Updated Jun 18, 2025

Roblox / cube

Roblox Foundation Model for 3D Intelligence

Jupyter Notebook 863 79 Updated Jul 22, 2025

keshik6 / grafting

[NeurIPS 2025 Oral] Exploring Diffusion Transformer Designs via Grafting

Jupyter Notebook 64 2 Updated Jun 18, 2025

Linfeng-Tang / PSFusion

This is official Pytorch implementation of "Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic …

Python 203 10 Updated Apr 28, 2025

nvidia-cosmos / cosmos-predict2

Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.

Python 675 90 Updated Oct 29, 2025

MiniMax-AI / MiniMax-M1

MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.

Python 2,997 264 Updated Jul 7, 2025

MMMGBench / MMMG

MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning [NeurIPS 2025 Poster]

Python 21 Updated Oct 9, 2025

redredsheep / PrismLayers

PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models

Jupyter Notebook 22 1 Updated Aug 11, 2025

X-PLUG / MobileAgent

Mobile-Agent: The Powerful GUI Agent Family

Python 6,428 651 Updated Nov 26, 2025

xiuqhou / Relation-DETR

[ECCV2024 Oral] Official implementation of the paper "Relation DETR: Exploring Explicit Position Relation Prior for Object Detection"

Python 245 19 Updated Nov 24, 2024

table-structure-recognition

Researcher.YuanYuhui PkuRainBow

Starred repositories

table-structure-recognition

lane-lines-detection