Stars
A multi-platform proxy client based on ClashMeta,simple and easy to use, open-source and ad-free.
NekoBox for Android / sing-box / universal proxy toolchain for Android
[DEIMv2] Real Time Object Detection Meets DINOv3
[CVPR'22] Official PyTorch implementation for paper "Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer"
The official code for Locality-Aware Zero-Shot Human-Object Interaction Detection, CVPR2025
Disentangled Pre-training for Human-Object Interaction Detection
Code of ICCV paper: https://arxiv.org/abs/2011.10881
[ECCV 2024] Official implementation of "LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction"
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
An extremely fast Python package and project manager, written in Rust.
[ICCV 2023] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
No fortress, purely open ground. OpenManus is Coming.
FlashMLA: Efficient Multi-head Latent Attention Kernels
Solve Visual Understanding with Reinforced VLMs
The official code for "TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning" | [AAAI2025]
Fully open reproduction of DeepSeek-R1
EVA Series: Visual Representation Fantasies from BAAI
[CVPR2021, PAMI2023] End-to-End Object Detection with Learnable Proposal
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
A 2D Unity simulation in which cars learn to navigate themselves through different courses. The cars are steered by a feedforward neural network. The weights of the network are trained using a modi…
[CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy