Stars
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
An easy way to apply LoRA to CLIP. Implementation of the paper "Low-Rank Few-Shot Adaptation of Vision-Language Models" (CLIP-LoRA) [CVPRW 2024].
Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)
The official implementation of 3DDFA_V3 in CVPR2024 (Highlight).
OpenFace 3.0 – open-source toolkit for facial landmark detection, action unit detection, eye-gaze estimation, and emotion recognition.
Fully Open Framework for Democratized Multimodal Training
Official PyTorch implementation of the paper: UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training.
About This repository is a curated collection of the most exciting and influential CVPR 2025 papers. 🔥 [Paper + Code + Demo]
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025, Highlight)
🐙 Guides, papers, lessons, notebooks and resources for prompt engineering, context engineering, RAG, and AI Agents.
🔥🔥First-ever hour scale video understanding models
An open-source framework for training large multimodal models.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
A curated list of awesome temporal action segmentation resources.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
Code for our IJCV 2023 paper "CLIP-guided Prototype Modulating for Few-shot Action Recognition".
An arbitrary face-swapping framework on images and videos with one single trained model!
Industry leading face manipulation platform
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
[CVPR2023] The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Frontier Multimodal Foundation Models for Image and Video Understanding
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2