Starred repositories
RTMPose series (RTMPose, DWPose, RTMO, RTMW) without mmcv, mmpose, mmdet etc.
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
SimOn: A Simple Framework for Online Temporal Action Localization
Advanced multimodal analysis tool for reviewing video annotations with pose detection, emotion recognition, audio analysis, and interactive timeline visualization.
A curated list of temporal action localization/detection and related area (e.g. temporal action proposal) resources.
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. 🔔 Official updates only via twitter @Martin9…
Multi-agent framework, runtime and control plane. Built for speed, privacy, and scale.
Effective Python: Third Edition — Source Code and Errata for the Book
A simple, easy-to-hack GraphRAG implementation
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using Agentic Retrieval 🔄.
In-browser Postgres sandbox with AI assistance (formerly postgres.new)
This project contains a step-by-step guide on how to design an advanced agentic memory for your LLM based applications.
LAYRA—an enterprise-ready, out-of-the-box solution—unlocks next-generation intelligent systems powered by visual RAG and limitless visual multi-step agent workflow orchestration.
> Gemini Rust Suite 🦀: A powerful, modular Rust toolkit for interacting with Google Gemini. Features a feature-rich CLI, persistent semantic memory (LanceDB), and extensible tool integration via th…
[IJCNN 2024] Unifying Global and Local Scene Entities Modelling for Precise Action Spotting
A fork to add multimodal model training to open-r1
Witness the aha moment of VLM with less than $3.
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
🐝 AI-powered browser assistant ("Cline for web browsing")
Temporal Action Detection & Weakly Supervised Temporal Action Detection & Temporal Action Proposal Generation
Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs