Stars
Autoware - the world's leading open-source software project for autonomous driving
Navigator, our self-driving vehicle software stack
A curated list of world models for autonomous driving. Keep updated.
A driving dataset for the development and validation of fused pose estimators and mapping algorithms
Add n-gram and large language model (LLM) support to Whisper models.
openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 300+ supported cars.
🚀🎬 ShortGPT - Experimental AI framework for youtube shorts / tiktok channel automation
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Jockey is a conversational video agent.
Sample code and workshop materials for the demonstration of multimodal video understanding with TwelveLabs
[EMNLP-2024] Build multimodal language agents for fast prototype and production
"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"
Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection (ECCV 2022)
an optimized, production-ready implementation of active speaker detection
Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset
The repository for Springer IJCV 2025 (LR-ASD: Lightweight and Robust Network for Active Speaker Detection)
Frame is an AI-powered, open-source vibe video editor, offering a Professional VIDEO cuting alternative for creators. With Cursor-like interaction, it automates editing, enhances videos, and delive…
🎥 Python and OpenCV-based scene cut/transition detection program & library.
AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Faster Whisper transcription with CTranslate2
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
An open-source, real-time streaming Automatic Speech Recognition (ASR) model for Thai, optimized for low-latency CPU deployment.