Skip to content
View Tomtunn's full-sized avatar
🦍
Feeling like a Gorilla
🦍
Feeling like a Gorilla
  • Mahidol University

Block or report Tomtunn

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Focus on prompting and generating

Python 47,455 7,735 Updated Dec 1, 2025

Autoware - the world's leading open-source software project for autonomous driving

Dockerfile 10,963 3,469 Updated Jan 7, 2026

Navigator, our self-driving vehicle software stack

Jupyter Notebook 40 15 Updated Sep 26, 2025

A curated list of world models for autonomous driving. Keep updated.

471 21 Updated Dec 23, 2025

A driving dataset for the development and validation of fused pose estimators and mapping algorithms

Jupyter Notebook 634 124 Updated Jul 20, 2025

Add n-gram and large language model (LLM) support to Whisper models.

Jupyter Notebook 40 4 Updated May 6, 2025

openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 300+ supported cars.

Python 59,572 10,544 Updated Jan 7, 2026

🚀🎬 ShortGPT - Experimental AI framework for youtube shorts / tiktok channel automation

Python 6,995 981 Updated Feb 10, 2025

An open-source OpusClip alternative

Python 97 41 Updated Oct 11, 2025

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 8,048 702 Updated Feb 10, 2025

Jockey is a conversational video agent.

TypeScript 94 15 Updated May 27, 2025
Jupyter Notebook 32 2 Updated Jun 10, 2024

Sample code and workshop materials for the demonstration of multimodal video understanding with TwelveLabs

Jupyter Notebook 6 1 Updated Dec 1, 2025

[EMNLP-2024] Build multimodal language agents for fast prototype and production

Python 2,619 287 Updated Mar 19, 2025

Self-Adapting Language Models

Python 1,637 293 Updated Aug 1, 2025

"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"

Python 1,771 315 Updated Dec 15, 2025

The agentic video editing framework

Python 203 24 Updated Feb 10, 2025

Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition

Python 15 1 Updated Sep 3, 2024

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection (ECCV 2022)

Python 68 9 Updated Oct 29, 2023

an optimized, production-ready implementation of active speaker detection

Python 76 19 Updated May 29, 2024

Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset

Python 71 7 Updated Jan 18, 2022

The repository for Springer IJCV 2025 (LR-ASD: Lightweight and Robust Network for Active Speaker Detection)

Python 83 19 Updated Mar 23, 2025

Frame is an AI-powered, open-source vibe video editor, offering a Professional VIDEO cuting alternative for creators. With Cursor-like interaction, it automates editing, enhances videos, and delive…

96 14 Updated May 6, 2025

🎥 Python and OpenCV-based scene cut/transition detection program & library.

Python 4,453 474 Updated Dec 11, 2025

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

Python 24 1 Updated Apr 2, 2025

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 8,931 989 Updated Dec 13, 2025

Faster Whisper transcription with CTranslate2

Python 20,225 1,689 Updated Nov 19, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,439 3,266 Updated Jan 7, 2026

An open-source, real-time streaming Automatic Speech Recognition (ASR) model for Thai, optimized for low-latency CPU deployment.

Python 36 7 Updated Nov 28, 2025
Next