Tomtunn

Follow

🦍

Feeling like a Gorilla

Tomtun Tomtunn

🦍

Feeling like a Gorilla

Follow

Deep learning engineer and researcher

2 followers · 7 following

Mahidol University

Achievements

Achievements

Stars

lllyasviel / Fooocus

Focus on prompting and generating

Python 47,455 7,735 Updated Dec 1, 2025

autowarefoundation / autoware

Autoware - the world's leading open-source software project for autonomous driving

Dockerfile 10,963 3,469 Updated Jan 7, 2026

Nova-UTD / navigator

Navigator, our self-driving vehicle software stack

Jupyter Notebook 40 15 Updated Sep 26, 2025

HaoranZhuExplorer / World-Models-Autonomous-Driving-Latest-Survey

A curated list of world models for autonomous driving. Keep updated.

471 21 Updated Dec 23, 2025

commaai / comma2k19

A driving dataset for the development and validation of fused pose estimators and mapping algorithms

Jupyter Notebook 634 124 Updated Jul 20, 2025

hitz-zentroa / whisper-lm

Add n-gram and large language model (LLM) support to Whisper models.

Jupyter Notebook 40 4 Updated May 6, 2025

commaai / openpilot

openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 300+ supported cars.

Python 59,572 10,544 Updated Jan 7, 2026

RayVentura / ShortGPT

🚀🎬 ShortGPT - Experimental AI framework for youtube shorts / tiktok channel automation

Python 6,995 981 Updated Feb 10, 2025

FujiwaraChoki / supoclip

An open-source OpusClip alternative

Python 97 41 Updated Oct 11, 2025

Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 8,048 702 Updated Feb 10, 2025

twelvelabs-io / tl-jockey

Jockey is a conversational video agent.

TypeScript 94 15 Updated May 27, 2025

prateekchhikara / sports-highlights

Jupyter Notebook 32 2 Updated Jun 10, 2024

twelvelabs-io / tl-solutions-samples

Sample code and workshop materials for the demonstration of multimodal video understanding with TwelveLabs

Jupyter Notebook 6 1 Updated Dec 1, 2025

om-ai-lab / OmAgent

[EMNLP-2024] Build multimodal language agents for fast prototype and production

Python 2,619 287 Updated Mar 19, 2025

Continual-Intelligence / SEAL

Self-Adapting Language Models

Python 1,637 293 Updated Aug 1, 2025

HKUDS / ViMax

"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"

Python 1,771 315 Updated Dec 15, 2025

diffusionstudio / agent

The agentic video editing framework

Python 203 24 Updated Feb 10, 2025

andrespimartin / weighted-x-entropy-asr

Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition

Python 15 1 Updated Sep 3, 2024

SRA2 / SPELL

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection (ECCV 2022)

Python 68 9 Updated Oct 29, 2023

sieve-community / fast-asd

an optimized, production-ready implementation of active speaker detection

Python 76 19 Updated May 29, 2024

Andreaswt / ai-podcast-clipper-saas

TypeScript 169 99 Updated May 22, 2025

okankop / ASDNet

Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset

Python 71 7 Updated Jan 18, 2022

Junhua-Liao / LR-ASD

The repository for Springer IJCV 2025 (LR-ASD: Lightweight and Robust Network for Active Speaker Detection)

Python 83 19 Updated Mar 23, 2025

aregrid / frame

Frame is an AI-powered, open-source vibe video editor, offering a Professional VIDEO cuting alternative for creators. With Cursor-like interaction, it automates editing, enhances videos, and delive…

96 14 Updated May 6, 2025

Breakthrough / PySceneDetect

🎥 Python and OpenCV-based scene cut/transition detection program & library.

Python 4,453 474 Updated Dec 11, 2025

alperensumeroglu / ai-clips-maker

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

Python 24 1 Updated Apr 2, 2025

pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 8,931 989 Updated Dec 13, 2025

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

Python 20,225 1,689 Updated Nov 19, 2025

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,439 3,266 Updated Jan 7, 2026

scb-10x / typhoon-asr

An open-source, real-time streaming Automatic Speech Recognition (ASR) model for Thai, optimized for low-latency CPU deployment.

Python 36 7 Updated Nov 28, 2025