Skip to content
View songkq's full-sized avatar

Block or report songkq

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 7,777 711 Updated Dec 30, 2025

Official implementation of ICML 2025 Oral 🏆 paper "Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection".

Python 170 16 Updated Jul 14, 2025

[NeurIPS 2025 Spotlight] "Detecting Generated Images by Fitting Natural Image Distributions"

Python 8 Updated Dec 18, 2025

The official code of Yume

Python 464 25 Updated Dec 30, 2025

This repository is the official implementation of NeurIPS 2025 Paper "Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable".

Python 56 2 Updated Dec 26, 2025

Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning

Python 27 Updated Dec 22, 2025
Python 20 Updated Dec 11, 2025

A quick way to gather all the metadata about a video, playlist, or channel from the YouTube API.

JavaScript 456 59 Updated Mar 2, 2025

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Python 603 40 Updated Dec 31, 2025

the subtitle editor :)

C# 11,832 1,146 Updated Dec 28, 2025

GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters

Python 644 58 Updated Dec 30, 2025

The swiss army knife of lossless video/audio editing

TypeScript 36,675 1,771 Updated Dec 19, 2025

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组

Python 15,524 1,597 Updated May 18, 2025
Python 17 13 Updated Dec 8, 2025

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…

Python 12,469 1,949 Updated Oct 20, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,191 196 Updated Oct 9, 2025

UniSpeech - Large Scale Self-Supervised Learning for Speech

Python 474 74 Updated Apr 5, 2024

Command line utility for forced alignment using Kaldi

Python 1,706 276 Updated Nov 15, 2025

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 8,913 985 Updated Dec 13, 2025

NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment

Python 900 147 Updated Dec 1, 2024

Voice Activity Detector (VAD) : low-latency, high-performance and lightweight

C 1,856 146 Updated Dec 23, 2025

Audio Normalization for Python/ffmpeg

HTML 1,460 125 Updated Dec 28, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 653 48 Updated Jun 5, 2025

[CVPR 2023 Workshop] The code reproduce the results of our solutions on both tracks for Meta AI Video Similarity Challenge (CVPR 2023 Workshop). Our solutions got the first place on both tracks, in…

Python 54 12 Updated May 30, 2023

Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)

Python 101 9 Updated Sep 15, 2025

HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.

Python 1,293 98 Updated Sep 28, 2025
Next