Stars
- All languages
- Assembly
- Astro
- C
- C#
- C++
- CSS
- ChucK
- Clojure
- Cuda
- Cython
- Dart
- Emacs Lisp
- Fortran
- Go
- HTML
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MATLAB
- MDX
- MLIR
- Macaulay2
- Makefile
- Markdown
- Mathematica
- PHP
- Perl
- PostScript
- Python
- Ruby
- Rust
- Scala
- Shell
- Svelte
- Swift
- TeX
- TypeScript
- Vim Script
- Visual Basic .NET
- Vue
- XSLT
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Official implementation of ICML 2025 Oral 🏆 paper "Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection".
tmlr-group / ConV
Forked from junz-debug/ConV[NeurIPS 2025 Spotlight] "Detecting Generated Images by Fitting Natural Image Distributions"
This repository is the official implementation of NeurIPS 2025 Paper "Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable".
Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning
A quick way to gather all the metadata about a video, playlist, or channel from the YouTube API.
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
The swiss army knife of lossless video/audio editing
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
UniSpeech - Large Scale Self-Supervised Learning for Speech
Command line utility for forced alignment using Kaldi
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
Unified automatic quality assessment for speech, music, and sound.
[CVPR 2023 Workshop] The code reproduce the results of our solutions on both tracks for Meta AI Video Similarity Challenge (CVPR 2023 Workshop). Our solutions got the first place on both tracks, in…
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.