Skip to content
View bokesyo's full-sized avatar
🐁
On vacation
🐁
On vacation

Organizations

@RhapsodyAILab

Block or report bokesyo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.

Python 144 14 Updated Aug 28, 2025

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,302 94 Updated Sep 22, 2025

A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.

Python 140 19 Updated Oct 7, 2025

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python 7,789 1,383 Updated Dec 6, 2023

Cook up amazing multimodal AI applications effortlessly with MiniCPM-o

Python 237 25 Updated Dec 10, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,262 204 Updated Jan 8, 2026

VoiceBench: Benchmarking LLM-Based Voice Assistants

Python 319 19 Updated Dec 11, 2025

A toolkit for speaker diarization.

Jupyter Notebook 373 40 Updated Dec 9, 2025

Contrastive Language-Audio Pretraining

Python 1,992 202 Updated May 15, 2025

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

Python 109 12 Updated Jun 14, 2025

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,853 82 Updated Jan 8, 2026

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,275 747 Updated May 31, 2024

Authenticator generates 2-Step Verification codes in your browser.

TypeScript 4,287 1,055 Updated Nov 26, 2025

Interface for OuteTTS models.

Python 1,419 114 Updated Jun 21, 2025

Spark-TTS Inference Code

Python 10,896 1,169 Updated Apr 9, 2025

"Your Fully-Automated Personal AI Assistant"

Python 1,348 191 Updated Oct 16, 2025

Toolkit for linearizing PDFs for LLM datasets/training

Python 16,740 1,327 Updated Jan 13, 2026

🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning

Python 20,924 3,501 Updated Jan 15, 2026

LeKiwi - Low-Cost Mobile Manipulator

1,144 126 Updated Jul 15, 2025

腾讯会议摸鱼工具

C# 1,180 106 Updated Dec 21, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,432 338 Updated Jan 5, 2026

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 38,899 4,682 Updated Aug 19, 2024

[ICASSP 2024] TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

Python 182 5 Updated Nov 22, 2024

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 343 48 Updated Jul 21, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.

Python 50,717 4,188 Updated Jan 14, 2026

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 741 41 Updated Nov 19, 2024

Deepseek R1 zero tiny version own reproduce on two A100s.

Python 82 28 Updated Feb 1, 2025
Next