Skip to content
View bokesyo's full-sized avatar
🐁
On vacation
🐁
On vacation

Organizations

@RhapsodyAILab

Block or report bokesyo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.

Python 127 10 Updated Aug 28, 2025

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,205 87 Updated Sep 22, 2025

A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.

Python 113 14 Updated Oct 7, 2025

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python 7,730 1,383 Updated Dec 6, 2023

Cook up amazing multimodal AI applications effortlessly with MiniCPM-o

Python 222 19 Updated Nov 6, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,851 160 Updated Oct 9, 2025

VoiceBench: Benchmarking LLM-Based Voice Assistants

Python 298 17 Updated Aug 22, 2025

A toolkit for speaker diarization.

Jupyter Notebook 321 33 Updated Oct 8, 2025

Contrastive Language-Audio Pretraining

Python 1,889 191 Updated May 15, 2025

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

Python 92 10 Updated Jun 14, 2025

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,768 76 Updated Oct 22, 2025

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,006 718 Updated May 31, 2024

Authenticator generates 2-Step Verification codes in your browser.

TypeScript 4,180 1,005 Updated Oct 24, 2025

Interface for OuteTTS models.

Python 1,397 114 Updated Jun 21, 2025

Spark-TTS Inference Code

Python 10,690 1,142 Updated Apr 9, 2025

"Your Fully-Automated Personal AI Assistant"

Python 1,273 179 Updated Oct 16, 2025

Toolkit for linearizing PDFs for LLM datasets/training

Python 15,901 1,208 Updated Nov 7, 2025

🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning

Python 19,241 2,982 Updated Nov 10, 2025

LeKiwi - Low-Cost Mobile Manipulator

1,022 112 Updated Jul 15, 2025

腾讯会议摸鱼工具

C# 1,108 99 Updated Mar 16, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,022 300 Updated Nov 3, 2025

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 38,690 4,659 Updated Aug 19, 2024

[ICASSP 2024] TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

Python 179 5 Updated Nov 22, 2024

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 323 46 Updated Jul 21, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 48,120 3,944 Updated Nov 10, 2025

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 706 39 Updated Nov 19, 2024

Deepseek R1 zero tiny version own reproduce on two A100s.

Python 73 27 Updated Feb 1, 2025
Next