Stars
Fast and local neural text-to-speech engine
Wan: Open and Advanced Large-Scale Video Generative Models
⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.
A multi-agent framework written in Rust that enables you to build, deploy, and coordinate multiple intelligent agents
deepbeepmeep / Wan2GP
Forked from Wan-Video/Wan2.1A fast AI Video Generator for the GPU Poor. Supports Wan 2.1/2.2, Qwen Image, Hunyuan Video, LTX Video and Flux.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Wan: Open and Advanced Large-Scale Video Generative Models
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
A TTS model capable of generating ultra-realistic dialogue in one pass.
Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
Automatic 3D Character animation using Pose Estimation and Landmark Generation techniques
High-resolution models for human tasks.
OpenMMLab Pose Estimation Toolbox and Benchmark.
A Conversational Speech Generation Model
HunyuanVideo: A Systematic Framework For Large Video Generation Model
📦 The official Nextcloud installation method. Provides easy deployment and maintenance with most features included in this one Nextcloud instance.
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
The best OSS video generation models, created by Genmo
Select a portrait, click to move the head around (please use your own space / GPU!)
flow-pilot is an openpilot based driver assistance system that runs on linux, windows and android powered machines.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.