LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,094 215 Updated May 19, 2025

RainBowLuoCS / OpenOmni

(NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis

Python 109 6 Updated Nov 8, 2025

VITA-MLLM / VITA

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,442 178 Updated Mar 28, 2025

NVlabs / Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 4,705 308 Updated Nov 11, 2025

SakanaAI / self-adaptive-llms

A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!

Python 1,162 138 Updated Jan 30, 2025

datawhalechina / self-llm

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调（全参数/Lora）、部署国内外开源大模型（LLM）/多模态大模型（MLLM）教程

Jupyter Notebook 25,898 2,605 Updated Nov 10, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 62,326 7,542 Updated Nov 12, 2025

OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 858 70 Updated Aug 27, 2024

huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 20,030 2,091 Updated Nov 12, 2025

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 9,463 732 Updated Sep 22, 2025

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

16,676 1,075 Updated Nov 12, 2025

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,782 297 Updated Jun 12, 2025

facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,693 1,164 Updated Nov 14, 2024

stepfun-ai / Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,215 89 Updated Sep 22, 2025

lucidrains / autoregressive-diffusion-pytorch

Implementation of Autoregressive Diffusion in Pytorch

Python 416 11 Updated Nov 3, 2024

VankouF / MotionMillion-Codes

Python 249 12 Updated Aug 13, 2025

inter-dance / interDance_API

Python 11 Updated Sep 29, 2024

facebookresearch / flow_matching

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,704 251 Updated Sep 25, 2025

simonalexanderson / ListenDenoiseAction

Code to reproduce the results for our SIGGRAPH 2023 paper "Listen Denoise Action"

Python 177 27 Updated Sep 20, 2023

Juzezhang / NeuralDome_Toolbox

Official Dataset Toolbox of the paper "[CVPR 2023]NeuralDome: A Neural Modeling Pipeline on Multi-View Human-Object Interactions" and "[CVPR2024]HOI-M3: Capture Multiple Humans and Objects Interact…

Python 67 7 Updated Aug 13, 2024

UMass-Embodied-AGI / Virtual-Community

Virtual Community: An Open World for Humans, Robots, and Society

Python 176 10 Updated Oct 26, 2025

unitreerobotics / unitree_rl_gym

Python 2,526 411 Updated Jul 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

li-ronghui li-ronghui

Achievements

Achievements

Block or report li-ronghui

Starred repositories

dlp3d-ai / dlp3d.ai

yz-cnsdqz / primal-release

oneScotch / ViMoGen

wyhuai / SkillMimic

baichuan-inc / Baichuan-Omni-1.5

fixie-ai / ultravox

MTG / omar-rq

kyutai-labs / moshi

ictnlp / LLaMA-Omni