Lists (1)
Sort Name ascending (A-Z)
Stars
[Lumina Embodied AI] 具身智能技术指南 Embodied-AI-Guide
Explore the Multimodal “Aha Moment” on 2B Model
Wan: Open and Advanced Large-Scale Video Generative Models
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
Fully open reproduction of DeepSeek-R1
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
Official Code for MotionCtrl [SIGGRAPH 2024]
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
Pandora: Towards General World Model with Natural Language Actions and Video States
Lumina-T2X is a unified framework for Text to Any Modality Generation
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Production-ready platform for agentic workflow development.
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
The evaluation framework for the InfiCoder-Eval benchmark.