Stars
LLaMA 2 implemented from scratch in PyTorch
GPT-SoVITS ONNX Inference Engine & Model Converter
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.
A third-party MNN server supporting external calls, embedding model, TTS model and ASR model features.一个支持外部调用、向量模型、文字转语音模型和语音识别模型特性的第三方MNN服务器
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
qnguyen3 / nanoLLaVA
Forked from BAAI-DCAI/BunnyWorld's Smallest Vision-Language Model
Everything about the SmolLM and SmolVLM family of models
Official Devkit for the KITTI Depth Prediction/Completion Benchmark 2017
From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Official implementation of "DepthLab: From Partial to Complete"
[under review] The official implementation of paper "BrushEdit: All-In-One Image Inpainting and Editing"
LLM大模型(重点)以及搜广推等 AI 算法中手写的面试题,(非 LeetCode),比如 Self-Attention, AUC等,一般比 LeetCode 更考察一个人的综合能力,又更贴近业务和基础知识一点
High-performance Image Tokenizers for VAR and AR
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Efficient vision foundation models for high-resolution generation and perception.
Janus-Series: Unified Multimodal Understanding and Generation Models
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
High-Resolution Image Synthesis with Latent Diffusion Models
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!