Skip to content
View loveunk's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report loveunk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 7,084 525 Updated May 5, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 12,519 1,536 Updated Apr 24, 2025

s1: Simple test-time scaling

Python 6,619 764 Updated Jun 25, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,854 1,087 Updated Dec 26, 2025

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 15,939 2,279 Updated Sep 3, 2025

机器学习、深度学习的学习路径及知识总结

Jupyter Notebook 2,276 370 Updated Jan 26, 2025
Python 4,470 434 Updated Sep 14, 2025

An Extensible Deep Learning Library

Python 2,304 392 Updated Dec 11, 2025

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 1,261 83 Updated Jan 23, 2025

Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)

Python 1,939 139 Updated Oct 23, 2025

HPT - Open Multimodal LLMs from HyperGAI

Python 315 22 Updated Jun 6, 2024

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Python 637 71 Updated Dec 10, 2024

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,706 311 Updated Nov 28, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,216 2,690 Updated Aug 12, 2024

An Autonomous LLM Agent for Complex Task Solving

Python 8,483 892 Updated Aug 12, 2024

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)

Python 27,924 3,522 Updated Sep 23, 2025

🚀 Power Your World with AI - Explore, Extend, Empower.

JavaScript 8,192 650 Updated Sep 15, 2025

PyTorch code and models for V-JEPA self-supervised learning from video.

Python 3,353 332 Updated Feb 27, 2025

A family of lightweight multimodal models.

Python 1,049 75 Updated Nov 18, 2024

MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks

Jupyter Notebook 8,479 527 Updated Oct 8, 2025

Official Code for DragGAN (SIGGRAPH 2023)

Python 36,007 3,449 Updated May 18, 2024

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Python 7,928 600 Updated Jul 17, 2024

🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.

2,984 135 Updated Dec 20, 2025

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,424 249 Updated Dec 3, 2024
Python 3,890 255 Updated Mar 15, 2024

Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)

Python 626 43 Updated Dec 30, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,447 479 Updated Aug 7, 2024

A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''

1,350 59 Updated Mar 14, 2024

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

Python 3,989 334 Updated Jun 12, 2024

Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".

Python 714 67 Updated Sep 19, 2024
Next