Skip to content
View lhcwork's full-sized avatar

Block or report lhcwork

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official repo for 'Large Multimodal Models Evaluation: A Survey'

1 Updated Dec 12, 2025

DepictQA: Depicted Image Quality Assessment with Vision Language Models

Python 192 7 Updated Nov 28, 2025

A curated list of recent diffusion models for video generation, editing, and various other applications.

5,369 333 Updated Dec 15, 2025

🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

Python 6,009 646 Updated Dec 27, 2025

中文nlp解决方案(大模型、数据、模型、训练、推理)

Jupyter Notebook 3,754 446 Updated Aug 5, 2025

A Framework of Small-scale Large Multimodal Models

Python 950 95 Updated Apr 26, 2025

A Simple Framework of Small-scale LMMs for Video Understanding

Python 108 6 Updated Jun 11, 2025

A minimal PyTorch re-implementation of Qwen3 VL with a fancy CLI

Python 304 17 Updated Dec 2, 2025

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 4,526 443 Updated Oct 27, 2025

Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜

Jupyter Notebook 1,855 146 Updated Jan 9, 2026

A unified inference and post-training framework for accelerated video generation.

Python 2,944 237 Updated Jan 13, 2026

Collect super-resolution related papers, data, repositories

2,981 365 Updated Dec 31, 2025

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

2,416 205 Updated Nov 12, 2025

🔥[Information Fusion 2024, Official Code] for paper "Prompt-guided image color aesthetics assessment: Models, datasets and benchmarks". Official Weights and Demos provided. 首个多因素色彩美学评估数据集、算法和benchm…

Python 68 2 Updated Jul 29, 2025

Teaching LMMs for Image Quality Scoring and Interpreting

Python 96 2 Updated Mar 25, 2025
Python 4,505 438 Updated Sep 14, 2025

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,582 192 Updated Jan 10, 2026

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,764 1,515 Updated Jan 4, 2026
Python 8,677 518 Updated Oct 9, 2024

Implementing Vi(sion)T(transformer)

448 62 Updated Mar 19, 2023

A PyTorch implementation of EfficientNet

Python 8,209 1,539 Updated Apr 8, 2022

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Python 24,861 3,482 Updated Jan 8, 2026

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,479 480 Updated Aug 7, 2024

[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Python 192 6 Updated Sep 18, 2025

The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".

34 1 Updated Sep 16, 2023

The official code for ACL 2025 Modeling Uncertainty in Composed Image Retrieval via Probabilistic Embeddings

Python 8 Updated Sep 15, 2025
Python 1 Updated Jun 11, 2025

[ACL 2025] Towards Text-Image Interleaved Retrieval

Python 16 2 Updated Sep 3, 2025

Cool Papers - Immersive Paper Discovery

JavaScript 687 16 Updated Aug 25, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,289 841 Updated Jan 8, 2026
Next