🔥[Information Fusion 2024, Official Code] for paper "Prompt-guided image color aesthetics assessment: Models, datasets and benchmarks". Official Weights and Demos provided. 首个多因素色彩美学评估数据集、算法和benchm…

Python 68 2 Updated Jul 29, 2025

zzc-1998 / Q-SiT

Teaching LMMs for Image Quality Scoring and Interpreting

Python 96 2 Updated Mar 25, 2025

LLaVA-VL / LLaVA-NeXT

Python 4,505 438 Updated Sep 14, 2025

2U1 / Qwen-VL-Series-Finetune

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,582 192 Updated Jan 10, 2026

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,764 1,515 Updated Jan 4, 2026

apple / ml-ferret

Python 8,677 518 Updated Oct 9, 2024

FrancescoSaverioZuppichini / ViT

Implementing Vi(sion)T(transformer)

448 62 Updated Mar 19, 2023

lukemelas / EfficientNet-PyTorch

A PyTorch implementation of EfficientNet

Python 8,209 1,539 Updated Apr 8, 2022

lucidrains / vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Python 24,861 3,482 Updated Jan 8, 2026

QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,479 480 Updated Aug 7, 2024

csuhan / Tar

[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Python 192 6 Updated Sep 18, 2025

Yangyi-Chen / CoTConsistency

The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".

34 1 Updated Sep 16, 2023

tanghme0w / ACL25-CoPE

The official code for ACL 2025 Modeling Uncertainty in Composed Image Retrieval via Probabilistic Embeddings

Python 8 Updated Sep 15, 2025

Yukyin / VQAGuider

Python 1 Updated Jun 11, 2025

vec-ai / wikiHow-TIIR

[ACL 2025] Towards Text-Image Interleaved Retrieval

Python 16 2 Updated Sep 3, 2025

bojone / papers.cool

Cool Papers - Immersive Paper Discovery

JavaScript 687 16 Updated Aug 25, 2025

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,289 841 Updated Jan 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lhcwork

Block or report lhcwork

Stars

Q-Future / LMM-Evaluation-Survey

XPixelGroup / DepictQA

showlab / Awesome-Video-Diffusion

jingyaogong / minimind-v

yuanzhoulvpi2017 / zero_nlp

TinyLLaVA / TinyLLaVA_Factory

ZhangXJ199 / TinyLLaVA-Video

Emericen / tiny-qwen

huggingface / nanoVLM

merveenoyan / smol-vision

hao-ai-lab / FastVideo

ChaofWang / Awesome-Super-Resolution

Yutong-Zhou-cv / Awesome-Text-to-Image

woshidandan / Prompt-DeT