shikiw

QidongHuang shikiw

PhD@USTC

73 followers · 31 following

Alibaba Cloud
https://shikiw.github.io/

Achievements

Stars

Cooperx521 / ScaleCap

Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’

Python 57 2 Updated Jun 25, 2025

Bujiazi / HiFlow

[NeurIPS 2025] Official implementation of HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

Python 82 1 Updated Sep 18, 2025

Osilly / Vision-R1

This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reas…

Python 727 19 Updated Sep 10, 2025

luogu-dev / cyaron

CYaRon: Yet Another Random Olympic-iNformatics test data generator

Python 1,587 177 Updated Oct 26, 2025

HJYao00 / Mulberry

[NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS

Python 1,224 110 Updated Sep 19, 2025

MoonshotAI / Kimi-k1.5

3,468 233 Updated Mar 7, 2025

SYuan03 / MM-IFEngine

[ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following

Python 113 Updated Sep 16, 2025

PRIME-RL / PRIME

Scalable RL solution for advanced reasoning of language models

Python 1,770 99 Updated Mar 18, 2025

ModalMinds / MM-EUREKA

MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning

Python 760 29 Updated Sep 7, 2025

huggingface / trl

Train transformer language models with reinforcement learning.

Python 16,329 2,297 Updated Nov 18, 2025

huggingface / Math-Verify

Python 1,004 47 Updated Jul 2, 2025

Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,271 58 Updated Nov 16, 2025

Liuziyu77 / Visual-RFT

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,258 102 Updated Oct 29, 2025

LiuZH-19 / SongGen

[ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Python 284 25 Updated Nov 5, 2025

bcmi / Light-A-Video

[ICCV 2025] Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Python 481 30 Updated Oct 25, 2025

Wiselnn570 / VideoRoPE

[ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++

Python 204 4 Updated Jul 28, 2025

beichenzbc / BoostStep

official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"

Python 36 3 Updated Jan 21, 2025

showlab / Awesome-MLLM-Hallucination

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

899 38 Updated Sep 27, 2025

baaivision / Emu3

Next-Token Prediction is All You Need

Python 2,253 89 Updated Mar 17, 2025

mit-han-lab / duo-attention

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 502 35 Updated Feb 10, 2025

nightdessert / Retrieval_Head

open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality

Python 219 23 Updated Aug 2, 2024

Liuziyu77 / MIA-DPO

Official implement of MIA-DPO

Python 67 3 Updated Jan 23, 2025

Cooperx521 / PyramidDrop

(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Python 132 2 Updated Mar 6, 2025

wuw2019 / LoTLIP

[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding

Python 46 3 Updated Jan 14, 2025

open-compass / ProSA

[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

Python 29 2 Updated May 22, 2025

Bujiazi / ByTheWay

[CVPR 2025] Official implementation of ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way

46 Updated Oct 10, 2025

YUCHEN005 / STAR-Adapt

Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"

Python 239 3 Updated May 24, 2024

SunzeY / Bootstrap3D

[ICCV-2025] Official implementation of Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data

Python 91 3 Updated Jul 26, 2025

shikiw / Modality-Integration-Rate

[ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".

Python 106 2 Updated Jul 9, 2025

open-compass / VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 3,386 556 Updated Nov 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QidongHuang shikiw

Achievements

Achievements

Block or report shikiw

Stars

Cooperx521 / ScaleCap

Bujiazi / HiFlow

Osilly / Vision-R1

luogu-dev / cyaron

HJYao00 / Mulberry

MoonshotAI / Kimi-k1.5

SYuan03 / MM-IFEngine

PRIME-RL / PRIME

ModalMinds / MM-EUREKA

huggingface / trl

huggingface / Math-Verify

Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs

Liuziyu77 / Visual-RFT

LiuZH-19 / SongGen

bcmi / Light-A-Video

Wiselnn570 / VideoRoPE

beichenzbc / BoostStep

showlab / Awesome-MLLM-Hallucination

baaivision / Emu3

mit-han-lab / duo-attention

nightdessert / Retrieval_Head

Liuziyu77 / MIA-DPO

Cooperx521 / PyramidDrop

wuw2019 / LoTLIP

open-compass / ProSA

Bujiazi / ByTheWay

YUCHEN005 / STAR-Adapt

SunzeY / Bootstrap3D

shikiw / Modality-Integration-Rate

open-compass / VLMEvalKit