Skip to content
View shikiw's full-sized avatar

Block or report shikiw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’

Python 57 2 Updated Jun 25, 2025

[NeurIPS 2025] Official implementation of HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

Python 82 1 Updated Sep 18, 2025

This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reas…

Python 727 19 Updated Sep 10, 2025

CYaRon: Yet Another Random Olympic-iNformatics test data generator

Python 1,587 177 Updated Oct 26, 2025

[NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS

Python 1,224 110 Updated Sep 19, 2025

[ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following

Python 113 Updated Sep 16, 2025

Scalable RL solution for advanced reasoning of language models

Python 1,770 99 Updated Mar 18, 2025

MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning

Python 760 29 Updated Sep 7, 2025

Train transformer language models with reinforcement learning.

Python 16,329 2,297 Updated Nov 18, 2025

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,271 58 Updated Nov 16, 2025

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,258 102 Updated Oct 29, 2025

[ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Python 284 25 Updated Nov 5, 2025

[ICCV 2025] Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Python 481 30 Updated Oct 25, 2025

[ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++

Python 204 4 Updated Jul 28, 2025

official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"

Python 36 3 Updated Jan 21, 2025

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

899 38 Updated Sep 27, 2025

Next-Token Prediction is All You Need

Python 2,253 89 Updated Mar 17, 2025

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 502 35 Updated Feb 10, 2025

open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality

Python 219 23 Updated Aug 2, 2024

Official implement of MIA-DPO

Python 67 3 Updated Jan 23, 2025

(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Python 132 2 Updated Mar 6, 2025

[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding

Python 46 3 Updated Jan 14, 2025

[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

Python 29 2 Updated May 22, 2025

[CVPR 2025] Official implementation of ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way

46 Updated Oct 10, 2025

Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"

Python 239 3 Updated May 24, 2024

[ICCV-2025] Official implementation of Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data

Python 91 3 Updated Jul 26, 2025

[ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".

Python 106 2 Updated Jul 9, 2025

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 3,386 556 Updated Nov 17, 2025
Next