The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curatio…

Python 2,309 228 Updated Nov 7, 2024

rhymes-ai / Aria

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook 1,074 85 Updated Jan 22, 2025

InternLM / xtuner

A Next-Generation Training Engine Built for Ultra-Large MoE Models

Python 4,949 378 Updated Oct 24, 2025

hila-chefer / Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-…

Jupyter Notebook 870 113 Updated Aug 24, 2023

zhaoyue-zephyrus / AVION

[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"

Python 135 11 Updated Aug 23, 2025

bdaiinstitute / theia

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Python 255 11 Updated Apr 3, 2025

zhangyikaii / Proto-CAT

The code repository for "Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation"

Python 11 Updated Feb 10, 2023

zhangyikaii / LAMDA-ZhiJian

ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model Reuse

Python 48 2 Updated Sep 2, 2023

FlagOpen / FlagScale

FlagScale is a large model toolkit based on open-sourced projects.

Python 364 112 Updated Oct 23, 2025

yuezih / SMILE

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation (NeurIPS 2023)

Jupyter Notebook 22 1 Updated Oct 1, 2023

ylwhxht / SRKD-DRET

AAAI2024 - Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection

Python 39 3 Updated Jul 2, 2024

yuezih / less-is-more

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)

Python 54 Updated Oct 28, 2024

yuezih / Movie101

Narrative movie understanding benchmark

Python 76 Updated Jun 11, 2025

showlab / Awesome-MLLM-Hallucination

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

880 39 Updated Sep 27, 2025

X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Python 2,252 129 Updated May 30, 2025

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 3,309 268 Updated Jan 18, 2025

EricLee8 / MPD_EMVI

Official implementation of our paper at ACL 2023: Pre-training Multi-party Dialogue Models with Latent Discourse Inference

Python 10 Updated Jul 10, 2023

ccfddl / ccf-deadlines

⏰ Collaboratively track worldwide conference deadlines (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~

Rust 8,075 543 Updated Oct 25, 2025

YixunLiang / ReTR

Official code of ReTR (NeurIPS 2023)

Python 48 Updated Nov 9, 2023

yaolinli / CapEnrich

Python 5 Updated Feb 7, 2023

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 23,833 2,647 Updated Aug 12, 2024

voxel51 / fiftyone

Refine high-quality datasets and visual AI models

Python 9,979 678 Updated Oct 26, 2025

ML-GSAI / DPT

Official PyTorch implementation for "Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels"

Python 95 4 Updated Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zihao Yue yuezih

Achievements

Achievements

Block or report yuezih

Stars

XiaomiMiMo / MiMo-VL

dvlab-research / VisionReasoner

XiaomiMiMo / MiMo

www-Ye / Time-R1

dvlab-research / Seg-Zero

EvolvingLMMs-Lab / open-r1-multimodal

dvlab-research / Lyra

BAAI-Agents / Cradle