Skip to content
View wjf5203's full-sized avatar

Block or report wjf5203

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

MiroThinker is a series of open-source agentic models trained for deep research and complex tool use scenarios.

Python 1,149 79 Updated Nov 25, 2025
Jupyter Notebook 82 1 Updated Nov 8, 2025

[ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Python 408 20 Updated Sep 14, 2025

Fully Open Framework for Democratized Multimodal Training

Python 629 44 Updated Nov 28, 2025

WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction

Python 57 1 Updated Sep 3, 2025

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,328 442 Updated Nov 28, 2025

Official PyTorch implementation of FlowMo.

Jupyter Notebook 103 6 Updated Apr 7, 2025
Python 4,420 426 Updated Sep 14, 2025

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

Jupyter Notebook 3,247 204 Updated May 19, 2025

Summarize existing representative LLMs text datasets.

1,389 139 Updated Oct 11, 2025

Curated list of datasets and tools for post-training.

4,023 331 Updated Nov 10, 2025

A quick guide (especially) for trending instruction finetuning datasets

3,316 223 Updated Nov 28, 2023

Awesome LLM pre-training resources, including data, frameworks, and methods.

288 19 Updated Apr 29, 2025

AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model

Python 49 2 Updated Oct 12, 2025

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Jupyter Notebook 274 14 Updated Jun 2, 2025

GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset

Python 237 5 Updated Aug 15, 2025

Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).

Python 392 10 Updated Aug 26, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,779 263 Updated Sep 25, 2025

[NeurIPS 2025] Efficient Reasoning Vision Language Models

Python 419 28 Updated Sep 18, 2025

华中科技大学博士毕业论文Latex模板

TeX 222 46 Updated Jul 24, 2025

[NeurIPS 2025] MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,504 75 Updated Nov 16, 2025

(Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators

Python 631 33 Updated Nov 10, 2025

This is a repo to track the latest autoregressive visual generation papers.

412 5 Updated Jun 25, 2025

[ICCV 2025] SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Python 307 8 Updated Dec 29, 2024

Image and video Tokenizer/VAE selection guide, text and face reconstruction evaluation.

Python 133 Updated Nov 24, 2025

[ICLR 2025][arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization

Python 180 5 Updated Jun 12, 2024

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Python 1,032 62 Updated Nov 4, 2025

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook 13,606 1,697 Updated Feb 29, 2024
Next