Skip to content
View SilentView's full-sized avatar

Highlights

  • Pro

Block or report SilentView

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Native Multimodal Models are World Learners

Python 1,295 46 Updated Nov 28, 2025

Official implementation of "OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes".

Python 80 2 Updated Nov 3, 2025

Official repository for "AM-RADIO: Reduce All Domains Into One"

Python 1,403 51 Updated Nov 27, 2025

Official code of RDT 2

Python 590 26 Updated Oct 11, 2025

About Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a cost-effective, self-iterative optimization loop.

Python 83 4 Updated Nov 26, 2025

Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

Python 41 3 Updated Nov 4, 2025

[SIGGRAPH Asia 2025] OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion

Python 150 8 Updated Nov 6, 2025

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 8,590 615 Updated Nov 20, 2025

Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"

Jupyter Notebook 124 2 Updated Oct 17, 2025

Implementation of "Hyperspherical Latents Improve Continuous-Token Autoregressive"

Python 80 6 Updated Nov 15, 2025

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

167 7 Updated Oct 5, 2025

Lumos Project: Frontier video unified model research by Alibaba DAMO Academy.

Python 144 3 Updated Jul 17, 2025

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.

Python 317 8 Updated Jul 9, 2024

🧑‍🚀 全世界最好的LLM资料总结(语音视频生成、Agent、辅助编程、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.

6,839 650 Updated Nov 29, 2025

DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (NeurIPS 2023 Spotlight) / / / / When Does Perceptual Alignment Benefit Vision Representations? (NeurIPS 2024)

Python 555 31 Updated Nov 24, 2025

GPT chat with emotional expressions.

Python 781 58 Updated Nov 27, 2025

[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Python 897 50 Updated Jul 10, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 16,776 1,372 Updated Nov 28, 2025

T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation

Jupyter Notebook 32 3 Updated Sep 16, 2025

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling

Python 1,074 53 Updated Nov 3, 2025

A very simple GRPO implement for reproducing r1-like LLM thinking.

Python 1,463 117 Updated Nov 21, 2025

ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation

Python 25 2 Updated May 27, 2025

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,185 1,219 Updated Nov 4, 2025

Official PyTorch implementation of One-Minute Video Generation with Test-Time Training

Python 2,307 191 Updated Jun 5, 2025

Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"

Python 319 17 Updated Nov 2, 2025
Python 79 2 Updated Jun 23, 2025

[ICCV 2025] Official implementation of the paper "DreamCube: 3D Panorama Generation via Multi-plane Synchronization".

Python 158 11 Updated Nov 5, 2025
Python 18 1 Updated Jun 5, 2025
Next