-
The University of Hong Kong
- https://silentview.github.io/
- in/tianwei-xiong-633a70266
Highlights
- Pro
Stars
Native Multimodal Models are World Learners
Official implementation of "OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes".
Official repository for "AM-RADIO: Reduce All Domains Into One"
About Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a cost-effective, self-iterative optimization loop.
Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
[SIGGRAPH Asia 2025] OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion
Reference PyTorch implementation and models for DINOv3
Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"
Implementation of "Hyperspherical Latents Improve Continuous-Token Autoregressive"
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
Lumos Project: Frontier video unified model research by Alibaba DAMO Academy.
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
🧑🚀 全世界最好的LLM资料总结(语音视频生成、Agent、辅助编程、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (NeurIPS 2023 Spotlight) / / / / When Does Perceptual Alignment Benefit Vision Representations? (NeurIPS 2024)
[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling
A very simple GRPO implement for reproducing r1-like LLM thinking.
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Official PyTorch implementation of One-Minute Video Generation with Test-Time Training
Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"
[ICCV 2025] Official implementation of the paper "DreamCube: 3D Panorama Generation via Multi-plane Synchronization".