-
The University of Hong Kong
- Hong Kong
- ttengwang.com
Lists (1)
Sort Name ascending (A-Z)
Stars
Codebase of 'From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model'
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Hierarchical Reasoning Model Official Release
🔥🔥🔥 Latest Papers, Codes and Datasets on Video-LMM Post-Training
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
[EMNLP 2025 Oral] Official codebase for Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated visuals created with ChatGPT and Sora, showcasing OpenAI’s advanced image generation capab…
Streamlining Cartoon Production with Generative Post-Keyframing
AudioStory: Generating Long-Form Narrative Audio with Large Language Models
A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/
Environments for LLM Reinforcement Learning
Structured Video Comprehension of Real-World Shorts
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is developed by the Department of Electronic Engineering at Tsin…
[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model
Pytorch implementation of MeanFlow on ImageNet and CIFAR10
Code release for the paper "Progress-Aware Video Frame Captioning" (CVPR 2025)