-
University of California, Berkeley
- Berkeley, CA
- https://wilson1yan.github.io/
- @wilson1yan
Stars
Wan: Open and Advanced Large-Scale Video Generative Models
Monitor Memory usage of Python code
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
ElasticTok: Adaptive Tokenization for Image and Video
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
[NeurIPS 2024] An official implementation of "ShareGPT4Video: Improving Video Understanding and Generation with Better Captions"
Large World Model -- Modeling Text and Video with Millions Context
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
The official repository of "Video assistant towards large language model makes everything easy"
A framework for few-shot evaluation of language models.
Youtube-8m Videos, Frames and Ids Generator. Extract videos from youtube-8m. Extract frames from youtube-8m.
Video-P2P: Video Editing with Cross-attention Control
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
[IJCV 2024] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
jax-triton contains integrations between JAX and OpenAI Triton