-
Microsoft; University of Washington
- Seattle, WA
- https://sites.google.com/site/kevinlin311tw/
Stars
Web-Bench is a benchmark designed to evaluate the performance of LLMs in actual Web development.
[NeurIPS 2025] T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
[NeurIPS 2024] Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
Transparent Image Layer Diffusion using Latent Transparency
Generative models for conditional audio generation
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
GPT-4V in Wonderland: LMMs as Smartphone Agents
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Examples and guides for using the OpenAI API
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
[ACM MM 2023] Official implementation of paper "Language-guided Human Motion Synthesis with Atomic Actions".
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
🎥 Python and OpenCV-based scene cut/transition detection program & library.
The unofficial python package that returns response of Google Bard through cookie value.
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
A PyTorch implementation of EmpiricalMVM
pytorch implementation of openpose including Hand and Body Pose Estimation.
[NeurIPS 2023] Official implementation of the paper "Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset"
[CVPR2024] DisCo: Referring Human Dance Generation in Real World
PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Universal LLM Deployment Engine with ML Compilation