-
NVIDIA
- Santa Clara, California, USA
-
02:27
(UTC -08:00) - sbyebss.github.io
Lists (17)
Sort Name ascending (A-Z)
Stars
RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO and designed for fine-tuning.
Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.
[CVPR 2025] Code for Segment Any Motion in Videos
[ECCV2024 - Oral, Best Paper Award Candidate] SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Reference PyTorch implementation and models for DINOv3
Wan: Open and Advanced Large-Scale Video Generative Models
Code of π^3: Permutation-Equivariant Visual Geometry Learning
MAGI-1: Autoregressive Video Generation at Scale
A beautiful, simple, clean, and responsive Jekyll theme for academics
An open-source AI agent that brings the power of Gemini directly into your terminal.
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
[ICLR 2025][arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization
Official PyTorch Implementation of "Diffusion Autoencoders are Scalable Image Tokenizers"
🦜🔗 The platform for reliable agents.
Wan: Open and Advanced Large-Scale Video Generative Models
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
Cosmos-Predict1 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
Enjoy the magic of Diffusion models!
Perceptual video quality assessment based on multi-method fusion.
This repo contains the code for 1D tokenizer and generator
ElasticTok: Adaptive Tokenization for Image and Video