NVIDIA Cosmos™ is a platform purpose-built for physical AI, featuring state-of-the-art generative world foundation models (WFMs), robust guardrails, and an accelerated data processing and curation pipeline. Designed specifically for real-world systems, Cosmos enables developers to rapidly advance physical AI applications such as autonomous vehicles (AVs), robots, and video analytics AI agents.
Cosmos World Foundation Models come in three model types which can all be customized in post-training: cosmos-predict, cosmos-transfer, and cosmos-reason:
Predict | Transfer | Reason | |
---|---|---|---|
Type | World Generation | Multi-Controlnet | Reasoning VLM |
Function | Predict novel future frames given initial frames | Transfer existing control frames into photoreal frames within a video clip | Reason against frames within a video clip |
Use Cases | Data Generation & Policy Evaluation | Data Augmentation | Data Curation |
Inputs | Text, Image, Video | Multiple Video Modalities such as RGB, Depth, Segmentation, and more. | Video & Text |
Outputs | Video | Video | Text |
Our world foundation models are purpose-built to accelerate improving performance in downstream model tasks in various stages, as illustrated here in the flywheel.