Skip to main content
Technology areas
AI and ML
Application development
Application hosting
Compute
Data analytics and pipelines
Databases
Distributed, hybrid, and multicloud
Generative AI
Industry solutions
Networking
Observability and monitoring
Security
Storage
Cross-product tools
Access and resources management
Costs and usage management
Infrastructure as code
Migration
SDK, languages, frameworks, and tools
/
Console
English
Deutsch
Español
Español – América Latina
Français
Indonesia
Italiano
Português
Português – Brasil
中文 – 简体
中文 – 繁體
日本語
한국어
Sign in
Vertex AI
Generative AI on Vertex AI
Start free
Guides
API reference
Vertex AI Cookbook
Prompt gallery
Resources
FAQ
Pricing
Technology areas
More
Guides
API reference
Vertex AI Cookbook
Prompt gallery
Resources
FAQ
Pricing
Cross-product tools
More
Console
Discover
Overview of Generative AI on Vertex AI
Generative AI beginner's guide
Glossary
Get started
Get an API key
Configure application default credentials
API quickstart
Vertex AI Studio quickstart
Migrate from Google AI Studio to Vertex AI
Deploy your Vertex AI Studio prompt as a web application
Vertex AI Studio capabilities
Get started with Gemini 3
Generate an image and verify its watermark using Imagen
Google GenAI libraries
Compatibility with OpenAI library
Vertex AI in express mode
Overview
Console tutorial
API tutorial
Select models
Model Garden
Overview of Model Garden
Use models in Model Garden
Test model capabilities
Supported models
Google Models
Overview
Gemini
Migrate to the latest Gemini models
Pro
Gemini 3 Pro
Gemini 3 Pro Image
Gemini 2.5 Pro
Flash
Gemini 2.5 Flash
Gemini 2.5 Flash Image
Gemini 2.5 Flash Live API
Gemini 2.0 Flash
Flash-Lite
Gemini 2.5 Flash-Lite
Gemini 2.0 Flash-Lite
Other Gemini models
Vertex AI Model Optimizer
Imagen
Imagen 3
Imagen 4
Imagen 4.0 upscale Preview
Virtual Try-On Preview 08-04
Imagen product recontext preview 06-30
Veo
Veo 2
Veo 3
Veo 3.1
Lyria
Lyria 2
Model versions
Managed models
Model as a Service (MaaS) overview
Partner models
Overview
Claude
Overview
Request predictions
Batch predictions
Prompt caching
Count tokens
Web search
Safety classifiers
Model details
Claude Opus 4.5
Claude Sonnet 4.5
Claude Opus 4.1
Claude Haiku 4.5
Claude Opus 4
Claude Sonnet 4
Claude 3.5 Haiku
Claude 3 Haiku
Mistral AI
Overview
Model details
Mistral Medium 3
Mistral OCR (25.05)
Mistral Small 3.1 (25.03)
Codestral 2
Open models
Overview
Use open models via Model as a Service (MaaS)
Grant access to open models
Models
DeepSeek
Overview
DeepSeek-V3.2
DeepSeek-V3.1
DeepSeek-R1-0528
DeepSeek-OCR
OpenAI
Overview
OpenAI gpt-oss-120b
OpenAI gpt-oss-20b
Qwen
Overview
Qwen 3 Next Instruct 80B
Qwen 3 Next Thinking 80B
Qwen 3 Coder
Qwen 3 235B
MiniMax
Overview
MiniMax M2
Kimi
Overview
Kimi K2 Thinking
Embedding (e5)
Multilingual E5 Small
Multilingual E5 Large
Llama
Overview
Request predictions
Model details
Llama 4 Maverick
Llama 4 Scout
Llama 3.3
Llama 3.2
Llama 3.1 405b
Llama 3.1 70b
Llama 3.1 8b
API
Call MaaS APIs for open models
Function calling
Thinking
Structured output
Batch prediction
Model deprecations (MaaS)
Self-deployed models
Overview
Choose an open model serving option
Deploy open models
Deploy open models from Model Garden
Deploy open models with prebuilt containers
Deploy open models with a custom vLLM container
Deploy models with custom weights
Deploy partner models from Model Garden
Google Gemma
Use Gemma
Tutorial: Deploy and inference Gemma (GPU)
Tutorial: Deploy and inference Gemma (TPU)
Llama
Use Hugging Face Models
Hex-LLM
Comprehensive guide to vLLM for Text and Multimodal LLM Serving (GPU)
vLLM TPU
xDiT
Tutorial: Deploy Llamma 3 models with SpotVM and Reservations
Model Garden notebooks
Tutorial: Optimize model performance with advanced features in Model Garden
Build
Agents
Vertex AI Agent Builder documentation
Prompt design
Introduction to prompting
Prompting strategies
Overview
Give clear and specific instructions
Use system instructions
Include few-shot examples
Add contextual information
Structure prompts
Compare prompts
Instruct the model to explain its reasoning
Break down complex tasks
Experiment with parameter values
Prompt iteration strategies
Task-specific prompt guidance
Design multimodal prompts
Design chat prompts
Design medical text prompts
Capabilities
Safety
Overview
Responsible AI
System instructions for safety
Configure content filters
Gemini for safety filtering and content moderation
Abuse monitoring
Process blocked responses