Skip to main content
Documentation
Technology areas
close
AI and ML
Application development
Application hosting
Compute
Data analytics and pipelines
Databases
Distributed, hybrid, and multicloud
Generative AI
Industry solutions
Networking
Observability and monitoring
Security
Storage
Cross-product tools
close
Access and resources management
Costs and usage management
Google Cloud SDK, languages, frameworks, and tools
Infrastructure as code
Migration
Related sites
close
Google Cloud Home
Free Trial and Free Tier
Architecture Center
Blog
Contact Sales
Google Cloud Developer Center
Google Developer Center
Google Cloud Marketplace
Google Cloud Marketplace Documentation
Google Cloud Skills Boost
Google Cloud Solution Center
Google Cloud Support
Google Cloud Tech Youtube Channel
/
English
Deutsch
Español – América Latina
Français
Indonesia
Italiano
Português – Brasil
中文 – 简体
中文 – 繁體
日本語
한국어
Console
Sign in
Generative AI on Vertex AI
Guides
API reference
Vertex AI Cookbook
Prompt gallery
Resources
FAQ
Contact Us
Start free
Documentation
Guides
API reference
Vertex AI Cookbook
Prompt gallery
Resources
FAQ
Technology areas
More
Cross-product tools
More
Related sites
More
Console
Contact Us
Start free
Discover
Overview of Generative AI on Vertex AI
Generative AI beginner's guide
Glossary
Get started
Get an API key
Configure application default credentials
API quickstart
Vertex AI Studio quickstart
Migrate from Google AI Studio to Vertex AI
Deploy your Vertex AI Studio prompt as a web application
Vertex AI Studio capabilities
Generate an image and verify its watermark using Imagen
Google GenAI libraries
Compatibility with OpenAI library
Vertex AI in express mode
Overview
Console tutorial
API tutorial
Select models
Model Garden
Overview of Model Garden
Use models in Model Garden
Test model capabilities
Supported models
Google Models
Overview
Gemini
Gemini 2.5 Flash
Gemini 2.5 Pro
Gemini 2.5 Flash-Lite
Gemini 2.0 Flash
Gemini 2.0 Flash-Lite
Vertex AI Model Optimizer
Migrate to the latest Gemini models
SDKs
Imagen
Imagen 3.0 Generate 002
Imagen 3.0 Generate 001
Imagen 3.0 Fast Generate 001
Imagen 3.0 Capability 001
Imagen 4.0 Generate
Imagen 4.0 Fast Generate
Imagen 4.0 Ultra Generate
Virtual Try-On Preview 08-04
Imagen product recontext preview 06-30
Migrate to Imagen 3
Veo
Veo 2
Veo 2 Experimental
Veo 3
Veo 3 Fast
Veo 3 preview
Veo 3 Fast preview
Model versions
Managed models
Model as a Service (MaaS) overview
Partner models
Claude
Overview
Request predictions
Batch predictions
Prompt caching
Count tokens
Model details
Claude Opus 4.1
Claude Opus 4
Claude Sonnet 4
Claude 3.7 Sonnet
Claude 3.5 Haiku
Claude 3 Haiku
Mistral AI
Overview
Model details
Mistral OCR (25.05)
Mistral Small 3.1 (25.03)
Mistral Large (24.11)
Codestral (25.01)
Open models
DeepSeek
Overview
DeepSeek-R1-0528
DeepSeek-V3.1
OpenAI
Overview
OpenAI gpt-oss-120b
OpenAI gpt-oss-20b
Qwen
Overview
Qwen 3 Coder
Qwen 3 235B
Llama
Overview
Request predictions
Batch predictions
Model details
Llama 4 Maverick
Llama 4 Scout
Llama 3.3
Llama 3.2
Llama 3.1 405b
Llama 3.1 70b
Llama 3.1 8b
Model capabilities
Function calling
Structured output
Model deprecations (MaaS)
Self-deployed models
Overview
Deploy models with custom weights
Google Gemma
Use Gemma
Tutorial: Deploy and inference Gemma (GPU)
Tutorial: Deploy and inference Gemma (TPU)
Llama
Use Hugging Face Models
Hex-LLM
Comprehensive guide to vLLM for Text and Multimodal LLM Serving (GPU)
xDiT
Tutorial: Deploy Llamma 3 models with SpotVM and Reservations
Model Garden notebooks
Tutorial: Optimize model performance with advanced features in Model Garden
Build
Agents
Overview
Agent Development Kit
Overview
Quickstart
Deploy to Agent Engine
Agent Engine
Overview
Runtime
Quickstart
Set up the environment
Develop an agent
Overview
Agent Development Kit
LangChain
LangGraph
AG2
LlamaIndex
Custom
Deploy an agent
Use an agent
Overview
Agent Development Kit
LangChain
LangGraph
AG2
LlamaIndex
Manage deployed agents
Overview
Access control
Tracing
Logging
Monitoring
Using Private Service Connect interface
Evaluate an agent
Sessions
Sessions overview
Manage sessions using Agent Development Kit
Manage sessions using API calls
Memory Bank
Overview
Set up Memory Bank
Quickstarts
Quickstart with Agent Engine SDK
Quickstart with Agent Development Kit
Generate memories
Fetch memories
Example Store
Example Store overview
Example Store quickstart
Create or reuse an Example Store instance
Upload examples
Retrieve examples
Getting help
Troubleshoot setting up the environment
Troubleshoot developing an agent
Troubleshoot deploying an agent
Troubleshoot using an agent
Troubleshoot managing deployed agents
Get support
Agent2Agent (A2A) Protocol
Overview
A2A Python SDK
A2A JavaScript SDK
A2A Java SDK
A2A C#/.NET SDK
A2A samples
Agent Tools
Built-in tools
Google Cloud tools
Model Context Protocol (MCP) tools
MCP Toolbox for Databases
Ecosystem tools
Prompt design
Introduction to prompting
Prompting strategies
Overview
Give clear and specific instructions
Use system instructions
Include few-shot examples
Add contextual information
Structure prompts
Compare prompts
Instruct the model to explain its reasoning
Break down complex tasks
Experiment with parameter values
Prompt iteration strategies
Task-specific prompt guidance
Design multimodal prompts
Design chat prompts
Design medical text prompts
Capabilities
Safety
Overview
Responsible AI
System instructions for safety
Configure content filters
Gemini for safety filtering and content moderation
Abuse monitoring
Process blocked responses
Text and code generation
Text generation
System instructions
Function calling
Structured output
Content generation parameters
Code execution
Medical text
Image generation
Gemini
Generate images with Gemini
Edit images with Gemini
Imagen
Imagen overview
Generate images using text prompts
Verify an image watermark
Configure Imagen parameters
Configure Responsible AI safety settings
Use prompt rewriter
Set text prompt language
Configure aspect ratio
Set output resolution
Omit content using a negative prompt
Generate deterministic images
Generate images for retail and e-commerce
Generate Virtual Try-On images
Recontextualize product images
Edit images
Overview
Insert objects into an image using inpaint
Remove objects from an image using inpaint
Expand the content of an image using outpaint
Replace the background of an image
Edit using Personalization
Edit images using text prompts
Customize images
Subject customization
Style customization
Controlled Customization
Instruct Customization
Upscale an image
Prompt and image attribute guide
Base64 encode and decode files
Responsible AI and usage guidelines for Imagen
Legacy features
Migrate to Imagen 3
Get image descriptions using visual captioning
Use Visual Question Answering
Get video descriptions using Imagen
Video generation
Introduction to Veo
Generate Veo videos from text prompts
Generate Veo videos from an image
Generate Veo videos using first and last video frames
Direct Veo video generation using a reference image
Extend Veo videos
Veo prompt guide
Turn off Veo's prompt rewriter
Responsible AI for Veo
Music generation
Generate music using Lyria
Lyria prompt guide
Media analysis
Image understanding
Video understanding
Audio understanding
Document understanding
Bounding box detection
Grounding
Overview
Grounding with Google Search
Grounding with Google Maps
Grounding with Vertex AI Search
Grounding with your search API
Grounding responses using RAG
Grounding with Elasticsearch
Web Grounding for Enterprise
URL context
Thinking
Live API
Live API overview
Interactive conversations
Built-in tools
Embeddings
Overview
Text embeddings
Get text embeddings
Choose an embeddings task type
Get multimodal embeddings
Get batch embeddings predictions
Translation
Generate speech from text
Transcribe speech
Development tools
Use AI-powered prompt writing tools
Overview
Optimize prompts
Overview
Zero-shot optimizer
Data-driven optimizer
Use prompt templates
RAG Engine
RAG overview
RAG quickstart for Python
RAG Engine billing
Understanding RagManagedDb
Data ingestion
Supported models
Generative models
Embedding models
Document parsing
Supported documents
Fine-tune RAG transformations
Use Document AI layout parser
Use the LLM parser
Vector database choices in RAG
Overview of vector database choices
Use RagManagedDb with RAG
Use Vertex AI Vector Search with RAG
Use Feature Store with RAG
Use Weaviate with RAG
Use Pinecone with RAG
Use Vertex AI Search with RAG
Reranking for RAG
Manage your RAG corpus
Use CMEK with RAG
RAG quotas
Use RAG in Gemini Live API
Tokenizer
List and count tokens
Use the Count Tokens API
Multimodal datasets
Use Vertex AI Search
Model tuning
Introduction to tuning
Gemini models
About supervised fine-tuning
Prepare your data
Use supervised fine-tuning
Use tuning checkpoints
Supported modalities
Text tuning
Document tuning
Image tuning
Audio tuning
Video tuning
Tune function calling
Open models
Embeddings models
Tune text embeddings models
Imagen models
Tune a subject model
Create a custom style model
Translation models
About supervised fine-tuning
Prepare your data
Use supervised fine-tuning
Tuning recommendations with LoRA and QLoRA
Migrate
Call Vertex AI models using OpenAI libraries
Overview
Authenticate
Examples
Evaluate
Overview
Tutorial: Perform evaluation using the console
Perform evaluation using the GenAI Client in Vertex AI SDK
Tutorial: Evaluate models using the GenAI Client in Vertex AI SDK
Define your evaluation metrics
Define your evaluation metrics
Details for managed rubric-based metrics
Prepare your evaluation dataset
Run an evaluation
View and interpret evaluation results
Alternative evaluation methods
Evaluate using the evaluation module in Vertex AI SDK
Tutorial: Perform evaluation using the evaluation module in Vertex AI SDK
Define your evaluation metrics
Prepare your evaluation dataset
Run an evaluation
Interpret evaluation results
Templates for model-based metrics
Evaluate agents
Evaluate a judge model
Configure a judge model
Run AutoSxS pipeline
Run a computation-based evaluation pipeline
Deploy
Overview
Optimize cost, latency, and performance
Deployment best practices
Cache reused prompt context
Overview
Create a context cache
Use a context cache
Get context cache information
Update a context cache
Delete a context cache
Context cache for fine-tuned Gemini models
Batch prediction
Overview
Create batch job from Cloud Storage
Create batch job from BigQuery
Provisioned Throughput
Provisioned Throughput overview
Supported models
Calculate Provisioned Throughput requirements
Provisioned Throughput for Live API
Single Zone Provisioned Throughput
Purchase Provisioned Throughput
Use Provisioned Throughput
Troubleshooting error code 429
Pay-as-you-go
Quotas and system limits
Dynamic shared quota
Administer
Access control
Networking
Security controls
Control access to Model Garden models
Enable Data Access audit logs
Monitor models
Monitor cost using custom metadata labels
Request-response logging
Secure a gen AI app by using IAP
Overview
Set up your project and source repository
Create a Cloud Run service
Create a load balancer
Configure IAP
Test your IAP-secured app
Clean up your project
Go to Vertex AI documentation
Vertex AI documentation
AI and ML
Application development
Application hosting
Compute
Data analytics and pipelines
Databases
Distributed, hybrid, and multicloud
Generative AI
Industry solutions
Networking
Observability and monitoring
Security
Storage
Access and resources management
Costs and usage management
Google Cloud SDK, languages, frameworks, and tools
Infrastructure as code
Migration
Google Cloud Home
Free Trial and Free Tier
Architecture Center
Blog
Contact Sales
Google Cloud Developer Center
Google Developer Center
Google Cloud Marketplace
Google Cloud Marketplace Documentation
Google Cloud Skills Boost
Google Cloud Solution Center