Skip to main content
Google Cloud Documentation
Technology areas
  • AI and ML
  • Application development
  • Application hosting
  • Compute
  • Data analytics and pipelines
  • Databases
  • Distributed, hybrid, and multicloud
  • Generative AI
  • Industry solutions
  • Networking
  • Observability and monitoring
  • Security
  • Storage
Cross-product tools
  • Access and resources management
  • Costs and usage management
  • Infrastructure as code
  • Migration
  • SDK, languages, frameworks, and tools
/
Console
  • English
  • Deutsch
  • Español
  • Español – América Latina
  • Français
  • Indonesia
  • Italiano
  • Português
  • Português – Brasil
  • 中文 – 简体
  • 中文 – 繁體
  • 日本語
  • 한국어
Sign in
  • Vertex AI
  • Generative AI on Vertex AI
Start free
Guides API reference Vertex AI Cookbook Prompt gallery Resources FAQ Pricing
Google Cloud Documentation
  • Technology areas
    • More
    • Guides
    • API reference
    • Vertex AI Cookbook
    • Prompt gallery
    • Resources
    • FAQ
    • Pricing
  • Cross-product tools
    • More
  • Console
  • Discover
    • Overview of Generative AI on Vertex AI
    • Generative AI beginner's guide
    • Glossary
  • Get started
    • Get an API key
    • Configure application default credentials
    • API quickstart
    • Vertex AI Studio quickstart
    • Migrate from Google AI Studio to Vertex AI
    • Deploy your Vertex AI Studio prompt as a web application
    • Vertex AI Studio capabilities
    • Get started with Gemini 3
    • Generate an image and verify its watermark using Imagen
    • Google GenAI libraries
    • Compatibility with OpenAI library
    • Vertex AI in express mode
    • Overview
    • Console tutorial
    • API tutorial
  • Select models
    • Model Garden
    • Overview of Model Garden
    • Use models in Model Garden
    • Test model capabilities
    • Supported models
    • Google Models
    • Overview
    • Gemini
      • Migrate to the latest Gemini models
      • Pro
      • Gemini 3 Pro
      • Gemini 3 Pro Image
      • Gemini 2.5 Pro
      • Flash
      • Gemini 2.5 Flash
      • Gemini 2.5 Flash Image
      • Gemini 2.5 Flash Live API
      • Gemini 2.0 Flash
      • Flash-Lite
      • Gemini 2.5 Flash-Lite
      • Gemini 2.0 Flash-Lite
      • Other Gemini models
      • Vertex AI Model Optimizer
    • Imagen
      • Imagen 3
      • Imagen 4
      • Imagen 4.0 upscale Preview
      • Virtual Try-On Preview 08-04
      • Imagen product recontext preview 06-30
    • Veo
      • Veo 2
      • Veo 3
      • Veo 3.1
    • Lyria
      • Lyria 2
    • Model versions
    • Managed models
    • Model as a Service (MaaS) overview
    • Partner models
      • Overview
      • Claude
        • Overview
        • Request predictions
        • Batch predictions
        • Prompt caching
        • Count tokens
        • Web search
        • Safety classifiers
        • Model details
        • Claude Opus 4.5
        • Claude Sonnet 4.5
        • Claude Opus 4.1
        • Claude Haiku 4.5
        • Claude Opus 4
        • Claude Sonnet 4
        • Claude 3.5 Haiku
        • Claude 3 Haiku
      • Mistral AI
        • Overview
        • Model details
        • Mistral Medium 3
        • Mistral OCR (25.05)
        • Mistral Small 3.1 (25.03)
        • Codestral 2
    • Open models
      • Overview
      • Use open models via Model as a Service (MaaS)
      • Grant access to open models
      • Models
      • DeepSeek
        • Overview
        • DeepSeek-V3.2
        • DeepSeek-V3.1
        • DeepSeek-R1-0528
        • DeepSeek-OCR
      • OpenAI
        • Overview
        • OpenAI gpt-oss-120b
        • OpenAI gpt-oss-20b
      • Qwen
        • Overview
        • Qwen 3 Next Instruct 80B
        • Qwen 3 Next Thinking 80B
        • Qwen 3 Coder
        • Qwen 3 235B
      • MiniMax
        • Overview
        • MiniMax M2
      • Kimi
        • Overview
        • Kimi K2 Thinking
      • Embedding (e5)
        • Multilingual E5 Small
        • Multilingual E5 Large
      • Llama
        • Overview
        • Request predictions
        • Model details
        • Llama 4 Maverick
        • Llama 4 Scout
        • Llama 3.3
        • Llama 3.2
        • Llama 3.1 405b
        • Llama 3.1 70b
        • Llama 3.1 8b
      • API
      • Call MaaS APIs for open models
      • Function calling
      • Thinking
      • Structured output
      • Batch prediction
    • Model deprecations (MaaS)
    • Self-deployed models
    • Overview
    • Choose an open model serving option
    • Deploy open models
      • Deploy open models from Model Garden
      • Deploy open models with prebuilt containers
      • Deploy open models with a custom vLLM container
      • Deploy models with custom weights
    • Deploy partner models from Model Garden
    • Google Gemma
      • Use Gemma
      • Tutorial: Deploy and inference Gemma (GPU)
      • Tutorial: Deploy and inference Gemma (TPU)
    • Llama
    • Use Hugging Face Models
    • Hex-LLM
    • Comprehensive guide to vLLM for Text and Multimodal LLM Serving (GPU)
    • vLLM TPU
    • xDiT
    • Tutorial: Deploy Llamma 3 models with SpotVM and Reservations
    • Model Garden notebooks
      • Tutorial: Optimize model performance with advanced features in Model Garden
  • Build
    • Agents
    • Vertex AI Agent Builder documentation
    • Prompt design
    • Introduction to prompting
    • Prompting strategies
      • Overview
      • Give clear and specific instructions
      • Use system instructions
      • Include few-shot examples
      • Add contextual information
      • Structure prompts
      • Compare prompts
      • Instruct the model to explain its reasoning
      • Break down complex tasks
      • Experiment with parameter values
      • Prompt iteration strategies
    • Task-specific prompt guidance
      • Design multimodal prompts
      • Design chat prompts
      • Design medical text prompts
    • Capabilities
    • Safety
      • Overview
      • Responsible AI
      • System instructions for safety
      • Configure content filters
      • Gemini for safety filtering and content moderation
      • Abuse monitoring
      • Process blocked responses
      • Content Credentials
    • Text and code generation
      • Text generation
      • System instructions
      • Function calling
      • Structured output
      • Content generation parameters
      • Code execution
      • Medical text
    • Image generation
      • Overview
      • Generate and edit images with Gemini
      • Generate images using text prompts with Imagen
      • Edit images with Imagen
      • Verify an image watermark
      • Configure Imagen parameters
        • Configure Responsible AI safety settings
        • Use prompt rewriter
        • Set text prompt language
        • Configure aspect ratio
        • Set output resolution
        • Omit content using a negative prompt
        • Generate deterministic images
      • Generate images for retail and e-commerce
        • Generate Virtual Try-On images
        • Recontextualize product images
      • Edit images
        • Overview
        • Insert objects into an image using inpaint
        • Remove objects from an image using inpaint
        • Expand the content of an image using outpaint
        • Replace the background of an image
      • Customize images
        • Subject customization
        • Style customization
        • Controlled Customization
        • Instruct Customization
      • Upscale images
      • Prompt and image attribute guide
      • Base64 encode and decode files
      • Responsible AI and usage guidelines for Imagen
    • Video generation
      • Introduction to Veo
      • Generate Veo videos from text prompts
      • Generate Veo videos from an image
      • Generate Veo videos using first and last video frames
      • Extend Veo videos
      • Direct Veo video generation using a reference image
      • Insert objects into Veo videos
      • Remove objects from Veo videos
      • Veo prompt guide
      • Veo best practices
      • Turn off Veo's prompt rewriter
      • Responsible AI for Veo
    • Music generation
      • Generate music using Lyria
      • Lyria prompt guide
    • Media analysis
      • Image understanding
      • Video understanding
      • Audio understanding
      • Document understanding
      • Bounding box detection
    • Grounding
      • Overview
      • Grounding with Google Search
      • Grounding with Google Maps
      • Grounding with Vertex AI Search
      • Grounding with your search API
      • Grounding responses using RAG
      • Grounding with Elasticsearch
      • Web Grounding for Enterprise
    • URL context
    • Thinking
      • Overview
      • Thought signatures
    • Computer Use
    • Live API
      • Overview
      • Get started
        • Get started using the Gen AI SDK
        • Get started using WebSockets
        • Get started using ADK
      • Start and manage live sessions
      • Send audio and video streams
      • Configure language and voice
      • Configure Gemini capabilities
      • Speech-to-speech translation
      • Best practices with Live API
      • Demo apps and resources
    • Embeddings
      • Overview
      • Text embeddings
        • Get text embeddings
        • Choose an embeddings task type
      • Get multimodal embeddings
      • Get batch embeddings inferences
    • Translation
    • Generate speech from text
    • Transcribe speech
    • Development tools
    • Use AI-powered prompt writing tools
      • Overview
      • Optimize prompts
        • Overview
        • Zero-shot optimizer
        • Data-driven optimizer
      • Use prompt templates
    • RAG Engine
      • RAG overview
      • RAG quickstart
      • RAG Engine billing
      • Understanding RagManagedDb
      • Data ingestion
      • Supported models
        • Generative models
        • Embedding models
      • Document parsing
        • Supported documents
        • Fine-tune RAG transformations
        • Use Document AI layout parser
        • Use the LLM parser
      • Vector database choices in RAG
        • Overview of vector database choices
        • Use RagManagedDb with RAG
        • Use Vertex AI Vector Search with RAG
        • Use Feature Store with RAG
        • Use Weaviate with RAG
        • Use Pinecone with RAG
      • Use Vertex AI Search with RAG
      • Reranking for RAG
      • Manage your RAG corpus
      • Use CMEK with RAG
      • RAG quotas
      • Use RAG in Gemini Live API
    • Tokenizer
      • List and count tokens
      • Use the Count Tokens API
    • Multimodal datasets
    • Use Vertex AI Search
    • Model tuning
    • Introduction to tuning
    • Tuning Gemini models
      • Supervised fine-tuning
        • About supervised fine-tuning
        • Prepare your data
        • Use supervised fine-tuning
        • Supported modalities
          • Text tuning
          • Document tuning
          • Image tuning
          • Audio tuning
          • Video tuning
          • Tune function calling
      • Preference tuning
        • About preference tuning