Skip to main content
Google Cloud Documentation
Technology areas
  • AI and ML
  • Application development
  • Application hosting
  • Compute
  • Data analytics and pipelines
  • Databases
  • Distributed, hybrid, and multicloud
  • Generative AI
  • Industry solutions
  • Networking
  • Observability and monitoring
  • Security
  • Storage
Cross-product tools
  • Access and resources management
  • Costs and usage management
  • Infrastructure as code
  • Migration
  • SDK, languages, frameworks, and tools
/
Console
  • English
  • Deutsch
  • Español
  • Español – América Latina
  • Français
  • Indonesia
  • Italiano
  • Português
  • Português – Brasil
  • 中文 – 简体
  • 中文 – 繁體
  • 日本語
  • 한국어
Sign in
  • Vertex AI
  • Generative AI on Vertex AI
Start free
Guides API reference Vertex AI Cookbook Prompt gallery Resources FAQ Pricing
Google Cloud Documentation
  • Technology areas
    • More
    • Guides
    • API reference
    • Vertex AI Cookbook
    • Prompt gallery
    • Resources
    • FAQ
    • Pricing
  • Cross-product tools
    • More
  • Console
  • Discover
    • Overview of Generative AI on Vertex AI
    • Generative AI beginner's guide
    • Glossary
  • Get started
    • Get an API key
    • Configure application default credentials
    • API quickstart
    • Vertex AI Studio quickstart
    • Migrate from Google AI Studio to Vertex AI
    • Deploy your Vertex AI Studio prompt as a web application
    • Vertex AI Studio capabilities
    • Get started with Gemini 3
    • Generate an image and verify its watermark using Imagen
    • Google GenAI libraries
    • Compatibility with OpenAI library
    • Vertex AI in express mode
    • Overview
    • Console tutorial
    • API tutorial
  • Select models
    • Model Garden
    • Overview of Model Garden
    • Use models in Model Garden
    • Test model capabilities
    • Supported models
    • Google Models
    • Overview
    • Gemini
      • Migrate to the latest Gemini models
      • Pro
      • Gemini 3 Pro
      • Gemini 3 Pro Image
      • Gemini 2.5 Pro
      • Flash
      • Gemini 2.5 Flash
      • Gemini 2.5 Flash Image
      • Gemini 2.5 Flash Live API
      • Gemini 2.0 Flash
      • Flash-Lite
      • Gemini 2.5 Flash-Lite
      • Gemini 2.0 Flash-Lite
      • Other Gemini models
      • Vertex AI Model Optimizer
    • Imagen
      • Imagen 3
      • Imagen 4
      • Imagen 4.0 upscale Preview
      • Virtual Try-On Preview 08-04
      • Imagen product recontext preview 06-30
    • Veo
      • Veo 2
      • Veo 3
      • Veo 3.1
    • Lyria
      • Lyria 2
    • Model versions
    • Managed models
    • Model as a Service (MaaS) overview
    • Partner models
      • Overview
      • Claude
        • Overview
        • Request predictions
        • Batch predictions
        • Prompt caching
        • Count tokens
        • Web search
        • Safety classifiers
        • Model details
        • Claude Opus 4.5
        • Claude Sonnet 4.5
        • Claude Opus 4.1
        • Claude Haiku 4.5
        • Claude Opus 4
        • Claude Sonnet 4
        • Claude 3.5 Haiku
        • Claude 3 Haiku
      • Mistral AI
        • Overview
        • Model details
        • Mistral Medium 3
        • Mistral OCR (25.05)
        • Mistral Small 3.1 (25.03)
        • Codestral 2
    • Open models
      • Overview
      • Use open models via Model as a Service (MaaS)
      • Grant access to open models
      • Models
      • DeepSeek
        • Overview
        • DeepSeek-V3.2
        • DeepSeek-V3.1
        • DeepSeek-R1-0528
        • DeepSeek-OCR
      • OpenAI
        • Overview
        • OpenAI gpt-oss-120b
        • OpenAI gpt-oss-20b
      • Qwen
        • Overview
        • Qwen 3 Next Instruct 80B
        • Qwen 3 Next Thinking 80B
        • Qwen 3 Coder
        • Qwen 3 235B
      • MiniMax
        • Overview
        • MiniMax M2
      • Kimi
        • Overview
        • Kimi K2 Thinking
      • Embedding (e5)
        • Multilingual E5 Small
        • Multilingual E5 Large
      • Llama
        • Overview
        • Request predictions
        • Model details
        • Llama 4 Maverick
        • Llama 4 Scout
        • Llama 3.3
        • Llama 3.2
        • Llama 3.1 405b
        • Llama 3.1 70b
        • Llama 3.1 8b
      • API
      • Call MaaS APIs for open models
      • Function calling
      • Thinking
      • Structured output
      • Batch prediction
    • Model deprecations (MaaS)
    • Self-deployed models
    • Overview
    • Choose an open model serving option
    • Deploy open models
      • Deploy open models from Model Garden
      • Deploy open models with prebuilt containers
      • Deploy open models with a custom vLLM container
      • Deploy models with custom weights
    • Deploy partner models from Model Garden
    • Google Gemma
      • Use Gemma
      • Tutorial: Deploy and inference Gemma (GPU)
      • Tutorial: Deploy and inference Gemma (TPU)
    • Llama
    • Use Hugging Face Models
    • Hex-LLM
    • Comprehensive guide to vLLM for Text and Multimodal LLM Serving (GPU)
    • vLLM TPU
    • xDiT
    • Tutorial: Deploy Llamma 3 models with SpotVM and Reservations
    • Model Garden notebooks
      • Tutorial: Optimize model performance with advanced features in Model Garden
  • Build
    • Agents
    • Vertex AI Agent Builder documentation
    • Prompt design
    • Introduction to prompting
    • Prompting strategies
      • Overview
      • Give clear and specific instructions
      • Use system instructions
      • Include few-shot examples
      • Add contextual information
      • Structure prompts
      • Compare prompts
      • Instruct the model to explain its reasoning
      • Break down complex tasks
      • Experiment with parameter values
      • Prompt iteration strategies
    • Task-specific prompt guidance
      • Design multimodal prompts
      • Design chat prompts
      • Design medical text prompts
    • Capabilities
    • Safety
      • Overview
      • Responsible AI
      • System instructions for safety
      • Configure content filters
      • Gemini for safety filtering and content moderation
      • Abuse monitoring
      • Process blocked responses