Model versions and lifecycle | Generative AI on Vertex AI

Skip to main content

Technology areas

AI and ML
Application development
Application hosting
Compute
Data analytics and pipelines
Databases
Distributed, hybrid, and multicloud
Generative AI
Industry solutions
Networking
Observability and monitoring
Security
Storage

Cross-product tools

Access and resources management
Costs and usage management
Infrastructure as code
Migration
SDK, languages, frameworks, and tools

/

Console

English
Deutsch
Español
Español – América Latina
Français
Indonesia
Italiano
Português
Português – Brasil
中文 – 简体
中文 – 繁體
日本語
한국어

Sign in

Vertex AI
Generative AI on Vertex AI

Start free

Guides API reference Vertex AI Cookbook Prompt gallery Resources FAQ Pricing

Technology areas
- More
Cross-product tools
- More
Console

Discover
Get started
Select models
- Model Garden
- Overview of Model Garden
- Use models in Model Garden
- Test model capabilities
- Supported models
- Google Models
- Overview
- Gemini
  - Migrate to the latest Gemini models
  - Pro
  - Gemini 3 Pro
  - Gemini 3 Pro Image
  - Gemini 2.5 Pro
  - Flash
  - Gemini 2.5 Flash
  - Gemini 2.5 Flash Image
  - Gemini 2.5 Flash Live API
  - Gemini 2.0 Flash
  - Flash-Lite
  - Gemini 2.5 Flash-Lite
  - Gemini 2.0 Flash-Lite
  - Other Gemini models
  - Vertex AI Model Optimizer
- Imagen
- Veo
- Lyria
  - Lyria 2
- Model versions
- Managed models
- Model as a Service (MaaS) overview
- Partner models
  - Overview
  - Claude
    Overview
    Request predictions
    Batch predictions
    Prompt caching
    Count tokens
    Web search
    Safety classifiers
    Model details
    Claude Opus 4.5
    Claude Sonnet 4.5
    Claude Opus 4.1
    Claude Haiku 4.5
    Claude Opus 4
    Claude Sonnet 4
    Claude 3.5 Haiku
    Claude 3 Haiku
  - Mistral AI
    Overview
    Model details
    Mistral Medium 3
    Mistral OCR (25.05)
    Mistral Small 3.1 (25.03)
    Codestral 2
- Open models
  - Overview
  - Use open models via Model as a Service (MaaS)
  - Grant access to open models
  - Models
  - DeepSeek
    Overview
    DeepSeek-V3.2
    DeepSeek-V3.1
    DeepSeek-R1-0528
    DeepSeek-OCR
  - OpenAI
    Overview
    OpenAI gpt-oss-120b
    OpenAI gpt-oss-20b
  - Qwen
    Overview
    Qwen 3 Next Instruct 80B
    Qwen 3 Next Thinking 80B
    Qwen 3 Coder
    Qwen 3 235B
  - MiniMax
    Overview
    MiniMax M2
  - Kimi
    Overview
    Kimi K2 Thinking
  - Embedding (e5)
    Multilingual E5 Small
    Multilingual E5 Large
  - Llama
    Overview
    Request predictions
    Model details
    Llama 4 Maverick
    Llama 4 Scout
    Llama 3.3
    Llama 3.2
    Llama 3.1 405b
    Llama 3.1 70b
    Llama 3.1 8b
  - API
  - Call MaaS APIs for open models
  - Function calling
  - Thinking
  - Structured output
  - Batch prediction
- Model deprecations (MaaS)
- Self-deployed models
- Overview
- Choose an open model serving option
- Deploy open models
- Deploy partner models from Model Garden
- Google Gemma
- Llama
- Use Hugging Face Models
- Hex-LLM
- Comprehensive guide to vLLM for Text and Multimodal LLM Serving (GPU)
- vLLM TPU
- xDiT
- Tutorial: Deploy Llamma 3 models with SpotVM and Reservations
- Model Garden notebooks
  - Tutorial: Optimize model performance with advanced features in Model Garden
Build
- Agents
- Vertex AI Agent Builder documentation
- Prompt design
- Introduction to prompting
- Prompting strategies
- Task-specific prompt guidance
- Capabilities
- Safety
- Text and code generation
- Image generation
- Video generation
- Music generation
  - Generate music using Lyria
  - Lyria prompt guide
- Media analysis
- Grounding
- URL context
- Thinking
  - Overview
  - Thought signatures
- Computer Use
- Live API
- Embeddings
  - Overview
  - Text embeddings
    Get text embeddings
    Choose an embeddings task type
  - Get multimodal embeddings
  - Get batch embeddings inferences
- Translation
- Generate speech from text
- Transcribe speech
- Development tools
- Use AI-powered prompt writing tools
  - Overview
  - Optimize prompts
    Overview
    Zero-shot optimizer
    Data-driven optimizer
  - Use prompt templates
- RAG Engine
  - RAG overview
  - RAG quickstart
  - RAG Engine billing
  - Understanding RagManagedDb
  - Data ingestion
  - Supported models
    Generative models
    Embedding models
  - Document parsing
    Supported documents
    Fine-tune RAG transformations
    Use Document AI layout parser
    Use the LLM parser
  - Vector database choices in RAG
    Overview of vector database choices
    Use RagManagedDb with RAG
    Use Vertex AI Vector Search with RAG
    Use Feature Store with RAG
    Use Weaviate with RAG
    Use Pinecone with RAG
  - Use Vertex AI Search with RAG
  - Reranking for RAG
  - Manage your RAG corpus
  - Use CMEK with RAG
  - RAG quotas
  - Use RAG in Gemini Live API
- Tokenizer
  - List and count tokens
  - Use the Count Tokens API
- Multimodal datasets
- Use Vertex AI Search
- Model tuning
- Introduction to tuning
- Tuning Gemini models
  - Supervised fine-tuning
    About supervised fine-tuning
    Prepare your data
    Use supervised fine-tuning
    
    Supported modalities
    Text tuning
    Document tuning
    Image tuning
    Audio tuning
    Video tuning
    Tune function calling
  - Preference tuning
    About preference tuning