Skip to main content
Google Cloud
Documentation Technology areas
  • AI and ML
  • Application development
  • Application hosting
  • Compute
  • Data analytics and pipelines
  • Databases
  • Distributed, hybrid, and multicloud
  • Generative AI
  • Industry solutions
  • Networking
  • Observability and monitoring
  • Security
  • Storage
Cross-product tools
  • Access and resources management
  • Costs and usage management
  • Google Cloud SDK, languages, frameworks, and tools
  • Infrastructure as code
  • Migration
Related sites
  • Google Cloud Home
  • Free Trial and Free Tier
  • Architecture Center
  • Blog
  • Contact Sales
  • Google Cloud Developer Center
  • Google Developer Center
  • Google Cloud Marketplace
  • Google Cloud Marketplace Documentation
  • Google Cloud Skills Boost
  • Google Cloud Solution Center
  • Google Cloud Support
  • Google Cloud Tech Youtube Channel
/
  • English
  • Deutsch
  • Español – América Latina
  • Français
  • Indonesia
  • Italiano
  • Português – Brasil
  • 中文 – 简体
  • 中文 – 繁體
  • 日本語
  • 한국어
Console Sign in
  • Google Kubernetes Engine (GKE)
Overview Guides Reference Samples Resources
Contact Us Start free
Google Cloud
  • Documentation
    • Overview
    • Guides
    • Reference
    • Samples
    • Resources
  • Technology areas
    • More
  • Cross-product tools
    • More
  • Related sites
    • More
  • Console
  • Contact Us
  • Start free
  • Discover
  • Introducing GKE
  • Explore GKE documentation
  • Use GKE or Cloud Run?
  • Try it
    • Create a cluster in the console
    • Create a cluster with Terraform
    • Explore your cluster
  • Fine-tune GKE services with Gemini assistance
  • Learn fundamentals
  • Start learning about GKE
  • Learn Kubernetes fundamentals
    • Start learning about Kubernetes
    • Introducing containers
    • Kubernetes comic
    • Kubernetes.io
    • Video playlist: Learn Kubernetes with Google
  • Learn GKE essentials
    • GKE modes of operation
    • Video playlist: GKE Essentials
  • Common GKE user roles and tasks
  • Get started
  • Cluster lifecycle
  • Cluster administration overview
  • Cluster configuration
  • Deploying workloads
  • GKE cluster architecture
  • Workflows and tools
    • gcloud CLI overview
    • GKE in the Google Cloud console
    • Provision GKE resources with Terraform
    • Install kubectl and configure cluster access
    • Simplify deployment using your IDE
  • Learning path: Containerize your app
    • Overview
    • Understand the monolith
    • Modularize the monolith
    • Prepare for containerization
    • Containerize the modular app
    • Deploy the app to a cluster
  • Learning path: Scalable apps
    • Overview
    • Create a cluster
    • Monitor with Prometheus
    • Scale workloads
    • Simulate failure
    • Centralize changes
    • Production considerations
  • Design and plan
  • Code samples
  • Architectures and best practices
    • Develop and deliver apps with Cloud Code, Cloud Build, and Google Cloud Deploy
    • Address continuous delivery challenges
  • Set up GKE clusters
  • Plan clusters for running your workloads
    • Compare features in GKE Autopilot and Standard
    • About regional clusters
    • About feature gates
    • About alpha clusters
  • Set up Autopilot clusters
    • About GKE Autopilot
    • Create Autopilot clusters
    • Extend the run time of Autopilot Pods
  • Set up Standard clusters
    • Create a zonal cluster
    • Create a regional cluster
    • Create an alpha cluster
    • Create a cluster using Windows node pools
  • Prepare to use clusters
    • Use labels to organize clusters
    • Manage GKE resources using Tags
  • Configure node pools
    • About node pools
    • Add and manage node pools
    • About node images
    • About Containerd images
    • Specify a node image
    • About Arm workloads on GKE
    • Create Standard clusters and node pools with Arm nodes
    • Plan GKE Standard node sizes
    • About Spot VMs
    • About Windows Server containers
    • Auto-repair nodes
    • Automatically bootstrap GKE nodes with DaemonSets
    • Update Kubernetes node labels and taints for node pools
  • Set up clusters for multi-tenancy
    • About cluster multi-tenancy
    • Plan a multi-tenant environment
    • Prepare GKE clusters for third-party tenants
    • Set up multi-tenant logging
  • Use fleets to simplify multi-cluster management
    • About fleets
    • Create fleets
  • Set up service mesh
    • Provision Cloud Service Mesh in an Autopilot cluster
  • Enhance scalability for clusters
    • About GKE scalability
    • Plan for scalability
    • Plan for large GKE clusters
    • Plan for large workloads
    • Provision extra compute capacity for rapid Pod scaling
    • Consume reserved zonal resources
    • About quicker workload startup with fast-starting nodes
  • Reduce and optimize costs
  • Plan for cost-optimization
  • View GKE costs
    • View cluster costs breakdown
    • View cost-related optimization metrics
  • Optimize GKE costs
    • Right-size your GKE workloads at scale
    • Reduce costs by scaling down GKE clusters during off-peak hours
    • Identify underprovisioned and overprovisioned GKE clusters
    • Identify idle GKE clusters
  • Configure autoscaling for infrastructure
    • About cluster autoscaling
    • Configure cluster autoscaling
    • About node auto-provisioning
    • Configure node auto-provisioning
    • View cluster autoscaling events
  • Configure autoscaling for workloads
    • Scaling deployed applications
    • About autoscaling workloads based on metrics
    • Optimize Pod autoscaling based on metrics
    • About horizontal Pod autoscaling
    • Autoscale deployments using horizontal Pod autoscaling
    • Configure autoscaling for LLM workloads on GPUs
    • Configure autoscaling for LLM workloads on TPUs
    • View horizontal Pod autoscaler events
    • Scale to zero using KEDA
    • About vertical Pod autoscaling
    • Configure multidimensional Pod autoscaling
    • Scale container resource requests and limits
  • Provision storage
  • About storage for GKE clusters
  • Use Kubernetes features, primitives, and abstractions for storage
    • Use persistent volumes and dynamic provisioning
    • Use StatefulSets
    • About volume snapshots
    • Use volume expansion
    • Populate volumes with data from Cloud Storage
      • About the GKE Volume Populator
      • Automate data transfer to Parallelstore
      • Automate data transfer to Hyperdisk ML
  • Block storage
    • Provision and use Persistent Disks
      • Using the Compute Engine Persistent Disk CSI driver
      • Persistent volume attach limits
      • Using pre-existing persistent disks
      • Manually install a CSI driver
      • Using persistent disks with multiple readers (ReadOnlyMany)
      • Persistent disks backed by SSD
      • Regional persistent disks
      • Increase stateful app availability with Stateful HA Operator
    • Provision and use Hyperdisk
      • About Hyperdisk
      • Scale your storage performance using Hyperdisk
      • Optimize storage performance and cost with Hyperdisk Storage Pools
      • Accelerate AI/ML data loading using Hyperdisk ML
    • Provision and use GKE Data Cache
      • Accelerate read performance of stateful workloads with GKE Data Cache
    • Manage your persistent storage
      • Configure a boot disk for node file systems
      • Clone persistent disks
      • Back up and restore Persistent Disk storage using volume snapshots
    • Optimize disk performance
      • About optimizing disk performance
      • Monitor disk performance
  • Local SSD and ephemeral storage
    • About Local SSD storage for GKE
    • Provision Local SSD-backed ephemeral storage
    • Provision Local SSD-backed raw block storage
    • Create a Deployment using an EmptyDir Volume
    • Use dedicated Persistent Disks as ephemeral volumes
  • File storage
    • Provision and use Filestore
      • About Filestore support for GKE
      • Access Filestore instances
      • Deploy a stateful workload with Filestore
      • About Filestore multishares for GKE
      • Optimize multishares for GKE
      • Back up and restore Filestore storage using volume snapshots
    • Provision and use Parallelstore volumes
      • About Parallelstore for GKE
      • Create and use a volume backed by Parallelstore
      • Access existing Parallelstore instances
    • Provision and use Lustre volumes
      • About Lustre for GKE
      • Create and use a volume backed by Lustre
      • Access existing Lustre instances
  • Object storage
    • Quickstart: Cloud Storage FUSE CSI driver for GKE
    • About the Cloud Storage FUSE CSI driver for GKE
    • Set up the Cloud Storage FUSE CSI driver
    • Mount Cloud Storage buckets as ephemeral volumes
    • Mount Cloud Storage buckets as persistent volumes
    • Configure the Cloud Storage FUSE CSI driver sidecar container
    • Optimize Cloud Storage FUSE CSI driver performance
  • Deploy and manage workloads
  • Deploy Autopilot workloads
    • Plan resource requests for Autopilot workloads
    • About Autopilot workloads in GKE Standard
    • Run Autopilot workloads in Standard clusters
  • Configure node attributes with ComputeClasses
    • About GKE ComputeClasses
    • About built-in ComputeClasses in GKE
    • About custom ComputeClasses
    • Control autoscaled node attributes with custom compute classes
    • Apply compute classes to Pods by default
    • About Balanced and Scale-Out compute classes in Autopilot clusters
    • Choose predefined compute classes for Autopilot Pods
  • Deploy workloads on optimized hardware
    • Minimum CPU platforms for compute-intensive workloads
    • Configure Pod bursting in GKE
    • Analyze CPU performance using the PMU
  • Deploy workloads that have special security requirements
    • GKE Autopilot partners
    • Run privileged workloads from GKE Autopilot partners
    • Run privileged open source workloads on GKE Autopilot
  • Deploy workloads that require specialized devices
    • About dynamic resource allocation (DRA) in GKE
    • Prepare your GKE infrastructure for DRA
    • Deploy DRA workloads
  • Migrate workloads
    • Identify Standard clusters to migrate to Autopilot
    • Prepare to migrate to Autopilot clusters from Standard clusters
  • Manage workloads
    • Place GKE Pods in specific zones
    • Simulate zone failure
    • Improve workload efficiency using NCCL Fast Socket
    • About container image digests
    • Using container image digests in Kubernetes manifests
    • Improve workload initialization speed
      • Use streaming container images
      • Use secondary boot disks to preload data or container images
    • Isolate your workloads using namespaces
  • Continuous integration and delivery
    • Plan for continuous integration and delivery
    • Create a CI/CD pipeline with Azure Pipelines
    • GitOps-style continuous delivery with Cloud Build
    • Modern CI/CD with GKE
      • A software delivery framework
      • Build a CI/CD system
      • Apply the developer workflow
  • Deploy databases, caches, and data streaming workloads
  • Data on GKE
  • Plan your database deployments on GKE
  • Managed databases
    • Deploy an app using GKE Autopilot and Spanner
    • Deploy WordPress on GKE with Persistent Disk and Cloud SQL
    • Analyze data on GKE using BigQuery, Cloud Run, and Gemma
  • Kafka
    • Deploy Apache Kafka to GKE using Strimzi
    • Deploy Apache Kafka to GKE using Confluent
    • Deploy a highly-available Kafka cluster on GKE
  • Redis
    • Create a multi-tier web application with Redis and PHP
    • Deploy a Redis cluster on GKE
    • Deploy Redis to GKE using Redis Enterprise
  • MySQL
    • Deploy a stateful MySQL cluster
    • Migrate your MySQL data from Persistent Disk to Hyperdisk
  • PostgreSQL
    • Deploy a highly-available PostgreSQL database
    • Deploy PostgreSQL to GKE using Zalando
    • Deploy PostgreSQL to GKE using CloudNativePG
  • SQL Server
    • Deploy single instance SQL Server 2017 on GKE
  • Memcached
    • Deploy Memcached on GKE
  • Vector databases
    • Build a RAG chatbot using GKE and Cloud Storage
    • Deploy a Qdrant database on GKE
    • Deploy an Elasticsearch database on GKE
    • Deploy a PostgreSQL vector database on GKE
    • Deploy a Weaviate vector database on GKE
  • Deploy AI/ML workloads
  • AI/ML orchestration on GKE
  • Run ML and AI workloads
    • About accelerator consumption options for AI workloads
    • GPUs
      • About GPUs in GKE
      • Deploy GPU workloads in GKE Autopilot
      • Deploy GPU workloads in GKE Standard
      • Encrypt GPU workload data in-use
      • Manage the GPU Stack with the NVIDIA GPU Operator
      • GPU Sharing
        • About GPU sharing strategies in GKE
        • Use multi-instance GPU
        • Use GPU time-sharing
        • Use NVIDIA MPS
      • Best practices for autoscaling LLM inference workloads on GPUs
      • Best practices for optimizing LLM inference performance on GPUs
    • TPUs in GKE
      • About TPUs in GKE
      • Plan TPUs in GKE
      • Request TPUs
        • About accelerator consumption options for AI workloads
        • Request TPUs with future reservations in calendar mode
        • Run a small batch workload with flex-start provisioning mode for TPUs
      • Deploy TPU workloads in GKE Autopilot
      • Deploy TPU workloads in GKE Standard
      • Deploy TPU Multislices in GKE
      • Orchestrate TPU Multislice workloads using JobSet and Kueue
      • Best practices for autoscaling LLM inference workloads on TPUs
    • Manage GKE node disruption for GPUs and TPUs
    • CPU-based workloads
      • Optimize Autopilot Pod performance by choosing a machine series
    • Optimize GPU and TPU provisioning
      • About GPU and TPU provisioning with flex-start
      • Run a large-scale workload with flex-start with queued provisioning
      • Run a small batch workload with GPUs and flex-start provisioning mode
      • Run a small batch workload with TPUs and flex-start provisioning mode
  • Training
    • Train a model with GPUs on GKE Standard mode
    • Train a model with GPUs on GKE Autopilot mode
    • Train Llama2 with Megatron-LM on A3 Mega VMs
    • Train large-scale ML models with Multi-Tier Checkpointing
  • Inference
    • About AI/ML model inference on GKE
    • Analyze model inference performance and costs with Inference Quickstart
    • Choose a load balancing strategy for AI/ML model inference on GKE
    • Try inference examples on GPUs
      • Serve a model with a single GPU
      • Serve an LLM with multiple GPUs
      • Serve LLMs like Deepseek-R1 671B or Llama 3.1 405B
      • Serve an LLM on L4 GPUs with Ray
      • Serve scalable LLMs using TorchServe
      • Serve Gemma on GPUs with Hugging Face TGI
      • Serve Gemma on GPUs with vLLM
      • Serve Llama models using GPUs on GKE with vLLM
      • Serve Gemma on GPUs with TensorRT-LLM
      • Serve an LLM with GKE Inference Gateway
      • Fine-tune Gemma open models using multiple GPUs
      • Serve LLMs with a cost-optimized and high-availability GPU provisioning strategy
      • Serve open LLMs on GKE with a pre-configured architecture
    • Try inference examples on TPUs
      • Serve open source models using TPUs with Optimum TPU
      • Serve Gemma on TPUs with JetStream
      • Serve an LLM on TPUs with JetStream and PyTorch
      • Serve an LLM on multi-host TPUs with JetStream and Pathways
      • Serve an LLM on TPUs with vLLM
      • Serve an LLM using TPUs with KubeRay
      • Serve SDXL using TPUs on GKE with MaxDiffusion
      • Perform multihost inference using Pathways
  • Batch
    • Best practices for running batch workloads on GKE
    • Deploy a batch system using Kueue
    • Obtain GPUs with Dynamic Workload Scheduler
      • About GPU obtainability with flex-start
      • Run a large-scale workload with flex-start with queued provisioning
      • Run a small batch workload with flex-start provisioning mode
    • Implement a Job queuing system with quota sharing between namespaces
    • Optimize resource utilization for mixed training and inference workloads using Kueue
  • Agentic AI
    • Deploy an agentic AI application on GKE with the ADK and Vertex AI
  • Use Ray on GKE
  • Deploy workloads by application type
  • Web servers and applications
    • Plan for serving websites
    • Deploy a stateful app
    • Ensure workloads are disruption-ready
    • Deploy a stateless app
    • Allow direct connections to Autopilot Pods using hostPort
    • Run Django
    • Deploy an application from Cloud Marketplace
    • Run full-stack workloads at scale on GKE
    • Deploy a containerized web server app
  • Gaming
    • Get support for Agones issues
    • Isolate the Agones controller in your GKE cluster
  • Deploy Arm workloads
    • Prepare an Arm workload for deployment to Standard clusters
    • Build multi-arch images for Arm workloads
    • Deploy Autopilot workloads on Arm architecture
    • Migrate x86 application on GKE to multi-arch with Arm
  • Microsoft Windows
    • Deploy a Windows Server application
    • Build Windows Server multi-arch images