Skip to main content
Documentation
Technology areas
close
AI and ML
Application development
Application hosting
Compute
Data analytics and pipelines
Databases
Distributed, hybrid, and multicloud
Generative AI
Industry solutions
Networking
Observability and monitoring
Security
Storage
Cross-product tools
close
Access and resources management
Costs and usage management
Google Cloud SDK, languages, frameworks, and tools
Infrastructure as code
Migration
Related sites
close
Google Cloud Home
Free Trial and Free Tier
Architecture Center
Blog
Contact Sales
Google Cloud Developer Center
Google Developer Center
Google Cloud Marketplace
Google Cloud Marketplace Documentation
Google Cloud Skills Boost
Google Cloud Solution Center
Google Cloud Support
Google Cloud Tech Youtube Channel
/
English
Deutsch
Español – América Latina
Français
Indonesia
Italiano
Português – Brasil
中文 – 简体
中文 – 繁體
日本語
한국어
Console
Sign in
Google Kubernetes Engine (GKE)
Overview
Guides
Reference
Samples
Resources
Contact Us
Start free
Documentation
Overview
Guides
Reference
Samples
Resources
Technology areas
More
Cross-product tools
More
Related sites
More
Console
Contact Us
Start free
Discover
Introducing GKE
Explore GKE documentation
Use GKE or Cloud Run?
Try it
Create a cluster in the console
Create a cluster with Terraform
Explore your cluster
Fine-tune GKE services with Gemini assistance
Learn fundamentals
Start learning about GKE
Learn Kubernetes fundamentals
Start learning about Kubernetes
Introducing containers
Kubernetes comic
Kubernetes.io
Video playlist: Learn Kubernetes with Google
Learn GKE essentials
GKE modes of operation
Video playlist: GKE Essentials
Common GKE user roles and tasks
Get started
Cluster lifecycle
Cluster administration overview
Cluster configuration
Deploying workloads
GKE cluster architecture
Workflows and tools
gcloud CLI overview
GKE in the Google Cloud console
Provision GKE resources with Terraform
Install kubectl and configure cluster access
Simplify deployment using your IDE
Learning path: Containerize your app
Overview
Understand the monolith
Modularize the monolith
Prepare for containerization
Containerize the modular app
Deploy the app to a cluster
Learning path: Scalable apps
Overview
Create a cluster
Monitor with Prometheus
Scale workloads
Simulate failure
Centralize changes
Production considerations
Design and plan
Code samples
Architectures and best practices
Develop and deliver apps with Cloud Code, Cloud Build, and Google Cloud Deploy
Address continuous delivery challenges
Set up GKE clusters
Plan clusters for running your workloads
Compare features in GKE Autopilot and Standard
About regional clusters
About feature gates
About alpha clusters
Set up Autopilot clusters
About GKE Autopilot
Create Autopilot clusters
Extend the run time of Autopilot Pods
Set up Standard clusters
Create a zonal cluster
Create a regional cluster
Create an alpha cluster
Create a cluster using Windows node pools
Prepare to use clusters
Use labels to organize clusters
Manage GKE resources using Tags
Configure node pools
About node pools
Add and manage node pools
About node images
About Containerd images
Specify a node image
About Arm workloads on GKE
Create Standard clusters and node pools with Arm nodes
Plan GKE Standard node sizes
About Spot VMs
About Windows Server containers
Auto-repair nodes
Automatically bootstrap GKE nodes with DaemonSets
Update Kubernetes node labels and taints for node pools
Set up clusters for multi-tenancy
About cluster multi-tenancy
Plan a multi-tenant environment
Prepare GKE clusters for third-party tenants
Set up multi-tenant logging
Use fleets to simplify multi-cluster management
About fleets
Create fleets
Set up service mesh
Provision Cloud Service Mesh in an Autopilot cluster
Enhance scalability for clusters
About GKE scalability
Plan for scalability
Plan for large GKE clusters
Plan for large workloads
Provision extra compute capacity for rapid Pod scaling
Consume reserved zonal resources
About quicker workload startup with fast-starting nodes
Reduce and optimize costs
Plan for cost-optimization
View GKE costs
View cluster costs breakdown
View cost-related optimization metrics
Optimize GKE costs
Right-size your GKE workloads at scale
Reduce costs by scaling down GKE clusters during off-peak hours
Identify underprovisioned and overprovisioned GKE clusters
Identify idle GKE clusters
Configure autoscaling for infrastructure
About cluster autoscaling
Configure cluster autoscaling
About node auto-provisioning
Configure node auto-provisioning
View cluster autoscaling events
Configure autoscaling for workloads
Scaling deployed applications
About autoscaling workloads based on metrics
Optimize Pod autoscaling based on metrics
About horizontal Pod autoscaling
Autoscale deployments using horizontal Pod autoscaling
Configure autoscaling for LLM workloads on GPUs
Configure autoscaling for LLM workloads on TPUs
View horizontal Pod autoscaler events
Scale to zero using KEDA
About vertical Pod autoscaling
Configure multidimensional Pod autoscaling
Scale container resource requests and limits
Provision storage
About storage for GKE clusters
Use Kubernetes features, primitives, and abstractions for storage
Use persistent volumes and dynamic provisioning
Use StatefulSets
About volume snapshots
Use volume expansion
Populate volumes with data from Cloud Storage
About the GKE Volume Populator
Automate data transfer to Parallelstore
Automate data transfer to Hyperdisk ML
Block storage
Provision and use Persistent Disks
Using the Compute Engine Persistent Disk CSI driver
Persistent volume attach limits
Using pre-existing persistent disks
Manually install a CSI driver
Using persistent disks with multiple readers (ReadOnlyMany)
Persistent disks backed by SSD
Regional persistent disks
Increase stateful app availability with Stateful HA Operator
Provision and use Hyperdisk
About Hyperdisk
Scale your storage performance using Hyperdisk
Optimize storage performance and cost with Hyperdisk Storage Pools
Accelerate AI/ML data loading using Hyperdisk ML
Provision and use GKE Data Cache
Accelerate read performance of stateful workloads with GKE Data Cache
Manage your persistent storage
Configure a boot disk for node file systems
Clone persistent disks
Back up and restore Persistent Disk storage using volume snapshots
Optimize disk performance
About optimizing disk performance
Monitor disk performance
Local SSD and ephemeral storage
About Local SSD storage for GKE
Provision Local SSD-backed ephemeral storage
Provision Local SSD-backed raw block storage
Create a Deployment using an EmptyDir Volume
Use dedicated Persistent Disks as ephemeral volumes
File storage
Provision and use Filestore
About Filestore support for GKE
Access Filestore instances
Deploy a stateful workload with Filestore
About Filestore multishares for GKE
Optimize multishares for GKE
Back up and restore Filestore storage using volume snapshots
Provision and use Parallelstore volumes
About Parallelstore for GKE
Create and use a volume backed by Parallelstore
Access existing Parallelstore instances
Provision and use Lustre volumes
About Lustre for GKE
Create and use a volume backed by Lustre
Access existing Lustre instances
Object storage
Quickstart: Cloud Storage FUSE CSI driver for GKE
About the Cloud Storage FUSE CSI driver for GKE
Set up the Cloud Storage FUSE CSI driver
Mount Cloud Storage buckets as ephemeral volumes
Mount Cloud Storage buckets as persistent volumes
Configure the Cloud Storage FUSE CSI driver sidecar container
Optimize Cloud Storage FUSE CSI driver performance
Deploy and manage workloads
Deploy Autopilot workloads
Plan resource requests for Autopilot workloads
About Autopilot workloads in GKE Standard
Run Autopilot workloads in Standard clusters
Configure node attributes with ComputeClasses
About GKE ComputeClasses
About built-in ComputeClasses in GKE
About custom ComputeClasses
Control autoscaled node attributes with custom compute classes
Apply compute classes to Pods by default
About Balanced and Scale-Out compute classes in Autopilot clusters
Choose predefined compute classes for Autopilot Pods
Deploy workloads on optimized hardware
Minimum CPU platforms for compute-intensive workloads
Configure Pod bursting in GKE
Analyze CPU performance using the PMU
Deploy workloads that have special security requirements
GKE Autopilot partners
Run privileged workloads from GKE Autopilot partners
Run privileged open source workloads on GKE Autopilot
Deploy workloads that require specialized devices
About dynamic resource allocation (DRA) in GKE
Prepare your GKE infrastructure for DRA
Deploy DRA workloads
Migrate workloads
Identify Standard clusters to migrate to Autopilot
Prepare to migrate to Autopilot clusters from Standard clusters
Manage workloads
Place GKE Pods in specific zones
Simulate zone failure
Improve workload efficiency using NCCL Fast Socket
About container image digests
Using container image digests in Kubernetes manifests
Improve workload initialization speed
Use streaming container images
Use secondary boot disks to preload data or container images
Isolate your workloads using namespaces
Continuous integration and delivery
Plan for continuous integration and delivery
Create a CI/CD pipeline with Azure Pipelines
GitOps-style continuous delivery with Cloud Build
Modern CI/CD with GKE
A software delivery framework
Build a CI/CD system
Apply the developer workflow
Deploy databases, caches, and data streaming workloads
Data on GKE
Plan your database deployments on GKE
Managed databases
Deploy an app using GKE Autopilot and Spanner
Deploy WordPress on GKE with Persistent Disk and Cloud SQL
Analyze data on GKE using BigQuery, Cloud Run, and Gemma
Kafka
Deploy Apache Kafka to GKE using Strimzi
Deploy Apache Kafka to GKE using Confluent
Deploy a highly-available Kafka cluster on GKE
Redis
Create a multi-tier web application with Redis and PHP
Deploy a Redis cluster on GKE
Deploy Redis to GKE using Redis Enterprise
MySQL
Deploy a stateful MySQL cluster
Migrate your MySQL data from Persistent Disk to Hyperdisk
PostgreSQL
Deploy a highly-available PostgreSQL database
Deploy PostgreSQL to GKE using Zalando
Deploy PostgreSQL to GKE using CloudNativePG
SQL Server
Deploy single instance SQL Server 2017 on GKE
Memcached
Deploy Memcached on GKE
Vector databases
Build a RAG chatbot using GKE and Cloud Storage
Deploy a Qdrant database on GKE
Deploy an Elasticsearch database on GKE
Deploy a PostgreSQL vector database on GKE
Deploy a Weaviate vector database on GKE
Deploy AI/ML workloads
AI/ML orchestration on GKE
Run ML and AI workloads
About accelerator consumption options for AI workloads
GPUs
About GPUs in GKE
Deploy GPU workloads in GKE Autopilot
Deploy GPU workloads in GKE Standard
Encrypt GPU workload data in-use
Manage the GPU Stack with the NVIDIA GPU Operator
GPU Sharing
About GPU sharing strategies in GKE
Use multi-instance GPU
Use GPU time-sharing
Use NVIDIA MPS
Best practices for autoscaling LLM inference workloads on GPUs
Best practices for optimizing LLM inference performance on GPUs
TPUs in GKE
About TPUs in GKE
Plan TPUs in GKE
Request TPUs
About accelerator consumption options for AI workloads
Request TPUs with future reservations in calendar mode
Run a small batch workload with flex-start provisioning mode for TPUs
Deploy TPU workloads in GKE Autopilot
Deploy TPU workloads in GKE Standard
Deploy TPU Multislices in GKE
Orchestrate TPU Multislice workloads using JobSet and Kueue
Best practices for autoscaling LLM inference workloads on TPUs
Manage GKE node disruption for GPUs and TPUs
CPU-based workloads
Optimize Autopilot Pod performance by choosing a machine series
Optimize GPU and TPU provisioning
About GPU and TPU provisioning with flex-start
Run a large-scale workload with flex-start with queued provisioning
Run a small batch workload with GPUs and flex-start provisioning mode
Run a small batch workload with TPUs and flex-start provisioning mode
Training
Train a model with GPUs on GKE Standard mode
Train a model with GPUs on GKE Autopilot mode
Train Llama2 with Megatron-LM on A3 Mega VMs
Train large-scale ML models with Multi-Tier Checkpointing
Inference
About AI/ML model inference on GKE
Analyze model inference performance and costs with Inference Quickstart
Choose a load balancing strategy for AI/ML model inference on GKE
Try inference examples on GPUs
Serve a model with a single GPU
Serve an LLM with multiple GPUs
Serve LLMs like Deepseek-R1 671B or Llama 3.1 405B
Serve an LLM on L4 GPUs with Ray
Serve scalable LLMs using TorchServe
Serve Gemma on GPUs with Hugging Face TGI
Serve Gemma on GPUs with vLLM
Serve Llama models using GPUs on GKE with vLLM
Serve Gemma on GPUs with TensorRT-LLM
Serve an LLM with GKE Inference Gateway
Fine-tune Gemma open models using multiple GPUs
Serve LLMs with a cost-optimized and high-availability GPU provisioning strategy
Serve open LLMs on GKE with a pre-configured architecture
Try inference examples on TPUs
Serve open source models using TPUs with Optimum TPU
Serve Gemma on TPUs with JetStream
Serve an LLM on TPUs with JetStream and PyTorch
Serve an LLM on multi-host TPUs with JetStream and Pathways
Serve an LLM on TPUs with vLLM
Serve an LLM using TPUs with KubeRay
Serve SDXL using TPUs on GKE with MaxDiffusion
Perform multihost inference using Pathways
Batch
Best practices for running batch workloads on GKE
Deploy a batch system using Kueue
Obtain GPUs with Dynamic Workload Scheduler
About GPU obtainability with flex-start
Run a large-scale workload with flex-start with queued provisioning
Run a small batch workload with flex-start provisioning mode
Implement a Job queuing system with quota sharing between namespaces
Optimize resource utilization for mixed training and inference workloads using Kueue
Agentic AI
Deploy an agentic AI application on GKE with the ADK and Vertex AI
Use Ray on GKE
Deploy workloads by application type
Web servers and applications
Plan for serving websites
Deploy a stateful app
Ensure workloads are disruption-ready
Deploy a stateless app
Allow direct connections to Autopilot Pods using hostPort
Run Django
Deploy an application from Cloud Marketplace
Run full-stack workloads at scale on GKE
Deploy a containerized web server app
Game Servers
Get support for Agones and Game Servers issues
Isolate the Agones controller in your GKE cluster
Deploy Arm workloads
Prepare an Arm workload for deployment to Standard clusters
Build multi-arch images for Arm workloads
Deploy Autopilot workloads on Arm architecture
Migrate x86 application on GKE to multi-arch with Arm
Microsoft Windows
Deploy a Windows Server application
Build Windows Server multi-arch images
Deploy ASP.NET apps with Windows Authentication in GKE Windows containers
Run fault-tolerant workloads at lower costs
Use Spot Pods on Autopilot clusters
Use Spot VMs to run workloads on GKE Standard clusters
Use preemptible VMs to run workloads
Manage and optimize clusters
Manage cluster lifecycle changes to minimize disruption
Optimize your usage of GKE with insights and recommendations
Manage a GKE cluster
Configure a cluster and workload for staging
Upgrade clusters and node pools
About GKE cluster upgrades
Plan for cluster upgrades
About release channels
Use release channels
About Autopilot cluster upgrades
About Standard cluster upgrades
Auto-upgrade nodes
Manually upgrade a cluster or node pool
About node upgrade strategies
Configure node upgrade strategies
About maintenance windows and exclusions
Configure maintenance windows and exclusions
About cluster upgrades with rollout sequencing
Sequence the rollout of cluster upgrades
Get notifications for cluster events
About cluster notifications
Receive cluster notifications through Pub/Sub
Configure cluster to receive email notifications
Configure cluster notifications for third-party services
Get visibility into cluster upgrades
Manage nodes
Ensure resources for node upgrades
Resize clusters by adding or removing nodes
Define compact placement for nodes
Migrate nodes to a different machine type
Migrate from Docker to containerd node images
Migrate nodes to Linux cgroupv2
Customize containerd configuration
Customize node system configuration
Configure Windows Server nodes to join a domain
Simultaneous multi-threading (SMT) for high performance compute
Delete clusters
Use Kubernetes beta APIs with GKE clusters
Ensure control plane stability when using webhooks
Use Backup for GKE
Troubleshoot application-layer Secrets
Troubleshoot CRDs with an invalid CA bundle
Monitor
Observability for GKE
Set up Google Cloud Managed Service for Prometheus
Monitor clusters and workloads
Configure metrics collection
Configure automatic application monitoring for workloads
View observability metrics
Collect and view observability metrics
Collect and view control plane metrics
Collect and view kube state metrics
Collect and view cAdvisor/Kubelet metrics
Collect and view DCGM metrics
Use application performance metrics
Monitor startup latency metrics
Understand cluster usage profiles with GKE usage metering
Application observability with Prometheus on GKE
Set up Elastic Stack on GKE
View and process logs
About GKE logs
View GKE logs
Control log ingestion
Adjust log throughput
Set up multi-tenant logging
Troubleshooting
Overview
Introduction to troubleshooting
Cluster setup
Cluster creation
Autopilot clusters
Kubectl command-line tool
Standard node pools
Node registration
Container runtime
Autoscaling
Cluster autoscaler not scaling down
Cluster autoscaler not scaling up
Storage
Storage in GKE
GKE Volume Populator
Cluster security
Networking
Workloads
Deployed workloads
Image pulls
CrashLoopBackOff events
OOM events
Arm workloads
TPUs
GPUs
Privileged workloads on Autopilot
Cluster management
Upgrades
Concurrent operations
Webhooks
Namespace stuck in the Terminating state
Scalability
Monitoring
System metrics
Monitoring dashboards
Logging
4xx errors
Known issues
Deprecations
Feature and API deprecations
Viewing deprecation insights and recommendations
Configure exec probe timeouts before upgrading to GKE version 1.35
Posture management feature deprecations
Transition from Container Registry to Artifact Registry in GKE
Migrate nodes to containerd 2
Workload vulnerability scanning removal in GKE standard edition
Deprecated authentication plugin for Kubernetes clients
PodSecurityPolicy deprecation
About the Docker node image deprecation
Ensure compatibility of TLS certificates before upgrading to GKE 1.29
Ensuring compatibility of webhook certificates before upgrading to v1.23
Serve Gemma on TPUs with Saxml
Serve an LLM using multi-host TPUs with Saxml
Kubernetes API deprecations
Kubernetes 1.32 deprecated APIs
Kubernetes 1.29 deprecated APIs
Kubernetes 1.27 deprecated APIs
Kubernetes 1.26 deprecated APIs
Kubernetes 1.25 deprecated APIs
Kubernetes Ingress Beta APIs removed in GKE 1.23
Kubernetes 1.22 deprecated APIs
Archive