Skip to main content
Documentation
Technology areas
close
AI and ML
Application development
Application hosting
Compute
Data analytics and pipelines
Databases
Distributed, hybrid, and multicloud
Generative AI
Industry solutions
Networking
Observability and monitoring
Security
Storage
Cross-product tools
close
Access and resources management
Costs and usage management
Google Cloud SDK, languages, frameworks, and tools
Infrastructure as code
Migration
Related sites
close
Google Cloud Home
Free Trial and Free Tier
Architecture Center
Blog
Contact Sales
Google Cloud Developer Center
Google Developer Center
Google Cloud Marketplace
Google Cloud Marketplace Documentation
Google Cloud Skills Boost
Google Cloud Solution Center
Google Cloud Support
Google Cloud Tech Youtube Channel
/
English
Deutsch
Español – América Latina
Français
Indonesia
Italiano
Português – Brasil
中文 – 简体
中文 – 繁體
日本語
한국어
Console
Sign in
Google Kubernetes Engine (GKE)
Overview
Guides
Reference
Samples
Resources
Contact Us
Start free
Documentation
Overview
Guides
Reference
Samples
Resources
Technology areas
More
Cross-product tools
More
Related sites
More
Console
Contact Us
Start free
Discover
Introducing GKE
Explore GKE documentation
Use GKE or Cloud Run?
Try it
Create a cluster in the console
Create a cluster with Terraform
Explore your cluster
Fine-tune GKE services with Gemini assistance
Learn fundamentals
Start learning about GKE
Learn Kubernetes fundamentals
Start learning about Kubernetes
Introducing containers
Kubernetes comic
Kubernetes.io
Video playlist: Learn Kubernetes with Google
Learn GKE essentials
GKE modes of operation
Video playlist: GKE Essentials
Common GKE user roles and tasks
Get started
Cluster lifecycle
Cluster administration overview
Cluster configuration
Deploying workloads
GKE cluster architecture
Workflows and tools
gcloud CLI overview
GKE in the Google Cloud console
Provision GKE resources with Terraform
Install kubectl and configure cluster access
Simplify deployment using your IDE
Learning path: Containerize your app
Overview
Understand the monolith
Modularize the monolith
Prepare for containerization
Containerize the modular app
Deploy the app to a cluster
Learning path: Scalable apps
Overview
Create a cluster
Monitor with Prometheus
Scale workloads
Simulate failure
Centralize changes
Production considerations
Design and plan
Code samples
Architectures and best practices
Develop and deliver apps with Cloud Code, Cloud Build, and Google Cloud Deploy
Address continuous delivery challenges
Set up GKE clusters
Plan clusters for running your workloads
Compare features in GKE Autopilot and Standard
About regional clusters
About feature gates
About alpha clusters
Set up Autopilot clusters
About GKE Autopilot
Create Autopilot clusters
Extend the run time of Autopilot Pods
Set up Standard clusters
Create a zonal cluster
Create a regional cluster
Create an alpha cluster
Create a cluster using Windows node pools
Prepare to use clusters
Use labels to organize clusters
Manage GKE resources using Tags
Configure node pools
About node pools
Add and manage node pools
About node images
About Containerd images
Specify a node image
About Arm workloads on GKE
Create Standard clusters and node pools with Arm nodes
Plan GKE Standard node sizes
About Spot VMs
About Windows Server containers
Auto-repair nodes
Automatically bootstrap GKE nodes with DaemonSets
Update Kubernetes node labels and taints for node pools
Set up clusters for multi-tenancy
About cluster multi-tenancy
Plan a multi-tenant environment
Prepare GKE clusters for third-party tenants
Set up multi-tenant logging
Use fleets to simplify multi-cluster management
About fleets
Create fleets
Set up service mesh
Provision Cloud Service Mesh in an Autopilot cluster
Enhance scalability for clusters
About GKE scalability
Plan for scalability
Plan for large GKE clusters
Plan for large workloads
Provision extra compute capacity for rapid Pod scaling
Consume reserved zonal resources
About quicker workload startup with fast-starting nodes
Reduce and optimize costs
Plan for cost-optimization
View GKE costs
View cluster costs breakdown
View cost-related optimization metrics
Optimize GKE costs
Right-size your GKE workloads at scale
Reduce costs by scaling down GKE clusters during off-peak hours
Identify underprovisioned and overprovisioned GKE clusters
Identify idle GKE clusters
Configure autoscaling for infrastructure
About cluster autoscaling
Configure cluster autoscaling
About node auto-provisioning
Configure node auto-provisioning
View cluster autoscaling events
Configure autoscaling for workloads
Scaling deployed applications
About autoscaling workloads based on metrics
Optimize Pod autoscaling based on metrics
About horizontal Pod autoscaling
Autoscale deployments using horizontal Pod autoscaling
Configure autoscaling for LLM workloads on GPUs
Configure autoscaling for LLM workloads on TPUs
View horizontal Pod autoscaler events
Scale to zero using KEDA
About vertical Pod autoscaling
Configure multidimensional Pod autoscaling
Scale container resource requests and limits
Provision storage
About storage for GKE clusters
Use Kubernetes features, primitives, and abstractions for storage
Use persistent volumes and dynamic provisioning
Use StatefulSets
About volume snapshots
Use volume expansion
Populate volumes with data from Cloud Storage
About the GKE Volume Populator
Automate data transfer to Parallelstore
Automate data transfer to Hyperdisk ML
Block storage
Provision and use Persistent Disks
Using the Compute Engine Persistent Disk CSI driver
Persistent volume attach limits
Using pre-existing persistent disks
Manually install a CSI driver
Using persistent disks with multiple readers (ReadOnlyMany)
Persistent disks backed by SSD
Regional persistent disks
Increase stateful app availability with Stateful HA Operator
Provision and use Hyperdisk
About Hyperdisk
Scale your storage performance using Hyperdisk
Optimize storage performance and cost with Hyperdisk Storage Pools
Accelerate AI/ML data loading using Hyperdisk ML
Provision and use GKE Data Cache
Accelerate read performance of stateful workloads with GKE Data Cache
Manage your persistent storage
Configure a boot disk for node file systems
Clone persistent disks
Back up and restore Persistent Disk storage using volume snapshots
Optimize disk performance
About optimizing disk performance
Monitor disk performance
Local SSD and ephemeral storage
About Local SSD storage for GKE
Provision Local SSD-backed ephemeral storage
Provision Local SSD-backed raw block storage
Create a Deployment using an EmptyDir Volume
Use dedicated Persistent Disks as ephemeral volumes
File storage
Provision and use Filestore
About Filestore support for GKE
Access Filestore instances
Deploy a stateful workload with Filestore
About Filestore multishares for GKE
Optimize multishares for GKE
Back up and restore Filestore storage using volume snapshots
Provision and use Parallelstore volumes
About Parallelstore for GKE
Create and use a volume backed by Parallelstore
Access existing Parallelstore instances
Provision and use Lustre volumes
About Lustre for GKE
Create and use a volume backed by Lustre
Access existing Lustre instances
Object storage
Quickstart: Cloud Storage FUSE CSI driver for GKE
About the Cloud Storage FUSE CSI driver for GKE
Set up the Cloud Storage FUSE CSI driver
Mount Cloud Storage buckets as ephemeral volumes
Mount Cloud Storage buckets as persistent volumes
Configure the Cloud Storage FUSE CSI driver sidecar container
Optimize Cloud Storage FUSE CSI driver performance
Deploy and manage workloads
Deploy Autopilot workloads
Plan resource requests for Autopilot workloads
About Autopilot workloads in GKE Standard
Run Autopilot workloads in Standard clusters
Configure node attributes with ComputeClasses
About GKE ComputeClasses
About built-in ComputeClasses in GKE
About custom ComputeClasses
Control autoscaled node attributes with custom compute classes
Apply compute classes to Pods by default
About Balanced and Scale-Out compute classes in Autopilot clusters
Choose predefined compute classes for Autopilot Pods
Deploy workloads on optimized hardware
Minimum CPU platforms for compute-intensive workloads
Configure Pod bursting in GKE
Analyze CPU performance using the PMU
Deploy workloads that have special security requirements
GKE Autopilot partners
Run privileged workloads from GKE Autopilot partners
Run privileged open source workloads on GKE Autopilot
Deploy workloads that require specialized devices
About dynamic resource allocation (DRA) in GKE
Prepare your GKE infrastructure for DRA
Deploy DRA workloads
Migrate workloads
Identify Standard clusters to migrate to Autopilot
Prepare to migrate to Autopilot clusters from Standard clusters
Manage workloads
Place GKE Pods in specific zones
Simulate zone failure
Improve workload efficiency using NCCL Fast Socket
About container image digests
Using container image digests in Kubernetes manifests
Improve workload initialization speed
Use streaming container images
Use secondary boot disks to preload data or container images
Isolate your workloads using namespaces
Continuous integration and delivery
Plan for continuous integration and delivery
Create a CI/CD pipeline with Azure Pipelines
GitOps-style continuous delivery with Cloud Build
Modern CI/CD with GKE
A software delivery framework
Build a CI/CD system
Apply the developer workflow
Deploy databases, caches, and data streaming workloads
Data on GKE
Plan your database deployments on GKE
Managed databases
Deploy an app using GKE Autopilot and Spanner
Deploy WordPress on GKE with Persistent Disk and Cloud SQL
Analyze data on GKE using BigQuery, Cloud Run, and Gemma
Kafka
Deploy Apache Kafka to GKE using Strimzi
Deploy Apache Kafka to GKE using Confluent
Deploy a highly-available Kafka cluster on GKE
Redis
Create a multi-tier web application with Redis and PHP
Deploy a Redis cluster on GKE
Deploy Redis to GKE using Redis Enterprise
MySQL
Deploy a stateful MySQL cluster
Migrate your MySQL data from Persistent Disk to Hyperdisk
PostgreSQL
Deploy a highly-available PostgreSQL database
Deploy PostgreSQL to GKE using Zalando
Deploy PostgreSQL to GKE using CloudNativePG
SQL Server
Deploy single instance SQL Server 2017 on GKE
Memcached
Deploy Memcached on GKE
Vector databases
Build a RAG chatbot using GKE and Cloud Storage
Deploy a Qdrant database on GKE
Deploy an Elasticsearch database on GKE
Deploy a PostgreSQL vector database on GKE
Deploy a Weaviate vector database on GKE
Deploy AI/ML workloads
AI/ML orchestration on GKE
Run ML and AI workloads
About accelerator consumption options for AI workloads
GPUs
About GPUs in GKE
Deploy GPU workloads in GKE Autopilot
Deploy GPU workloads in GKE Standard
Encrypt GPU workload data in-use
Manage the GPU Stack with the NVIDIA GPU Operator
GPU Sharing
About GPU sharing strategies in GKE
Use multi-instance GPU
Use GPU time-sharing
Use NVIDIA MPS
Best practices for autoscaling LLM inference workloads on GPUs
Best practices for optimizing LLM inference performance on GPUs
TPUs in GKE
About TPUs in GKE
Plan TPUs in GKE
Request TPUs
About accelerator consumption options for AI workloads
Request TPUs with future reservations in calendar mode
Run a small batch workload with flex-start provisioning mode for TPUs
Deploy TPU workloads in GKE Autopilot
Deploy TPU workloads in GKE Standard
Deploy TPU Multislices in GKE
Orchestrate TPU Multislice workloads using JobSet and Kueue
Best practices for autoscaling LLM inference workloads on TPUs
Manage GKE node disruption for GPUs and TPUs
CPU-based workloads
Optimize Autopilot Pod performance by choosing a machine series
Optimize GPU and TPU provisioning
About GPU and TPU provisioning with flex-start
Run a large-scale workload with flex-start with queued provisioning
Run a small batch workload with GPUs and flex-start provisioning mode
Run a small batch workload with TPUs and flex-start provisioning mode
Training
Train a model with GPUs on GKE Standard mode
Train a model with GPUs on GKE Autopilot mode
Train Llama2 with Megatron-LM on A3 Mega VMs
Train large-scale ML models with Multi-Tier Checkpointing
Inference
About AI/ML model inference on GKE
Analyze model inference performance and costs with Inference Quickstart
Choose a load balancing strategy for AI/ML model inference on GKE
Try inference examples on GPUs
Serve a model with a single GPU
Serve an LLM with multiple GPUs
Serve LLMs like Deepseek-R1 671B or Llama 3.1 405B
Serve an LLM on L4 GPUs with Ray
Serve scalable LLMs using TorchServe
Serve Gemma on GPUs with Hugging Face TGI
Serve Gemma on GPUs with vLLM
Serve Llama models using GPUs on GKE with vLLM
Serve Gemma on GPUs with TensorRT-LLM
Serve an LLM with GKE Inference Gateway
Fine-tune Gemma open models using multiple GPUs
Serve LLMs with a cost-optimized and high-availability GPU provisioning strategy
Serve open LLMs on GKE with a pre-configured architecture
Try inference examples on TPUs
Serve open source models using TPUs with Optimum TPU
Serve Gemma on TPUs with JetStream
Serve an LLM on TPUs with JetStream and PyTorch
Serve an LLM on multi-host TPUs with JetStream and Pathways
Serve an LLM on TPUs with vLLM
Serve an LLM using TPUs with KubeRay
Serve SDXL using TPUs on GKE with MaxDiffusion
Perform multihost inference using Pathways
Batch
Best practices for running batch workloads on GKE
Deploy a batch system using Kueue
Obtain GPUs with Dynamic Workload Scheduler
About GPU obtainability with flex-start
Run a large-scale workload with flex-start with queued provisioning