This tutorial provides recommended practices for creating a stateful application and upgrading the Google Kubernetes Engine (GKE) cluster that's running the application. This tutorial uses Redis as an example for deploying a stateful application, but the same concepts are applicable to other types of stateful applications deployed on GKE.
Objectives
This tutorial covers the following steps:
- Create a GKE cluster enrolled in a release channel.
- Create a Redis Cluster on GKE.
- Deploy the Redis client application to GKE.
- Perform these best practices for node pool upgrades:
- Set up the Pod Disruption Budget (PDB).
- Set up the maintenance window and exclusions.
- Set up the node upgrade strategy to either surge upgrade or blue-green upgrade.
 
- Test the application.
- Upgrade the cluster.
- Test workload disruption.
The following diagram shows you a high-level view of the cluster architecture for this tutorial:
Costs
In this document, you use the following billable components of Google Cloud:
  
  
  
  To generate a cost estimate based on your projected usage,
      use the pricing calculator.
  
When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.
Before you begin
Set up your project
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
- 
    
    
      
        In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project. Roles required to create a project To create a project, you need the Project Creator ( roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
- 
  
    Verify that billing is enabled for your Google Cloud project. 
- 
  
  
    
      Enable the GKE API. Roles required to enable APIs To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
- 
    
    
      
        In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project. Roles required to create a project To create a project, you need the Project Creator ( roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
- 
  
    Verify that billing is enabled for your Google Cloud project. 
- 
  
  
    
      Enable the GKE API. Roles required to enable APIs To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
Set defaults for the Google Cloud CLI
- In the Google Cloud console, start a Cloud Shell instance: 
 Open Cloud Shell
- Download the source code for this sample app: - git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples cd kubernetes-engine-samples/quickstarts/hello-app-redis/manifests
- Set the default environment variables: - gcloud config set project PROJECT-ID gcloud config set compute/zone COMPUTE-ZONE- Replace the following values: - PROJECT_ID: your Google Cloud project ID.
- COMPUTE_ZONE: the Compute Engine zone.
 
Create a GKE cluster enrolled in a release channel
To create your GKE cluster, complete the following steps:
- Create a cluster named - redis-testwith three nodes:- gcloud container clusters create redis-test \ --location CONTROL_PLANE_LOCATION \ --num-nodes=3 \ --release-channel regular- Replace CONTROL_PLANE_LOCATION with the Compute Engine location of the control plane of your cluster. Provide a region for regional clusters, or a zone for zonal clusters. - Once the cluster is created, you should see output similar to the following example: - NAME: redis-test LOCATION: us-central1-c MASTER_VERSION: 1.22.10-gke.600 MASTER_IP: 34.69.67.7 MACHINE_TYPE: e2-medium NODE_VERSION: 1.22.10-gke.600 NUM_NODES: 3 STATUS: RUNNING
- Configure - kubectlto communicate with the cluster:- gcloud container clusters get-credentials redis-test
Create a Redis Cluster on GKE
In this section, you add a Redis Cluster on top of the GKE cluster you previously created by deploying a ConfigMap, StatefulSet, and headless Service.
To create a Redis cluster, complete these steps:
- Refer to the ConfigMap file ( - redis-configmap.yaml) which stores the Redis configuration. The snippet below shows the Readiness probe and the Liveness probe scripts.- The - readiness.shand- liveness.shscripts use redis-cli ping to check if the redis server is running or not. If it returns- PONG, the Redis server is up and running. These scripts will be used in the- redis-cluster.yaml.- To learn more about the Redis parameters in this ConfigMap, see the Redis Cluster configuration parameters section in the Redis Cluster tutorial. 
- Deploy the ConfigMap: - kubectl apply -f redis-configmap.yaml
- Refer to the StatefulSet ( - redis-cluster.yaml) snippet below which shows the usage of the Readiness probe and the Liveness probe.- To learn about how to configure probes in Kubernetes, see Configure Probes. - We strongly recommend that you use Readiness and Liveness probes when upgrading node pools; this ensures that your Pods are ready during an upgrade. 
- Deploy the StatefulSet: - kubectl apply -f redis-cluster.yaml
- The headless Service named - redis-service.yamlis for the Redis nodes' connection. The- clusterIPfield is set to- Nonein order to create a headless Service.- Deploy the Service: - kubectl apply -f redis-service.yaml
- Wait approximately two minutes and verify all the Pods are running by using the following command: - kubectl get pods- You should see output similar to the following example: - NAME READY STATUS RESTARTS AGE redis-0 1/1 Running 0 2m29s redis-1 1/1 Running 0 2m8s redis-2 1/1 Running 0 107s redis-3 1/1 Running 0 85s redis-4 1/1 Running 0 54s redis-5 1/1 Running 0 23s
- Verify the persistent volumes were created by running the following command: - kubectl get pv- You should see output similar to the following example: - NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-HASH 1Gi RWO Delete Bound default/data-redis-5 standard 75s pvc-HASH 1Gi RWO Delete Bound default/data-redis-1 standard 2m59s pvc-HASH 1Gi RWO Delete Bound default/data-redis-3 standard 2m16s pvc-HASH 1Gi RWO Delete Bound default/data-redis-2 standard 2m38s pvc-HASH 1Gi RWO Delete Bound default/data-redis-0 standard 3m20s pvc-HASH 1Gi RWO Delete Bound default/data-redis-4 standard 104s- In this output, HASH represents a hash which is attached to each persistent volume name. 
Assign roles to your Redis Cluster
Once the configuration is complete, assign roles to the Redis Cluster.
The following script obtains the Pod IP addresses, then assigns the leader and follower roles by passing each of the Pod IP addresses into the command:
To assign roles to your Redis cluster, complete these steps:
- Run the script: - chmod +x ./roles.sh ./roles.sh
- Type - yeswhen prompted.
- Log in to a Redis node to check its role. For example, to verify that that - redis-0has a leader role, run the following command:- kubectl exec -it redis-0 -- redis-cli role- You should see output similar to the following example: - 1) "master" 2) (integer) 574 3) 1) 1) "10.28.2.3" 2) "6379" 3) "574"
Deploy the Redis client application
To deploy your application to the GKE cluster you created, define
a Deployment for your application.
The file named app-deployment.yaml 
contains the deployment definition for the application.
To learn more about the probes and Pod affinity rules used in this Deployment, see GKE best practices: Designing and building highly available clusters.
To create the Deployment, complete the following steps:
- Apply the Deployment: - kubectl apply -f app-deployment.yaml
- Expose the application through a load balancer: - kubectl expose deployment hello-web \ --type=LoadBalancer \ --port 80 \ --target-port 8080
- Wait approximately one minute and retrieve the application's external IP address by running the following command: - kubectl get service- From the output, copy the value listed in - hello-web's- EXTERNAL-IPcolumn:- NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-web LoadBalancer 10.13.10.55 EXTERNAL_IP 80:30703/TCP 166m
- Verify the application is working by pasting the EXTERNAL_IP into your web browser. You should see output similar to the following example: - I have been hit [1] times since deployment!- Take note of the visit number. You need to use it in the Testing the application's disruption section. 
- Set a variable for the EXTERNAL_IP you just copied. You use this value when you create scripts to test your application in the next section: - export IP=EXTERNAL_IP
Configure best practices for node pool upgrades
Perform these best practices for stateful applications to optimize for better availability during node pool upgrades.
Set up the Pod Disruption Budget (PDB)
Create a Pod Disruption Budget to limit the number of replicated Pods that are down simultaneously during a voluntary disruption. This is useful for stateful application where there needs to be a quorum for the number of replicas to be available during an upgrade.
In a PDB definition:
- appspecifies which application this PDB applies to.
- minAvailablesets the minimum number of Pods to be available during a disruption. It can be a value or a percentage (e.g. 30%).
- maxUnavailablesets the maximum number of Pods that can be unavailable during a disruption. It can be a value or a percentage as well.
To set up the PDB, complete these steps:
- Deploy the PDB: - kubectl apply -f pdb-minavailable.yaml
- Verify that the PDB has been created: - kubectl get pdb
Set up the maintenance windows and exclusions
Node auto-upgrades streamline the upgrade process and keep the nodes in the cluster up-to-date when the control plane is upgraded on your behalf. This feature is enabled by default. To learn more, see Auto-upgrading nodes.
Use maintenance windows and maintenance exclusions to set up time frames and control when maintenance can and cannot occur on GKE clusters:
- Set up a maintenance window that starts at 2:00 AM UTC on August 19, 2022, and finishes four hours later. This maintenance window runs daily. During this time, automatic maintenance is permitted. - gcloud container clusters update redis-test \ --maintenance-window-start 2022-08-19T02:00:00Z \ --maintenance-window-end 2022-08-19T06:00:00Z \ --maintenance-window-recurrence FREQ=DAILY
- Set up an exclusion window that prevents maintenance during the New Year holiday. This maintenance exclusion uses the - no_upgradesscope. During this time, no automatic maintenance of any kind is permitted. To learn more, see Scope of maintenance to exclude.- gcloud container clusters update redis-test \ --add-maintenance-exclusion-name new-year \ --add-maintenance-exclusion-start 2022-12-26T00:00:00Z \ --add-maintenance-exclusion-end 2023-01-02T02:00:00Z \ --add-maintenance-exclusion-scope no_upgrades
- Verify the maintenance window and exclusions are applied. Look under - maintenancePolicy:- gcloud container clusters describe redis-test
To learn more, see Configure maintenance windows and exclusions.
Configure a node upgrade strategy
There are two node upgrade strategies you can use for the node pools in your GKE cluster: Blue-green upgrades and surge upgrades. To learn more, see Node upgrade strategies.
Blue-green upgrades
Choose blue-green upgrades if the workloads are less tolerant of disruptions, and a temporary cost increase due to higher resource usage is acceptable.
Run the following command to change the current node pools to blue-green upgrade strategy.
gcloud container node-pools update default-pool \
--cluster=redis-test \
--enable-blue-green-upgrade \
--location CONTROL_PLANE_LOCATION \
--node-pool-soak-duration=120s
Node pool soak duration is set to two minutes to save time during the soak node pool phase for the purpose of this tutorial. This phase is used to verify the workload's health after the blue pool nodes have been drained. We recommend setting the node pool soak duration to one hour (3600 seconds) or a duration that best suits the application.
For more information about managing pod allocation, see Deploy a Pod to a specific node pool and Deploying Services to specific node pools.
For more information about configuring blue-green upgrades, see Configure blue-green upgrades.
Surge upgrades
Choose surge upgrades if cost optimization is important and if workloads can tolerate a graceful shutdown in less than 60 minutes (GKE respects PDB up to 60 minutes).
Run the following command to change the current node pools to surge upgrade strategy.
gcloud container node-pools update default-pool \
--max-surge-upgrade=1 \
--max-unavailable-upgrade=0 \
--cluster=redis-test
With this configuration (maxSurge=1 and maxUnavailable=0), only one surge node
can be added to the node pool during an upgrade, so only one node can be
upgraded at a time. This setting speeds up Pod restarts during upgrades while
progressing conservatively.
For more information about configuring surge upgrades, see Configure surge upgrades.
Check the current node pool configuration:
   gcloud container node-pools describe default-pool \
   --cluster redis-test \
   --location CONTROL_PLANE_LOCATION
For more information on viewing node pools, see View node pools in a cluster.
Test the application
In this section you use two scripts, one that sends requests to your application, and one that measures the success rate of the requests. You use these scripts to measure what happens when you upgrade your cluster.
To create the scripts:
- Change to the directory containing the scripts: - cd cd kubernetes-engine-samples/quickstarts/hello-app-redis/scripts
- Refer to the script named - generate_load.shwhich sends a queries-per-second (QPS) request to your application. The script saves the HTTP response code into the current directory to a file named- output. The value of- outputis used in the script you create in the next step.
- Refer to the script named - print_error_rate.shwhich calculates the success rate based on the output generated by- generate_load.sh.
- Give yourself permission to run the scripts: - chmod u+x generate_load.sh print_error_rate.sh
- Set a variable for the number of QPS. This value is used in the - generate_load.shscript as is the variable you set for the EXTERNAL_IP. We recommend you set a value of 40.- export QPS=40
- Run the - generate_load.shscript to start sending QPS:- ./generate_load.sh $IP $QPS 2>&1
- Leave the - generate_load.shscript running and open a new terminal. In the new terminal, run the- print_error_rate.shscript to check the error rate:- cd cd kubernetes-engine-samples/quickstarts/hello-app-redis/scripts watch ./print_error_rate.sh- You should see a 100% success rate and 0% error rates as the QPS are made. 
- Leave both scripts running and open a third terminal in preparation for the next section. 
Upgrade the cluster
To upgrade the cluster, complete these steps:
- Determine which GKE version the - redis-testcluster is using:- V=$(gcloud container clusters describe redis-test | grep "version:" | sed "s/version: //") echo $V- You should see output similar to the following example: - 1.22.9-gke.2000.
- Retrieve a list of available Kubernetes versions: - gcloud container get-server-config
- In the list of versions, locate the - validMasterVersions:section and look for the- redis-testversion you retrieved in the previous step. To avoid violating the GKE version skew policy by picking a version incompatible with the nodes, copy the version from the list that is listed immediately before the- redis-testversion.
- Upgrade the cluster's control plane to the version you selected and type - ywhen prompted:- gcloud container clusters upgrade redis-test \ --master \ --cluster-version VERSION- Replace VERSION with the version you selected from the list in the previous step. - The control plane upgrade takes several minutes. 
- Upgrade the cluster's nodes to the version you selected and type - ywhen prompted:- gcloud container clusters upgrade redis-test \ --cluster-version=VERSION \ --node-pool=default-pool- Replace VERSION with the version you selected from the list. 
Test workload disruption
In this section, you test your application's status and observe workload disruption.
- Return to the terminal window running - ./print_error_rate.shand observe how the success rate changed during the upgrade. You should notice a slight decrease in the success rate and a slight increase in the app network error rate as the nodes are taken down to be upgraded.- In the - Success ratefield, you'll see how many visits were successfully made to the website. Take a note of this value.
- Stop both scripts from running by entering - CTRL+Cin the relevant terminals.
- Return to the website for your application by entering its IP address (this is the EXTERNAL_IP you copied during the Deploy the Redis client application section) into your browser. 
- Observe the visit number for your application. The number you see should equal: - ORIGINAL_VISIT_NUMBER + SUCCESSFUL_VISIT_NUMBER- where ORIGINAL_VISIT_NUMBER is the number you recorded in the final step of Deploy the Redis client application and SUCCESSFUL_VISIT_NUMBER is the value you recorded in the first step of this section. 
Clean up
After you finish the tutorial, you can clean up the resources that you created so that they stop using quota and incurring charges. The following sections describe how to delete or turn off these resources.
Delete the project
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete the cluster
To delete the cluster you created for this tutorial, run the following command:
gcloud container clusters delete redis-test