This document describes how to scale existing stateless workloads running in a Google Distributed Cloud (GDC) air-gapped Kubernetes cluster. You must scale the pods running in your stateless workloads as your container workload requirements evolve.
This document is for developers within the application operator group who are responsible for managing application workloads for their organization. For more information, see Audiences for GDC air-gapped documentation.
Before you begin
To complete the tasks in this document, you must have the following resources and roles:
To run commands against a Kubernetes cluster, make sure you have the following resources:
Locate the Kubernetes cluster name, or ask a member of the platform administrator group what the cluster name is.
Sign in and generate the kubeconfig file for the Kubernetes cluster if you don't have one.
Use the kubeconfig path of the Kubernetes cluster to replace
KUBERNETES_CLUSTER_KUBECONFIGin these instructions.
To get the required permissions to scale stateless workloads in a shared cluster, ask your Organization IAM Admin to grant you the Namespace Admin role (
namespace-admin) in your project namespace.To get the required permissions to scale stateless workloads in a standard cluster, ask your Organization IAM Admin to grant you the Cluster Developer role (
cluster-developer) in a standard cluster.
Scale a deployment
Use the scaling functionality of Kubernetes to appropriately scale the amount of pods running in your deployment.
Autoscale the pods of a deployment
Kubernetes offers autoscaling to remove the need of manually updating your deployment when demand evolves. Complete the following steps to autoscale the pods of your deployment:
To ensure the horizontal pod autoscaler can appropriately measure the CPU percentage, set the CPU resource request on your deployment.
Set the horizontal pod autoscaler in your deployment:
kubectl --kubeconfig KUBERNETES_CLUSTER_KUBECONFIG \ -n NAMESPACE \ autoscale deployment DEPLOYMENT_NAME \ --cpu-percent=CPU_PERCENT \ --min=MIN_NUMBER_REPLICAS \ --max=MAX_NUMBER_REPLICASReplace the following:
KUBERNETES_CLUSTER_KUBECONFIG: the kubeconfig file for the cluster.NAMESPACE: the namespace. For shared clusters, this must be a project namespace. For standard clusters, it can be any namespace.DEPLOYMENT_NAME: the name of the deployment to autoscale.CPU_PERCENT: the target average CPU utilization to request, represented as a percentage, over all the pods.MIN_NUMBER_REPLICAS: the lower limit for the number of pods the autoscaler can provision.MAX_NUMBER_REPLICAS: the upper limit for the number of pods the autoscaler can provision.
Check the current status of the horizontal pod autoscaler:
kubectl get hpaThe output is similar to the following:
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE DEPLOYMENT_NAME Deployment/DEPLOYMENT_NAME/scale 0% / 50% 1 10 1 18s
Manually scale the pods of a deployment
If you prefer to manually scale a deployment, run:
kubectl --kubeconfig KUBERNETES_CLUSTER_KUBECONFIG \
-n NAMESPACE \
scale deployment DEPLOYMENT_NAME \
--replicas NUMBER_OF_REPLICAS
Replace the following:
KUBERNETES_CLUSTER_KUBECONFIG: the kubeconfig file for the cluster.NAMESPACE: the namespace. For shared clusters, this must be a project namespace. For standard clusters, it can be any namespace.DEPLOYMENT_NAME: the name of the deployment in which to autoscale.DEPLOYMENT_NAME: the number of replicatedPodobjects in the deployment.