Identify underprovisioned and overprovisioned workloads

This document explains how to identify underprovisioned and overprovisioned workloads that run on Google Kubernetes Engine (GKE) clusters by using insights and recommendations. After you verify that the identified workloads would benefit from the recommendation to scale up or down, you can make the recommended change to save costs or increase the reliability of your workload. If possible, the recommendation includes projected monthly savings or cost. For more information, see Understand cost or savings estimates.

GKE provides these insights about workloads running on both Autopilot and Standard clusters. GKE also provides similar recommendations for entire clusters. For more information, see Identify underprovisioned and overprovisioned GKE clusters.

GKE monitors your clusters and delivers guidance to optimize your usage through Active Assist, a service that provides recommenders that generate insights and recommendations for using resources on Google Cloud. For more information about how to manage insights and recommendations, see Optimize your usage of GKE with insights and recommendations.

Get insights and recommendations for underprovisioned and overprovisioned workloads

GKE surfaces these insights and recommendations in the following locations in the Google Cloud console after observing the specific behavior discussed in the following section:

The recommendations have the following titles in the Workloads page:

  • Overprovisioned workloads: "Decrease resource requests to reduce costs"
  • Underprovisioned workloads: "Increase resource requests to improve reliability"

You can also receive all types of insights and recommendations through the Google Cloud CLI or the Recommender API. To find these types specifically, follow the instructions to view insights and recommendations and filter using the WORKLOAD_UNDERPROVISIONED and WORKLOAD_OVERPROVISIONED subtypes.

After you identify underprovisioned or overprovisioned workloads, see the considerations when rightsizing workloads.

How GKE identifies underprovisioned and overprovisioned workloads

The following table describes the signals that GKE uses for identifying underprovisioned and overprovisioned workloads that can be scaled up or down, and the threshold for each signal. Additionally, this table shows the action that we recommend that you take in this scenario.

Subtype Signal Observation period Details Recommendation
WORKLOAD_UNDERPROVISIONED CPU or memory usage is high Last 15 days A workload is underprovisioned when CPU or memory utilization is greater than 150% for at least 10% of the time over the last 15 days. Scale up your workload to increase reliability
WORKLOAD_OVERPROVISIONED CPU or memory usage is low Last 15 days A workload is overprovisioned when CPU or memory utilization is less than 50% for at least 90% of the time over the last 15 days. Scale down your workload to save costs

GKE also uses the following guidelines to determine when to provide insights and recommendations:

  • GKE doesn't generate recommendations for the target metric of horizontal Pod autoscaling (HPA) because using this metric can cause interference.
  • If vertical Pod autoscaling (VPA) is enabled, the request values are automatically managed and GKE doesn't need to generate a recommendation.
  • GKE might wait up to three days before generating recommendations for new workloads.

Understand cost or savings estimates

If possible, GKE's recommendation includes an estimate that projects the monthly cost or savings if you rightsized the workload. This estimate is derived from the workload costs, based on the weighted average of the request values combined with the CPU and memory cost of the workload over the past 30 days.

Any estimated costs or savings are projections based on previous spending, and are not a guarantee of future cost or savings.

To see these estimates, ensure that the following is true:

  • You have the required billing.accounts.getSpendingInformation permission to get spending information. For more information, see Cloud Billing access.
  • GKE cost allocation is enabled for the cluster. For more information, see Enable GKE cost allocation.

For more information about the cost of all of your GKE clusters, including a more granular breakdown based on namespaces and workloads, see Get key spending insights for your GKE resource allocation and cluster costs.

For more information about the costs of running a GKE cluster, see GKE pricing.

Considerations when rightsizing workloads

Before you follow a recommendation to scale up or down a workload, consider the following:

  • Review the resource utilization of the workload to see how it's performing, and if it's using more or less CPU and memory than expected. For instructions, see Analyze resource requests.
  • Batch processing workloads might intentionally maintain high utilization for cost efficiency. If the allocated resources are sufficient for the batch jobs, you don't need to scale up the highly utilized workload, which was identified as underprovisioned.
  • GKE has limited visibility into the actual memory usage of Java Virtual Machine (JVM)-based workloads. Use extra scrutiny before applying recommendations for these types of workloads.

Implement the recommendation to rightsize a workload

You can adjust the size of a workload to better match the workload's resource utilization by doing either of the following:

  • Enable vertical Pod autoscaling for the workload. For more information, see Set Pod resource requests automatically.
  • Change the requests and limits manually according to the recommendation:

    • Underprovisioned workload: to implement the recommendation to rightsize an underprovisioned workload, increase the resource requests and limits for the workload. When you implement this recommendation, you help to ensure that your workload remains reliable because it has the appropriate amount of resources for its applications.
    • Overprovisioned workload: to implement the recommendation to rightsize an overprovisioned workload, decrease the resource requests and limits for the workload. Adjust cluster CPU and memory allocations to match your workload needs. When you implement this recommendation, you help to ensure that you use only the resources that you need to run your workload.

What's next