This tutorial uses Kueue to show you how to implement a Job queueing system, configure workload resource and quota sharing between different namespaces on Google Kubernetes Engine (GKE), and to maximize the utilization of your cluster.
Background
As an infrastructure engineer or cluster administrator, maximizing utilization between namespaces is very important. A batch of Jobs in one namespace might not fully use the full quota assigned to the namespace, while another namespace may have multiple pending Jobs. In order to efficiently use the cluster resources among Jobs in different namespaces and to increase the flexibility of quota management, you can configure cohorts in Kueue. A cohort is a group of ClusterQueues that can borrow unused quota from one another. A ClusterQueue governs a pool of resources such as CPU, memory, and hardware accelerators.
You can find a more detailed definition of all these concepts in the Kueue documentation
Create the ResourceFlavors
A ResourceFlavor represents resource variations in your cluster nodes, such as different VMs (for example spot versus on-demand), architectures (for example, x86 vs ARM CPUs), brands and models (for example, Nvidia A100 versus T4 GPUs).
ResourceFlavors use node labels and taints to match with a set of nodes in the cluster.
In this manifest:
- The ResourceFlavor on-demandhas its label set tocloud.google.com/gke-provisioning: standard.
- The ResourceFlavor spothas its label set tocloud.google.com/gke-provisioning: spot.
When a workload is assigned a ResourceFlavor, Kueue assigns the Pods of the workload to nodes that match the node labels defined for the ResourceFlavor.
Deploy the ResourceFlavor:
kubectl apply -f flavors.yaml
Create the ClusterQueue and LocalQueue
Create two ClusterQueues cq-team-a and cq-team-b, and their corresponding
LocalQueues lq-team-a and lq-team-b respectively namespaced to team-a and team-b.
ClusterQueues are cluster-scoped object that governs a pool of resources such as CPU, memory, and hardware accelerators. Batch administrators can restrict the visibility of these objects to batch users.
LocalQueues are namespaced objects that batch users can list. They point to CluterQueues, from which resources are allocated to run the LocalQueue workloads.
ClusterQueues allows resources to have multiple flavors. In this case, both
ClusterQueues have two flavors, on-demand and spot, each providing cpu resources.
The quota of the ResourceFlavor spot is set to 0, and will not be used for
now.
Both ClusterQueues share the same cohort called all-teams, defined in .spec.cohort.
When two or more ClusterQueues share the same cohort, they can borrow unused quota from each
other.
You can learn more about how cohorts work and the borrowing semantics in the Kueue documentation
Deploy the ClusterQueues and LocalQueues:
kubectl apply -f cq-team-a.yaml
kubectl apply -f cq-team-b.yaml
(Optional) Monitor Workloads using kube-prometheus
You can use Prometheus to monitor your active and pending Kueue workloads.
To monitor the workloads being brought up and observe the load on each
ClusterQueue, deploy kube-prometheus to the 
cluster under the namespace monitoring:
- Download the source code for Prometheus operator: - cd git clone https://github.com/prometheus-operator/kube-prometheus.git
- Create the CustomResourceDefinitions(CRDs): - kubectl create -f kube-prometheus/manifests/setup
- Create the monitoring components: - kubectl create -f kube-prometheus/manifests
- Allow the - prometheus-operatorto scrape metrics from Kueue components:- kubectl apply -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/prometheus.yaml
- Change to the working directory: - cd kubernetes-engine-samples/batch/kueue-cohort
- Set up port forwarding to the Prometheus service running in your GKE cluster: - kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090
- Open the Prometheus web UI on localhost:9090 in the browser. - In the Cloud Shell: - Click Web Preview. 
- Click Change port and set the port number to - 9090.
- Click Change and Preview. 
 - The following Prometheus web UI appears.  
- In the Expression query box, enter the following query to create the first panel that monitors the active workloads for - cq-team-aClusterQueue:- kueue_pending_workloads{cluster_queue="cq-team-a", status="active"} or kueue_admitted_active_workloads{cluster_queue="cq-team-a"}
- Click Add panel. 
- In the Expression query box, enter the following query to create another panel that monitors the active workloads for - cq-team-bClusterQueue:- kueue_pending_workloads{cluster_queue="cq-team-b", status="active"} or kueue_admitted_active_workloads{cluster_queue="cq-team-b"}
- Click Add panel. 
- In the Expression query box, enter the following query to create a panel that monitors the number of nodes in the cluster: - count(kube_node_info)
(Optional) Monitor Workloads using Google Cloud Managed Service for Prometheus
You can use Google Cloud Managed Service for Prometheus to monitor your active and pending Kueue workloads. A full list of metrics can be found in the Kueue documentation.
- Setup Identity and RBAC for metrics access: - The following configuration creates 4 Kubernetes resources, that provide metrics access for the Google Cloud Managed Service for Prometheus collectors. - A ServiceAccount named - kueue-metrics-readerwithin the- kueue-systemnamespace, will be used to authenticate when accessing the Kueue metrics.
- A Secret associated with the - kueue-metrics-readerservice account, stores an authentication token, that is used by the collector, to authenticate with metrics endpoint exposed by the Kueue deployment.
- A Role named - kueue-secret-readerin the- kueue-systemnamespace, which allows reading the secret containing the service account token.
- A ClusterRoleBinding that grants the - kueue-metrics-readerservice account the- kueue-metrics-readerClusterRole.
 - apiVersion: v1 kind: ServiceAccount metadata: name: kueue-metrics-reader namespace: kueue-system --- apiVersion: v1 kind: Secret metadata: name: kueue-metrics-reader-token namespace: kueue-system annotations: kubernetes.io/service-account.name: kueue-metrics-reader type: kubernetes.io/service-account-token --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: kueue-secret-reader namespace: kueue-system rules: - resources: - secrets apiGroups: [""] verbs: ["get", "list", "watch"] resourceNames: ["kueue-metrics-reader-token"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kueue-metrics-reader subjects: - kind: ServiceAccount name: kueue-metrics-reader namespace: kueue-system roleRef: kind: ClusterRole name: kueue-metrics-reader apiGroup: rbac.authorization.k8s.io
- Configure RoleBinding for Google Cloud Managed Service for Prometheus: - Depending on if you are using a Autopilot or Standard cluster, you will need to create the RoleBinding in either the - gke-gmp-systemor- gmp-systemnamespace. This resource allows the collector service account to access the- kueue-metrics-reader-tokensecret to authenticate and scrape the Kueue metrics.- Autopilot- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: gmp-system:collector:kueue-secret-reader namespace: kueue-system roleRef: name: kueue-secret-reader kind: Role apiGroup: rbac.authorization.k8s.io subjects: - name: collector namespace: gke-gmp-system kind: ServiceAccount- Standard- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: gmp-system:collector:kueue-secret-reader namespace: kueue-system roleRef: name: kueue-secret-reader kind: Role apiGroup: rbac.authorization.k8s.io subjects: - name: collector namespace: gmp-system kind: ServiceAccount
- Configure Pod Monitoring resource: - The following resource configures the monitoring for the Kueue depployment, it specifies that metrics are exposed on the /metrics path over HTTPS. It uses the - kueue-metrics-reader-tokensecret for authentication when scraping the metrics.- apiVersion: monitoring.googleapis.com/v1 kind: PodMonitoring metadata: name: kueue namespace: kueue-system spec: selector: matchLabels: control-plane: controller-manager endpoints: - port: 8443 interval: 30s path: /metrics scheme: https tls: insecureSkipVerify: true authorization: type: Bearer credentials: secret: name: kueue-metrics-reader-token key: token
Query exported metrics
Sample PromQL queries for monitoring Kueue based systems
These PromQL queries allow you to monitor key Kueue metrics such as job throughput, resource utilization by queue, and workload wait times to understand system performance and identify potential bottlenecks.
Job Throughput
This calculates the per-second rate of admitted workloads over 5 minutes for each cluster_queue. This metric can help in breaking it down by queue helps pinpoint bottlenecks and summing it provides the overall system throughput.
Query:
sum(rate(kueue_admitted_workloads_total[5m])) by (cluster_queue)
Resource Utilization
This assumes metrics.enableClusterQueueResources is enabled. It calculates the
ratio of current CPU usage to the nominal CPU quota for each queue. A value
close to 1 indicates high utilization. You can adapt this for memory or other
resources by changing the resource label.
To install a custom-configured released version of Kueue in your cluster, follow the Kueue documentation.
Query:
sum(kueue_cluster_queue_resource_usage{resource="cpu"}) by (cluster_queue) / sum(kueue_cluster_queue_nominal_quota{resource="cpu"}) by (cluster_queue)Queue Wait Times
This provides the 90th percentile wait time for workloads in a specific queue. You can modify the quantile value (e.g. 0.5 for median, 0.99 for 99th percentile) to understand the wait time distribution.
Query:
histogram_quantile(0.9, kueue_admission_wait_time_seconds_bucket{cluster_queue="QUEUE_NAME"})Create Jobs and observe the admitted workloads
In this section, you create Kubernetes Jobs under the namespace team-a and team-b. A Job controller in Kubernetes creates one or more Pods and ensures that they successfully execute a specific task.
Generate Jobs to both ClusterQueues that will sleep for 10 seconds, with three paralleled Jobs and will be completed with three completions. It will then be cleaned up after 60 seconds.
job-team-a.yaml creates Jobs under the namespace team-a and points to
the LocalQueue lq-team-a and the ClusterQueue cq-team-a.
Similarly, job-team-b.yaml creates Jobs under team-b namespace, and points
to the LocalQueue lq-team-b and the ClusterQueue cq-team-b.
- Start a new terminal and run this script to generate a Job every second: - ./create_jobs.sh job-team-a.yaml 1
- Start another terminal and create Jobs for the - team-bnamespace:- ./create_jobs.sh job-team-b.yaml 1
- Observe the Jobs being queued up in Prometheus. Or with this command: - watch -n 2 kubectl get clusterqueues -o wide
The output should be similar to the following:
    NAME        COHORT      STRATEGY         PENDING WORKLOADS   ADMITTED WORKLOADS
    cq-team-a   all-teams   BestEffortFIFO   0                   5
    cq-team-b   all-teams   BestEffortFIFO   0                   4
Borrow unused quota with cohorts
ClusterQueues might not be at full capacity at all times. Quotas usage is not maximized when workloads are not evenly spread out among ClusterQueues. If ClusterQueues share the same cohort between each other, ClusterQueues can borrow quotas from other ClusterQueues to maximize the quota utilization.
- Once there are Jobs queued up for both ClusterQueues - cq-team-aand- cq-team-b, stop the script for the- team-bnamespace by pressing- CTRL+con the corresponding terminal.
- Once all the pending Jobs from the namespace - team-bare processed, the jobs from the namespace- team-acan borrow the available resources in- cq-team-b:- kubectl describe clusterqueue cq-team-a- Because - cq-team-aand- cq-team-bshare the same cohort called- all-teams, these ClusterQueues are able to share resources that are not utilized.- Flavors Usage: Name: on-demand Resources: Borrowed: 5 Name: cpu Total: 15 Borrowed: 5Gi Name: memory Total: 15Gi
- Resume the script for the - team-bnamespace.- ./create_jobs.sh job-team-b.yaml 3- Observe how the borrowed resources from - cq-team-ago back to- 0, while the resources from- cq-team-bare used for its own workloads:- kubectl describe clusterqueue cq-team-a- Flavors Usage: Name: on-demand Resources: Borrowed: 0 Name: cpu Total: 9 Borrowed: 0 Name: memory Total: 9Gi
Increase quota with Spot VMs
When quota needs to be temporarily increased, for example to meet high demand in pending workloads, you can configure Kueue to accommodate the demand by adding more ClusterQueues to the cohort. ClusterQueues with unused resources can share those resources with other ClusterQueues that belong to the same cohort.
At the beginning of the tutorial, you created a node pool named spot using Spot VMs and a ResourceFlavor named spot with the label set to cloud.google.com/gke-provisioning: spot. Create a ClusterQueue to use this node pool and the ResourceFlavor that represents it:
- Create a new ClusterQueue called - cq-spotwith cohort set to- all-teams:- Because this ClusterQueue shares the same cohort with - cq-team-aand- cq-team-b, both ClusterQueue- cq-team-aand- cq-team-bcan borrow resources up to 15 CPU requests, and 15 Gi of memory.- kubectl apply -f cq-spot.yaml
- In Prometheus, observe how the admitted workloads spike for both - cq-team-aand- cq-team-bthanks to the added quota by- cq-spotwho shares the same cohort. Or with this command:- watch -n 2 kubectl get clusterqueues -o wide
- In Prometheus, observe the number of nodes in the cluster. Or with this command: - watch -n 2 kubectl get nodes -o wide
- Stop both scripts by pressing - CTRL+cfor- team-aand- team-bnamespace.