Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

GKE에서 네임스페이스 간 할당량 공유로 작업 큐 추가 구현

표준

이 튜토리얼에서는 Kueue를 사용하여 작업 큐 추가 시스템을 구현하고, 워크로드 리소스를 구성하고, Google Kubernetes Engine(GKE)에서 다른 네임스페이스 간 워크로드 리소스 및 할당량 공유를 구성하고, 클러스터 사용률을 극대화하는 방법을 보여줍니다.

배경

인프라 엔지니어 또는 클러스터 관리자에게는 네임스페이스 간 사용률을 극대화하는 것이 매우 중요합니다. 다른 네임스페이스에 보류 중인 작업이 여러 개 있는 동안에는 하나의 네임스페이스에 있는 작업 배치가 네임스페이스에 할당된 전체 할당량을 완전히 사용하지 못할 수 있습니다. 여러 네임스페이스에서 작업 간 클러스터 리소스를 효율적으로 사용하고 할당량을 유연하게 관리하기 위해서는 Kueue에서 동질 집단을 구성할 수 있습니다. 동질 집단이란 사용되지 않은 할당량을 서로 대여해서 사용할 수 있는 ClusterQueue 그룹을 의미합니다. ClusterQueue는 CPU, 메모리, 하드웨어 가속기와 같은 리소스 풀을 제어합니다.

이러한 모든 개념에 대한 보다 자세한 정의는 Kueue 문서에서 찾을 수 있습니다.

목표

이 튜토리얼은 할당량 공유와 함께 Kueue를 사용해서 Kubernetes에서 작업 큐 추가 시스템을 구현하려는 인프라 엔지니어 또는 클러스터 관리자를 대상으로 합니다.

이 튜토리얼에서는 2개의 서로 다른 네임스페이스에 있는 두 팀을 예로 들어 설명합니다. 각 팀은 전용 리소스를 갖고 있지만 리소스를 서로 대여해서 사용할 수 있습니다. 세 번째 리소스 집합은 작업이 누적되었을 때 스필오버로 사용될 수 있습니다.

Prometheus 연산자를 사용하여 서로 다른 네임스페이스에서 작업 및 리소스 할당을 모니터링할 수 있습니다.

이 튜토리얼에서 다루는 단계는 다음과 같습니다.

GKE 클러스터 만들기
ResourceFlavor 만들기
각 팀에 대해 ClusterQueue 및 LocalQueue 만들기
작업을 만들고 허용된 워크로드 관찰하기
동질 집단으로 사용되지 않은 할당량 대여
스팟 VM을 제어하는 스필오버 ClusterQueue 추가

비용

이 튜토리얼에서는 비용이 청구될 수 있는 다음과 같은 Google Cloud구성요소를 사용합니다.

가격 계산기를 사용하여 예상 사용량을 기준으로 예상 비용을 산출합니다.

이 튜토리얼을 마치면 만든 리소스를 삭제하여 비용이 계속 청구되지 않도록 할 수 있습니다. 자세한 내용은 삭제를 참조하세요.

시작하기 전에

프로젝트 설정

Google Cloud 계정에 로그인합니다. Google Cloud를 처음 사용하는 경우 계정을 만들고 Google 제품의 실제 성능을 평가해 보세요. 신규 고객에게는 워크로드를 실행, 테스트, 배포하는 데 사용할 수 있는 $300의 무료 크레딧이 제공됩니다.

In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project.

Roles required to create a project

To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the GKE API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project.

Roles required to create a project

To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the GKE API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

Google Cloud CLI 기본값 설정

Google Cloud 콘솔에서 Cloud Shell 인스턴스를 시작합니다.
Cloud Shell 열기

이 샘플 앱의 소스 코드를 다운로드합니다.

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples

기본 환경 변수를 설정합니다.
```
gcloud config set project PROJECT_ID
gcloud config set compute/region COMPUTE_REGION
```
다음 값을 바꿉니다.
- PROJECT_ID: Google Cloud 프로젝트 ID
- COMPUTE_REGION: Compute Engine 리전입니다.

GKE 클러스터 만들기

kueue-cohort라는 GKE 클러스터를 만듭니다.

기본 풀에 6개 노드(영역당 2개)가 있고 자동 확장이 없는 클러스터를 만듭니다. 처음에는 팀에 모든 리소스가 제공되므로 리소스를 경합해야 합니다.

이후에는 두 팀이 해당 큐로 전송하는 워크로드가 Kueue에서 관리되는 방법을 볼 수 있습니다.
```
  gcloud container clusters create kueue-cohort --location COMPUTE_REGION \
  --release-channel rapid --machine-type e2-standard-4 --num-nodes 2
```
참고: 이 단계가 완료되는 데는 최대 5분이 걸릴 수 있습니다.

클러스터가 생성된 후의 결과는 다음과 비슷합니다.
```
  kubeconfig entry generated for kueue-cohort.
  NAME: kueue-cohort
  LOCATION: us-central1
  MASTER_VERSION: 1.26.2-gke.1000
  MASTER_IP: 35.224.108.58
  MACHINE_TYPE: e2-medium
  NODE_VERSION: 1.26.2-gke.1000
  NUM_NODES: 6
  STATUS: RUNNING
```
여기서 kueue-cluster의 STATUS는 RUNNING입니다.
spot이라는 노드 풀을 만듭니다.

이 노드 풀은 스팟 VM을 사용하고 자동 확장이 사용 설정되어 있습니다. 0개 노드로 시작하지만 나중에 오버스필 용량으로 사용할 수 있도록 이를 팀에 제공합니다.
```
gcloud container node-pools create spot --cluster=kueue-cohort --location COMPUTE_REGION  \
--spot --enable-autoscaling --max-nodes 20 --num-nodes 0 \
--machine-type e2-standard-4
```
클러스터에 Kueue 출시 버전을 설치합니다.
```
VERSION=VERSION
kubectl apply -f \
  https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml
```
VERSION을 문자 v와 함께 최신 Kueue 버전으로 바꿉니다. 예를 들면 v0.4.0입니다. Kueue 버전에 대한 자세한 내용은 Kueue 출시 버전을 참조하세요.

참고: 저장소의 샘플에는 v0.3.0 이상이 필요합니다.

Kueue 컨트롤러가 준비될 때까지 기다립니다.
```
watch kubectl -n kueue-system get pods
```
계속하기 전에 출력이 다음과 비슷한지 확인합니다.
```
NAME                                        READY   STATUS    RESTARTS   AGE
kueue-controller-manager-6cfcbb5dc5-rsf8k   2/2     Running   0          3m
```
team-a와 team-b라는 새 네임스페이스 2개를 만듭니다.
```
kubectl create namespace team-a
kubectl create namespace team-b
```
작업이 각 네임스페이스에 생성됩니다.

ResourceFlavor 만들기

ResourceFlavor는 서로 다른 VM 리소스(예: 스팟과 주문형), 아키텍처(예: x86과 ARM CPU), 브랜드 및 모델(예: Nvidia A100과 T4 GPU)과 같은 클러스터 노드의 리소스 변형을 나타냅니다.

ResourceFlavor는 노드 라벨 및 taint를 사용하여 클러스터의 노드 집합과 일치하는지 확인합니다.

apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: on-demand # This ResourceFlavor will be used for the CPU resource
spec:
  nodeLabels:
    cloud.google.com/gke-provisioning: standard # This label was applied automatically by GKE
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: spot # This ResourceFlavor will be used as added resource for the CPU resource
spec:
  nodeLabels:  
    cloud.google.com/gke-provisioning: spot # This label was applied automatically by GKE

이 매니페스트에서 각 항목은 다음을 수행합니다.

ResourceFlavor on-demand에는 해당 라벨이 cloud.google.com/gke-provisioning: standard으로 설정됩니다.
ResourceFlavor spot에는 해당 라벨이 cloud.google.com/gke-provisioning: spot으로 설정됩니다.

워크로드에 ResourceFlavor가 할당되었으면 Kueue가 워크로드의 포드를 ResourceFlavor에 정의된 노드 라벨과 일치하는 노드에 할당합니다.

ResourceFlavor를 배포합니다.

kubectl apply -f flavors.yaml

ClusterQueue 및 LocalQueue 만들기

2개의 ClusterQueue인 cq-team-a 및 cq-team-b를 만들고 해당 LocalQueue인 lq-team-a 및 lq-team-b를 각각 team-a 및 team-b에 네임스페이스된 상태로 만듭니다.

ClusterQueue는 CPU, 메모리, 하드웨어 가속기와 같은 리소스 풀을 제어하는 클러스터 범위의 객체입니다. 일괄 작업 관리자는 이러한 객체의 가시성을 일괄 작업 사용자로 제한할 수 있습니다.

LocalQueue는 일괄 작업 사용자가 나열할 수 있는 네임스페이스된 객체입니다. 이것들은 LocalQueue 워크로드 실행을 위해 리소스가 할당된 CluterQueue에 연결됩니다.

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: cq-team-a
spec:
  cohort: all-teams # cq-team-a and cq-team-b share the same cohort
  namespaceSelector:
    matchLabels:
      kubernetes.io/metadata.name: team-a #Only team-a can submit jobs direclty to this queue, but will be able to share it through the cohort
  resourceGroups:
  - coveredResources: ["cpu", "memory"]
    flavors:
    - name: on-demand
      resources:
      - name: "cpu"
        nominalQuota: 10
        borrowingLimit: 5
      - name: "memory"
        nominalQuota: 10Gi
        borrowingLimit: 15Gi
    - name: spot # This ClusterQueue doesn't have nominalQuota for spot, but it can borrow from others
      resources:
      - name: "cpu"
        nominalQuota: 0
      - name: "memory"
        nominalQuota: 0
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  namespace: team-a # LocalQueue under team-a namespace
  name: lq-team-a
spec:
  clusterQueue: cq-team-a # Point to the ClusterQueue team-a-cq

ClusterQueue는 리소스가 여러 버전을 갖도록 허용합니다. 이 경우에는 두 ClusterQueue에 두 가지 버전인 on-demand 및 spot이 포함되고 각 버전이 cpu 리소스를 제공합니다. ResourceFlavor spot의 할당량은 0으로 설정되고 현재는 사용되지 않습니다.

두 ClusterQueue는 .spec.cohort에 정의된 all-teams라는 동일한 동질 집단을 공유합니다. 2개 이상의 ClusterQueue가 동일한 동질 집단을 공유하는 경우 서로 사용되지 않은 할당량을 대여할 수 있습니다.

동질 집단의 작동 방식과 대여 시맨틱스에 대한 자세한 내용은 Kueue 문서를 참조하세요.

ClusterQueues 및 LocalQueue를 배포합니다.

kubectl apply -f cq-team-a.yaml
kubectl apply -f cq-team-b.yaml

(선택사항) kube-prometheus를 사용하여 워크로드 모니터링

Prometheus를 사용하여 활성 및 대기 중인 Kueue 워크로드를 모니터링할 수 있습니다. 발생하는 워크로드를 모니터링하고 각 ClusterQueue에서 부하를 관찰하려면 네임스페이스 monitoring 아래의 클러스터에 kube-prometheus를 배포합니다.

Prometheus 연산자의 소스 코드를 다운로드합니다.

cd
git clone https://github.com/prometheus-operator/kube-prometheus.git

CustomResourceDefinition(CRD)을 만듭니다.

kubectl create -f kube-prometheus/manifests/setup

모니터링 구성요소를 만듭니다.

kubectl create -f kube-prometheus/manifests

prometheus-operator가 Kueue 구성요소에서 측정항목을 스크래핑하도록 허용합니다.

kubectl apply -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/prometheus.yaml

작업 디렉터리로 변경합니다.

cd kubernetes-engine-samples/batch/kueue-cohort

GKE 클러스터에서 실행되는 Prometheus 서비스에 대한 포트 전달을 설정합니다.
```
kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090
```
브라우저에서 localhost:9090으로 Prometheus 웹 UI를 엽니다.

Cloud Shell에서 다음을 수행합니다.
1. 웹 미리보기를 클릭합니다.
2. 포트 변경을 클릭하고 포트 번호를 9090로 설정합니다.
3. 변경 및 미리보기를 클릭합니다.
다음 Prometheus 웹 UI가 표시됩니다.
표현식 쿼리 상자에 다음 쿼리를 입력하여 cq-team-a ClusterQueue의 활성 워크로드를 모니터링하는 첫 번째 패널을 만듭니다.
```
kueue_pending_workloads{cluster_queue="cq-team-a", status="active"} or kueue_admitted_active_workloads{cluster_queue="cq-team-a"}
```
패널 추가를 클릭합니다.
표현식 쿼리 상자에 다음 쿼리를 입력하여 cq-team-b ClusterQueue의 활성 워크로드를 모니터링하는 다른 패널을 만듭니다.
```
kueue_pending_workloads{cluster_queue="cq-team-b", status="active"} or kueue_admitted_active_workloads{cluster_queue="cq-team-b"}
```
패널 추가를 클릭합니다.
표현식 쿼리 상자에 다음 쿼리를 입력하여 클러스터의 노드 수를 모니터링하는 패널을 만듭니다.
```
count(kube_node_info)
```

(선택사항) Google Cloud Managed Service for Prometheus를 사용하여 워크로드 모니터링

Google Cloud Managed Service for Prometheus를 사용하여 활성 및 대기 중인 Kueue 워크로드를 모니터링할 수 있습니다. 측정항목의 전체 목록은 Kueue 문서에서 확인할 수 있습니다.

측정항목 액세스를 위한 ID 및 RBAC를 설정합니다.

다음 구성은 Google Cloud Managed Service for Prometheus 수집기의 측정항목 액세스를 제공하는 Kubernetes 리소스 4개를 만듭니다.

kueue-system 네임스페이스 내의 kueue-metrics-reader라는 ServiceAccount는 Kueue 측정항목에 액세스할 때 인증하는 데 사용됩니다.
kueue-metrics-reader 서비스 계정과 연결된 Secret은 수집기가 Kueue 배포에서 노출된 측정항목 엔드포인트로 인증하는 데 사용되는 인증 토큰을 저장합니다.
kueue-system 네임스페이스의 kueue-secret-reader라는 Role로, 서비스 계정 토큰이 포함된 보안 비밀을 읽을 수 있습니다.
kueue-metrics-reader 서비스 계정에 kueue-metrics-reader ClusterRole을 부여하는 ClusterRoleBinding

apiVersion: v1
kind: ServiceAccount
metadata:
 name: kueue-metrics-reader
 namespace: kueue-system
---
apiVersion: v1
kind: Secret
metadata:
 name: kueue-metrics-reader-token
 namespace: kueue-system
 annotations:
   kubernetes.io/service-account.name: kueue-metrics-reader
type: kubernetes.io/service-account-token
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
 name: kueue-secret-reader
 namespace: kueue-system
rules:
-   resources:
 -   secrets
 apiGroups: [""]
 verbs: ["get", "list", "watch"]
 resourceNames: ["kueue-metrics-reader-token"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
 name: kueue-metrics-reader
subjects:
-   kind: ServiceAccount
 name: kueue-metrics-reader
 namespace: kueue-system
roleRef:
 kind: ClusterRole
 name: kueue-metrics-reader
 apiGroup: rbac.authorization.k8s.io

Google Cloud Managed Service for Prometheus의 RoleBinding을 구성합니다.

Autopilot 또는 Standard 클러스터를 사용하는지에 따라 gke-gmp-system 또는 gmp-system 네임스페이스에서 RoleBinding을 만들어야 합니다. 이 리소스를 사용하면 수집기 서비스 계정이 kueue-metrics-reader-token 보안 비밀에 액세스하여 Kueue 측정항목을 인증하고 스크랩할 수 있습니다.

Autopilot

  apiVersion: rbac.authorization.k8s.io/v1
  kind: RoleBinding
  metadata:
    name: gmp-system:collector:kueue-secret-reader
    namespace: kueue-system
  roleRef:
    name: kueue-secret-reader
    kind: Role
    apiGroup: rbac.authorization.k8s.io
  subjects:
  -   name: collector
    namespace: gke-gmp-system
    kind: ServiceAccount

표준

  apiVersion: rbac.authorization.k8s.io/v1
  kind: RoleBinding
  metadata:
    name: gmp-system:collector:kueue-secret-reader
    namespace: kueue-system
  roleRef:
    name: kueue-secret-reader
    kind: Role
    apiGroup: rbac.authorization.k8s.io
  subjects:
  -   name: collector
    namespace: gmp-system
    kind: ServiceAccount

포드 모니터링 리소스를 구성합니다.

다음 리소스는 Kueue 배포의 모니터링을 구성합니다. HTTPS를 통해 /metrics 경로에 측정항목이 노출되도록 지정합니다. 측정항목을 스크래핑할 때 인증을 위해 kueue-metrics-reader-token 보안 비밀을 사용합니다.

apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
name: kueue
namespace: kueue-system
spec:
selector:
 matchLabels:
   control-plane: controller-manager
endpoints:
-   port: 8443
 interval: 30s
 path: /metrics
 scheme: https
 tls:
   insecureSkipVerify: true
 authorization:
   type: Bearer
   credentials:
     secret:
       name: kueue-metrics-reader-token
       key: token

내보낸 측정항목 쿼리

Kueue 기반 시스템 모니터링을 위한 샘플 PromQL 쿼리

이러한 PromQL 쿼리를 사용하면 작업 처리량, 큐별 리소스 사용률, 워크로드 대기 시간과 같은 주요 Kueue 측정항목을 모니터링하여 시스템 성능을 파악하고 잠재적인 병목 현상을 식별할 수 있습니다.

작업 처리량

이렇게 하면 각 cluster_queue에 대해 5분 동안 허용된 워크로드의 초당 비율이 계산됩니다. 이 측정항목을 큐별로 분류하면 병목 현상을 정확히 파악하는 데 도움이 되고, 합산하면 전체 시스템 처리량을 알 수 있습니다.

쿼리:

sum(rate(kueue_admitted_workloads_total[5m])) by (cluster_queue)

리소스 사용률

metrics.enableClusterQueueResources가 사용 설정되어 있다고 가정합니다. 각 큐의 명목 CPU 할당량에 대한 현재 CPU 사용량의 비율을 계산합니다. 1에 가까운 값은 사용률이 높음을 나타냅니다. 리소스 라벨을 변경하여 메모리 또는 기타 리소스에 맞게 조정할 수 있습니다.

클러스터에 커스텀 구성된 출시 버전의 Kueue를 설치하려면 Kueue 문서를 따르세요.

쿼리:

sum(kueue_cluster_queue_resource_usage{resource="cpu"}) by (cluster_queue) / sum(kueue_cluster_queue_nominal_quota{resource="cpu"}) by (cluster_queue)

대기 시간

특정 큐의 워크로드에 대한 90번째 백분위수 대기 시간을 제공합니다. 대기 시간 분포를 파악하기 위해 분위수 값(예: 중앙값의 경우 0.5, 99번째 백분위수의 경우 0.99)을 수정할 수 있습니다.

쿼리:

histogram_quantile(0.9, kueue_admission_wait_time_seconds_bucket{cluster_queue="QUEUE_NAME"})

작업을 만들고 허용된 워크로드 관찰하기

이 섹션에서는 team-a 및 team-b 네임스페이스에서 Kubernetes 작업을 만듭니다. Kubernetes의 작업 컨트롤러는 하나 이상의 포드를 만들고 특정 작업을 성공적으로 실행하도록 지원합니다.

두 ClusterQueue 모두에 3개의 병렬 작업과 함께 10초 동안 절전 모드로 전환되는 작업을 생성합니다. 이러한 작업은 3개 작업이 모두 완료되었을 때 완료로 설정됩니다. 그리고 60초 후에 삭제됩니다.

apiVersion: batch/v1
kind: Job
metadata:
  namespace: team-a # Job under team-a namespace
  generateName: sample-job-team-a-
  labels:
    kueue.x-k8s.io/queue-name: lq-team-a # Point to the LocalQueue
spec:
  ttlSecondsAfterFinished: 60 # Job will be deleted after 60 seconds
  parallelism: 3 # This Job will have 3 replicas running at the same time
  completions: 3 # This Job requires 3 completions
  suspend: true # Set to true to allow Kueue to control the Job when it starts
  template:
    spec:
      containers:
      - name: dummy-job
        image: gcr.io/k8s-staging-perf-tests/sleep:latest
        args: ["10s"] # Sleep for 10 seconds
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
      restartPolicy: Never

job-team-a.yaml은 team-a 네임스페이스 아래에 작업을 만들고 LocalQueue lq-team-a 및 ClusterQueue cq-team-a를 가리킵니다.

마찬가지로 job-team-b.yaml은 team-b 네임스페이스 아래에 작업을 만들고 LocalQueue lq-team-b 및 ClusterQueue cq-team-b를 가리킵니다.

새 터미널을 시작하고 이 스크립트를 실행하여 1초마다 작업을 생성합니다.
```
./create_jobs.sh job-team-a.yaml 1
```
다른 터미널을 시작하고 team-b 네임스페이스에 대해 작업을 완료합니다.
```
./create_jobs.sh job-team-b.yaml 1
```
Prometheus에서 큐에 추가되는 작업을 관찰합니다. 또는 다음 명령어를 사용합니다.
```
watch -n 2 kubectl get clusterqueues -o wide
```

출력은 다음과 비슷하게 표시됩니다.

    NAME        COHORT      STRATEGY         PENDING WORKLOADS   ADMITTED WORKLOADS
    cq-team-a   all-teams   BestEffortFIFO   0                   5
    cq-team-b   all-teams   BestEffortFIFO   0                   4

동질 집단으로 사용되지 않은 할당량 대여

ClusterQueue가 항상 전체 용량으로 실행되지는 않을 수 있습니다. 워크로드가 ClusterQueue 간에 고르게 분포되지 않은 경우 할당량 사용이 극대화되지 않습니다. ClusterQueue가 서로 동일한 동질 집단을 공유하는 경우 할당량 사용률을 극대화하기 위해 ClusterQueue가 다른 ClusterQueue에서 할당량을 대여할 수 있습니다.

두 가지 ClusterQueue cq-team-a 및 cq-team-b에 대해 큐에 추가된 작업이 있으면 해당 터미널에서 CTRL+c를 눌러서 team-b 네임스페이스에 대해 스크립트를 중지합니다.
team-b 네임스페이스에서 모든 대기중 작업이 처리된 다음에는 team-a 네임스페이스의 작업이 cq-team-b에서 사용 가능한 리소스를 대여할 수 있습니다.
```
kubectl describe clusterqueue cq-team-a
```
cq-team-a 및 cq-team-b가 all-teams라는 동일한 동질 집단을 공유하기 때문에 이러한 ClusterQueue가 사용되지 않은 리소스를 공유할 수 있습니다.
```
  Flavors Usage:
    Name:  on-demand
    Resources:
      Borrowed:  5
      Name:      cpu
      Total:     15
      Borrowed:  5Gi
      Name:      memory
      Total:     15Gi
```

team-b 네임스페이스의 스크립트를 재개합니다.

./create_jobs.sh job-team-b.yaml 3

cq-team-b의 리소스가 자체 워크로드에 사용되는 동안 cq-team-a에서 대여한 리소스가 0으로 돌아가는 방식을 관찰합니다.

kubectl describe clusterqueue cq-team-a

  Flavors Usage:
    Name:  on-demand
    Resources:
      Borrowed:  0
      Name:      cpu
      Total:     9
      Borrowed:  0
      Name:      memory
      Total:     9Gi

스팟 VM으로 할당량 증가

대기중 워크로드에서 높은 수요를 충족시키는 등의 목적으로 할당량을 일시적으로 늘려야 할 경우 동질 집단에 ClusterQueue를 더 추가하여 수요를 수용할 수 있도록 Kueue를 구성할 수 있습니다. 사용되지 않은 리소스가 포함된 ClusterQueue는 동일한 동질 집단에 속하는 다른 ClusterQueue에 이러한 리소스를 공유할 수 있습니다.

튜토리얼의 시작 부분에서는 라벨을 cloud.google.com/gke-provisioning: spot으로 설정하여 스팟 VM과 spot이라는 ResourceFlavor를 사용하여 spot이라는 노드 풀을 만들었습니다. 이 노드 풀과 이를 나타내는 ResourceFlavor를 사용하도록 ClusterQueue를 만듭니다.

동질 집단이 all-teams로 설정된 cq-spot이라는 새 ClusterQueue를 만듭니다.

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: spot-cq
spec:
  cohort: all-teams # Same cohort as cq-team-a and cq-team-b
  resourceGroups:
  - coveredResources: ["cpu", "memory"]
    flavors:
    - name: spot
      resources:
      - name: "cpu"
        nominalQuota: 40
      - name: "memory"
        nominalQuota: 144Gi

이 ClusterQueue는 동일한 동질 집단을 cq-team-a 및 cq-team-b와 공유하기 때문에 ClusterQueue cq-team-a 및 cq-team-b가 모두 최대 15개 CPU 요청과 15Gi 메모리까지 리소스를 대여할 수 있습니다.

kubectl apply -f cq-spot.yaml

Prometheus에서 동일한 동질 집단을 공유하는 cq-spot으로 추가된 할당량으로 인해 cq-team-a 및 cq-team-b 모두에 대해 인정된 워크로드가 급증하는 방식을 관찰합니다. 또는 다음 명령어를 사용합니다.
```
watch -n 2 kubectl get clusterqueues -o wide
```
Prometheus에서 클러스터의 노드 수를 관찰합니다. 또는 다음 명령어를 사용합니다.
```
watch -n 2 kubectl get nodes -o wide
```
team-a 및 team-b 네임스페이스에 대해 CTRL+c를 눌러서 두 스크립트를 중지합니다.

삭제

이 튜토리얼에서 사용된 리소스 비용이 Google Cloud 계정에 청구되지 않도록 하려면 리소스가 포함된 프로젝트를 삭제하거나 프로젝트를 유지하고 개별 리소스를 삭제하세요.

프로젝트 삭제

Google Cloud 콘솔에서 리소스 관리 페이지로 이동합니다.
리소스 관리로 이동
프로젝트 목록에서 삭제할 프로젝트를 선택하고 삭제를 클릭합니다.
대화상자에서 프로젝트 ID를 입력한 후 종료를 클릭하여 프로젝트를 삭제합니다.

개별 리소스 삭제

Kueue 할당량 시스템을 삭제합니다.

kubectl delete -n team-a localqueue lq-team-a
kubectl delete -n team-b localqueue lq-team-b
kubectl delete clusterqueue cq-team-a
kubectl delete clusterqueue cq-team-b
kubectl delete clusterqueue cq-spot
kubectl delete resourceflavor default
kubectl delete resourceflavor on-demand
kubectl delete resourceflavor spot

Kueue 매니페스트를 삭제합니다.

VERSION=VERSION
kubectl delete -f \
  https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml

다음과 같이 클러스터를 삭제합니다.

gcloud container clusters delete kueue-cohort --location=COMPUTE_REGION