在 GKE 透過 Prometheus 觀測應用程式

本教學課程說明如何使用開放原始碼 Prometheus,為部署至 Google Kubernetes Engine (GKE) 的應用程式微服務設定存活探查。

本教學課程使用開放原始碼的 Prometheus。不過,每個 GKE Autopilot 叢集都會自動部署 Managed Service for Prometheus,這是 Prometheus 指標的Google Cloud全代管多雲跨專案解決方案。Managed Service for Prometheus 可讓您使用 Prometheus 監控世界各地的部署項目並接收快訊,而且無須大規模管理及操作 Prometheus。

您也可以使用 Grafana 等開放原始碼工具,將 Prometheus 收集的指標視覺化。

準備環境

在本教學課程中,您將使用 Cloud Shell 管理Google Cloud上託管的資源。

  1. 設定預設環境變數:

    gcloud config set project PROJECT_ID
    gcloud config set compute/region CONTROL_PLANE_LOCATION
    

    更改下列內容:

    • PROJECT_ID:您的 Google Cloud 專案 ID
    • CONTROL_PLANE_LOCATION:叢集控制層的 Compute Engine 區域。在本教學課程中,區域為 us-central1。通常會希望將函式部署到靠近您所在位置的區域。
  2. 複製本教學課程中使用的範例存放區:

    git clone https://github.com/GoogleCloudPlatform/bank-of-anthos.git
    cd bank-of-anthos/
    
  3. 建立叢集:

    gcloud container clusters create-auto CLUSTER_NAME \
        --release-channel=CHANNEL_NAME \
        --location=CONTROL_PLANE_LOCATION
    

    更改下列內容:

    • CLUSTER_NAME:新叢集的名稱。
    • CHANNEL_NAME發布版本的名稱。

部署 Prometheus

使用範例 Helm 資訊套件安裝 Prometheus:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install tutorial bitnami/kube-prometheus \
    --version 8.2.2 \
    --values extras/prometheus/oss/values.yaml \
    --wait

這項指令會安裝 Prometheus 和下列元件:

  • Prometheus Operator: 部署及設定開放原始碼 Prometheus 的熱門方式。
  • Alertmanager: 處理 Prometheus 伺服器傳送的快訊,並將快訊轉送至應用程式, 例如 Slack。
  • Blackbox 匯出工具: 可讓 Prometheus 使用 HTTP、HTTPS、DNS、TCP、ICMP 和 gRPC 探查端點。

部署 Bank of Anthos

部署 Bank of Anthos 範例應用程式:

kubectl apply -f extras/jwt/jwt-secret.yaml
kubectl apply -f kubernetes-manifests

Slack 通知

如要設定 Slack 通知,您必須建立 Slack 應用程式、為該應用程式啟用「傳入 Webhook」,並將應用程式安裝至 Slack 工作區。

建立 Slack 應用程式

  1. 加入 Slack 工作區,方法是註冊電子郵件或使用工作區管理員傳送的邀請。

  2. 使用工作區名稱和 Slack 帳戶憑證登入 Slack

  3. 建立新的 Slack 應用程式

    1. 在「建立應用程式」對話方塊中,按一下「從頭開始」
    2. 指定「應用程式名稱」,然後選擇 Slack 工作區。
    3. 點選「建立應用程式」
    4. 在「新增功能」下方,按一下「連入的 Webhook」
    5. 按一下「啟用連入的 Webhook」切換按鈕。
    6. 在「Webhook URLs for Your Workspace」部分,按一下「Add New Webhook to Workspace」
    7. 在隨即開啟的授權頁面中,選取要接收通知的頻道。
    8. 按一下「Allow」
    9. Slack 應用程式的 Webhook 會顯示在「Webhook URLs for Your Workspace」(工作區的 Webhook 網址) 部分。儲存網址,稍後會用到。

設定 Alertmanager

建立 Kubernetes 密鑰以儲存 Webhook URL:

kubectl create secret generic alertmanager-slack-webhook --from-literal webhookURL=SLACK_WEBHOOK_URL
kubectl apply -f extras/prometheus/oss/alertmanagerconfig.yaml

SLACK_WEBHOOK_URL 替換成上一節的 Webhook 網址。

設定 Prometheus

  1. 請查看下列資訊清單:

    # Copyright 2023 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #      http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: frontend-probe
    spec:
      jobName: frontend
      prober:
        url: tutorial-kube-prometheus-blackbox-exporter:19115
        path: /probe
      module: http_2xx
      interval: 60s
      scrapeTimeout: 30s
      targets:
        staticConfig:
          labels:
            app: bank-of-anthos
          static:
            - frontend:80
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: userservice-probe
    spec:
      jobName: userservice
      prober:
        url: tutorial-kube-prometheus-blackbox-exporter:19115
        path: /probe
      module: http_2xx
      interval: 60s
      scrapeTimeout: 30s
      targets:
        staticConfig:
          labels:
            app: bank-of-anthos
          static:
            - userservice:8080/ready
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: balancereader-probe
    spec:
      jobName: balancereader
      prober:
        url: tutorial-kube-prometheus-blackbox-exporter:19115
        path: /probe
      module: http_2xx
      interval: 60s
      scrapeTimeout: 30s
      targets:
        staticConfig:
          labels:
            app: bank-of-anthos
          static:
            - balancereader:8080/ready
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: contacts-probe
    spec:
      jobName: contacts
      prober:
        url: tutorial-kube-prometheus-blackbox-exporter:19115
        path: /probe
      module: http_2xx
      interval: 60s
      scrapeTimeout: 30s
      targets:
        staticConfig:
          labels:
            app: bank-of-anthos
          static:
            - contacts:8080/ready
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: ledgerwriter-probe
    spec:
      jobName: ledgerwriter
      prober:
        url: tutorial-kube-prometheus-blackbox-exporter:19115
        path: /probe
      module: http_2xx
      interval: 60s
      scrapeTimeout: 30s
      targets:
        staticConfig:
          labels:
            app: bank-of-anthos
          static:
            - ledgerwriter:8080/ready
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: Probe
    metadata:
      name: transactionhistory-probe
    spec:
      jobName: transactionhistory
      prober:
        url: tutorial-kube-prometheus-blackbox-exporter:19115
        path: /probe
      module: http_2xx
      interval: 60s
      scrapeTimeout: 30s
      targets:
        staticConfig:
          labels:
            app: bank-of-anthos
          static:
            - transactionhistory:8080/ready
    

    這份資訊清單說明 Prometheus 活躍性探測結果,並包含下列欄位:

    • spec.jobName:指派給已擷取指標的工作名稱。
    • spec.prober.url:黑箱匯出工具的服務網址。包括 Helm 圖表中定義的 Blackbox Exporter 預設通訊埠。
    • spec.prober.path:指標集合路徑。
    • spec.targets.staticConfig.labels:從目標擷取的所有指標所指派的標籤。
    • spec.targets.staticConfig.static:要探查的主機清單。
  2. 將資訊清單套用至叢集:

    kubectl apply -f extras/prometheus/oss/probes.yaml
    
  3. 請查看下列資訊清單:

    # Copyright 2023 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #      http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      name: uptime-rule
    spec:
      groups:
      - name: Micro services uptime
        interval: 60s
        rules:
        - alert: BalancereaderUnavaiable
          expr: probe_success{app="bank-of-anthos",job="balancereader"} == 0
          for: 1m
          annotations:
            summary: Balance Reader Service is unavailable
            description: Check Balance Reader pods and it's logs
          labels:
            severity: 'critical'
        - alert: ContactsUnavaiable
          expr: probe_success{app="bank-of-anthos",job="contacts"} == 0
          for: 1m
          annotations:
            summary: Contacs Service is unavailable
            description: Check Contacs pods and it's logs
          labels:
            severity: 'warning'
        - alert: FrontendUnavaiable
          expr: probe_success{app="bank-of-anthos",job="frontend"} == 0
          for: 1m
          annotations:
            summary: Frontend Service is unavailable
            description: Check Frontend pods and it's logs
          labels:
            severity: 'critical'
        - alert: LedgerwriterUnavaiable
          expr: probe_success{app="bank-of-anthos",job="ledgerwriter"} == 0
          for: 1m
          annotations:
            summary: Ledger Writer Service is unavailable
            description: Check Ledger Writer pods and it's logs
          labels:
            severity: 'critical'
        - alert: TransactionhistoryUnavaiable
          expr: probe_success{app="bank-of-anthos",job="transactionhistory"} == 0
          for: 1m
          annotations:
            summary: Transaction History Service is unavailable
            description: Check Transaction History pods and it's logs
          labels:
            severity: 'critical'
        - alert: UserserviceUnavaiable
          expr: probe_success{app="bank-of-anthos",job="userservice"} == 0
          for: 1m
          annotations:
            summary: User Service is unavailable
            description: Check User Service pods and it's logs
          labels:
            severity: 'critical'
    

    這個資訊清單說明 PrometheusRule,並包含下列欄位:

    • spec.groups.[*].name:規則群組的名稱。
    • spec.groups.[*].interval:評估群組中規則的頻率。
    • spec.groups.[*].rules[*].alert:快訊名稱。
    • spec.groups.[*].rules[*].expr:要評估的 PromQL 運算式。
    • spec.groups.[*].rules[*].for:快訊必須回報的時間長度,系統才會視為啟動。
    • spec.groups.[*].rules[*].annotations:要新增至每項快訊的註解清單。這項設定僅適用於快訊規則。
    • spec.groups.[*].rules[*].labels:要新增或覆寫的標籤。
  4. 將資訊清單套用至叢集:

    kubectl apply -f extras/prometheus/oss/rules.yaml
    

模擬服務中斷

  1. contacts Deployment 縮減為零,模擬服務中斷情形:

    kubectl scale deployment contacts --replicas 0
    

    Slack 工作區頻道中應該會顯示通知訊息。GKE 最多可能需要 5 分鐘才能調整 Deployment 的規模。

  2. 還原 contacts Deployment:

    kubectl scale deployment contacts --replicas 1
    

    Slack 工作區頻道中應該會顯示快訊解決通知訊息。GKE 最多可能需要 5 分鐘才能擴大 Deployment。