在 Google Kubernetes Engine (GKE) 上,使用 Stable Diffusion 模型部署 Ray Serve 應用程式

本指南提供範例,說明如何使用 Ray ServeRay Operator 外掛程式,在 Google Kubernetes Engine (GKE) 上部署及提供 Stable Diffusion 模型。

關於 Ray 和 Ray Serve

Ray 是開放原始碼的可擴充運算架構,適用於 AI/ML 應用程式。Ray Serve 是 Ray 適用的模型服務程式庫,用於在分散式環境中擴充及提供模型。詳情請參閱 Ray 說明文件中的「Ray Serve」。

您可以使用 RayCluster 或 RayService 資源部署 Ray Serve 應用程式。在實際工作環境中,您應使用 RayService 資源,原因如下:

  • RayService 應用程式的就地更新
  • RayCluster 資源升級時完全不必停機
  • 高可用性的 Ray Serve 應用程式

目標

本指南適用於 Generative AI 客戶、GKE 新手或現有使用者、機器學習工程師、MLOps (DevOps) 工程師,或是有意使用 Kubernetes 容器自動化調度管理功能,透過 Ray 服務模型的平台管理員。

  • 建立具有 GPU 節點集區的 GKE 叢集。
  • 使用 RayCluster 自訂資源建立 Ray 叢集。
  • 執行 Ray Serve 應用程式。
  • 部署 RayService 自訂資源。

費用

在本文件中,您會使用下列 Google Cloud的計費元件:

您可以使用 Pricing Calculator,根據預測用量估算費用。

初次使用 Google Cloud 的使用者可能符合免費試用期資格。

完成本文所述工作後,您可以刪除建立的資源,避免繼續計費,詳情請參閱「清除所用資源」。

事前準備

Cloud Shell 已預先安裝本教學課程所需的軟體,包括 kubectlgcloud CLI。如果您未使用 Cloud Shell,則必須安裝 gcloud CLI。

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. Install the Google Cloud CLI.

  3. 若您採用的是外部識別資訊提供者 (IdP),請先使用聯合身分登入 gcloud CLI

  4. 執行下列指令,初始化 gcloud CLI:

    gcloud init
  5. Create or select a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the GKE API:

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    gcloud services enable container.googleapis.com
  8. Install the Google Cloud CLI.

  9. 若您採用的是外部識別資訊提供者 (IdP),請先使用聯合身分登入 gcloud CLI

  10. 執行下列指令,初始化 gcloud CLI:

    gcloud init
  11. Create or select a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  12. Verify that billing is enabled for your Google Cloud project.

  13. Enable the GKE API:

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    gcloud services enable container.googleapis.com
  14. Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/container.clusterAdmin, roles/container.admin

    gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE

    Replace the following:

    • PROJECT_ID: Your project ID.
    • USER_IDENTIFIER: The identifier for your user account. For example, myemail@example.com.
    • ROLE: The IAM role that you grant to your user account.
  15. 準備環境

    如要準備環境,請按照下列步驟操作:

    1. 在 Google Cloud 控制台中,按一下Cloud Shell 啟用圖示「啟用 Cloud Shell」,即可啟動 Cloud Shell 工作階段。Google Cloud 控制台。 系統會在 Google Cloud 控制台的底部窗格啟動工作階段。

    2. 設定環境變數:

      export PROJECT_ID=PROJECT_ID
      export CLUSTER_NAME=rayserve-cluster
      export COMPUTE_REGION=us-central1
      export COMPUTE_ZONE=us-central1-c
      export CLUSTER_VERSION=CLUSTER_VERSION
      export TUTORIAL_HOME=`pwd`
      

      更改下列內容:

      • PROJECT_ID:您的 Google Cloud 專案 ID
      • CLUSTER_VERSION:要使用的 GKE 版本。必須為 1.30.1 或之後。
    3. 複製 GitHub 存放區:

      git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
      
    4. 變更為工作目錄:

      cd kubernetes-engine-samples/ai-ml/gke-ray/rayserve/stable-diffusion
      
    5. 建立 Python 虛擬環境:

      venv

      python -m venv myenv && \
      source myenv/bin/activate
      

      Conda

      1. 安裝 Conda

      2. 執行下列指令:

        conda create -c conda-forge python=3.9.19 -n myenv && \
        conda activate myenv
        

      使用 serve run 部署 Serve 應用程式時,Ray 會要求本機用戶端的 Python 版本與 Ray 叢集使用的版本相符。rayproject/ray:2.37.0 映像檔使用 Python 3.9。如果您執行的是其他用戶端版本,請選取適當的 Ray 映像檔

    6. 安裝執行 Serve 應用程式所需的依附元件:

      pip install ray[serve]==2.37.0
      pip install torch
      pip install requests
      

    建立具有 GPU 節點集區的叢集

    建立具有 GPU 節點集區的 Autopilot 或 Standard GKE 叢集:

    Autopilot

    建立 Autopilot 叢集:

    gcloud container clusters create-auto ${CLUSTER_NAME}  \
        --enable-ray-operator \
        --cluster-version=${CLUSTER_VERSION} \
        --location=${COMPUTE_REGION}
    

    標準

    1. 建立標準叢集:

      gcloud container clusters create ${CLUSTER_NAME} \
          --addons=RayOperator \
          --cluster-version=${CLUSTER_VERSION}  \
          --machine-type=c3d-standard-8 \
          --location=${COMPUTE_ZONE} \
          --num-nodes=1
      
    2. 建立 GPU 節點集區:

      gcloud container node-pools create gpu-pool \
          --cluster=${CLUSTER_NAME} \
          --machine-type=g2-standard-8 \
          --location=${COMPUTE_ZONE} \
          --num-nodes=1 \
          --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest
      

    部署 RayCluster 資源

    如要部署 RayCluster 資源:

    1. 請查看下列資訊清單:

      apiVersion: ray.io/v1
      kind: RayCluster
      metadata:
        name: stable-diffusion-cluster
      spec:
        rayVersion: '2.37.0'
        headGroupSpec:
          rayStartParams:
            dashboard-host: '0.0.0.0'
          template:
            metadata:
            spec:
              containers:
              - name: ray-head
                image: rayproject/ray:2.37.0
                ports:
                - containerPort: 6379
                  name: gcs
                - containerPort: 8265
                  name: dashboard
                - containerPort: 10001
                  name: client
                - containerPort: 8000
                  name: serve
                resources:
                  limits:
                    cpu: "2"
                    ephemeral-storage: "15Gi"
                    memory: "8Gi"
                  requests:
                    cpu: "2"
                    ephemeral-storage: "15Gi"
                    memory: "8Gi"
              nodeSelector:
                cloud.google.com/machine-family: c3d
        workerGroupSpecs:
        - replicas: 1
          minReplicas: 1
          maxReplicas: 4
          groupName: gpu-group
          rayStartParams: {}
          template:
            spec:
              containers:
              - name: ray-worker
                image: rayproject/ray:2.37.0-gpu
                resources:
                  limits:
                    cpu: 4
                    memory: "16Gi"
                    nvidia.com/gpu: 1
                  requests:
                    cpu: 3
                    memory: "16Gi"
                    nvidia.com/gpu: 1
              nodeSelector:
                cloud.google.com/gke-accelerator: nvidia-l4

      這個資訊清單說明 RayCluster 資源。

    2. 將資訊清單套用至叢集:

      kubectl apply -f ray-cluster.yaml
      
    3. 確認 RayCluster 資源已準備就緒:

      kubectl get raycluster
      

      輸出結果會與下列內容相似:

      NAME                       DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
      stable-diffusion-cluster   2                 2                   6      20Gi     0      ready    33s
      

      在這個輸出內容中,STATUS 欄中的 ready 表示 RayCluster 資源已準備就緒。

    連線至 RayCluster 資源

    如要連線至 RayCluster 資源:

    1. 確認 GKE 是否已建立 RayCluster 服務:

      kubectl get svc stable-diffusion-cluster-head-svc
      

      輸出結果會與下列內容相似:

      NAME                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                AGE
      pytorch-mnist-cluster-head-svc   ClusterIP   34.118.238.247   <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP   109s
      
    2. 建立通訊埠轉送工作階段,將流量轉送至 Ray 首節點:

      kubectl port-forward svc/stable-diffusion-cluster-head-svc 8265:8265 2>&1 >/dev/null &
      kubectl port-forward svc/stable-diffusion-cluster-head-svc 10001:10001 2>&1 >/dev/null &
      
    3. 確認 Ray 用戶端可以使用 localhost 連線至 Ray 叢集:

      ray list nodes --address http://localhost:8265
      

      輸出結果會與下列內容相似:

      ======== List: 2024-06-19 15:15:15.707336 ========
      Stats:
      ------------------------------
      Total: 3
      
      Table:
      ------------------------------
          NODE_ID                                                   NODE_IP     IS_HEAD_NODE    STATE    NODE_NAME    RESOURCES_TOTAL                 LABELS
      0  1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2  10.28.1.21  False           ALIVE    10.28.1.21   CPU: 2.0                        ray.io/node_id: 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2
      # Several lines of output omitted
      

    執行 Ray Serve 應用程式

    如要執行 Ray Serve 應用程式,請按照下列步驟操作:

    1. 執行 Stable Diffusion Ray Serve 應用程式:

      serve run stable_diffusion:entrypoint --working-dir=. --runtime-env-json='{"pip": ["torch", "torchvision", "diffusers==0.12.1", "huggingface_hub==0.25.2", "transformers", "fastapi==0.113.0"], "excludes": ["myenv"]}' --address ray://localhost:10001
      
      

      輸出結果會與下列內容相似:

      2024-06-19 18:20:58,444 INFO scripts.py:499 -- Running import path: 'stable_diffusion:entrypoint'.
      2024-06-19 18:20:59,730 INFO packaging.py:530 -- Creating a file package for local directory '.'.
      2024-06-19 18:21:04,833 INFO handle.py:126 -- Created DeploymentHandle 'hyil6u9f' for Deployment(name='StableDiffusionV2', app='default').
      2024-06-19 18:21:04,834 INFO handle.py:126 -- Created DeploymentHandle 'xo25rl4k' for Deployment(name='StableDiffusionV2', app='default').
      2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle '57x9u4fp' for Deployment(name='APIIngress', app='default').
      2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'xr6kt85t' for Deployment(name='StableDiffusionV2', app='default').
      2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'g54qagbz' for Deployment(name='APIIngress', app='default').
      2024-06-19 18:21:19,139 INFO handle.py:126 -- Created DeploymentHandle 'iwuz00mv' for Deployment(name='APIIngress', app='default').
      2024-06-19 18:21:19,139 INFO api.py:583 -- Deployed app 'default' successfully.
      
    2. 建立通訊埠轉送工作階段至 Ray Serve 通訊埠 (8000):

      kubectl port-forward svc/stable-diffusion-cluster-head-svc 8000:8000 2>&1 >/dev/null &
      
    3. 執行 Python 指令碼:

      python generate_image.py
      

      這項指令碼會將圖片生成至名為 output.png 的檔案。如下圖所示:

      日落時分的海灘。Stable Diffusion 生成的圖片。

    部署 RayService

    RayService 自訂資源可管理 RayCluster 資源和 Ray Serve 應用程式的生命週期。

    如要進一步瞭解 RayService,請參閱 Ray 說明文件中的「Deploy Ray Serve Applications」(部署 Ray Serve 應用程式) 和「Production Guide」(正式版指南)。

    如要部署 RayService 資源,請按照下列步驟操作:

    1. 請查看下列資訊清單:

      apiVersion: ray.io/v1
      kind: RayService
      metadata:
        name: stable-diffusion
      spec:
        serveConfigV2: |
          applications:
            - name: stable_diffusion
              import_path: ai-ml.gke-ray.rayserve.stable-diffusion.stable_diffusion:entrypoint
              runtime_env:
                working_dir: "https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/archive/main.zip"
                pip: ["diffusers==0.12.1", "torch", "torchvision", "huggingface_hub==0.25.2", "transformers"]
        rayClusterConfig:
          rayVersion: '2.37.0'
          headGroupSpec:
            rayStartParams:
              dashboard-host: '0.0.0.0'
            template:
              spec:
                containers:
                - name: ray-head
                  image:  rayproject/ray:2.37.0
                  ports:
                  - containerPort: 6379
                    name: gcs
                  - containerPort: 8265
                    name: dashboard
                  - containerPort: 10001
                    name: client
                  - containerPort: 8000
                    name: serve
                  resources:
                    limits:
                      cpu: "2"
                      ephemeral-storage: "15Gi"
                      memory: "8Gi"
                    requests:
                      cpu: "2"
                      ephemeral-storage: "15Gi"
                      memory: "8Gi"
                nodeSelector:
                  cloud.google.com/machine-family: c3d
          workerGroupSpecs:
          - replicas: 1
            minReplicas: 1
            maxReplicas: 4
            groupName: gpu-group
            rayStartParams: {}
            template:
              spec:
                containers:
                - name: ray-worker
                  image: rayproject/ray:2.37.0-gpu
                  resources:
                    limits:
                      cpu: 4
                      memory: "16Gi"
                      nvidia.com/gpu: 1
                    requests:
                      cpu: 3
                      memory: "16Gi"
                      nvidia.com/gpu: 1
                nodeSelector:
                  cloud.google.com/gke-accelerator: nvidia-l4

      這個資訊清單說明 RayService 自訂資源。

    2. 將資訊清單套用至叢集:

      kubectl apply -f ray-service.yaml
      
    3. 確認服務已準備就緒:

      kubectl get svc stable-diffusion-serve-svc
      

      輸出結果會與下列內容相似:

      NAME                         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
      
      stable-diffusion-serve-svc   ClusterIP   34.118.236.0   <none>        8000/TCP   31m
      
    4. 設定通訊埠轉送至 Ray Serve 服務:

      kubectl port-forward svc/stable-diffusion-serve-svc 8000:8000 2>&1 >/dev/null &
      
    5. 執行上一節的 Python 指令碼:

      python generate_image.py
      

      這個指令碼會生成類似上一節的圖片。

    觀察 Ray 工作負載

    如要查看 RayJob 的詳細資料,請前往 Google Cloud 控制台的「Kubernetes Engine > AI/ML > Jobs」部分。

    在 Google Cloud 控制台中查看 RayJobs

    清除所用資源

    刪除專案

      Delete a Google Cloud project:

      gcloud projects delete PROJECT_ID

    刪除個別資源

    如要刪除叢集,請輸入:

    gcloud container clusters delete ${CLUSTER_NAME}
    

    後續步驟