在 Google Kubernetes Engine (GKE) 上，使用 Stable Diffusion 模型部署 Ray Serve 應用程式

自動駕駛標準

本指南提供範例，說明如何使用 Ray Serve 和 Ray Operator 外掛程式，在 Google Kubernetes Engine (GKE) 上部署及提供 Stable Diffusion 模型。

關於 Ray 和 Ray Serve

Ray 是開放原始碼的可擴充運算架構，適用於 AI/ML 應用程式。Ray Serve 是 Ray 適用的模型服務程式庫，用於在分散式環境中擴充及提供模型。詳情請參閱 Ray 說明文件中的「Ray Serve」。

您可以使用 RayCluster 或 RayService 資源部署 Ray Serve 應用程式。在實際工作環境中，您應使用 RayService 資源，原因如下：

RayService 應用程式的就地更新
RayCluster 資源升級時完全不必停機
高可用性的 Ray Serve 應用程式

目標

本指南適用於 Generative AI 客戶、GKE 新手或現有使用者、機器學習工程師、MLOps (DevOps) 工程師，或是有意使用 Kubernetes 容器自動化調度管理功能，透過 Ray 服務模型的平台管理員。

建立具有 GPU 節點集區的 GKE 叢集。
使用 RayCluster 自訂資源建立 Ray 叢集。
執行 Ray Serve 應用程式。
部署 RayService 自訂資源。

費用

在本文件中，您會使用下列 Google Cloud的計費元件：

您可以使用 Pricing Calculator，根據預測用量估算費用。

初次使用 Google Cloud 的使用者可能符合免費試用期資格。

完成本文所述工作後，您可以刪除建立的資源，避免繼續計費，詳情請參閱「清除所用資源」。

事前準備

Cloud Shell 已預先安裝本教學課程所需的軟體，包括 kubectl 和 gcloud CLI。如果您未使用 Cloud Shell，則必須安裝 gcloud CLI。

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Install the Google Cloud CLI.

注意：如果您先前已安裝 gcloud CLI，請執行 gcloud components update，確保您使用的是最新版本。

若您採用的是外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。

執行下列指令，初始化 gcloud CLI：

gcloud init

Create or select a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project.

Enable the GKE API:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

gcloud services enable container.googleapis.com

Install the Google Cloud CLI.

注意：如果您先前已安裝 gcloud CLI，請執行 gcloud components update，確保您使用的是最新版本。

若您採用的是外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。

執行下列指令，初始化 gcloud CLI：

gcloud init

Create or select a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project.

Enable the GKE API:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

gcloud services enable container.googleapis.com

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/container.clusterAdmin, roles/container.admin
```
gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
```
Replace the following:
- PROJECT_ID: Your project ID.
- USER_IDENTIFIER: The identifier for your user account. For example, myemail@example.com.
- ROLE: The IAM role that you grant to your user account.

準備環境

如要準備環境，請按照下列步驟操作：

在 Google Cloud 控制台中，按一下「啟用 Cloud Shell」，即可啟動 Cloud Shell 工作階段。Google Cloud 控制台。系統會在 Google Cloud 控制台的底部窗格啟動工作階段。

設定環境變數：

export PROJECT_ID=PROJECT_ID
export CLUSTER_NAME=rayserve-cluster
export COMPUTE_REGION=us-central1
export COMPUTE_ZONE=us-central1-c
export CLUSTER_VERSION=CLUSTER_VERSION
export TUTORIAL_HOME=`pwd`

更改下列內容：

PROJECT_ID：您的 Google Cloud 專案 ID。
CLUSTER_VERSION：要使用的 GKE 版本。必須為 1.30.1 或之後。

複製 GitHub 存放區：

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples

變更為工作目錄：

cd kubernetes-engine-samples/ai-ml/gke-ray/rayserve/stable-diffusion

建立 Python 虛擬環境：
venv
```
python -m venv myenv && \
source myenv/bin/activate
```
Conda
1. 安裝 Conda。
2. 執行下列指令：
  conda create -c conda-forge python=3.9.19 -n myenv && \ conda activate myenv
使用 serve run 部署 Serve 應用程式時，Ray 會要求本機用戶端的 Python 版本與 Ray 叢集使用的版本相符。rayproject/ray:2.37.0 映像檔使用 Python 3.9。如果您執行的是其他用戶端版本，請選取適當的 Ray 映像檔。

安裝執行 Serve 應用程式所需的依附元件：

pip install ray[serve]==2.37.0
pip install torch
pip install requests

建立具有 GPU 節點集區的叢集

建立具有 GPU 節點集區的 Autopilot 或 Standard GKE 叢集：

Autopilot

建立 Autopilot 叢集：

gcloud container clusters create-auto ${CLUSTER_NAME}  \
    --enable-ray-operator \
    --cluster-version=${CLUSTER_VERSION} \
    --location=${COMPUTE_REGION}

標準

建立標準叢集：

gcloud container clusters create ${CLUSTER_NAME} \
    --addons=RayOperator \
    --cluster-version=${CLUSTER_VERSION}  \
    --machine-type=c3d-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=1

建立 GPU 節點集區：

gcloud container node-pools create gpu-pool \
    --cluster=${CLUSTER_NAME} \
    --machine-type=g2-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=1 \
    --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest

部署 RayCluster 資源

如要部署 RayCluster 資源：

請查看下列資訊清單：

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: stable-diffusion-cluster
spec:
  rayVersion: '2.37.0'
  headGroupSpec:
    rayStartParams:
      dashboard-host: '0.0.0.0'
    template:
      metadata:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.37.0
          ports:
          - containerPort: 6379
            name: gcs
          - containerPort: 8265
            name: dashboard
          - containerPort: 10001
            name: client
          - containerPort: 8000
            name: serve
          resources:
            limits:
              cpu: "2"
              ephemeral-storage: "15Gi"
              memory: "8Gi"
            requests:
              cpu: "2"
              ephemeral-storage: "15Gi"
              memory: "8Gi"
        nodeSelector:
          cloud.google.com/machine-family: c3d
  workerGroupSpecs:
  - replicas: 1
    minReplicas: 1
    maxReplicas: 4
    groupName: gpu-group
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray:2.37.0-gpu
          resources:
            limits:
              cpu: 4
              memory: "16Gi"
              nvidia.com/gpu: 1
            requests:
              cpu: 3
              memory: "16Gi"
              nvidia.com/gpu: 1
        nodeSelector:
          cloud.google.com/gke-accelerator: nvidia-l4

這個資訊清單說明 RayCluster 資源。

將資訊清單套用至叢集：
```
kubectl apply -f ray-cluster.yaml
```

確認 RayCluster 資源已準備就緒：

kubectl get raycluster

輸出結果會與下列內容相似：

NAME                       DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
stable-diffusion-cluster   2                 2                   6      20Gi     0      ready    33s

在這個輸出內容中，STATUS 欄中的 ready 表示 RayCluster 資源已準備就緒。

連線至 RayCluster 資源

如要連線至 RayCluster 資源：

確認 GKE 是否已建立 RayCluster 服務：

kubectl get svc stable-diffusion-cluster-head-svc

輸出結果會與下列內容相似：

NAME                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                AGE
pytorch-mnist-cluster-head-svc   ClusterIP   34.118.238.247   <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP   109s

建立通訊埠轉送工作階段，將流量轉送至 Ray 首節點：

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8265:8265 2>&1 >/dev/null &
kubectl port-forward svc/stable-diffusion-cluster-head-svc 10001:10001 2>&1 >/dev/null &

確認 Ray 用戶端可以使用 localhost 連線至 Ray 叢集：

ray list nodes --address http://localhost:8265

輸出結果會與下列內容相似：

======== List: 2024-06-19 15:15:15.707336 ========
Stats:
------------------------------
Total: 3

Table:
------------------------------
    NODE_ID                                                   NODE_IP     IS_HEAD_NODE    STATE    NODE_NAME    RESOURCES_TOTAL                 LABELS
0  1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2  10.28.1.21  False           ALIVE    10.28.1.21   CPU: 2.0                        ray.io/node_id: 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2
# Several lines of output omitted

執行 Ray Serve 應用程式

如要執行 Ray Serve 應用程式，請按照下列步驟操作：

執行 Stable Diffusion Ray Serve 應用程式：

serve run stable_diffusion:entrypoint --working-dir=. --runtime-env-json='{"pip": ["torch", "torchvision", "diffusers==0.12.1", "huggingface_hub==0.25.2", "transformers", "fastapi==0.113.0"], "excludes": ["myenv"]}' --address ray://localhost:10001

輸出結果會與下列內容相似：

2024-06-19 18:20:58,444 INFO scripts.py:499 -- Running import path: 'stable_diffusion:entrypoint'.
2024-06-19 18:20:59,730 INFO packaging.py:530 -- Creating a file package for local directory '.'.
2024-06-19 18:21:04,833 INFO handle.py:126 -- Created DeploymentHandle 'hyil6u9f' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,834 INFO handle.py:126 -- Created DeploymentHandle 'xo25rl4k' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle '57x9u4fp' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'xr6kt85t' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'g54qagbz' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO handle.py:126 -- Created DeploymentHandle 'iwuz00mv' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO api.py:583 -- Deployed app 'default' successfully.

建立通訊埠轉送工作階段至 Ray Serve 通訊埠 (8000)：

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8000:8000 2>&1 >/dev/null &

執行 Python 指令碼：
```
python generate_image.py
```
這項指令碼會將圖片生成至名為 output.png 的檔案。如下圖所示：

部署 RayService

RayService 自訂資源可管理 RayCluster 資源和 Ray Serve 應用程式的生命週期。

如要進一步瞭解 RayService，請參閱 Ray 說明文件中的「Deploy Ray Serve Applications」(部署 Ray Serve 應用程式) 和「Production Guide」(正式版指南)。

如要部署 RayService 資源，請按照下列步驟操作：

請查看下列資訊清單：

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: stable-diffusion
spec:
  serveConfigV2: |
    applications:
      - name: stable_diffusion
        import_path: ai-ml.gke-ray.rayserve.stable-diffusion.stable_diffusion:entrypoint
        runtime_env:
          working_dir: "https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/archive/main.zip"
          pip: ["diffusers==0.12.1", "torch", "torchvision", "huggingface_hub==0.25.2", "transformers"]
  rayClusterConfig:
    rayVersion: '2.37.0'
    headGroupSpec:
      rayStartParams:
        dashboard-host: '0.0.0.0'
      template:
        spec:
          containers:
          - name: ray-head
            image:  rayproject/ray:2.37.0
            ports:
            - containerPort: 6379
              name: gcs
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
            resources:
              limits:
                cpu: "2"
                ephemeral-storage: "15Gi"
                memory: "8Gi"
              requests:
                cpu: "2"
                ephemeral-storage: "15Gi"
                memory: "8Gi"
          nodeSelector:
            cloud.google.com/machine-family: c3d
    workerGroupSpecs:
    - replicas: 1
      minReplicas: 1
      maxReplicas: 4
      groupName: gpu-group
      rayStartParams: {}
      template:
        spec:
          containers:
          - name: ray-worker
            image: rayproject/ray:2.37.0-gpu
            resources:
              limits:
                cpu: 4
                memory: "16Gi"
                nvidia.com/gpu: 1
              requests:
                cpu: 3
                memory: "16Gi"
                nvidia.com/gpu: 1
          nodeSelector:
            cloud.google.com/gke-accelerator: nvidia-l4

這個資訊清單說明 RayService 自訂資源。

將資訊清單套用至叢集：
```
kubectl apply -f ray-service.yaml
```

確認服務已準備就緒：

kubectl get svc stable-diffusion-serve-svc

輸出結果會與下列內容相似：

NAME                         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE

stable-diffusion-serve-svc   ClusterIP   34.118.236.0   <none>        8000/TCP   31m

設定通訊埠轉送至 Ray Serve 服務：

kubectl port-forward svc/stable-diffusion-serve-svc 8000:8000 2>&1 >/dev/null &

執行上一節的 Python 指令碼：
```
python generate_image.py
```
這個指令碼會生成類似上一節的圖片。

觀察 Ray 工作負載

如要查看 RayJob 的詳細資料，請前往 Google Cloud 控制台的「Kubernetes Engine > AI/ML > Jobs」部分。

在 Google Cloud 控制台中查看 RayJobs

清除所用資源

刪除專案

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

刪除個別資源

如要刪除叢集，請輸入：

gcloud container clusters delete ${CLUSTER_NAME}

後續步驟

查看 Google Cloud 的參考架構、圖表和最佳做法。歡迎瀏覽我們的 Cloud Architecture Center。

在 Google Kubernetes Engine (GKE) 上，使用 Stable Diffusion 模型部署 Ray Serve 應用程式 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

關於 Ray 和 Ray Serve

目標

費用

事前準備

準備環境

venv

Conda

建立具有 GPU 節點集區的叢集

Autopilot

標準

部署 RayCluster 資源

連線至 RayCluster 資源

執行 Ray Serve 應用程式

部署 RayService

觀察 Ray 工作負載

清除所用資源

刪除專案

刪除個別資源

後續步驟

在 Google Kubernetes Engine (GKE) 上，使用 Stable Diffusion 模型部署 Ray Serve 應用程式