在 Google Kubernetes Engine (GKE) 上使用 Stable Diffusion 模型部署 Ray Serve 应用

Autopilot Standard

本指南通过使用 Ray Serve 和 Ray Operator 插件作为示例实现，来举例说明如何在 Google Kubernetes Engine (GKE) 上部署和应用 Stable Diffusion 模型。

Ray 和 Ray Serve 简介

Ray 是一个适用于 AI/机器学习应用的开源可伸缩计算框架。Ray Serve 是 Ray 的模型部署库，用于在分布式环境中扩缩和应用模型。如需了解详情，请参阅 Ray 文档中的 Ray Serve。

您可以使用 RayCluster 或 RayService 资源部署 Ray Serve 应用。出于以下原因，您应该在生产环境中使用 RayService 资源：

RayService 应用的就地更新
RayCluster 资源的零停机时间升级
高可用性 Ray Serve 应用

目标

本指南适用于对通过 Kubernetes 容器编排功能来使用 Ray 应用模型感兴趣的生成式 AI 客户、GKE 的新用户或现有用户、机器学习工程师、MLOps (DevOps) 工程师或平台管理员。

创建具有 GPU 节点池的 GKE 集群。
使用 RayCluster 自定义资源创建 Ray 集群。
运行 Ray Serve 应用。
部署 RayService 自定义资源。

费用

在本文档中，您将使用 Google Cloud的以下收费组件：

如需根据您的预计使用量来估算费用，请使用价格计算器。

新 Google Cloud 用户可能有资格申请免费试用。

完成本文档中描述的任务后，您可以通过删除所创建的资源来避免继续计费。如需了解详情，请参阅清理。

准备工作

Cloud Shell 中预安装了本教程所需的软件，包括 kubectl 和 gcloud CLI。如果您不使用 Cloud Shell，则必须安装 gcloud CLI。

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Install the Google Cloud CLI.

注意：如果您之前安装了 gcloud CLI，请确保通过运行 gcloud components update 来获得最新版本。

如果您使用的是外部身份提供方 (IdP)，则必须先使用联合身份登录 gcloud CLI。

如需初始化 gcloud CLI，请运行以下命令：

gcloud init

Create or select a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project.

Enable the GKE API:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

gcloud services enable container.googleapis.com

Install the Google Cloud CLI.

注意：如果您之前安装了 gcloud CLI，请确保通过运行 gcloud components update 来获得最新版本。

如果您使用的是外部身份提供方 (IdP)，则必须先使用联合身份登录 gcloud CLI。

如需初始化 gcloud CLI，请运行以下命令：

gcloud init

Create or select a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project.

Enable the GKE API:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

gcloud services enable container.googleapis.com

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/container.clusterAdmin, roles/container.admin
```
gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
```
Replace the following:
- PROJECT_ID: Your project ID.
- USER_IDENTIFIER: The identifier for your user account. For example, myemail@example.com.
- ROLE: The IAM role that you grant to your user account.

准备环境

如需准备您的环境，请按以下步骤操作：

点击Google Cloud 控制台中的 激活 Cloud Shell，从 Google Cloud 控制台启动 Cloud Shell 会话。此操作会在 Google Cloud 控制台的底部窗格中启动会话。

设置环境变量：

export PROJECT_ID=PROJECT_ID
export CLUSTER_NAME=rayserve-cluster
export COMPUTE_REGION=us-central1
export COMPUTE_ZONE=us-central1-c
export CLUSTER_VERSION=CLUSTER_VERSION
export TUTORIAL_HOME=`pwd`

替换以下内容：

PROJECT_ID：您的 Google Cloud项目 ID。
CLUSTER_VERSION：要使用的 GKE 版本。必须为 1.30.1 或更高版本。

克隆 GitHub 代码库：

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples

切换到工作目录：

cd kubernetes-engine-samples/ai-ml/gke-ray/rayserve/stable-diffusion

创建 Python 虚拟环境：
venv
```
python -m venv myenv && \
source myenv/bin/activate
```
Conda
1. 安装 Conda。
2. 运行以下命令：
  conda create -c conda-forge python=3.9.19 -n myenv && \ conda activate myenv
使用 serve run 部署 Serve 应用时，Ray 要求本地客户端的 Python 版本与 Ray 集群中使用的版本匹配。rayproject/ray:2.37.0 映像使用 Python 3.9。如果您运行的是其他客户端版本，请选择相应的 Ray 映像。

安装运行 Serve 应用所需的依赖项：

pip install ray[serve]==2.37.0
pip install torch
pip install requests

创建具有 GPU 节点池的集群

创建具有 GPU 节点池的 Autopilot 或 Standard GKE 集群：

Autopilot

创建 Autopilot 集群：

gcloud container clusters create-auto ${CLUSTER_NAME}  \
    --enable-ray-operator \
    --cluster-version=${CLUSTER_VERSION} \
    --location=${COMPUTE_REGION}

标准

创建 Standard 集群：

gcloud container clusters create ${CLUSTER_NAME} \
    --addons=RayOperator \
    --cluster-version=${CLUSTER_VERSION}  \
    --machine-type=c3d-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=1

创建 GPU 节点池：

gcloud container node-pools create gpu-pool \
    --cluster=${CLUSTER_NAME} \
    --machine-type=g2-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=1 \
    --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest

部署 RayCluster 资源

如需部署 RayCluster 资源，请执行以下操作：

请查看以下清单：

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: stable-diffusion-cluster
spec:
  rayVersion: '2.37.0'
  headGroupSpec:
    rayStartParams:
      dashboard-host: '0.0.0.0'
    template:
      metadata:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.37.0
          ports:
          - containerPort: 6379
            name: gcs
          - containerPort: 8265
            name: dashboard
          - containerPort: 10001
            name: client
          - containerPort: 8000
            name: serve
          resources:
            limits:
              cpu: "2"
              ephemeral-storage: "15Gi"
              memory: "8Gi"
            requests:
              cpu: "2"
              ephemeral-storage: "15Gi"
              memory: "8Gi"
        nodeSelector:
          cloud.google.com/machine-family: c3d
  workerGroupSpecs:
  - replicas: 1
    minReplicas: 1
    maxReplicas: 4
    groupName: gpu-group
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray:2.37.0-gpu
          resources:
            limits:
              cpu: 4
              memory: "16Gi"
              nvidia.com/gpu: 1
            requests:
              cpu: 3
              memory: "16Gi"
              nvidia.com/gpu: 1
        nodeSelector:
          cloud.google.com/gke-accelerator: nvidia-l4

此清单描述了一个 RayCluster 资源。

将清单应用到您的集群：
```
kubectl apply -f ray-cluster.yaml
```

验证 RayCluster 资源是否已准备就绪：

kubectl get raycluster

输出类似于以下内容：

NAME                       DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
stable-diffusion-cluster   2                 2                   6      20Gi     0      ready    33s

在此输出中，STATUS 列中的 ready 表示 RayCluster 资源已准备就绪。

连接到 RayCluster 资源

如需连接到 RayCluster 资源，请执行以下操作：

验证 GKE 是否已创建 RayCluster 服务：

kubectl get svc stable-diffusion-cluster-head-svc

输出类似于以下内容：

NAME                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                AGE
pytorch-mnist-cluster-head-svc   ClusterIP   34.118.238.247   <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP   109s

建立到 Ray 头节点的端口转发会话：

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8265:8265 2>&1 >/dev/null &
kubectl port-forward svc/stable-diffusion-cluster-head-svc 10001:10001 2>&1 >/dev/null &

验证 Ray 客户端是否可以使用 localhost 连接到 Ray 集群：

ray list nodes --address http://localhost:8265

输出类似于以下内容：

======== List: 2024-06-19 15:15:15.707336 ========
Stats:
------------------------------
Total: 3

Table:
------------------------------
    NODE_ID                                                   NODE_IP     IS_HEAD_NODE    STATE    NODE_NAME    RESOURCES_TOTAL                 LABELS
0  1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2  10.28.1.21  False           ALIVE    10.28.1.21   CPU: 2.0                        ray.io/node_id: 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2
# Several lines of output omitted

运行 Ray Serve 应用

如需运行 Ray Serve 应用，请执行以下操作：

运行 Stable Diffusion Ray Serve 应用：

serve run stable_diffusion:entrypoint --working-dir=. --runtime-env-json='{"pip": ["torch", "torchvision", "diffusers==0.12.1", "huggingface_hub==0.25.2", "transformers", "fastapi==0.113.0"], "excludes": ["myenv"]}' --address ray://localhost:10001

输出类似于以下内容：

2024-06-19 18:20:58,444 INFO scripts.py:499 -- Running import path: 'stable_diffusion:entrypoint'.
2024-06-19 18:20:59,730 INFO packaging.py:530 -- Creating a file package for local directory '.'.
2024-06-19 18:21:04,833 INFO handle.py:126 -- Created DeploymentHandle 'hyil6u9f' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,834 INFO handle.py:126 -- Created DeploymentHandle 'xo25rl4k' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle '57x9u4fp' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'xr6kt85t' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'g54qagbz' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO handle.py:126 -- Created DeploymentHandle 'iwuz00mv' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO api.py:583 -- Deployed app 'default' successfully.

建立到 Ray Serve 端口 (8000) 的端口转发会话：

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8000:8000 2>&1 >/dev/null &

运行 Python 脚本：
```
python generate_image.py
```
该脚本会向名为 output.png 的文件生成图片。图片类似于以下内容：

部署 RayService

RayService 自定义资源会管理 RayCluster 资源和 Ray Serve 应用的生命周期。

如需详细了解 RayService，请参阅 Ray 文档中的部署 Ray Serve 应用和生产指南。

如需部署 RayService 资源，请按照以下步骤操作：

请查看以下清单：

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: stable-diffusion
spec:
  serveConfigV2: |
    applications:
      - name: stable_diffusion
        import_path: ai-ml.gke-ray.rayserve.stable-diffusion.stable_diffusion:entrypoint
        runtime_env:
          working_dir: "https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/archive/main.zip"
          pip: ["diffusers==0.12.1", "torch", "torchvision", "huggingface_hub==0.25.2", "transformers"]
  rayClusterConfig:
    rayVersion: '2.37.0'
    headGroupSpec:
      rayStartParams:
        dashboard-host: '0.0.0.0'
      template:
        spec:
          containers:
          - name: ray-head
            image:  rayproject/ray:2.37.0
            ports:
            - containerPort: 6379
              name: gcs
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
            resources:
              limits:
                cpu: "2"
                ephemeral-storage: "15Gi"
                memory: "8Gi"
              requests:
                cpu: "2"
                ephemeral-storage: "15Gi"
                memory: "8Gi"
          nodeSelector:
            cloud.google.com/machine-family: c3d
    workerGroupSpecs:
    - replicas: 1
      minReplicas: 1
      maxReplicas: 4
      groupName: gpu-group
      rayStartParams: {}
      template:
        spec:
          containers:
          - name: ray-worker
            image: rayproject/ray:2.37.0-gpu
            resources:
              limits:
                cpu: 4
                memory: "16Gi"
                nvidia.com/gpu: 1
              requests:
                cpu: 3
                memory: "16Gi"
                nvidia.com/gpu: 1
          nodeSelector:
            cloud.google.com/gke-accelerator: nvidia-l4

此清单描述了一个 RayService 自定义资源。

将清单应用到您的集群：
```
kubectl apply -f ray-service.yaml
```

验证 Service 是否已准备就绪：

kubectl get svc stable-diffusion-serve-svc

输出类似于以下内容：

NAME                         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE

stable-diffusion-serve-svc   ClusterIP   34.118.236.0   <none>        8000/TCP   31m

配置到 Ray Serve 服务的端口转发：

kubectl port-forward svc/stable-diffusion-serve-svc 8000:8000 2>&1 >/dev/null &

运行上一部分中的 Python 脚本：
```
python generate_image.py
```
该脚本会生成与上一部分中生成的图片类似的图片。

观察 Ray 工作负载

如需查看 RayJob 的详细信息，您可以在 Google Cloud 控制台中前往 Kubernetes Engine > AI/机器学习 > 作业部分。

在 Google Cloud 控制台中查看 RayJob

清理

删除项目

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

删除各个资源

如需删除集群，请输入以下命令：

gcloud container clusters delete ${CLUSTER_NAME}

后续步骤

探索有关 Google Cloud 的参考架构、图表和最佳做法。查看我们的 Cloud 架构中心。

在 Google Kubernetes Engine (GKE) 上使用 Stable Diffusion 模型部署 Ray Serve 应用 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

Ray 和 Ray Serve 简介

目标

费用

准备工作

准备环境

venv

Conda

创建具有 GPU 节点池的集群

Autopilot

标准

部署 RayCluster 资源

连接到 RayCluster 资源

运行 Ray Serve 应用

部署 RayService

观察 Ray 工作负载

清理

删除项目

删除各个资源

后续步骤

在 Google Kubernetes Engine (GKE) 上使用 Stable Diffusion 模型部署 Ray Serve 应用