透過 MaxDiffusion 在 GKE 上使用 TPU 提供 Stable Diffusion XL (SDXL)

自動駕駛標準

本教學課程說明如何使用 MaxDiffusion，在 Google Kubernetes Engine (GKE) 上透過張量處理單元 (TPU) 服務 SDXL 圖片生成模型。在本教學課程中，您會從 Hugging Face 下載模型，並使用執行 MaxDiffusion 的容器，將模型部署至 Autopilot 或標準叢集。

如果您在部署及執行 AI/機器學習工作負載時，需要代管型 Kubernetes 的精細控管、自訂、擴充性、復原能力、可攜性和成本效益，本指南是絕佳的入門資源。如果您需要統一管理的 AI 平台，以便快速建構及提供機器學習模型，同時兼顧成本效益，建議您試用 Vertex AI 部署解決方案。

背景

在 GKE 上使用 TPU 搭配 MaxDiffusion 供應 SDXL，即可建構穩固且可用於正式環境的供應解決方案，同時享有代管型 Kubernetes 的所有優點，包括成本效益、擴充性和高可用性。本節說明本教學課程中使用的主要技術。

Stable Diffusion XL (SDXL)

Stable Diffusion XL (SDXL) 是 MaxDiffusion 支援推論的潛在擴散模型 (LDM)。生成式 AI 方面，您可以使用 LDM 根據文字描述生成高品質圖片。LDM 適用於圖片搜尋和圖片說明等應用程式。

SDXL 支援單一或多主機推論，並提供分片註解。這項功能可讓 SDXL 在多部機器上訓練及執行，進而提升效率。

詳情請參閱 Stability AI 存放區的生成模型和 SDXL 文件。

TPU

TPU 是 Google 開發的客製化特殊應用積體電路 (ASIC)，用於加速機器學習和 AI 模型，這些模型是使用 TensorFlow、PyTorch 和 JAX 等架構建構而成。

在 GKE 中使用 TPU 之前，建議您先完成下列學習路徑：

如要瞭解目前可用的 TPU 版本，請參閱 Cloud TPU 系統架構。
瞭解 GKE 中的 TPU。

本教學課程涵蓋 SDXL 模型服務。GKE 會在單一主機 TPU v5e 節點上部署模型，並根據模型需求設定 TPU 拓撲，以低延遲方式提供提示。在本指南中，模型會使用 TPU v5e 晶片和 1x1 拓撲。

MaxDiffusion

MaxDiffusion 是一系列參考實作項目，以 Python 和 Jax 撰寫，適用於各種潛在擴散模型，可在 XLA 裝置 (包括 TPU 和 GPU) 上執行。MaxDiffusion 是研究和正式環境中擴散專案的起點。

詳情請參閱 MaxDiffusion 存放區。

目標

本教學課程適用於使用 JAX 的生成式 AI 客戶、SDXL 的新使用者或現有使用者，以及對使用 Kubernetes 容器協調功能服務 LLM 感興趣的任何 ML 工程師、MLOps (DevOps) 工程師或平台管理員。

本教學課程包含下列步驟：

根據模型特性，建立具有建議 TPU 拓撲的 GKE Autopilot 或 Standard 叢集。
建構 SDXL 推論容器映像檔。
在 GKE 上部署 SDXL 推論伺服器。
透過網頁應用程式提供模型並與模型互動。

架構

本節說明本教學課程使用的 GKE 架構。此架構包含 GKE Autopilot 或 Standard 叢集，可佈建 TPU 並代管 MaxDiffusion 元件。GKE 會使用這些元件部署及提供模型。

下圖顯示這個架構的元件：

在 GKE 上使用 TPU v5e 供應 MaxDiffusion 的架構範例。

這個架構包含下列元件：

GKE Autopilot 或 Standard 區域叢集。
一個單一主機 TPU 配量節點集區，用於在 MaxDiffusion 部署作業中代管 SDXL 模型。
服務元件，負載平衡器類型為 ClusterIP。這項服務會將傳入流量分配給所有 MaxDiffusion HTTP 副本。
具有外部 LoadBalancer 服務的 WebApp HTTP 伺服器，可分配傳入流量，並將模型服務流量重新導向至 ClusterIP 服務。

事前準備

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

Make sure that you have the following role or roles on the project: roles/container.admin, roles/iam.serviceAccountAdmin, roles/artifactregistry.admin
Check for the roles
1. In the Google Cloud console, go to the IAM page.
  Go to IAM
2. Select the project.
3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
1. In the Google Cloud console, go to the IAM page.
  前往 IAM
2. 選取所需專案。
3. 按一下「Grant access」(授予存取權)。
4. 在「New principals」(新增主體) 欄位中，輸入您的使用者 ID。這通常是指 Google 帳戶的電子郵件地址。
5. 按一下「選取角色」，然後搜尋角色。
6. 如要授予其他角色，請按一下「Add another role」(新增其他角色)，然後新增其他角色。
7. 按一下「Save」(儲存)。

透過 MaxDiffusion 在 GKE 上使用 TPU 提供 Stable Diffusion XL (SDXL)

背景

Stable Diffusion XL (SDXL)

TPU

MaxDiffusion

目標

架構

事前準備

Check for the roles

Grant the roles

準備環境

建立及設定 Google Cloud 資源

建立 GKE 叢集

Autopilot

標準

建構 SDXL 推論容器

部署 SDXL 推論伺服器

部署網頁應用程式用戶端

使用網頁與模型互動

清除所用資源

刪除專案

刪除個別資源

後續步驟

透過 MaxDiffusion 在 GKE 上使用 TPU 提供 Stable Diffusion XL (SDXL) 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

背景

Stable Diffusion XL (SDXL)

TPU

MaxDiffusion

目標

架構

事前準備

Check for the roles

Grant the roles

準備環境

建立及設定 Google Cloud 資源

建立 GKE 叢集

Autopilot

標準

建構 SDXL 推論容器

部署 SDXL 推論伺服器

部署網頁應用程式用戶端

使用網頁與模型互動

清除所用資源

刪除專案

刪除個別資源

後續步驟

透過 MaxDiffusion 在 GKE 上使用 TPU 提供 Stable Diffusion XL (SDXL)