本頁面由 Cloud Translation API 翻譯而成。

使用 Agent Development Kit (ADK) 和自行託管的 LLM，在 GKE 上部署代理式 AI 應用程式

自動駕駛標準

本教學課程示範如何使用 Google Kubernetes Engine (GKE)，部署及管理容器化代理程式 AI/ML 應用程式。結合 Google 代理程式開發套件 (ADK) 和自行代管的大型語言模型 (LLM)，例如由 vLLM 提供的 Llama 3.1，您就能有效率地大規模運作 AI 代理程式，同時全面控管模型堆疊。本教學課程將逐步說明完整流程，包括如何將以 Python 為基礎的代理程式從開發階段，部署至具有 GPU 加速功能的 GKE Autopilot 叢集，以供正式環境使用。

本教學課程的適用對象為機器學習 (ML) 工程師、開發人員和雲端架構師，他們有興趣使用 Kubernetes 容器自動化調度管理功能，提供代理程式 AI/ML 應用程式。如要進一步瞭解內容中提及的常見角色和範例工作，請參閱「常見的 GKE Enterprise 使用者角色和工作」。 Google Cloud

開始之前，請務必詳閱下列事項：

背景

本節說明本教學課程中使用的主要技術。

Agent Development Kit (ADK)

Agent Development Kit (ADK) 是一個彈性十足的模組化框架，可用於開發及部署 AI 代理。雖然 ADK 已針對 Gemini 和 Google 生態系統最佳化，但您不必使用特定模型或部署作業，而且 ADK 可與其他架構相容。ADK 的設計宗旨是讓代理程式開發作業更貼近軟體開發，方便開發人員建立、部署及協調代理程式架構，處理從基本工作到複雜工作流程的各種作業。

詳情請參閱 ADK 說明文件。

GKE 代管 Kubernetes 服務

Google Cloud 提供一系列服務，包括 GKE，非常適合部署及管理 AI/機器學習工作負載。GKE 是代管 Kubernetes 服務，可簡化容器化應用程式的部署、擴充及管理作業。GKE 提供必要的基礎架構，包括可擴充的資源、分散式運算和高效能網路，可處理 LLM 的運算需求。

如要進一步瞭解 Kubernetes 的重要概念，請參閱「開始瞭解 Kubernetes」。如要進一步瞭解 GKE，以及如何運用 GKE 調度資源、自動執行作業及管理 Kubernetes，請參閱 GKE 總覽。

vLLM

vLLM 是經過高度最佳化的開放原始碼 LLM 服務架構，可提升 GPU 的服務處理量，並提供下列功能：

使用 PagedAttention 實作最佳化轉換器。
持續批次處理，提升整體放送輸送量。
在多個 GPU 上進行張量平行處理和分散式服務。

詳情請參閱 vLLM 說明文件。

目標

本教學課程說明如何執行下列操作：

設定 Google Cloud 環境。
佈建啟用 GPU 的 GKE 叢集。
使用 vLLM 推論伺服器部署 Llama 3.1 模型。
為 ADK 型代理程式建構容器映像檔。
將代理程式部署到 GKE 叢集，並連線至自行代管的 LLM。
測試已部署的代理程式。

架構

本教學課程介紹可擴充的架構，說明如何在 GKE 部署代理式 AI 應用程式。ADK 代理程式應用程式會在標準 CPU 節點集區上執行，而自架 LLM (vLLM 上的 Llama 3.1) 則會在啟用 GPU 的節點集區上執行，兩者都位於同一個 GKE 叢集內。這種架構會將代理程式的應用程式邏輯與 LLM 推論工作負載分開，讓每個元件都能獨立擴充及管理。

這個架構有兩個核心元件，分別位於各自的 GKE 部署作業中：

ADK 代理程式應用程式：代理程式的自訂建構業務邏輯和工具 (例如 get_weather) 位於容器映像檔中。這個映像檔會在標準 CPU 節點集區上執行，並透過內部 Kubernetes 服務與 LLM 通訊。
自行代管的 LLM (vLLM 上的 Llama 3.1)：Llama 3.1 模型會在啟用 GPU 的節點集區上，透過專屬的 vLLM 伺服器執行。這項部署作業會使用公開容器映像檔 (vllm/vllm-openai:v0.8.5)，該映像檔已設定為在容器啟動時，從 Hugging Face 下載並提供指定的模型。代理程式會透過 vllm-llama3-service Kubernetes 服務公開的 REST API 與這個伺服器通訊。

ADK 代理程式和 vLLM 部署作業都會在同一個 GKE 叢集上執行。在單一叢集中共置可簡化網路、管理和部署作業，同時仍允許為應用程式元件指派專用硬體。

費用

本教學課程使用下列 Google Cloud計費元件：

請查看各項服務的價格，瞭解可能需要支付的費用。

事前準備

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Make sure that you have the following role or roles on the project: roles/container.admin, roles/iam.serviceAccountAdmin, roles/artifactregistry.admin, roles/cloudbuild.builds.editor, roles/resourcemanager.projectIamAdmin
Check for the roles
1. In the Google Cloud console, go to the IAM page.
  Go to IAM
2. Select the project.
3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
1. In the Google Cloud console, go to the IAM page.
  前往 IAM
2. 選取所需專案。
3. 按一下「Grant access」(授予存取權)。
4. 在「New principals」(新增主體) 欄位中，輸入您的使用者 ID。這通常是指 Google 帳戶的電子郵件地址。
5. 在「Select a role」(選取角色) 清單中，選取角色。
6. 如要授予其他角色，請按一下「Add another role」(新增其他角色)，然後新增其他角色。
7. 按一下「Save」(儲存)。
8. 從 Hugging Face 取得讀取存取權杖，下載 Llama 模型。此外，您也需要要求存取 Llama 3.1 模型。

使用 Agent Development Kit (ADK) 和自行託管的 LLM，在 GKE 上部署代理式 AI 應用程式

背景

Agent Development Kit (ADK)

GKE 代管 Kubernetes 服務

vLLM

目標

架構

費用

事前準備

Check for the roles

Grant the roles

準備環境

複製範例專案

建立及設定 Google Cloud 資源

gcloud

Autopilot

標準

Terraform

設定 `kubectl` 以與叢集通訊

建構代理程式映像檔

部署模型

部署代理程式應用程式

測試已部署的代理程式

清除所用資源

刪除已部署的資源

gcloud

Terraform

後續步驟

使用 Agent Development Kit (ADK) 和自行託管的 LLM，在 GKE 上部署代理式 AI 應用程式 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

背景

Agent Development Kit (ADK)

GKE 代管 Kubernetes 服務

vLLM

目標

架構

費用

事前準備

Check for the roles

Grant the roles

準備環境

複製範例專案

建立及設定 Google Cloud 資源

gcloud

Autopilot

標準

Terraform

設定 kubectl 以與叢集通訊

建構代理程式映像檔

部署模型

部署代理程式應用程式

測試已部署的代理程式

清除所用資源

刪除已部署的資源

gcloud

Terraform

後續步驟

使用 Agent Development Kit (ADK) 和自行託管的 LLM，在 GKE 上部署代理式 AI 應用程式

設定 `kubectl` 以與叢集通訊