使用智能体开发套件 (ADK) 和自托管 LLM 在 GKE 上部署智能体 AI 应用

Autopilot Standard

本教程演示了如何使用 Google Kubernetes Engine (GKE) 部署和管理容器化的智能体 AI/机器学习应用。通过将 Google 智能体开发套件 (ADK) 与自行托管的大语言模型 (LLM)（例如由 vLLM 提供的 Llama 3.1）相结合，您可以高效且大规模地将 AI 智能体投入实际应用，同时保持对模型堆栈的完全控制。本教程将指导您完成基于 Python 的智能体从开发到在具有 GPU 加速功能的 GKE Autopilot 集群上进行生产部署的端到端流程。

本教程面向有兴趣使用 Kubernetes 容器编排功能来部署智能体 AI/机器学习应用的机器学习 (ML) 工程师、开发者和云架构师。如需详细了解我们在 Google Cloud内容中提及的常见角色和示例任务，请参阅常见的 GKE Enterprise 用户角色和任务。

在开始之前，请确保您熟悉以下内容：

背景

本部分介绍本教程中使用的关键技术。

智能体开发套件 (ADK)

智能体开发套件 (ADK) 是一个灵活的模块化框架，用于开发和部署 AI 智能体。尽管 ADK 针对 Gemini 和 Google 生态系统进行了优化，但它并不要求您使用特定模型或部署，并且可与其他框架兼容。ADK 的设计旨在让智能体开发更像软件开发，从而让开发者更轻松地创建、部署和编排从基本任务到复杂工作流的智能体架构。

如需了解详情，请参阅 ADK 文档。

GKE 托管式 Kubernetes 服务

Google Cloud 提供各种各样的服务，包括 GKE，该服务非常适合用于部署和管理 AI/机器学习工作负载。GKE 是一项托管式 Kubernetes 服务，可简化容器化应用的部署、扩缩和管理。GKE 提供必要的基础设施（包括可扩缩资源、分布式计算和高效网络），以满足 LLM 的计算需求。

如需详细了解关键 Kubernetes 概念，请参阅开始了解 Kubernetes。如需详细了解 GKE 以及它如何帮助您扩缩、自动执行和管理 Kubernetes，请参阅 GKE 概览。

vLLM

vLLM 是一个经过高度优化的开源 LLM 服务框架，可提高 GPU 上的服务吞吐量，具有如下功能：

具有 PagedAttention 且经过优化的 Transformer 实现
连续批处理，可提高整体服务吞吐量。
多个 GPU 上的张量并行处理和分布式服务。

如需了解详情，请参阅 vLLM 文档。

目标

本教程介绍了如何执行以下操作：

设置 Google Cloud 环境。
预配支持 GPU 的 GKE 集群。
使用 vLLM 推理服务器部署 Llama 3.1 模型。
为基于 ADK 的代理构建容器映像。
将代理部署到 GKE 集群，并将其连接到自托管 LLM。
测试已部署的智能体。

架构

本教程介绍了一种可伸缩的架构，用于在 GKE 上部署智能体 AI 应用。ADK 代理应用在标准 CPU 节点池运行，而自托管 LLM（基于 vLLM 的 Llama 3.1）在支持 GPU 的节点池池上运行，两者都在同一 GKE 集群内。此架构将代理的应用逻辑与 LLM 推理工作负载分离，从而允许独立扩展和管理每个组件。

该架构有两个核心组件，每个组件都位于自己的 GKE Deployment 中：

ADK 代理应用：代理的自定义构建的业务逻辑和工具（例如 get_weather）位于容器映像中。该映像在标准 CPU 节点池运行，并使用内部 Kubernetes 服务与 LLM 进行通信。
自行托管的 LLM（基于 vLLM 的 Llama 3.1）：Llama 3.1 模型在支持 GPU 的节点池中的专用 vLLM 服务器上运行。此部署使用公共容器映像 (vllm/vllm-openai:v0.8.5)，该映像配置为在容器启动时从 Hugging Face 下载并提供指定的模型。代理通过 vllm-llama3-service Kubernetes 服务公开的 REST API 与此服务器通信。

ADK 代理和 vLLM 部署在同一 GKE 集群上运行。这种在单个集群内的并置简化了网络、管理和部署，同时仍允许为应用的组件分配专用硬件。

费用

本教程使用 Google Cloud的以下收费组件：

查看各项服务的价格，了解可能的费用。

准备工作

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Make sure that you have the following role or roles on the project: roles/container.admin, roles/iam.serviceAccountAdmin, roles/artifactregistry.admin, roles/cloudbuild.builds.editor, roles/resourcemanager.projectIamAdmin
Check for the roles
1. In the Google Cloud console, go to the IAM page.
  Go to IAM
2. Select the project.
3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
1. In the Google Cloud console, go to the IAM page.
  前往 IAM
2. 选择项目。
3. 点击 授予访问权限。
4. 在新的主账号字段中，输入您的用户标识符。这通常是 Google 账号的电子邮件地址。
5. 在选择角色列表中，选择一个角色。
6. 如需授予其他角色，请点击 添加其他角色，然后添加其他各个角色。
7. 点击 Save（保存）。
8. 从 Hugging Face 获取读取访问令牌，以便下载 Llama 模型。您还需要申请访问 Llama 3.1 模型。

使用智能体开发套件 (ADK) 和自托管 LLM 在 GKE 上部署智能体 AI 应用

背景

智能体开发套件 (ADK)

GKE 托管式 Kubernetes 服务

vLLM

目标

架构

费用

准备工作

Check for the roles

Grant the roles

准备环境

克隆示例项目

创建和配置 Google Cloud 资源

gcloud

Autopilot

Standard

Terraform

配置 `kubectl` 以与您的集群通信

构建代理映像

部署模型

部署代理应用

测试已部署的智能体

清理

删除已部署的资源

gcloud

Terraform

后续步骤

使用智能体开发套件 (ADK) 和自托管 LLM 在 GKE 上部署智能体 AI 应用 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

背景

智能体开发套件 (ADK)

GKE 托管式 Kubernetes 服务

vLLM

目标

架构

费用

准备工作

Check for the roles

Grant the roles

准备环境

克隆示例项目

创建和配置 Google Cloud 资源

gcloud

Autopilot

Standard

Terraform

配置 kubectl 以与您的集群通信

构建代理映像

部署模型

部署代理应用

测试已部署的智能体

清理

删除已部署的资源

gcloud

Terraform

后续步骤

使用智能体开发套件 (ADK) 和自托管 LLM 在 GKE 上部署智能体 AI 应用

配置 `kubectl` 以与您的集群通信