事前構成されたアーキテクチャで TPU を使用して、GKE でオープン LLM をサービングする

Autopilot Standard

このページでは、事前構成済みのプロダクションレディな GKE 推論リファレンスアーキテクチャを使用して、推論用に TPU で GKE に一般的なオープンソースの大規模言語モデル（LLM）をデプロイしてサービングする方法について説明します。このアプローチでは、Infrastructure as Code（IaC）を使用し、CLI スクリプトでラップされた Terraform を使用して、AI 推論ワークロード用に設計された標準化された安全でスケーラブルな GKE 環境を作成します。

このガイドでは、vLLM サービングフレームワークで GKE の単一ホスト TPU ノードを使用し、LLM をデプロイしてサービングします。このガイドでは、次のオープンモデルをデプロイする手順と構成について説明します。

このガイドは、推論用のオープンモデルのサービングに Kubernetes コンテナオーケストレーション機能を使用することに関心のある ML エンジニアやデータと AI のスペシャリストを対象としています。 Google Cloud のコンテンツで使用されている一般的なロールとタスクの例の詳細については、一般的な GKE ユーザーのロールとタスクをご覧ください。

始める前に

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Make sure that you have the following role or roles on the project: roles/artifactregistry.admin, roles/browser, roles/compute.networkAdmin, roles/container.clusterAdmin, roles/iam.serviceAccountAdmin, roles/resourcemanager.projectIamAdmin, and roles/serviceusage.serviceUsageAdmin
Check for the roles
1. In the Google Cloud console, go to the IAM page.
  Go to IAM
2. Select the project.
3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
1. In the Google Cloud console, go to the IAM page.
  IAM に移動
2. プロジェクトを選択します。
3. [ アクセスを許可] をクリックします。
4. [新しいプリンシパル] フィールドに、ユーザー ID を入力します。これは通常、Google アカウントのメールアドレスです。
5. [ロールを選択] リストでロールを選択します。
6. 追加のロールを付与するには、 [別のロールを追加] をクリックして各ロールを追加します。
7. [保存] をクリックします。

事前構成されたアーキテクチャで TPU を使用して、GKE でオープン LLM をサービングする

始める前に

Check for the roles

Grant the roles

モデルへのアクセス権を取得する

GKE 推論環境をプロビジョニングする

Cloud Shell を起動する

ベースアーキテクチャをデプロイする

Autopilot

Standard

オープンモデルをデプロイする

モデルを選択する

Gemma 3 1B-it

Gemma 3 4B-it

Gemma 3 27B-it

モデルをダウンロードする

モデルをデプロイする

デプロイをテストする

クリーンアップ

Autopilot

Standard

次のステップ

事前構成されたアーキテクチャで TPU を使用して、GKE でオープン LLM をサービングする コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

始める前に

Check for the roles

Grant the roles

モデルへのアクセス権を取得する

GKE 推論環境をプロビジョニングする

Cloud Shell を起動する

ベース アーキテクチャをデプロイする

Autopilot

Standard

オープンモデルをデプロイする

モデルを選択する

Gemma 3 1B-it

Gemma 3 4B-it

Gemma 3 27B-it

モデルをダウンロードする

モデルをデプロイする

デプロイをテストする

クリーンアップ

Autopilot

Standard

次のステップ

事前構成されたアーキテクチャで TPU を使用して、GKE でオープン LLM をサービングする

ベースアーキテクチャをデプロイする