Offene LLMs in GKE mit TPUs und einer vorkonfigurierten Architektur bereitstellen

Autopilot Standard

Auf dieser Seite wird beschrieben, wie Sie beliebte offene Large Language Models (LLMs) in GKE mit TPUs für die Inferenz mithilfe einer vorkonfigurierten, produktionsbereiten GKE-Inferenzreferenzarchitektur schnell bereitstellen und bereitstellen. Bei diesem Ansatz wird Infrastructure as Code (IaC) verwendet, wobei Terraform in CLI-Skripts eingebunden ist, um eine standardisierte, sichere und skalierbare GKE-Umgebung für KI-Inferenz-Arbeitslasten zu erstellen.

In diesem Leitfaden stellen Sie LLMs mit TPU-Knoten mit einem einzelnen Host in GKE mit dem vLLM-Bereitstellungs-Framework bereit. In diesem Leitfaden finden Sie Anleitungen und Konfigurationen für die Bereitstellung der folgenden offenen Modelle:

Dieser Leitfaden richtet sich an ML-Entwickler (Machine Learning) und Daten- und KI-Spezialisten, die daran interessiert sind, Funktionen zur Kubernetes-Containerorchestrierung für die Bereitstellung offener Modelle für Inferenzen zu nutzen. Weitere Informationen zu gängigen Rollen und Beispielaufgaben, auf die in Google Cloud Inhalten verwiesen wird, finden Sie unter Häufig verwendete GKE-Nutzerrollen und -Aufgaben.

Hinweise

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Make sure that you have the following role or roles on the project: roles/artifactregistry.admin, roles/browser, roles/compute.networkAdmin, roles/container.clusterAdmin, roles/iam.serviceAccountAdmin, roles/resourcemanager.projectIamAdmin, and roles/serviceusage.serviceUsageAdmin
Check for the roles
1. In the Google Cloud console, go to the IAM page.
  Go to IAM
2. Select the project.
3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
1. In the Google Cloud console, go to the IAM page.
  IAM aufrufen
2. Wählen Sie das Projekt aus.
3. Klicken Sie auf Zugriffsrechte erteilen.
4. Geben Sie im Feld Neue Hauptkonten Ihre Nutzer-ID ein. Das ist in der Regel die E‑Mail-Adresse eines Google-Kontos.
5. Klicken Sie auf Rolle auswählen und suchen Sie nach der Rolle.
6. Klicken Sie auf Weitere Rolle hinzufügen, wenn Sie weitere Rollen zuweisen möchten.
7. Klicken Sie auf Speichern.

Offene LLMs in GKE mit TPUs und einer vorkonfigurierten Architektur bereitstellen

Hinweise

Check for the roles

Grant the roles

Zugriff auf das Modell erhalten

GKE-Inferenzumgebung bereitstellen

Cloud Shell starten

Basisarchitektur bereitstellen

Autopilot

Standard

Offenes Modell bereitstellen

Modell auswählen

Gemma 3 1B-it

Gemma 3 4B-it

Gemma 3 27B-it

Modell herunterladen

Modell bereitstellen

Bereitstellung testen

Bereinigen

Autopilot

Standard

Nächste Schritte

Offene LLMs in GKE mit TPUs und einer vorkonfigurierten Architektur bereitstellen Mit Sammlungen den Überblick behalten Sie können Inhalte basierend auf Ihren Einstellungen speichern und kategorisieren.

Hinweise

Check for the roles

Grant the roles

Zugriff auf das Modell erhalten

GKE-Inferenzumgebung bereitstellen

Cloud Shell starten

Basisarchitektur bereitstellen

Autopilot

Standard

Offenes Modell bereitstellen

Modell auswählen

Gemma 3 1B-it

Gemma 3 4B-it

Gemma 3 27B-it

Modell herunterladen

Modell bereitstellen

Bereitstellung testen

Bereinigen

Autopilot

Standard

Nächste Schritte

Offene LLMs in GKE mit TPUs und einer vorkonfigurierten Architektur bereitstellen