Eroga LLM aperti su GKE utilizzando TPU con un'architettura preconfigurata

Autopilot Standard

Questa pagina mostra come eseguire rapidamente il deployment e pubblicare modelli linguistici di grandi dimensioni (LLM) open source popolari su GKE con TPU per l'inferenza utilizzando un'architettura di riferimento per l'inferenza di GKE preconfigurata e pronta per la produzione. Questo approccio utilizza Infrastructure as Code (IaC), con Terraform incluso negli script CLI, per creare un ambiente GKE standardizzato, sicuro e scalabile progettato per i carichi di lavoro di inferenza AI.

In questa guida, esegui il deployment e la gestione di LLM utilizzando nodi TPU single-host su GKE con il framework di gestione vLLM. Questa guida fornisce istruzioni e configurazioni per il deployment dei seguenti modelli aperti:

Questa guida è rivolta a ML engineer e specialisti di dati e AI interessati a esplorare le funzionalità di orchestrazione dei container Kubernetes per la gestione di modelli aperti per l'inferenza. Per saperne di più sui ruoli comuni e sulle attività di esempio a cui viene fatto riferimento nei contenuti di Google Cloud , consulta Ruoli e attività comuni degli utenti GKE.

Prima di iniziare

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Make sure that you have the following role or roles on the project: roles/artifactregistry.admin, roles/browser, roles/compute.networkAdmin, roles/container.clusterAdmin, roles/iam.serviceAccountAdmin, roles/resourcemanager.projectIamAdmin, and roles/serviceusage.serviceUsageAdmin
Check for the roles
1. In the Google Cloud console, go to the IAM page.
  Go to IAM
2. Select the project.
3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
1. In the Google Cloud console, go to the IAM page.
  Vai a IAM
2. Seleziona il progetto.
3. Fai clic su Concedi l'accesso.
4. Nel campo Nuove entità, inserisci il tuo identificatore dell'utente. In genere si tratta dell'indirizzo email di un Account Google.
5. Fai clic su Seleziona un ruolo, quindi cerca il ruolo.
6. Per concedere altri ruoli, fai clic su Aggiungi un altro ruolo e aggiungi ogni ruolo successivo.
7. Fai clic su Salva.

Eroga LLM aperti su GKE utilizzando TPU con un'architettura preconfigurata

Prima di iniziare

Check for the roles

Grant the roles

Ottenere l'accesso al modello

Esegui il provisioning dell'ambiente di inferenza GKE

Avvia Cloud Shell

Esegui il deployment dell'architettura di base

Autopilot

Standard

Esegui il deployment di un modello aperto

Seleziona un modello

Gemma 3 1B-it

Gemma 3 4B-it

Gemma 3 27B-it

Scaricare il modello

Esegui il deployment del modello

Testare il deployment

Esegui la pulizia

Autopilot

Standard

Passaggi successivi

Eroga LLM aperti su GKE utilizzando TPU con un'architettura preconfigurata Mantieni tutto organizzato con le raccolte Salva e classifica i contenuti in base alle tue preferenze.

Prima di iniziare

Check for the roles

Grant the roles

Ottenere l'accesso al modello

Esegui il provisioning dell'ambiente di inferenza GKE

Avvia Cloud Shell

Esegui il deployment dell'architettura di base

Autopilot

Standard

Esegui il deployment di un modello aperto

Seleziona un modello

Gemma 3 1B-it

Gemma 3 4B-it

Gemma 3 27B-it

Scaricare il modello

Esegui il deployment del modello

Testare il deployment

Esegui la pulizia

Autopilot

Standard

Passaggi successivi

Eroga LLM aperti su GKE utilizzando TPU con un'architettura preconfigurata