Halaman ini diterjemahkan oleh Cloud Translation API.

Men-deploy dan melakukan inferensi Gemma menggunakan endpoint yang didukung TPU Model Garden dan Vertex AI

Dalam tutorial ini, Anda akan menggunakan Model Garden untuk men-deploy model terbuka Gemma 2B ke endpoint Vertex AI yang didukung TPU. Anda harus men-deploy model ke endpoint sebelum model tersebut dapat digunakan untuk menyajikan prediksi online. Men-deploy model akan mengaitkan resource fisik dengan model tersebut, sehingga dapat menyajikan prediksi online dengan latensi rendah.

Setelah men-deploy model Gemma 2B, Anda melakukan inferensi model terlatih dengan menggunakan PredictionServiceClient untuk mendapatkan prediksi online. Prediksi online adalah permintaan sinkron yang dibuat ke model yang di-deploy ke endpoint.

Tujuan

Tutorial ini menunjukkan cara melakukan tugas-tugas berikut:

Men-deploy model terbuka Gemma 2B ke endpoint yang didukung TPU dengan menggunakan Model Garden
Gunakan PredictionServiceClient untuk mendapatkan prediksi online

Biaya

Di dokumen ini, Anda akan menggunakan komponen Google Cloudyang dapat ditagih berikut:

A ct5lp-hightpu-1t machine type with one TPU_V5 accelerator
Vertex AI prediction and explanation

Untuk membuat perkiraan biaya berdasarkan proyeksi penggunaan Anda, gunakan kalkulator harga.

Pengguna Google Cloud baru mungkin memenuhi syarat untuk mendapatkan uji coba gratis.

Setelah menyelesaikan tugas yang dijelaskan dalam dokumen ini, Anda dapat menghindari penagihan berkelanjutan dengan menghapus resource yang Anda buat. Untuk mengetahui informasi selengkapnya, baca bagian Pembersihan.

Sebelum memulai

Tutorial ini mengharuskan Anda untuk:

Menyiapkan Google Cloud project dan mengaktifkan Vertex AI API
Di komputer lokal Anda:
- Menginstal, melakukan inisialisasi, dan mengautentikasi dengan Google Cloud CLI
- Instal SDK untuk bahasa Anda

Menyiapkan project Google Cloud

Siapkan Google Cloud project Anda dan aktifkan Vertex AI API.

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

Menyiapkan Google Cloud CLI

Di komputer lokal Anda, siapkan Google Cloud CLI.

Instal dan lakukan inisialisasiGoogle Cloud CLI.
Jika sebelumnya Anda telah menginstal gcloud CLI, pastikan komponen gcloud Anda diupdate dengan menjalankan perintah ini.
```
gcloud components update
```
Untuk melakukan autentikasi dengan gcloud CLI, buat file Kredensial Default Aplikasi (ADC) lokal dengan menjalankan perintah ini. Alur web yang diluncurkan oleh perintah digunakan untuk memberikan kredensial pengguna Anda.
```
gcloud auth application-default login
```
Untuk mengetahui informasi selengkapnya, lihat Konfigurasi autentikasi gcloud CLI dan konfigurasi ADC.

Menyiapkan SDK untuk bahasa pemrograman Anda

Untuk menyiapkan lingkungan yang digunakan dalam tutorial ini, Anda menginstal Vertex AI SDK untuk bahasa Anda dan library Protocol Buffers. Contoh kode menggunakan fungsi dari library Protocol Buffers untuk mengonversi kamus input ke format JSON yang diharapkan oleh API.

Di komputer lokal Anda, klik salah satu tab berikut untuk menginstal SDK untuk bahasa pemrograman Anda.

Python

Di komputer lokal Anda, klik salah satu tab berikut untuk menginstal SDK untuk bahasa pemrograman Anda.

Instal dan update Vertex AI SDK untuk Python dengan menjalankan perintah ini.
```
pip3 install --upgrade "google-cloud-aiplatform>=1.64"
```
Instal library Protocol Buffers untuk Python dengan menjalankan perintah ini.
```
pip3 install --upgrade "protobuf>=5.28"
```

Node.js

Instal atau update aiplatform SDK untuk Node.js dengan menjalankan perintah berikut.

npm install @google-cloud/aiplatform

Java

Untuk menambahkan google-cloud-aiplatform sebagai dependensi, tambahkan kode yang sesuai untuk lingkungan Anda.

Maven dengan BOM

Tambahkan HTML berikut ke pom.xml Anda:

<dependencyManagement>
<dependencies>
  <dependency>
    <artifactId>libraries-bom</artifactId>
    <groupId>com.google.cloud</groupId>
    <scope>import</scope>
    <type>pom</type>
    <version>26.34.0</version>
  </dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
  <groupId>com.google.cloud</groupId>
  <artifactId>google-cloud-aiplatform</artifactId>
</dependency>
<dependency>
  <groupId>com.google.protobuf</groupId>
  <artifactId>protobuf-java-util</artifactId>
</dependency>
<dependency>
  <groupId>com.google.code.gson</groupId>
  <artifactId>gson</artifactId>
</dependency>
</dependencies>

Maven tanpa BOM

Tambahkan kode berikut ke pom.xml Anda:

<dependency>
  <groupId>com.google.cloud</groupId>
  <artifactId>google-cloud-aiplatform</artifactId>
  <version>1.1.0</version>
</dependency>
<dependency>
  <groupId>com.google.protobuf</groupId>
  <artifactId>protobuf-java-util</artifactId>
  <version>5.28</version>
</dependency>
<dependency>
  <groupId>com.google.code.gson</groupId>
  <artifactId>gson</artifactId>
  <version>2.11.0</version>
</dependency>

Gradle tanpa BOM

Tambahkan kode berikut ke build.gradle Anda:

implementation 'com.google.cloud:google-cloud-aiplatform:1.1.0'

Go

Instal paket Go ini dengan menjalankan perintah berikut.

go get cloud.google.com/go/aiplatform
go get google.golang.org/protobuf
go get github.com/googleapis/gax-go/v2

Men-deploy Gemma menggunakan Model Garden

Anda men-deploy model Gemma 2B ke jenis mesin Compute Engine ct5lp-hightpu-1t yang dioptimalkan untuk pelatihan skala kecil hingga sedang. Mesin ini memiliki satu akselerator TPU v5e. Untuk mengetahui informasi selengkapnya tentang melatih model menggunakan TPU, lihat Pelatihan Cloud TPU v5e.

Dalam tutorial ini, Anda akan men-deploy model terbuka Gemma 2B yang disesuaikan dengan instruksi menggunakan kartu model di Model Garden. Versi model tertentu adalah gemma2-2b-it — -it adalah singkatan dari instruction-tuned.

Model Gemma 2B memiliki ukuran parameter yang lebih kecil, yang berarti persyaratan resource yang lebih rendah dan fleksibilitas deployment yang lebih tinggi.

Di konsol Google Cloud , buka halaman Model Garden.

Buka Model Garden
Klik kartu model Gemma 2.

Buka Gemma 2
Klik Deploy untuk membuka panel Deploy model.
Di panel Deploy model, tentukan detail ini.
1. Untuk Deployment environment, klik Vertex AI.
2. Di bagian Deploy model:
  1. Untuk Resource ID, pilih gemma-2b-it.
  2. Untuk Model name dan Endpoint name, terima nilai default. Contoh:
    - Nama model: gemma2-2b-it-1234567891234
    - Nama endpoint: gemma2-2b-it-mg-one-click-deploy
    Catat nama endpoint. Anda akan memerlukannya untuk menemukan ID endpoint yang digunakan dalam contoh kode.
3. Di bagian Deployment settings:
  1. Terima opsi default untuk setelan Basic.
  2. Untuk Region, terima nilai default atau pilih wilayah dari daftar. Catat regionnya. Anda akan memerlukannya untuk contoh kode.
  3. Untuk Machine spec, pilih instance yang didukung TPU: ct5lp-hightpu-1t (1 TPU_V5_LITEPOD; ct5lp-hightpu-1t).
Klik Deploy. Saat deployment selesai, Anda akan menerima email yang berisi detail tentang endpoint baru Anda. Anda juga dapat melihat detail endpoint dengan mengklik Online prediction > Endpoints dan memilih region Anda.

Buka Endpoint

Melakukan inferensi Gemma 2B dengan PredictionServiceClient

Setelah men-deploy Gemma 2B, Anda menggunakan PredictionServiceClient untuk mendapatkan prediksi online untuk perintah: "Mengapa langit berwarna biru?"

Parameter kode

Contoh kode PredictionServiceClient mengharuskan Anda memperbarui hal berikut.

PROJECT_ID: Untuk menemukan project ID Anda, ikuti langkah-langkah berikut.
1. Buka halaman Selamat Datang di konsol Google Cloud .
  Buka Selamat Datang
2. Dari pemilih project di bagian atas halaman, pilih project Anda.
  
  Nama project, nomor project, dan project ID muncul setelah heading Selamat datang.
ENDPOINT_REGION: Ini adalah region tempat Anda men-deploy endpoint.
ENDPOINT_ID: Untuk menemukan ID endpoint, lihat di konsol atau jalankan perintah gcloud ai endpoints list. Anda memerlukan nama dan region endpoint dari panel Deploy model.
Konsol
Anda dapat melihat detail endpoint dengan mengklik Online prediction > Endpoints dan memilih region Anda. Perhatikan angka yang muncul di kolom ID.

Buka Endpoint
gcloud
Anda dapat melihat detail endpoint dengan menjalankan perintah gcloud ai endpoints list.
```
gcloud ai endpoints list \
  --region=ENDPOINT_REGION \
  --filter=display_name=ENDPOINT_NAME
```
Outputnya akan terlihat seperti ini.
```
Using endpoint [https://us-central1-aiplatform.googleapis.com/]
ENDPOINT_ID: 1234567891234567891
DISPLAY_NAME: gemma2-2b-it-mg-one-click-deploy
```

Kode contoh

Dalam kode contoh untuk bahasa Anda, perbarui PROJECT_ID, ENDPOINT_REGION, dan ENDPOINT_ID. Kemudian jalankan kode Anda.

Python

Untuk mempelajari cara menginstal atau mengupdate Vertex AI SDK untuk Python, lihat Menginstal Vertex AI SDK untuk Python. Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API Python.

"""
Sample to run inference on a Gemma2 model deployed to a Vertex AI endpoint with TPU accellerators.
"""

from google.cloud import aiplatform
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value

# TODO(developer): Update & uncomment lines below
# PROJECT_ID = "your-project-id"
# ENDPOINT_REGION = "your-vertex-endpoint-region"
# ENDPOINT_ID = "your-vertex-endpoint-id"

# Default configuration
config = {"max_tokens": 1024, "temperature": 0.9, "top_p": 1.0, "top_k": 1}

# Prompt used in the prediction
prompt = "Why is the sky blue?"

# Encapsulate the prompt in a correct format for TPUs
# Example format: [{'prompt': 'Why is the sky blue?', 'temperature': 0.9}]
input = {"prompt": prompt}
input.update(config)

# Convert input message to a list of GAPIC instances for model input
instances = [json_format.ParseDict(input, Value())]

# Create a client
api_endpoint = f"{ENDPOINT_REGION}-aiplatform.googleapis.com"
client = aiplatform.gapic.PredictionServiceClient(
    client_options={"api_endpoint": api_endpoint}
)

# Call the Gemma2 endpoint
gemma2_end_point = (
    f"projects/{PROJECT_ID}/locations/{ENDPOINT_REGION}/endpoints/{ENDPOINT_ID}"
)
response = client.predict(
    endpoint=gemma2_end_point,
    instances=instances,
)
text_responses = response.predictions
print(text_responses[0])

Node.js

Sebelum mencoba contoh ini, ikuti petunjuk penyiapan Node.js di Panduan memulai Vertex AI menggunakan library klien. Untuk mengetahui informasi selengkapnya, lihat Dokumentasi referensi API Node.js Vertex AI.

Untuk melakukan autentikasi ke Vertex AI, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

// Imports the Google Cloud Prediction Service Client library
const {
  // TODO(developer): Uncomment PredictionServiceClient before running the sample.
  // PredictionServiceClient,
  helpers,
} = require('@google-cloud/aiplatform');
/**
 * TODO(developer): Update these variables before running the sample.
 */
const projectId = 'your-project-id';
const endpointRegion = 'your-vertex-endpoint-region';
const endpointId = 'your-vertex-endpoint-id';

// Prompt used in the prediction
const prompt = 'Why is the sky blue?';

// Encapsulate the prompt in a correct format for TPUs
// Example format: [{prompt: 'Why is the sky blue?', temperature: 0.9}]
const input = {
  prompt,
  // Parameters for default configuration
  maxOutputTokens: 1024,
  temperature: 0.9,
  topP: 1.0,
  topK: 1,
};

// Convert input message to a list of GAPIC instances for model input
const instances = [helpers.toValue(input)];

// TODO(developer): Uncomment apiEndpoint and predictionServiceClient before running the sample.
// const apiEndpoint = `${endpointRegion}-aiplatform.googleapis.com`;

// Create a client
// predictionServiceClient = new PredictionServiceClient({apiEndpoint});

// Call the Gemma2 endpoint
const gemma2Endpoint = `projects/${projectId}/locations/${endpointRegion}/endpoints/${endpointId}`;

const [response] = await predictionServiceClient.predict({
  endpoint: gemma2Endpoint,
  instances,
});

const predictions = response.predictions;
const text = predictions[0].stringValue;

console.log('Predictions:', text);

Java

Sebelum mencoba contoh ini, ikuti petunjuk penyiapan Java di Panduan memulai Vertex AI menggunakan library klien. Untuk mengetahui informasi selengkapnya, lihat Dokumentasi referensi API Java Vertex AI.

Untuk melakukan autentikasi ke Vertex AI, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.


import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.PredictResponse;
import com.google.cloud.aiplatform.v1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1.PredictionServiceSettings;
import com.google.gson.Gson;
import com.google.protobuf.InvalidProtocolBufferException;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Gemma2PredictTpu {
  private final PredictionServiceClient predictionServiceClient;

  // Constructor to inject the PredictionServiceClient
  public Gemma2PredictTpu(PredictionServiceClient predictionServiceClient) {
    this.predictionServiceClient = predictionServiceClient;
  }

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String endpointRegion = "us-west1";
    String endpointId = "YOUR_ENDPOINT_ID";

    PredictionServiceSettings predictionServiceSettings =
        PredictionServiceSettings.newBuilder()
            .setEndpoint(String.format("%s-aiplatform.googleapis.com:443", endpointRegion))
            .build();
    PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings);
    Gemma2PredictTpu creator = new Gemma2PredictTpu(predictionServiceClient);

    creator.gemma2PredictTpu(projectId, endpointRegion, endpointId);
  }

  // Demonstrates how to run inference on a Gemma2 model
  // deployed to a Vertex AI endpoint with TPU accelerators.
  public String gemma2PredictTpu(String projectId, String region,
           String endpointId) throws IOException {
    Map<String, Object> paramsMap = new HashMap<>();
    paramsMap.put("temperature", 0.9);
    paramsMap.put("maxOutputTokens", 1024);
    paramsMap.put("topP", 1.0);
    paramsMap.put("topK", 1);
    Value parameters = mapToValue(paramsMap);
    // Prompt used in the prediction
    String instance = "{ \"prompt\": \"Why is the sky blue?\"}";
    Value.Builder instanceValue = Value.newBuilder();
    JsonFormat.parser().merge(instance, instanceValue);
    // Encapsulate the prompt in a correct format for TPUs
    // Example format: [{'prompt': 'Why is the sky blue?', 'temperature': 0.9}]
    List<Value> instances = new ArrayList<>();
    instances.add(instanceValue.build());

    EndpointName endpointName = EndpointName.of(projectId, region, endpointId);

    PredictResponse predictResponse = this.predictionServiceClient
        .predict(endpointName, instances, parameters);
    String textResponse = predictResponse.getPredictions(0).getStringValue();
    System.out.println(textResponse);
    return textResponse;
  }

  private static Value mapToValue(Map<String, Object> map) throws InvalidProtocolBufferException {
    Gson gson = new Gson();
    String json = gson.toJson(map);
    Value.Builder builder = Value.newBuilder();
    JsonFormat.parser().merge(json, builder);
    return builder.build();
  }
}

Go

Sebelum mencoba contoh ini, ikuti petunjuk penyiapan Go di Panduan memulai Vertex AI menggunakan library klien. Untuk mengetahui informasi selengkapnya, lihat Dokumentasi referensi API Go Vertex AI.

Untuk melakukan autentikasi ke Vertex AI, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/aiplatform/apiv1/aiplatformpb"

	"google.golang.org/protobuf/types/known/structpb"
)

// predictTPU demonstrates how to run interference on a Gemma2 model deployed to a Vertex AI endpoint with TPU accelerators.
func predictTPU(w io.Writer, client PredictionsClient, projectID, location, endpointID string) error {
	ctx := context.Background()

	// Note: client can be initialized in the following way:
	// apiEndpoint := fmt.Sprintf("%s-aiplatform.googleapis.com:443", location)
	// client, err := aiplatform.NewPredictionClient(ctx, option.WithEndpoint(apiEndpoint))
	// if err != nil {
	// 	return fmt.Errorf("unable to create prediction client: %v", err)
	// }
	// defer client.Close()

	gemma2Endpoint := fmt.Sprintf("projects/%s/locations/%s/endpoints/%s", projectID, location, endpointID)
	prompt := "Why is the sky blue?"
	parameters := map[string]interface{}{
		"temperature":     0.9,
		"maxOutputTokens": 1024,
		"topP":            1.0,
		"topK":            1,
	}

	// Encapsulate the prompt in a correct format for TPUs.
	// Example format: [{'prompt': 'Why is the sky blue?', 'temperature': 0.9}]
	promptValue, err := structpb.NewValue(map[string]interface{}{
		"prompt":     prompt,
		"parameters": parameters,
	})
	if err != nil {
		fmt.Fprintf(w, "unable to convert prompt to Value: %v", err)
		return err
	}

	req := &aiplatformpb.PredictRequest{
		Endpoint:  gemma2Endpoint,
		Instances: []*structpb.Value{promptValue},
	}

	resp, err := client.Predict(ctx, req)
	if err != nil {
		return err
	}

	prediction := resp.GetPredictions()
	value := prediction[0].GetStringValue()
	fmt.Fprintf(w, "%v", value)

	return nil
}

Pembersihan

Agar tidak perlu membayar biaya pada akun Google Cloud Anda untuk resource yang digunakan dalam tutorial ini, hapus project yang berisi resource tersebut, atau simpan project dan hapus setiap resource.

Menghapus project

Perhatian: Penghapusan project memiliki efek berikut:

Semua hal dalam project akan dihapus. Jika Anda menggunakan project Anda sendiri untuk mengerjakan tugas dalam dokumen ini, saat Anda menghapusnya, pekerjaan lain dalam project tersebut juga akan dihapus.
Project ID kustom akan hilang. Saat membuat project ini, Anda mungkin juga membuat project ID kustom yang masih ingin Anda gunakan pada masa mendatang. Agar tidak kehilangan URL yang menggunakan project ID tersebut, seperti URL appspot.com, hapus resource yang dipilih di dalam project, bukan menghapus seluruh project.

Jika berencana mempelajari beberapa arsitektur, tutorial, atau panduan memulai, Anda dapat menggunakan kembali project untuk membantu Anda agar tidak melampaui batas kuota project.

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

Menghapus resource satu per satu

Jika Anda mempertahankan project, hapus resource yang digunakan dalam tutorial ini:

Membatalkan deployment model dan menghapus endpoint
Menghapus model dari Model Registry

Membatalkan deployment model dan menghapus endpoint

Gunakan salah satu metode berikut untuk membatalkan deployment model dan menghapus endpoint.

Konsol

Di konsol Google Cloud , klik Online prediction, lalu klik Endpoints.

Buka halaman Endpoint
Di menu drop-down Region, pilih region tempat Anda men-deploy endpoint.
Klik nama endpoint untuk membuka halaman detail. Contoh: gemma2-2b-it-mg-one-click-deploy
Di baris untuk model Gemma 2 (Version 1), klik Tindakan, lalu klik Batalkan deployment model dari endpoint.
Pada dialog Batalkan deployment model dari endpoint, klik Batalkan deployment.
Klik tombol Back untuk kembali ke halaman Endpoints.

Buka halaman Endpoint
Di akhir baris gemma2-2b-it-mg-one-click-deploy, klik Tindakan, lalu pilih Hapus endpoint.
Di perintah konfirmasi, klik Konfirmasi.

gcloud

Untuk membatalkan deployment model dan menghapus endpoint menggunakan Google Cloud CLI, ikuti langkah-langkah berikut.

Dalam perintah ini, ganti:

PROJECT_ID dengan nama project Anda
LOCATION_ID dengan region tempat Anda men-deploy model dan endpoint
ENDPOINT_ID dengan ID endpoint
DEPLOYED_MODEL_NAME dengan nama tampilan model
DEPLOYED_MODEL_ID dengan ID model

Dapatkan ID endpoint dengan menjalankan perintah gcloud ai endpoints list. Perintah ini mencantumkan ID endpoint untuk semua endpoint di project Anda. Catat ID endpoint yang digunakan dalam tutorial ini.
```
gcloud ai endpoints list \
    --project=PROJECT_ID \
    --region=LOCATION_ID
```
Outputnya akan terlihat seperti ini. Dalam output, ID disebut ENDPOINT_ID.
```
Using endpoint [https://us-central1-aiplatform.googleapis.com/]
ENDPOINT_ID: 1234567891234567891
DISPLAY_NAME: gemma2-2b-it-mg-one-click-deploy
```

Dapatkan ID model dengan menjalankan perintah gcloud ai models describe. Catat ID model yang Anda deploy dalam tutorial ini.

gcloud ai models describe DEPLOYED_MODEL_NAME \
    --project=PROJECT_ID \
    --region=LOCATION_ID

Output yang disingkat akan terlihat seperti ini. Dalam output, ID disebut deployedModelId.

Using endpoint [https://us-central1-aiplatform.googleapis.com/]
artifactUri: [URI removed]
baseModelSource:
  modelGardenSource:
    publicModelName: publishers/google/models/gemma2
...
deployedModels:
- deployedModelId: '1234567891234567891'
  endpoint: projects/12345678912/locations/us-central1/endpoints/12345678912345
displayName: gemma2-2b-it-12345678912345
etag: [ETag removed]
modelSourceInfo:
  sourceType: MODEL_GARDEN
name: projects/123456789123/locations/us-central1/models/gemma2-2b-it-12345678912345
...

Batalkan deployment model dari endpoint. Anda akan memerlukan ID endpoint dan ID model dari perintah sebelumnya.

gcloud ai endpoints undeploy-model ENDPOINT_ID \
    --project=PROJECT_ID \
    --region=LOCATION_ID \
    --deployed-model-id=DEPLOYED_MODEL_ID

Perintah ini tidak menghasilkan output.

Jalankan perintah gcloud ai endpoints delete untuk menghapus endpoint.
```
gcloud ai endpoints delete ENDPOINT_ID \
    --project=PROJECT_ID \
    --region=LOCATION_ID
```
Jika diminta, ketik y untuk mengonfirmasi. Perintah ini tidak menghasilkan output.

Menghapus model

Konsol

Buka halaman Model Registry dari bagian Vertex AI di konsol Google Cloud .

Buka halaman Model Registry
Di menu drop-down Region, pilih region tempat Anda men-deploy model.
Di akhir baris gemma2-2b-it-1234567891234, klik Tindakan.
Pilih Hapus model.

Saat Anda menghapus model, semua versi dan evaluasi model terkait akan dihapus dari Google Cloud project Anda.
Pada perintah konfirmasi, klik Hapus.

gcloud

Untuk menghapus model menggunakan Google Cloud CLI, berikan nama tampilan dan region model ke perintah gcloud ai models delete.

gcloud ai models delete DEPLOYED_MODEL_NAME \
    --project=PROJECT_ID \
    --region=LOCATION_ID

Ganti DEPLOYED_MODEL_NAME dengan nama tampilan model. Ganti PROJECT_ID dengan nama project Anda. Ganti LOCATION_ID dengan region tempat Anda men-deploy model.

Langkah berikutnya

Pelajari lebih lanjut model terbuka Gemma.
Baca Persyaratan Penggunaan Gemma.
Pelajari lebih lanjut model terbuka.
Pelajari cara men-deploy model yang disesuaikan.
Pelajari cara men-deploy Gemma 2 ke Google Kubernetes Engine menggunakan HuggingFace Textgen Inference (TGI).
Pelajari lebih lanjut PredictionServiceClient dalam bahasa pilihan Anda: Python, Node.js, Java, atau Go.

Men-deploy dan melakukan inferensi Gemma menggunakan endpoint yang didukung TPU Model Garden dan Vertex AI Tetap teratur dengan koleksi Simpan dan kategorikan konten berdasarkan preferensi Anda.

Tujuan

Biaya

Sebelum memulai

Menyiapkan project Google Cloud

Menyiapkan Google Cloud CLI

Menyiapkan SDK untuk bahasa pemrograman Anda

Python

Node.js

Java

Maven dengan BOM

Maven tanpa BOM

Gradle tanpa BOM

Go

Men-deploy Gemma menggunakan Model Garden

Melakukan inferensi Gemma 2B dengan PredictionServiceClient

Parameter kode

Konsol

gcloud

Kode contoh

Python

Node.js

Java

Go

Pembersihan

Menghapus project

Menghapus resource satu per satu

Membatalkan deployment model dan menghapus endpoint

Konsol

gcloud

Menghapus model

Konsol

gcloud

Langkah berikutnya

Men-deploy dan melakukan inferensi Gemma menggunakan endpoint yang didukung TPU Model Garden dan Vertex AI