In this tutorial, you use Model Garden to deploy the Gemma 2B open model to a TPU-backed Vertex AI endpoint. You must deploy a model to an endpoint before that model can be used to serve online predictions. Deploying a model associates physical resources with the model so it can serve online predictions with low latency.
After you deploy the Gemma 2B model, you inference the trained
model by using the PredictionServiceClient
to get online
predictions. Online predictions are synchronous requests made to a
model that is deployed to an endpoint.
Deploy Gemma using Model Garden
You deploy the Gemma 2B model to a ct5lp-hightpu-1t
Compute Engine
machine type that is optimized for small to medium scale training. This machine
has one TPU v5e accelerator. For more information on training models
using TPUs, see Cloud TPU v5e training.
In this tutorial, you deploy the instruction-tuned Gemma 2B open
model by using the model card in Model Garden. The specific model
version is gemma2-2b-it
— -it
stands for instruction-tuned.
The Gemma 2B model has a lower parameter size which means lower resource requirements and more deployment flexibility.
In the Google Cloud console, go to the Model Garden page.
Click the Gemma 2 model card.
Click Deploy to open the Deploy model pane.
In the Deploy model pane, specify these details.
For Deployment environment click Vertex AI.
In the Deploy model section:
For Resource ID, choose
gemma-2b-it
.For Model name and Endpoint name, accept the default values. For example:
- Model name:
gemma2-2b-it-1234567891234
- Endpoint name:
gemma2-2b-it-mg-one-click-deploy
Make a note of the endpoint name. You'll need it to find the endpoint ID used in the code samples.
- Model name:
In the Deployment settings section:
Accept the default option for Basic settings.
For Region, accept the default value or choose a region from the list. Make a note of the region. You'll need it for the code samples.
For Machine spec, choose the TPU backed instance:
ct5lp-hightpu-1t (1 TPU_V5_LITEPOD; ct5lp-hightpu-1t)
.
Click Deploy. When the deployment is finished, receive an email that contains details about your new endpoint. You can also view the endpoint details by clicking Online prediction > Endpoints and selecting your region.
Inference Gemma 2B with the PredictionServiceClient
After you deploy Gemma 2B, you use the PredictionServiceClient
to
get online predictions for the prompt: "Why is the sky blue?"
Code parameters
The PredictionServiceClient
code samples require you to update the following.
PROJECT_ID
: To find your project ID follow these steps.Go to the Welcome page in the Google Cloud console.
From the project picker at the top of the page, select your project.
The project name, project number, and project ID appear after the Welcome heading.
ENDPOINT_REGION
: This is the region where you deployed the endpoint.ENDPOINT_ID
: To find your endpoint ID, view it in the console or run thegcloud ai endpoints list
command. You'll need the endpoint name and region from the Deploy model pane.Console
You can view the endpoint details by clicking Online prediction > Endpoints and selecting your region. Note the number that appears in the
ID
column.gcloud
You can view the endpoint details by running the
gcloud ai endpoints list
command.gcloud ai endpoints list \ --region=ENDPOINT_REGION \ --filter=display_name=ENDPOINT_NAME
The output looks like this.
Using endpoint [https://us-central1-aiplatform.googleapis.com/] ENDPOINT_ID: 1234567891234567891 DISPLAY_NAME: gemma2-2b-it-mg-one-click-deploy
Sample code
In the sample code for your language, update the PROJECT_ID
,
ENDPOINT_REGION
, and ENDPOINT_ID
. Then run your code.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Go API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.