Generate text by using a Gemma open model and the ML.GENERATE_TEXT function
This tutorial shows you how to create a
remote model
that's based on the
Gemma model,
and then how to use that model with the
ML.GENERATE_TEXT
function
to extract keywords and perform sentiment analysis on movie reviews from
the bigquery-public-data.imdb.reviews
public table.
Required permissions
To run this tutorial, you need the following Identity and Access Management (IAM) roles:
- Create and use BigQuery datasets, connections, and models:
BigQuery Admin (
roles/bigquery.admin
). - Grant permissions to the connection's service account: Project IAM Admin
(
roles/resourcemanager.projectIamAdmin
). - Deploy and undeploy models in Vertex AI: Vertex AI Administrator
(
roles/aiplatform.admin
).
These predefined roles contain the permissions required to perform the tasks in this document. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
- Create a dataset:
bigquery.datasets.create
- Create, delegate, and use a connection:
bigquery.connections.*
- Set the default connection:
bigquery.config.*
- Set service account permissions:
resourcemanager.projects.getIamPolicy
andresourcemanager.projects.setIamPolicy
- Deploy and undeploy a Vertex AI model:
aiplatform.endpoints.deploy
aiplatform.endpoints.undeploy
- Create a model and run inference:
bigquery.jobs.create
bigquery.models.create
bigquery.models.getData
bigquery.models.updateData
bigquery.models.updateMetadata
You might also be able to get these permissions with custom roles or other predefined roles.
Costs
In this document, you use the following billable components of Google Cloud:
- BigQuery ML: You incur costs for the data that you process in BigQuery.
- Vertex AI: You incur costs for calls to the Vertex AI model that's represented by the remote model.
To generate a cost estimate based on your projected usage,
use the pricing calculator.
For more information about BigQuery pricing, see BigQuery pricing in the BigQuery documentation.
Open models that you deploy to Vertex AI are charged per machine-hour. This means billing starts as soon as the endpoint is fully set up, and continues until you un-deploy it. For more information about Vertex AI pricing, see the Vertex AI pricing page.
Before you begin
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.
Deploy a Gemma model on Vertex AI
Deploy the
gemma-2-27b-it
model to Vertex AI, following the instructions in
Deploy Model Garden models.
During deployment, you must select Public (shared endpoint) as the value
for the Endpoint access field in the deployment workflow.
Create a dataset
Create a BigQuery dataset to store your ML model.
Console
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, click your project name.
Click
View actions > Create dataset.On the Create dataset page, do the following:
For Dataset ID, enter
bqml_tutorial
.For Location type, select Multi-region, and then select US (multiple regions in United States).
Leave the remaining default settings as they are, and click Create dataset.
bq
To create a new dataset, use the
bq mk
command
with the --location
flag. For a full list of possible parameters, see the
bq mk --dataset
command
reference.
Create a dataset named
bqml_tutorial
with the data location set toUS
and a description ofBigQuery ML tutorial dataset
:bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial
Instead of using the
--dataset
flag, the command uses the-d
shortcut. If you omit-d
and--dataset
, the command defaults to creating a dataset.Confirm that the dataset was created:
bq ls
API
Call the datasets.insert
method with a defined dataset resource.
{ "datasetReference": { "datasetId": "bqml_tutorial" } }
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
Create the remote model
Create a remote model that represents a hosted Vertex AI model:
In the Google Cloud console, go to the BigQuery page.
In the query editor, run the following statement:
CREATE OR REPLACE MODEL `bqml_tutorial.gemma_model` REMOTE WITH CONNECTION DEFAULT OPTIONS (ENDPOINT = 'https://ENDPOINT_REGION-aiplatform.googleapis.com/v1/projects/ENDPOINT_PROJECT_ID/locations/ENDPOINT_REGION/endpoints/ENDPOINT_ID');
Replace the following:
ENDPOINT_REGION
: the region in which the open model is deployed.ENDPOINT_PROJECT_ID
: the project in which the open model is deployed.ENDPOINT_ID
: the ID of the HTTPS endpoint used by the open model. You can get the endpoint ID by locating the open model on the Online prediction page and copying the value in the ID field.
The following example shows the format of a valid HTTP endpoint:
https://us-central1-aiplatform.googleapis.com/v1/projects/myproject/locations/us-central1/endpoints/1234
.
The query takes several seconds to complete, after which the gemma_model
model appears in the bqml_tutorial
dataset in the Explorer pane.
Because the query uses a CREATE MODEL
statement to create a model, there
are no query results.
Perform keyword extraction
Perform keyword extraction on IMDB movie reviews by
using the remote model and the ML.GENERATE_TEXT
function:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement to perform keyword extraction on five movie reviews:
SELECT * FROM ML.GENERATE_TEXT( MODEL `bqml_tutorial.gemma_model`, ( SELECT CONCAT('Extract the key words from the movie review below: ', review) AS prompt, * FROM `bigquery-public-data.imdb.reviews` LIMIT 10 ), STRUCT( 0.2 AS temperature, 100 AS max_output_tokens, TRUE AS flatten_json_output));
The output is similar to the following, with non-generated columns omitted for clarity:
+----------------------------------------------+-------------------------+-----------------------------+-----+ | generated_text | ml_generate_text_status | prompt | ... | +----------------------------------------------+-------------------------------------------------------+-----+ | Here are some key words from the | | Extract the key words from | | | movie review: * **Romance:** | | the movie review below: | | | "romantic tryst," "elope" * **Comedy:** | | Linda Arvidson (as Jennie) | | | "Contrived Comedy" * **Burglary:** | | and Harry Solter (as Frank) | | | "burglar," "rob," "booty" * **Chase:** | | are enjoying a romantic | | | "chases," "escape" * **Director:** "D.W. | | tryst, when in walks her | | | Griffith" * **Actors:** "Linda Arvidson,"... | | father Charles Inslee;... | | +----------------------------------------------+-------------------------+-----------------------------+-----+ | Here are some key words from the | | Extract the key words from | | | movie review: * **Elderbush Gilch:** The | | the movie review below: | | | name of the movie being reviewed. * | | This is the second addition | | | **Disappointment:** The reviewer's | | to Frank Baum's personally | | | overall feeling about the film. * | | produced trilogy of Oz | | | **Dim-witted:** Describes the story | | films. It's essentially the | | | line negatively. * **Moronic, sadistic,... | | same childishness as the... | | +----------------------------------------------+-------------------------+-----------------------------+-----+
The results include the following columns:
generated_text
: the generated text.ml_generate_text_status
: the API response status for the corresponding row. If the operation was successful, this value is empty.prompt
: the prompt that is used for the sentiment analysis.- All of the columns from the
bigquery-public-data.imdb.reviews
table.
Perform sentiment analysis
Perform sentiment analysis on IMDB movie reviews by
using the remote model and the ML.GENERATE_TEXT
function:
In the Google Cloud console, go to the BigQuery page.
In the query editor, run the following statement to perform sentiment analysis on five movie reviews:
SELECT * FROM ML.GENERATE_TEXT( MODEL `bqml_tutorial.gemma_model`, ( SELECT CONCAT('Analyze the sentiment of the following movie review and classify it as either POSITIVE or NEGATIVE. \nMovie Review: ', review) AS prompt, * FROM `bigquery-public-data.imdb.reviews` LIMIT 10 ), STRUCT( 0.2 AS temperature, 128 AS max_output_tokens, TRUE AS flatten_json_output));
The output is similar to the following, with non-generated columns omitted for clarity:
+----------------------------------------------+-------------------------+-----------------------------+-----+ | generated_text | ml_generate_text_status | prompt | ... | +----------------------------------------------+-------------------------------------------------------+-----+ | **Sentiment:** NEGATIVE **Justification:** | | Analyze the sentiment of | | | * **Negative Language:** The reviewer uses | | movie review and classify | | | phrases like "don't quite make it," "come to | | it as either POSITIVE or | | | mind," "quite disappointing," and "not many | | NEGATIVE. Movie Review: | | | laughs." * **Specific Criticisms:** The | | Although Charlie Chaplin | | | reviewer points out specific flaws in the | | made some great short | | | plot and humor, stating that the manager... | | comedies in the late... | | +----------------------------------------------+-------------------------+-----------------------------+-----+ | **Sentiment:** NEGATIVE **Reasoning:** | | Analyze the sentiment of | | | * **Negative Language:** The reviewer uses | | movie review and classify | | | phrases like "poor writing," "static camera- | | it as either POSITIVE or | | | work," "chews the scenery," "all surface and | | NEGATIVE. Movie Review: | | | no depth," "sterile spectacles," which all | | Opulent sets and sumptuous | | | carry negative connotations. * **Comparison | | costumes well photographed | | | to a More Successful Film:**... | | by Theodor Sparkuhl, and... | | +----------------------------------------------+-------------------------+-----------------------------+-----+
The results include the same columns documented for Perform keyword extraction.
Undeploy model
If you choose not to delete your project as recommended, make sure to undeploy the Gemma model in Vertex AI to avoid continued billing for it.
Clean up
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.