使用 Gemma 开放模型和 ML.GENERATE_TEXT 函数生成文本
本教程介绍了如何创建基于 Gemma 模型的远程模型,然后介绍了如何将该模型与 ML.GENERATE_TEXT 函数搭配使用,以便从 bigquery-public-data.imdb.reviews 公共表中提取关键字并对电影评价执行情感分析:
所需权限
如需运行本教程,您需要拥有以下 Identity and Access Management (IAM) 角色:
- 创建和使用 BigQuery 数据集、连接和模型:BigQuery Admin (
roles/bigquery.admin)。 - 向连接的服务账号授予权限:Project IAM Admin (
roles/resourcemanager.projectIamAdmin)。 - 在 Vertex AI 中部署和取消部署模型:Vertex AI Administrator (
roles/aiplatform.admin)。
这些预定义角色包含执行本文档中的任务所需的权限。如需查看所需的确切权限,请展开所需权限部分:
所需权限
- 创建数据集:
bigquery.datasets.create - 创建、委托和使用连接:
bigquery.connections.* - 设置默认连接:
bigquery.config.* - 设置服务账号权限:
resourcemanager.projects.getIamPolicy和resourcemanager.projects.setIamPolicy - 部署和取消部署 Vertex AI 模型:
aiplatform.endpoints.deployaiplatform.endpoints.undeploy
- 创建模型并运行推断:
bigquery.jobs.createbigquery.models.createbigquery.models.getDatabigquery.models.updateDatabigquery.models.updateMetadata
费用
在本文档中,您将使用 Google Cloud的以下收费组件:
- BigQuery ML: You incur costs for the data that you process in BigQuery.
- Vertex AI: You incur costs for calls to the Vertex AI model that's represented by the remote model.
如需根据您的预计使用量来估算费用,请使用价格计算器。
如需详细了解 BigQuery 价格,请参阅 BigQuery 文档中的 BigQuery 价格。
您部署到 Vertex AI 的开放模型按机器时长收费。这意味着,端点完全设置完毕后,系统就会开始计费,并会一直计费,直到您取消部署为止。如需详细了解 Vertex AI 价格,请参阅 Vertex AI 价格页面。
准备工作
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
创建数据集
创建 BigQuery 数据集以存储机器学习模型。
控制台
在 Google Cloud 控制台中,前往 BigQuery 页面。
在探索器窗格中,点击您的项目名称。
点击 查看操作 > 创建数据集
在 创建数据集 页面上,执行以下操作:
在数据集 ID 部分,输入
bqml_tutorial。在位置类型部分,选择多区域,然后选择 US (multiple regions in United States)(美国[美国的多个区域])。
保持其余默认设置不变,然后点击创建数据集。
bq
如需创建新数据集,请使用带有 --location 标志的 bq mk 命令。 如需查看完整的潜在参数列表,请参阅 bq mk --dataset 命令参考文档。
创建一个名为
bqml_tutorial的数据集,并将数据位置设置为US,说明为BigQuery ML tutorial dataset:bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial
该命令使用的不是
--dataset标志,而是-d快捷方式。如果省略-d和--dataset,该命令会默认创建一个数据集。确认已创建数据集:
bq ls
API
使用已定义的数据集资源调用 datasets.insert 方法。
{ "datasetReference": { "datasetId": "bqml_tutorial" } }
BigQuery DataFrame
在尝试此示例之前,请按照《BigQuery 快速入门:使用 BigQuery DataFrames》中的 BigQuery DataFrames 设置说明进行操作。如需了解详情,请参阅 BigQuery DataFrames 参考文档。
如需向 BigQuery 进行身份验证,请设置应用默认凭证。如需了解详情,请参阅为本地开发环境设置 ADC。
创建远程模型
创建一个代表托管式 Vertex AI 模型的远程模型:
在 Google Cloud 控制台中,前往 BigQuery 页面。
在查询编辑器中,运行以下语句:
CREATE OR REPLACE MODEL `bqml_tutorial.gemma_model` REMOTE WITH CONNECTION DEFAULT OPTIONS ( MODEL_GARDEN_MODEL_NAME = 'publishers/google/models/gemma3@gemma-3-270m', MACHINE_TYPE = 'g2-standard-12' );
查询最多需要 20 分钟才能完成,之后 gemma_model 模型会显示在探索器窗格的 bqml_tutorial 数据集中。由于查询使用 CREATE MODEL 语句来创建模型,因此没有查询结果。
执行关键字提取
使用远程模型和 ML.GENERATE_TEXT 函数对 IMDB 电影评价执行关键字提取:
在 Google Cloud 控制台中,前往 BigQuery 页面。
在查询编辑器中,输入以下语句,对五项电影评论执行关键字提取:
SELECT * FROM ML.GENERATE_TEXT( MODEL `bqml_tutorial.gemma_model`, ( SELECT CONCAT('Extract the key words from the movie review below: ', review) AS prompt, * FROM `bigquery-public-data.imdb.reviews` LIMIT 10 ), STRUCT( 0.2 AS temperature, 100 AS max_output_tokens, TRUE AS flatten_json_output));
输出类似于以下内容,为清楚起见,省略了非生成的列:
+----------------------------------------------+-------------------------+-----------------------------+-----+ | generated_text | ml_generate_text_status | prompt | ... | +----------------------------------------------+-------------------------------------------------------+-----+ | Here are some key words from the | | Extract the key words from | | | movie review: * **Romance:** | | the movie review below: | | | "romantic tryst," "elope" * **Comedy:** | | Linda Arvidson (as Jennie) | | | "Contrived Comedy" * **Burglary:** | | and Harry Solter (as Frank) | | | "burglar," "rob," "booty" * **Chase:** | | are enjoying a romantic | | | "chases," "escape" * **Director:** "D.W. | | tryst, when in walks her | | | Griffith" * **Actors:** "Linda Arvidson,"... | | father Charles Inslee;... | | +----------------------------------------------+-------------------------+-----------------------------+-----+ | Here are some key words from the | | Extract the key words from | | | movie review: * **Elderbush Gilch:** The | | the movie review below: | | | name of the movie being reviewed. * | | This is the second addition | | | **Disappointment:** The reviewer's | | to Frank Baum's personally | | | overall feeling about the film. * | | produced trilogy of Oz | | | **Dim-witted:** Describes the story | | films. It's essentially the | | | line negatively. * **Moronic, sadistic,... | | same childishness as the... | | +----------------------------------------------+-------------------------+-----------------------------+-----+
结果包括以下列:
generated_text:生成的文本。ml_generate_text_status:相应行的 API 响应状态。如果操作成功,则此值为空。prompt:用于情感分析的提示。bigquery-public-data.imdb.reviews表中的所有列。
执行情感分析
使用远程模型和 ML.GENERATE_TEXT 函数对 IMDB 电影评论进行情感分析:
在 Google Cloud 控制台中,前往 BigQuery 页面。
在查询编辑器中,运行以下语句,对五项电影评价执行情感分析:
SELECT * FROM ML.GENERATE_TEXT( MODEL `bqml_tutorial.gemma_model`, ( SELECT CONCAT('Analyze the sentiment of the following movie review and classify it as either POSITIVE or NEGATIVE. \nMovie Review: ', review) AS prompt, * FROM `bigquery-public-data.imdb.reviews` LIMIT 10 ), STRUCT( 0.2 AS temperature, 128 AS max_output_tokens, TRUE AS flatten_json_output));
输出类似于以下内容,为清楚起见,省略了非生成的列:
+----------------------------------------------+-------------------------+-----------------------------+-----+ | generated_text | ml_generate_text_status | prompt | ... | +----------------------------------------------+-------------------------------------------------------+-----+ | **Sentiment:** NEGATIVE **Justification:** | | Analyze the sentiment of | | | * **Negative Language:** The reviewer uses | | movie review and classify | | | phrases like "don't quite make it," "come to | | it as either POSITIVE or | | | mind," "quite disappointing," and "not many | | NEGATIVE. Movie Review: | | | laughs." * **Specific Criticisms:** The | | Although Charlie Chaplin | | | reviewer points out specific flaws in the | | made some great short | | | plot and humor, stating that the manager... | | comedies in the late... | | +----------------------------------------------+-------------------------+-----------------------------+-----+ | **Sentiment:** NEGATIVE **Reasoning:** | | Analyze the sentiment of | | | * **Negative Language:** The reviewer uses | | movie review and classify | | | phrases like "poor writing," "static camera- | | it as either POSITIVE or | | | work," "chews the scenery," "all surface and | | NEGATIVE. Movie Review: | | | no depth," "sterile spectacles," which all | | Opulent sets and sumptuous | | | carry negative connotations. * **Comparison | | costumes well photographed | | | to a More Successful Film:**... | | by Theodor Sparkuhl, and... | | +----------------------------------------------+-------------------------+-----------------------------+-----+结果包含执行关键字提取中记录的列。
取消部署模型
如果您选择不按照建议删除项目,则必须在 Vertex AI 中取消部署 Gemma 模型,以免继续为其付费。在指定闲置时间段(默认值为 6.5 小时)过后,BigQuery 会自动取消部署模型。或者,您也可以使用 ALTER MODEL 语句立即取消部署模型,如以下示例所示:
ALTER MODEL `bqml_tutorial.gemma_model` SET OPTIONS (deploy_model = false);
如需了解详情,请参阅自动或立即取消部署开放模型。
清理
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.