Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

搜索图表

本教程介绍了如何使用自主嵌入生成和 AI.SEARCH函数对图数据执行语义搜索。

目标

本教程介绍了以下任务：

创建用于存储人员、金融账号、账号所有权和账号转账相关信息的表。
使用自主嵌入生成来简化嵌入维护工作流。
创建用于定义存储在表中的数据之间关系的图。
对图节点使用 AI.SEARCH 函数，以对账号说明执行语义搜索。
对图边使用 AI.SEARCH 函数，以对账号转账备注执行语义搜索。

费用

在本文档中，您将使用的以下收费组件： Google Cloud

BigQuery: You incur costs for the data that you process in BigQuery.

您可使用价格计算器根据您的预计使用情况来估算费用。

新 Google Cloud 用户可能有资格申请免费试用。

完成本文档中描述的任务后，您可以通过删除所创建的资源来避免继续计费。如需了解详情，请参阅清理。

准备工作

控制台

登录您的 Google Cloud 账号。如果您是新手，请创建一个账号来评估我们的产品在实际场景中的表现。 Google Cloud新客户还可获享 $300 赠金，用于运行、测试和部署工作负载。

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the BigQuery API.

Roles required to enable APIs

To enable APIs, you need the serviceusage.services.enable permission. If you created the project, then you likely already have this permission through the Owner role (roles/owner). Otherwise, you can get this permission through the Service Usage Admin role (roles/serviceusage.serviceUsageAdmin). Learn how to grant roles.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the BigQuery API.

Roles required to enable APIs

Enable the API

确保您拥有项目的以下角色： BigQuery Data Editor、 Project IAM Admin
检查角色
1. 在 Google Cloud 控制台中，前往 IAM 页面。
  转到 IAM
2. 选择项目。
3. 在主账号 列中，找到标识您或您所属群组的所有行。如需了解您属于哪些群组，请与您的管理员联系。
4. 对于指定或包含您的所有行，请检查角色列以查看角色列表是否包含所需的角色。
授予角色
1. 在 Google Cloud 控制台中，前往 IAM 页面。
  转到 IAM
2. 选择项目。
3. 点击 授予访问权限。
4. 在新的主账号 字段中，输入您的用户标识符。这通常是 Google 账号的电子邮件地址。
5. 点击选择角色，然后搜索相应角色。
6. 如需授予其他角色，请点击 添加其他角色 ，然后添加其他各个角色。
7. 点击 Save （保存）。

gcloud

安装 Google Cloud CLI。

如果您使用的是外部身份提供方 (IdP)，则必须先使用联合身份登录 gcloud CLI。

如需初始化 gcloud CLI，请运行以下命令：

gcloud init

创建或选择 Google Cloud 项目。

选择或创建项目所需的角色

选择项目：选择项目不需要特定的 IAM 角色，您可以选择您已被授予角色的任何项目。
创建项目：如需创建项目，您需要拥有 Project Creator 角色 (roles/resourcemanager.projectCreator)，该角色包含 resourcemanager.projects.create 权限。了解如何授予角色。

创建 Google Cloud 项目：
```
gcloud projects create PROJECT_ID
```
将 PROJECT_ID 替换为您要创建的 Google Cloud 项目的名称。
选择您创建的 Google Cloud 项目：
```
gcloud config set project PROJECT_ID
```
将 PROJECT_ID 替换为您的 Google Cloud 项目名称。

验证是否已为您的 Google Cloud 项目启用结算功能。

启用 BigQuery API：

启用 API 所需的角色

如需启用 API，您需要拥有 serviceusage.services.enable 权限。如果您已创建项目，则可能已通过 Owner 角色 (roles/owner) 拥有此权限。否则，您可以通过 Service Usage Admin 角色 (roles/serviceusage.serviceUsageAdmin) 获得此权限。了解如何授予角色。

gcloud services enable bigquery.googleapis.com

安装 Google Cloud CLI。

如果您使用的是外部身份提供方 (IdP)，则必须先使用联合身份登录 gcloud CLI。

如需初始化 gcloud CLI，请运行以下命令：

gcloud init

创建或选择 Google Cloud 项目。

选择或创建项目所需的角色

选择项目：选择项目不需要特定的 IAM 角色，您可以选择您已被授予角色的任何项目。
创建项目：如需创建项目，您需要拥有 Project Creator 角色 (roles/resourcemanager.projectCreator)，该角色包含 resourcemanager.projects.create 权限。了解如何授予角色。

创建 Google Cloud 项目：
```
gcloud projects create PROJECT_ID
```
将 PROJECT_ID 替换为您要创建的 Google Cloud 项目的名称。
选择您创建的 Google Cloud 项目：
```
gcloud config set project PROJECT_ID
```
将 PROJECT_ID 替换为您的 Google Cloud 项目名称。

验证是否已为您的 Google Cloud 项目启用结算功能。

启用 BigQuery API：

启用 API 所需的角色

gcloud services enable bigquery.googleapis.com

将角色授予您的用户账号。对以下每个 IAM 角色运行以下命令一次： roles/bigquery.dataEditor, roles/resourcemanager.projectIamAdmin
```
gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
```
替换以下内容：
- PROJECT_ID：您的项目 ID。
- USER_IDENTIFIER：您的用户账号的标识符。例如，myemail@example.com。
- ROLE：您授予用户账号的 IAM 角色。

创建表格

如需存储在以下示例中创建的表和图，请创建一个数据集。以下查询会创建一个名为 graph_search 的数据集：

CREATE SCHEMA IF NOT EXISTS graph_search;

以下表包含人员和账号的相关信息，以及这些实体之间的关系：

Person：人员相关信息。
Account：银行账号相关信息。
PersonOwnAccount：有关谁拥有哪些账号的信息。
AccountTransferAccount：账号间转账相关信息。

如需创建这些表，请运行以下 CREATE TABLE 语句：

CREATE OR REPLACE TABLE graph_search.Person (
  id               INT64,
  name             STRING,
  PRIMARY KEY (id) NOT ENFORCED
);

CREATE OR REPLACE TABLE graph_search.Account (
  id                    INT64,
  create_time           TIMESTAMP,
  is_blocked            BOOL,
  description           STRING,
  description_embedding STRUCT<result ARRAY<FLOAT64>, status STRING>
                          GENERATED ALWAYS AS (
                          AI.EMBED(description, model => 'embeddinggemma-300m')
                          ) STORED OPTIONS( asynchronous = TRUE ),
  PRIMARY KEY (id) NOT ENFORCED
);

CREATE OR REPLACE TABLE graph_search.PersonOwnAccount (
  id               INT64 NOT NULL,
  account_id       INT64 NOT NULL,
  create_time      TIMESTAMP,
  PRIMARY KEY (id, account_id) NOT ENFORCED,
  FOREIGN KEY (id) REFERENCES graph_search.Person(id) NOT ENFORCED,
  FOREIGN KEY (account_id) REFERENCES graph_search.Account(id) NOT ENFORCED
);

CREATE OR REPLACE TABLE graph_search.AccountTransferAccount (
  id               INT64 NOT NULL,
  to_id            INT64 NOT NULL,
  amount           FLOAT64,
  create_time      TIMESTAMP NOT NULL,
  order_number     STRING,
  notes            STRING,
  notes_embedding  STRUCT<result ARRAY<FLOAT64>, status STRING>
                     GENERATED ALWAYS AS (
                     AI.EMBED(notes, model => 'embeddinggemma-300m')
                     ) STORED OPTIONS( asynchronous = TRUE ),
  PRIMARY KEY (id, to_id, create_time) NOT ENFORCED,
  FOREIGN KEY (id) REFERENCES graph_search.Account(id) NOT ENFORCED,
  FOREIGN KEY (to_id) REFERENCES graph_search.Account(id) NOT ENFORCED
);

Account 和 AccountTransferAccount 表使用自主嵌入生成来维护其 description 和 notes 列的嵌入。

在本教程中，我们使用 embeddinggemma-300m 模型，因为它在 BigQuery 中运行，并且非常适合短字符串。对于超过 128 个令牌的较长字符串，您应选择其他嵌入模型，例如 text-embedding-005。如需了解详情，请参阅选择嵌入模型。

插入数据

以下查询会将一些示例数据插入到您的表中。`INSERT` 语句会省略嵌入列，BigQuery 会自动填充这些列。INSERT

INSERT INTO graph_search.Account
  (id, create_time, is_blocked, description)
VALUES
  (7,"2020-01-10 06:22:20.222",false,"Fund for a refreshing tropical vacation"),
  (16,"2020-01-27 17:55:09.206",true,"Fund for a rainy day!"),
  (20,"2020-02-18 05:44:20.655",false,"Saving up for travel");

INSERT INTO graph_search.Person
  (id, name)
VALUES
  (1,"Alex"),
  (2,"Dana"),
  (3,"Lee");

INSERT INTO graph_search.AccountTransferAccount
  (id, to_id, amount, create_time, order_number, notes)
VALUES
  (7,16,300,"2020-08-29 15:28:58.647","304330008004315", "wedding present"),
  (7,16,100,"2020-10-04 16:55:05.342","304120005529714", "birthday gift"),
  (16,20,300,"2020-09-25 02:36:14.926","103650009791820", "for shared cost of dinner"),
  (20,7,500,"2020-10-04 16:55:05.342","304120005529714", "fees for tuition"),
  (20,16,200,"2020-10-17 03:59:40.247","302290001484851", "loved the lunch");

INSERT INTO graph_search.PersonOwnAccount
  (id, account_id, create_time)
VALUES
  (1,7,"2020-01-10 06:22:20.222"),
  (2,20,"2020-01-27 17:55:09.206"),
  (3,16,"2020-02-18 05:44:20.655");

创建图

以下查询使用 CREATE PROPERTY GRAPH 语句在 graph_search 数据集中创建一个名为 FinGraph 的图。 Account 和 Person 表是节点表。AccountTransferAccount 和 PersonOwnAccount 表是边表，表示节点表之间的关系。

CREATE OR REPLACE PROPERTY GRAPH graph_search.FinGraph
NODE TABLES (graph_search.Account, graph_search.Person)
EDGE TABLES (
  graph_search.PersonOwnAccount
    SOURCE KEY (id) REFERENCES Person (id)
    DESTINATION KEY (account_id) REFERENCES Account (id)
    LABEL Owns,
  graph_search.AccountTransferAccount
    SOURCE KEY (id) REFERENCES Account (id)
    DESTINATION KEY (to_id) REFERENCES Account (id)
    LABEL Transfers
);

搜索节点

以下查询显示了谁拥有休闲旅游和度假账号。第一个查询使用 DECLARE语句创建一个名为 similar_account的变量。该变量在 DEFAULT 子句中使用对 AI.SEARCH 的调用进行初始化，该调用会查找说明与 accounts for leisure travel and vacation语义相似度最高的账号。该查询在对 AI.SEARCH 的调用中将 top_k 实参设置为 2，以限制结果数量。第二个查询是一个图查询，它会返回账号所有者的姓名以及账号说明。

DECLARE similar_account DEFAULT ((
SELECT ARRAY_AGG(base.id)
FROM
  AI.SEARCH(
    (SELECT * FROM graph_search.Account WHERE description_embedding IS NOT NULL),
    'description',
    'accounts for leisure travel and vacation',
    top_k => 2)
));

GRAPH graph_search.FinGraph
MATCH (p:Person)-[:Owns]->(a:Account)
WHERE a.id IN UNNEST(similar_account)
RETURN p.name, a.description;

结果类似于以下内容：

+------+-----------------------------------------+
| name | description                             |
+------+-----------------------------------------+
| Dana | Saving up for travel                    |
| Alex | Fund for a refreshing tropical vacation |
+------+-----------------------------------------+

搜索边

以下查询显示了谁进行了与食品付款相关的账号转账。第一个查询使用 AI.SEARCH 函数填充名为 food_transfers 的变量。此变量保存了关联备注与 food 语义相似度最高的转账的订单号。该查询在对 AI.SEARCH 的调用中将 top_k 实参设置为 2，以限制结果数量。第二个查询是一个图查询，它会返回账号所有者的姓名以及转账备注。

DECLARE food_transfers DEFAULT ((
SELECT ARRAY_AGG(base.order_number)
FROM
  AI.SEARCH(
    (SELECT * FROM graph_search.AccountTransferAccount WHERE notes_embedding IS NOT NULL),
    'notes',
    'food',
    top_k => 2)
));

GRAPH graph_search.FinGraph
MATCH (p:Person)-[:Owns]->(:Account)-[t:Transfers]->(:Account)
WHERE t.order_number IN UNNEST(food_transfers)
RETURN p.name, t.notes;

结果类似于以下内容：

+------+---------------------------+
| name | notes                     |
+------+---------------------------+
| Dana | loved the lunch           |
| Lee  | for shared cost of dinner |
+------+---------------------------+

创建向量索引

向量索引可缩短搜索延迟时间并降低计算成本。本教程中的表太小，无法使用向量索引。当表很大（通常包含数百万行）时，向量索引非常有用。 BigQuery 提供两种类型的索引：IVF 和 TreeAH。如需详细了解如何创建索引和选择类型，请参阅管理向量索引。

清理

为避免因本教程中使用的资源导致您的 Google Cloud 账号产生费用，请删除包含这些资源的项目，或者保留项目但删除各个资源。

删除项目

删除项目： Google Cloud

gcloud projects delete PROJECT_ID

后续步骤

详细了解 BigQuery Graph。
了解如何创建和查询图。
详细了解如何创建向量索引以及执行语义搜索和 RAG。