编写的上下文是数据智能体所有者可以提供的指导,用于引导数据智能体的行为并优化 API 的回答。有效的编写上下文可为 Conversational Analytics API 数据智能体提供有用的上下文,以便回答有关数据源的问题。
本页介绍了如何为 BigQuery 数据源提供作者提供的背景信息。对于 BigQuery 数据源,您可以通过结构化上下文和系统指令的组合来提供编写的上下文。尽可能通过结构化上下文字段提供上下文。然后,您可以使用 system_instruction
参数来提供结构化字段未涵盖的补充指导。
虽然结构化上下文字段和系统指令都是可选的,但提供可靠的上下文可让代理给出更准确且更相关的回答。
定义构成创作上下文的结构化字段和系统指令后,您可以通过以下任一调用将该上下文提供给 API:
- 创建持久性数据智能体:在请求正文的
published_context
对象中添加编写的上下文,以配置可在多轮对话中保持的智能体行为。如需了解详情,请参阅创建数据智能体 (HTTP) 或为有状态或无状态聊天设置上下文 (Python SDK)。 - 发送无状态请求:在聊天请求的
inline_context
对象中提供编写的上下文,以定义该智能体在相应 API 调用期间的行为。如需了解详情,请参阅创建无状态多轮对话 (HTTP) 或发送包含内嵌上下文的无状态聊天请求 (Python SDK)。
定义结构化上下文字段
本部分介绍了如何使用结构化上下文字段向数据代理提供上下文。您可以向代理提供以下信息作为结构化上下文:
表级结构化上下文
使用 tableReferences
键可为智能体提供有关可用于回答问题的特定表的详细信息。对于每个表引用,您可以使用以下结构化上下文字段来定义表的架构:
description
:表格内容和用途的摘要synonyms
:可用于指代表的备选术语列表tags
:与表格关联的关键字或标记的列表
以下示例展示了如何在直接 HTTP 请求中以及使用 Python SDK 时以结构化上下文的形式提供这些属性。
HTTP
在直接 HTTP 请求中,您可以在相关表格引用的 schema
对象中提供这些表格级属性。如需查看如何构建完整请求载荷的完整示例,请参阅连接到 BigQuery 数据。
"tableReferences": [
{
"projectId": "bigquery-public-data",
"datasetId": "thelook_ecommerce",
"tableId": "orders",
"schema": {
"description": "Data for orders in The Look, a fictitious ecommerce store.",
"synonyms": ["sales"],
"tags": ["sale", "order", "sales_order"]
}
},
{
"projectId": "bigquery-public-data",
"datasetId": "thelook_ecommerce",
"tableId": "users",
"schema": {
"description": "Data for users in The Look, a fictitious ecommerce store.",
"synonyms": ["customers"],
"tags": ["user", "customer", "buyer"]
}
}
]
Python SDK
使用 Python SDK 时,您可以在 BigQueryTableReference
对象的 schema
属性中定义这些表级属性。以下示例展示了如何创建为 orders
和 users
表提供上下文的表引用对象。如需查看有关如何构建和使用表引用对象的完整示例,请参阅连接到 BigQuery 数据。
# Define context for the 'orders' table
bigquery_table_reference_1 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_1.project_id = "bigquery-public-data"
bigquery_table_reference_1.dataset_id = "thelook_ecommerce"
bigquery_table_reference_1.table_id = "orders"
bigquery_table_reference_1.schema = geminidataanalytics.Schema()
bigquery_table_reference_1.schema.description = "Data for orders in The Look, a fictitious ecommerce store."
bigquery_table_reference_1.schema.synonyms = ["sales"]
bigquery_table_reference_1.schema.tags = ["sale", "order", "sales_order"]
# Define context for the 'users' table
bigquery_table_reference_2 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_2.project_id = "bigquery-public-data"
bigquery_table_reference_2.dataset_id = "thelook_ecommerce"
bigquery_table_reference_2.table_id = "users"
bigquery_table_reference_2.schema = geminidataanalytics.Schema()
bigquery_table_reference_2.schema.description = "Data for users in The Look, a fictitious ecommerce store."
bigquery_table_reference_2.schema.synonyms = ["customers"]
bigquery_table_reference_2.schema.tags = ["user", "customer", "buyer"]
列级结构化上下文
嵌套在表格引用的 schema
对象中的 fields
键接受 field
对象列表作为输入来描述各列。并非所有字段都需要提供额外的上下文信息;不过,对于常用字段,添加额外的详细信息有助于提升智能体的性能。
对于每个 field
对象,您可以使用以下结构化上下文字段来定义列的基本属性:
description
:对列的内容和用途的简要说明synonyms
:可用于指代相应列的替代字词列表tags
:与列关联的关键字或标记的列表
以下示例展示了如何通过直接 HTTP 请求和 Python SDK,将这些属性作为结构化上下文提供给 orders
表中的 status
字段和 users
表中的 first_name
字段。
HTTP
在直接 HTTP 请求中,您可以通过在表引用的 schema
对象中提供 fields
对象列表来定义这些列级属性。
"tableReferences": [
{
"projectId": "bigquery-public-data",
"datasetId": "thelook_ecommerce",
"tableId": "orders",
"schema": {
"fields": [{
"name": "status",
"description": "The current status of the order.",
}]
}
},
{
"projectId": "bigquery-public-data",
"datasetId": "thelook_ecommerce",
"tableId": "users",
"schema": {
"fields": [{
"name": "first_name",
"description": "The first name of the user.",
"tags": "person",
}]
}
}
]
Python SDK
使用 Python SDK 时,您可以通过将 Field
对象的列表分配给表的 schema
属性的 fields
属性来定义这些列级属性。
# Define column context for the 'orders' table
bigquery_table_reference_1.schema.fields = [
geminidataanalytics.Field(
name="status",
description="The current status of the order.",
)
]
# Define column context for the 'users' table
bigquery_table_reference_2.schema.fields = [
geminidataanalytics.Field(
name="first_name",
description="The first name of the user.",
tags=["person"],
)
]
示例查询
example_queries
键接受 example_query
对象列表,用于定义自然语言查询,帮助智能体针对常见问题或重要问题提供更准确、更相关的回答。通过为智能体提供自然语言问题及其对应的 SQL 查询,您可以引导智能体提供更高质量且更一致的结果。
对于每个 example_query
对象,您可以提供以下字段来定义自然语言问题及其对应的 SQL 查询:
natural_language_question
:用户可能会提出的自然语言问题sql_query
:与自然语言问题对应的 SQL 查询
以下示例展示了如何通过直接 HTTP 请求和 Python SDK 为 orders
表提供示例查询。
HTTP
在直接 HTTP 请求中,在 example_queries
字段中提供 example_query
对象列表。每个对象都必须包含 naturalLanguageQuestion
键和相应的 sqlQuery
键。
"example_queries": [
{
"naturalLanguageQuestion": "How many orders are there?",
"sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`"
},
{
"naturalLanguageQuestion": "How many orders were shipped?",
"sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'"
}
]
Python SDK
使用 Python SDK 时,您可以提供 ExampleQuery
对象的列表。为每个对象提供 natural_language_question
和 sql_query
参数的值。
example_queries = [
geminidataanalytics.ExampleQuery(
natural_language_question="How many orders are there?",
sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`",
),
geminidataanalytics.ExampleQuery(
natural_language_question="How many orders were shipped?",
sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'",
)
]
在系统指令中定义其他上下文
您可以使用 system_instruction
参数为结构化上下文字段不支持的上下文提供补充指导。通过提供这些额外的指导,有助于智能体更好地理解您的数据及应用场景的上下文信息。
系统指令包含一系列关键组件和对象,可为数据智能体提供有关数据源的详细信息,以及有关智能体在回答问题时的角色的指导。您可以使用 system_instruction
参数以 YAML 格式的字符串形式向数据代理提供系统指令。
以下模板展示了一种建议的 YAML 结构来为 BigQuery 数据源的 system_instruction
参数提供字符串,包括可用的键和预期的数据类型。虽然此模板提供了一个建议的结构,其中包含用于定义系统指令的重要组件,但它并未涵盖所有可能的系统指令格式。
- system_instruction: str # A description of the expected behavior of the agent. For example: You are a sales agent.
- tables: # A list of tables to describe for the agent.
- table: # Details about a single table that is relevant for the agent.
- name: str # The name of the table.
- fields: # Details about columns (fields) within the table.
- field: # Details about a single column within the current table.
- name: str # The name of the column.
- aggregations: list[str] # Commonly used or default aggregations for the column.
- relationships: # A list of join relationships between tables.
- relationship: # Details about a single join relationship.
- name: str # The name of this join relationship.
- description: str # A description of the relationship.
- relationship_type: str # The join relationship type: one-to-one, one-to-many, many-to-one, or many-to-many.
- join_type: str # The join type: inner, outer, left, right, or full.
- left_table: str # The name of the left table in the join.
- right_table: str # The name of the right table in the join.
- relationship_columns: # A list of columns that are used for the join.
- left_column: str # The join column from the left table.
- right_column: str # The join column from the right table.
- glossaries: # A list of definitions for glossary business terms, jargon, and abbreviations.
- glossary: # The definition for a single glossary item.
- term: str # The term, phrase, or abbreviation to define.
- description: str # A description or definition of the term.
- synonyms: list[str] # Alternative terms for the glossary entry.
- additional_descriptions: # A list of any other general instructions or content.
- text: str # Any additional general instructions or context not covered elsewhere.
以下部分包含系统指令的关键组成部分示例:
system_instruction
使用 system_instruction
键定义智能体的角色及角色设定。此初始指令可为 API 的回答设定基调和风格,并帮助智能体理解其核心目标。
例如,您可以将智能体定义成一个虚构网店的销售分析师,如下所示:
- system_instruction: >-
You are an expert sales analyst for a fictitious ecommerce store. You will answer questions about sales, orders, and customer data. Your responses should be concise and data-driven.
tables
在将表的基本属性(例如说明和同义词)定义为结构化上下文的同时,您还可以使用系统指令中的 tables
键来提供补充业务逻辑。对于 BigQuery 数据源,这包括使用 fields
键为特定列定义默认 aggregations
。
以下 YAML 代码块示例展示了如何在系统指令中使用 tables
键来嵌套为表 bigquery-public-data.thelook_ecommerce.orders
提供补充指南的字段:
- tables:
- table:
- name: bigquery-public-data.thelook_ecommerce.orders
- fields:
- field:
- name: num_of_items
- aggregations: 'sum, avg'
relationships
系统指令中的 relationships
键包含表之间联接关系的列表。通过定义联接关系,有助于智能体理解在回答问题时应如何联接来自多个表的数据。
例如,您可以按如下方式定义 bigquery-public-data.thelook_ecommerce.orders
表与 bigquery-public-data.thelook_ecommerce.users
表之间的 orders_to_user
关系:
- relationships:
- relationship:
- name: orders_to_user
- description: >-
Connects customer order data to user information with the user_id and id fields to allow an aggregated view of sales by customer demographics.
- relationship_type: many-to-one
- join_type: left
- left_table: bigquery-public-data.thelook_ecommerce.orders
- right_table: bigquery-public-data.thelook_ecommerce.users
- relationship_columns:
- left_column: user_id
- right_column: id
glossaries
系统指令中的 glossaries
键列出了与您的数据及应用场景相关的业务术语、行话和缩写的定义。通过提供词汇表定义,有助于智能体准确解读并回答使用特定业务语言的问题。
例如,您可以根据特定业务情境定义常见业务状态和“OMPF”等字词,如下所示:
- glossaries:
- glossary:
- term: complete
- description: Represents an order status where the order has been completed.
- synonyms: 'finish, done, fulfilled'
- glossary:
- term: shipped
- description: Represents an order status where the order has been shipped to the customer.
- glossary:
- term: returned
- description: Represents an order status where the customer has returned the order.
- glossary:
- term: OMPF
- description: Order Management and Product Fulfillment
additional_descriptions
使用 additional_descriptions
键可提供不适合放入其他结构化上下文或系统指令字段中的任何一般性说明或上下文信息。通过在系统指令中提供更多说明,有助于智能体更好地理解您的数据及应用场景的上下文信息。
例如,您可以使用 additional_descriptions
键提供有关组织的信息,如下所示:
- additional_descriptions:
- text: All the sales data pertains to The Look, a fictitious ecommerce store.
- text: 'Orders can be of three categories: food, clothes, and electronics.'
示例:销售代理的已创作上下文
以下示例展示了如何通过结合使用结构化上下文和系统指令,为虚构的销售分析师智能体提供编写的上下文。
示例:结构化上下文
您可以提供包含表、列和示例查询相关详细信息的结构化上下文来引导智能体,如以下 HTTP 和 Python SDK 示例所示。
HTTP
以下示例展示了如何在 HTTP 请求中定义结构化上下文:
{
"bq": {
"tableReferences": [
{
"projectId": "bigquery-public-data",
"datasetId": "thelook_ecommerce",
"tableId": "orders",
"schema": {
"description": "Data for orders in The Look, a fictitious ecommerce store.",
"synonyms": ["sales"],
"tags": ["sale", "order", "sales_order"],
"fields": [
{
"name": "status",
"description": "The current status of the order."
},
{
"name": "num_of_items",
"description": "The number of items in the order."
}
]
}
},
{
"projectId": "bigquery-public-data",
"datasetId": "thelook_ecommerce",
"tableId": "users",
"schema": {
"description": "Data for users in The Look, a fictitious ecommerce store.",
"synonyms": ["customers"],
"tags": ["user", "customer", "buyer"],
"fields": [
{
"name": "first_name",
"description": "The first name of the user.",
"tags": ["person"]
},
{
"name": "last_name",
"description": "The last name of the user.",
"tags": ["person"]
},
{
"name": "age_group",
"description": "The age demographic group of the user."
},
{
"name": "email",
"description": "The email address of the user.",
"tags": ["contact"]
}
]
}
}
]
},
"example_queries": [
{
"naturalLanguageQuestion": "How many orders are there?",
"sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`"
},
{
"naturalLanguageQuestion": "How many orders were shipped?",
"sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'"
},
{
"naturalLanguageQuestion": "How many unique customers are there?",
"sqlQuery": "SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users`"
},
{
"naturalLanguageQuestion": "How many users in the 25-34 age group have a cymbalgroup email address?",
"sqlQuery": "SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users` WHERE users.age_group = '25-34' AND users.email LIKE '%@cymbalgroup.com'"
}
]
}
Python SDK
以下示例展示了如何使用 Python SDK 定义结构化上下文:
# Define context for the 'orders' table
bigquery_table_reference_1 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_1.project_id = "bigquery-public-data"
bigquery_table_reference_1.dataset_id = "thelook_ecommerce"
bigquery_table_reference_1.table_id = "orders"
bigquery_table_reference_1.schema = geminidataanalytics.Schema()
bigquery_table_reference_1.schema.description = "Data for orders in The Look, a fictitious ecommerce store."
bigquery_table_reference_1.schema.synonyms = ["sales"]
bigquery_table_reference_1.schema.tags = ["sale", "order", "sales_order"]
bigquery_table_reference_1.schema.fields = [
geminidataanalytics.Field(
name="status",
description="The current status of the order.",
),
geminidataanalytics.Field(
name="num_of_items",
description="The number of items in the order."
)
]
# Define context for the 'users' table
bigquery_table_reference_2 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_2.project_id = "bigquery-public-data"
bigquery_table_reference_2.dataset_id = "thelook_ecommerce"
bigquery_table_reference_2.table_id = "users"
bigquery_table_reference_2.schema = geminidataanalytics.Schema()
bigquery_table_reference_2.schema.description = "Data for users in The Look, a fictitious ecommerce store."
bigquery_table_reference_2.schema.synonyms = ["customers"]
bigquery_table_reference_2.schema.tags = ["user", "customer", "buyer"]
bigquery_table_reference_2.schema.fields = [
geminidataanalytics.Field(
name="first_name",
description="The first name of the user.",
tags=["person"],
),
geminidataanalytics.Field(
name="last_name",
description="The last name of the user.",
tags=["person"],
),
geminidataanalytics.Field(
name="age_group",
description="The age demographic group of the user.",
),
geminidataanalytics.Field(
name="email",
description="The email address of the user.",
tags=["contact"],
)
]
# Define example queries
example_queries = [
geminidataanalytics.ExampleQuery(
natural_language_question="How many orders are there?",
sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`",
),
geminidataanalytics.ExampleQuery(
natural_language_question="How many orders were shipped?",
sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'",
),
geminidataanalytics.ExampleQuery(
natural_language_question="How many unique customers are there?",
sql_query="SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users`",
),
geminidataanalytics.ExampleQuery(
natural_language_question="How many users in the 25-34 age group have a cymbalgroup email address?",
sql_query="SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users` WHERE users.age_group = '25-34' AND users.email LIKE '%@cymbalgroup.com'",
)
]
示例:系统指令
以下系统指令通过定义代理的角色并提供结构化字段不支持的指导信息(例如关系定义、术语表、其他说明和补充 orders
表详细信息)来补充结构化上下文。在此示例中,由于 users
表已通过结构化上下文完全定义,因此无需在系统指令中重新定义。
- system_instruction: >-
You are an expert sales analyst for a fictitious ecommerce store. You will answer questions about sales, orders, and customer data. Your responses should be concise and data-driven.
- tables:
- table:
- name: bigquery-public-data.thelook_ecommerce.orders
- fields:
- field:
- name: num_of_items
- aggregations: 'sum, avg'
- relationships:
- relationship:
- name: orders_to_user
- description: >-
Connects customer order data to user information with the user_id and id fields.
- relationship_type: many-to-one
- join_type: left
- left_table: bigquery-public-data.thelook_ecommerce.orders
- right_table: bigquery-public-data.thelook_ecommerce.users
- relationship_columns:
- left_column: user_id
- right_column: id
- glossaries:
- glossary:
- term: complete
- description: Represents an order status where the order has been completed.
- synonyms: 'finish, done, fulfilled'
- glossary:
- term: OMPF
- description: Order Management and Product Fulfillment
- additional_descriptions:
- text: All the sales data pertains to The Look, a fictitious ecommerce store.