为 BigQuery 数据源定义数据代理上下文

编写的上下文是数据智能体所有者可以提供的指导，用于引导数据智能体的行为并优化 API 的回答。有效的编写上下文可为 Conversational Analytics API 数据智能体提供有用的上下文，以便回答有关数据源的问题。

本页介绍了如何为 BigQuery 数据源提供作者提供的背景信息。对于 BigQuery 数据源，您可以通过结构化上下文和系统指令的组合来提供编写的上下文。尽可能通过结构化上下文字段提供上下文。然后，您可以使用 system_instruction 参数来提供结构化字段未涵盖的补充指导。

虽然结构化上下文字段和系统指令都是可选的，但提供可靠的上下文可让代理给出更准确且更相关的回答。

定义构成创作上下文的结构化字段和系统指令后，您可以通过以下任一调用将该上下文提供给 API：

创建持久化数据代理：在请求正文的 published_context 对象中添加编写的上下文，以配置可在多轮对话中保持的代理行为。如需了解详情，请参阅创建数据代理 (HTTP) 或为有状态或无状态聊天设置上下文 (Python SDK)。
发送无状态请求：在聊天请求的 inline_context 对象中提供编写的上下文，以定义该智能体在相应 API 调用期间的行为。如需了解详情，请参阅创建无状态多轮对话 (HTTP) 或发送包含内嵌上下文的无状态聊天请求 (Python SDK)。

定义结构化上下文字段

本部分介绍了如何使用结构化上下文字段向数据代理提供上下文。您可以向代理提供以下信息作为结构化上下文：

表级结构化上下文，包括表的说明、同义词和标记
列级结构化上下文，包括表格列的说明、同义词、标记和示例值
示例查询，即提供自然语言问题及其对应的 SQL 查询，以引导智能体

表级结构化上下文

使用 tableReferences 键可为智能体提供有关可用于回答问题的特定表的详细信息。对于每个表引用，您可以使用以下结构化上下文字段来定义表的架构：

description：表格内容和用途的摘要
synonyms：可用于指代表的备选术语列表
tags：与表格关联的关键字或标记的列表

以下示例展示了如何在直接 HTTP 请求中以及使用 Python SDK 时以结构化上下文的形式提供这些属性。

HTTP

在直接 HTTP 请求中，您可以在相关表格引用的 schema 对象中提供这些表格级属性。如需查看如何构建完整请求载荷的完整示例，请参阅连接到 BigQuery 数据。

"tableReferences": [
  {
    "projectId": "bigquery-public-data",
    "datasetId": "thelook_ecommerce",
    "tableId": "orders",
    "schema": {
        "description": "Data for orders in The Look, a fictitious ecommerce store.",
        "synonyms": ["sales"],
        "tags": ["sale", "order", "sales_order"]
    }
  },
  {
    "projectId": "bigquery-public-data",
    "datasetId": "thelook_ecommerce",
    "tableId": "users",
    "schema": {
        "description": "Data for users in The Look, a fictitious ecommerce store.",
        "synonyms": ["customers"],
        "tags": ["user", "customer", "buyer"]
    }
  }
]

Python SDK

使用 Python SDK 时，您可以在 BigQueryTableReference 对象的 schema 属性中定义这些表级属性。以下示例展示了如何创建为 orders 和 users 表提供上下文的表引用对象。如需查看有关如何构建和使用表引用对象的完整示例，请参阅连接到 BigQuery 数据。

# Define context for the 'orders' table
bigquery_table_reference_1 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_1.project_id = "bigquery-public-data"
bigquery_table_reference_1.dataset_id = "thelook_ecommerce"
bigquery_table_reference_1.table_id = "orders"

bigquery_table_reference_1.schema = geminidataanalytics.Schema()
bigquery_table_reference_1.schema.description = "Data for orders in The Look, a fictitious ecommerce store."
bigquery_table_reference_1.schema.synonyms = ["sales"]
bigquery_table_reference_1.schema.tags = ["sale", "order", "sales_order"]

# Define context for the 'users' table
bigquery_table_reference_2 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_2.project_id = "bigquery-public-data"
bigquery_table_reference_2.dataset_id = "thelook_ecommerce"
bigquery_table_reference_2.table_id = "users"

bigquery_table_reference_2.schema = geminidataanalytics.Schema()
bigquery_table_reference_2.schema.description = "Data for users in The Look, a fictitious ecommerce store."
bigquery_table_reference_2.schema.synonyms = ["customers"]
bigquery_table_reference_2.schema.tags = ["user", "customer", "buyer"]

列级结构化上下文

嵌套在表格引用的 schema 对象中的 fields 键接受 field 对象列表作为输入来描述各列。并非所有字段都需要提供额外的上下文信息；不过，对于常用字段，添加额外的详细信息有助于提升智能体的性能。

对于每个 field 对象，您可以使用以下结构化上下文字段来定义列的基本属性：

description：对列的内容和用途的简要说明
synonyms：可用于指代相应列的替代字词列表
tags：与列关联的关键字或标记的列表

以下示例展示了如何通过直接 HTTP 请求和 Python SDK，将这些属性作为结构化上下文提供给 orders 表中的 status 字段和 users 表中的 first_name 字段。

HTTP

在直接 HTTP 请求中，您可以通过在表引用的 schema 对象中提供 fields 对象列表来定义这些列级属性。

"tableReferences": [
  {
    "projectId": "bigquery-public-data",
    "datasetId": "thelook_ecommerce",
    "tableId": "orders",
    "schema": {
      "fields": [{
          "name": "status",
          "description": "The current status of the order.",
      }]
    }
  },
  {
    "projectId": "bigquery-public-data",
    "datasetId": "thelook_ecommerce",
    "tableId": "users",
    "schema": {
      "fields": [{
          "name": "first_name",
          "description": "The first name of the user.",
          "tags": "person",
      }]
    }
  }
]

Python SDK

使用 Python SDK 时，您可以通过将 Field 对象的列表分配给表的 schema 属性的 fields 属性来定义这些列级属性。

# Define column context for the 'orders' table
bigquery_table_reference_1.schema.fields = [
    geminidataanalytics.Field(
        name="status",
        description="The current status of the order.",
    )
]

# Define column context for the 'users' table
bigquery_table_reference_2.schema.fields = [
    geminidataanalytics.Field(
        name="first_name",
        description="The first name of the user.",
        tags=["person"],
    )
]

示例查询

example_queries 键接受 example_query 对象列表，用于定义自然语言查询，帮助智能体针对常见问题或重要问题提供更准确、更相关的回答。通过为智能体提供自然语言问题及其对应的 SQL 查询，您可以引导智能体提供更高质量且更一致的结果。

对于每个 example_query 对象，您可以提供以下字段来定义自然语言问题及其对应的 SQL 查询：

natural_language_question：用户可能会提出的自然语言问题
sql_query：与自然语言问题对应的 SQL 查询

以下示例展示了如何通过直接 HTTP 请求和 Python SDK 为 orders 表提供示例查询。

HTTP

在直接 HTTP 请求中，在 example_queries 字段中提供 example_query 对象列表。每个对象都必须包含 naturalLanguageQuestion 键和相应的 sqlQuery 键。

"example_queries": [
  {
    "naturalLanguageQuestion": "How many orders are there?",
    "sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`"
  },
  {
    "naturalLanguageQuestion": "How many orders were shipped?",
    "sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'"
  }
]

Python SDK

使用 Python SDK 时，您可以提供 ExampleQuery 对象的列表。为每个对象提供 natural_language_question 和 sql_query 参数的值。

example_queries = [
    geminidataanalytics.ExampleQuery(
        natural_language_question="How many orders are there?",
        sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`",
    ),
    geminidataanalytics.ExampleQuery(
        natural_language_question="How many orders were shipped?",
        sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'",
    )
]

在系统指令中定义其他上下文

您可以使用 system_instruction 参数为结构化上下文字段不支持的上下文提供补充指导。通过提供此额外指导，有助于代理更好地理解您的数据及应用场景的上下文信息。

系统指令包含一系列关键组件和对象，可为数据智能体提供有关数据源的详细信息，以及有关智能体在回答问题时的角色的指导。您可以使用 system_instruction 参数以 YAML 格式的字符串形式向数据代理提供系统指令。

以下模板展示了一种建议的 YAML 结构来为 BigQuery 数据源的 system_instruction 参数提供字符串，包括可用的键和预期的数据类型。虽然此模板提供了一个建议的结构，其中包含用于定义系统指令的重要组件，但它并未涵盖所有可能的系统指令格式。

- system_instruction: str # A description of the expected behavior of the agent. For example: You are a sales agent.
- tables: # A list of tables to describe for the agent.
  - table: # Details about a single table that is relevant for the agent.
    - name: str # The name of the table.
    - fields: # Details about columns (fields) within the table.
      - field: # Details about a single column within the current table.
        - name: str # The name of the column.
        - aggregations: list[str] # Commonly used or default aggregations for the column.
  - relationships: # A list of join relationships between tables.
    - relationship: # Details about a single join relationship.
      - name: str # The name of this join relationship.
      - description: str # A description of the relationship.
      - relationship_type: str # The join relationship type: one-to-one, one-to-many, many-to-one, or many-to-many.
      - join_type: str # The join type: inner, outer, left, right, or full.
      - left_table: str # The name of the left table in the join.
      - right_table: str # The name of the right table in the join.
      - relationship_columns: # A list of columns that are used for the join.
        - left_column: str # The join column from the left table.
        - right_column: str # The join column from the right table.
- glossaries: # A list of definitions for glossary business terms, jargon, and abbreviations.
  - glossary: # The definition for a single glossary item.
    - term: str # The term, phrase, or abbreviation to define.
    - description: str # A description or definition of the term.
    - synonyms: list[str] # Alternative terms for the glossary entry.
- additional_descriptions: # A list of any other general instructions or content.
  - text: str # Any additional general instructions or context not covered elsewhere.

以下部分包含系统指令的关键组成部分示例：

system_instruction
tables
relationships
glossaries
additional_descriptions

`system_instruction`

使用 system_instruction 键定义智能体的角色及角色设定。此初始指令可为 API 的回答设定基调和风格，并帮助智能体理解其核心目标。

例如，您可以将智能体定义成一个虚构网店的销售分析师，如下所示：

- system_instruction: >-
    You are an expert sales analyst for a fictitious ecommerce store. You will answer questions about sales, orders, and customer data. Your responses should be concise and data-driven.

`tables`

虽然您可以将表的基本属性（例如说明和同义词）定义为结构化上下文，但也可以使用系统指令中的 tables 键来提供补充业务逻辑。对于 BigQuery 数据源，这包括使用 fields 键为特定列定义默认 aggregations。

以下 YAML 代码块示例展示了如何在系统指令中使用 tables 键来嵌套为表 bigquery-public-data.thelook_ecommerce.orders 提供补充指南的字段：

- tables:
  - table:
    - name: bigquery-public-data.thelook_ecommerce.orders
    - fields:
      - field:
        - name: num_of_items
        - aggregations: 'sum, avg'

`relationships`

系统指令中的 relationships 键包含表之间联接关系的列表。通过定义联接关系，有助于智能体理解在回答问题时应如何联接来自多个表的数据。

例如，您可以按如下方式定义 bigquery-public-data.thelook_ecommerce.orders 表与 bigquery-public-data.thelook_ecommerce.users 表之间的 orders_to_user 关系：

- relationships:
  - relationship:
    - name: orders_to_user
    - description: >-
        Connects customer order data to user information with the user_id and id fields to allow an aggregated view of sales by customer demographics.
    - relationship_type: many-to-one
    - join_type: left
    - left_table: bigquery-public-data.thelook_ecommerce.orders
    - right_table: bigquery-public-data.thelook_ecommerce.users
    - relationship_columns:
      - left_column: user_id
      - right_column: id

`glossaries`

系统指令中的 glossaries 键列出了与您的数据及应用场景相关的业务术语、行话和缩写的定义。通过提供词汇表定义，有助于智能体准确解读并回答使用特定业务语言的问题。

例如，您可以根据特定业务情境定义常见业务状态和“OMPF”等字词，如下所示：

- glossaries:
  - glossary:
    - term: complete
    - description: Represents an order status where the order has been completed.
    - synonyms: 'finish, done, fulfilled'
  - glossary:
    - term: shipped
    - description: Represents an order status where the order has been shipped to the customer.
  - glossary:
    - term: returned
    - description: Represents an order status where the customer has returned the order.
  - glossary:
    - term: OMPF
    - description: Order Management and Product Fulfillment

`additional_descriptions`

使用 additional_descriptions 键可提供不适合放入其他结构化上下文或系统指令字段中的任何一般性说明或上下文信息。通过在系统指令中提供更多说明，有助于智能体更好地理解您的数据及应用场景的上下文信息。

例如，您可以使用 additional_descriptions 键提供有关组织的信息，如下所示：

- additional_descriptions:
  - text: All the sales data pertains to The Look, a fictitious ecommerce store.
  - text: 'Orders can be of three categories: food, clothes, and electronics.'

示例：销售代理的已创作上下文

以下示例展示了如何通过结合使用结构化上下文和系统指令，为虚构的销售分析师智能体提供编写的上下文。

示例：结构化上下文

您可以提供包含表、列和示例查询相关详细信息的结构化上下文来引导智能体，如以下 HTTP 和 Python SDK 示例所示。

HTTP

以下示例展示了如何在 HTTP 请求中定义结构化上下文：

{
  "bq": {
    "tableReferences": [
      {
        "projectId": "bigquery-public-data",
        "datasetId": "thelook_ecommerce",
        "tableId": "orders",
        "schema": {
          "description": "Data for orders in The Look, a fictitious ecommerce store.",
          "synonyms": ["sales"],
          "tags": ["sale", "order", "sales_order"],
          "fields": [
            {
              "name": "status",
              "description": "The current status of the order."
            },
            {
              "name": "num_of_items",
              "description": "The number of items in the order."
            }
          ]
        }
      },
      {
        "projectId": "bigquery-public-data",
        "datasetId": "thelook_ecommerce",
        "tableId": "users",
        "schema": {
          "description": "Data for users in The Look, a fictitious ecommerce store.",
          "synonyms": ["customers"],
          "tags": ["user", "customer", "buyer"],
          "fields": [
            {
              "name": "first_name",
              "description": "The first name of the user.",
              "tags": ["person"]
            },
            {
              "name": "last_name",
              "description": "The last name of the user.",
              "tags": ["person"]
            },
            {
              "name": "age_group",
              "description": "The age demographic group of the user."
            },
            {
              "name": "email",
              "description": "The email address of the user.",
              "tags": ["contact"]
            }
          ]
        }
      }
    ]
  },
  "example_queries": [
    {
      "naturalLanguageQuestion": "How many orders are there?",
      "sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`"
    },
    {
      "naturalLanguageQuestion": "How many orders were shipped?",
      "sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'"
    },
    {
      "naturalLanguageQuestion": "How many unique customers are there?",
      "sqlQuery": "SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users`"
    },
    {
      "naturalLanguageQuestion": "How many users in the 25-34 age group have a cymbalgroup email address?",
      "sqlQuery": "SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users` WHERE users.age_group = '25-34' AND users.email LIKE '%@cymbalgroup.com'"
    }
  ]
}

Python SDK

以下示例展示了如何使用 Python SDK 定义结构化上下文：

# Define context for the 'orders' table
bigquery_table_reference_1 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_1.project_id = "bigquery-public-data"
bigquery_table_reference_1.dataset_id = "thelook_ecommerce"
bigquery_table_reference_1.table_id = "orders"

bigquery_table_reference_1.schema = geminidataanalytics.Schema()
bigquery_table_reference_1.schema.description = "Data for orders in The Look, a fictitious ecommerce store."
bigquery_table_reference_1.schema.synonyms = ["sales"]
bigquery_table_reference_1.schema.tags = ["sale", "order", "sales_order"]
bigquery_table_reference_1.schema.fields = [
    geminidataanalytics.Field(
        name="status",
        description="The current status of the order.",
    ),
    geminidataanalytics.Field(
        name="num_of_items",
        description="The number of items in the order."
    )
]

# Define context for the 'users' table
bigquery_table_reference_2 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_2.project_id = "bigquery-public-data"
bigquery_table_reference_2.dataset_id = "thelook_ecommerce"
bigquery_table_reference_2.table_id = "users"

bigquery_table_reference_2.schema = geminidataanalytics.Schema()
bigquery_table_reference_2.schema.description = "Data for users in The Look, a fictitious ecommerce store."
bigquery_table_reference_2.schema.synonyms = ["customers"]
bigquery_table_reference_2.schema.tags = ["user", "customer", "buyer"]
bigquery_table_reference_2.schema.fields = [
    geminidataanalytics.Field(
        name="first_name",
        description="The first name of the user.",
        tags=["person"],
    ),
    geminidataanalytics.Field(
        name="last_name",
        description="The last name of the user.",
        tags=["person"],
    ),
    geminidataanalytics.Field(
        name="age_group",
        description="The age demographic group of the user.",
    ),
    geminidataanalytics.Field(
        name="email",
        description="The email address of the user.",
        tags=["contact"],
    )
]

# Define example queries
example_queries = [
  geminidataanalytics.ExampleQuery(
      natural_language_question="How many orders are there?",
      sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`",
  ),
  geminidataanalytics.ExampleQuery(
      natural_language_question="How many orders were shipped?",
      sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'",
  ),
  geminidataanalytics.ExampleQuery(
      natural_language_question="How many unique customers are there?",
      sql_query="SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users`",
  ),
  geminidataanalytics.ExampleQuery(
      natural_language_question="How many users in the 25-34 age group have a cymbalgroup email address?",
      sql_query="SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users` WHERE users.age_group = '25-34' AND users.email LIKE '%@cymbalgroup.com'",
  )
]

示例：系统指令

以下系统指令通过定义代理的角色并提供结构化字段不支持的指导信息（例如关系定义、术语表、其他说明和补充 orders 表详细信息）来补充结构化上下文。在此示例中，由于 users 表已通过结构化上下文完全定义，因此无需在系统指令中重新定义。

- system_instruction: >-
    You are an expert sales analyst for a fictitious ecommerce store. You will answer questions about sales, orders, and customer data. Your responses should be concise and data-driven.
- tables:
    - table:
        - name: bigquery-public-data.thelook_ecommerce.orders
        - fields:
            - field:
                - name: num_of_items
                - aggregations: 'sum, avg'
- relationships:
    - relationship:
        - name: orders_to_user
        - description: >-
            Connects customer order data to user information with the user_id and id fields.
        - relationship_type: many-to-one
        - join_type: left
        - left_table: bigquery-public-data.thelook_ecommerce.orders
        - right_table: bigquery-public-data.thelook_ecommerce.users
        - relationship_columns:
            - left_column: user_id
            - right_column: id
- glossaries:
    - glossary:
        - term: complete
        - description: Represents an order status where the order has been completed.
        - synonyms: 'finish, done, fulfilled'
    - glossary:
        - term: OMPF
        - description: Order Management and Product Fulfillment
- additional_descriptions:
    - text: All the sales data pertains to The Look, a fictitious ecommerce store.

为 BigQuery 数据源定义数据代理上下文 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

定义结构化上下文字段

表级结构化上下文

HTTP

Python SDK

列级结构化上下文

HTTP

Python SDK

示例查询

HTTP

Python SDK

在系统指令中定义其他上下文

system_instruction

tables

relationships

glossaries

additional_descriptions

示例：销售代理的已创作上下文

示例：结构化上下文

HTTP

Python SDK

示例：系统指令

为 BigQuery 数据源定义数据代理上下文

`system_instruction`

`tables`

`relationships`

`glossaries`

`additional_descriptions`