為 BigQuery 資料來源定義資料代理程式環境

撰寫的背景資訊是資料代理擁有者提供的指引,可塑造資料代理的行為,並改善 API 的回應。撰寫得當的背景資訊可為對話式數據分析 API 資料代理程式提供實用背景資訊,協助回答資料來源相關問題。

本頁面說明如何為 BigQuery 資料來源提供撰寫的內容。如果是 BigQuery 資料來源,您可以透過結構化內容和系統指令的組合,提供撰寫的內容。請盡可能透過結構化內容欄位提供背景資訊。接著,您可以使用 system_instruction 參數,提供結構化欄位未涵蓋的補充指引。

結構化情境欄位和系統指令都是選用項目,但提供豐富的情境資訊,有助於服務專員給出更準確且相關的回覆。

定義構成撰寫脈絡的結構化欄位和系統指令後,您可以在下列其中一個呼叫中向 API 提供該脈絡:

定義結構化情境欄位

本節說明如何使用結構化內容欄位,為資料代理程式提供內容。您可以向代理程式提供下列資訊,做為結構化脈絡:

資料表層級的結構化背景資訊

使用 tableReferences 鍵向代理提供詳細資料,說明可回答問題的特定資料表。針對每個表格參照,您可以使用下列結構化內容欄位定義表格的結構定義:

  • description:資料表內容和用途的摘要
  • synonyms:可用來指稱資料表的替代字詞清單
  • tags:與資料表相關聯的關鍵字或標記清單

下列範例說明如何透過直接 HTTP 要求和 Python SDK,以結構化內容的形式提供這些屬性。

HTTP

在直接 HTTP 要求中,您會在相關表格參照的 schema 物件中提供這些表格層級屬性。如需如何建構完整要求酬載的完整範例,請參閱「連結至 BigQuery 資料」。

"tableReferences": [
  {
    "projectId": "bigquery-public-data",
    "datasetId": "thelook_ecommerce",
    "tableId": "orders",
    "schema": {
        "description": "Data for orders in The Look, a fictitious ecommerce store.",
        "synonyms": ["sales"],
        "tags": ["sale", "order", "sales_order"]
    }
  },
  {
    "projectId": "bigquery-public-data",
    "datasetId": "thelook_ecommerce",
    "tableId": "users",
    "schema": {
        "description": "Data for users in The Look, a fictitious ecommerce store.",
        "synonyms": ["customers"],
        "tags": ["user", "customer", "buyer"]
    }
  }
]

Python SDK

使用 Python SDK 時,您可以在 BigQueryTableReference 物件的 schema 屬性中定義這些資料表層級屬性。以下範例說明如何建立資料表參照物件,為 ordersusers 資料表提供背景資訊。如需瞭解如何建構及使用資料表參照物件的完整範例,請參閱「連結至 BigQuery 資料」。

# Define context for the 'orders' table
bigquery_table_reference_1 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_1.project_id = "bigquery-public-data"
bigquery_table_reference_1.dataset_id = "thelook_ecommerce"
bigquery_table_reference_1.table_id = "orders"

bigquery_table_reference_1.schema = geminidataanalytics.Schema()
bigquery_table_reference_1.schema.description = "Data for orders in The Look, a fictitious ecommerce store."
bigquery_table_reference_1.schema.synonyms = ["sales"]
bigquery_table_reference_1.schema.tags = ["sale", "order", "sales_order"]

# Define context for the 'users' table
bigquery_table_reference_2 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_2.project_id = "bigquery-public-data"
bigquery_table_reference_2.dataset_id = "thelook_ecommerce"
bigquery_table_reference_2.table_id = "users"

bigquery_table_reference_2.schema = geminidataanalytics.Schema()
bigquery_table_reference_2.schema.description = "Data for users in The Look, a fictitious ecommerce store."
bigquery_table_reference_2.schema.synonyms = ["customers"]
bigquery_table_reference_2.schema.tags = ["user", "customer", "buyer"]

資料欄層級的結構化脈絡

fields 鍵會巢狀內嵌在表格參照的 schema 物件中,並採用 field 物件清單來描述個別資料欄。並非所有欄位都需要額外背景資訊,但如果提供常用欄位的詳細資料,有助於提升服務專員的成效。

針對每個 field 物件,您可以使用下列結構化內容欄位定義資料欄的基本屬性:

  • description:簡要說明資料欄的內容和用途
  • synonyms:可用來指稱資料欄的替代字詞清單
  • tags:與資料欄相關聯的關鍵字或標記清單

以下範例說明如何透過直接 HTTP 要求和 Python SDK,為 orders 表格中的 status 欄位和 users 表格中的 first_name 欄位,以結構化內容的形式提供這些屬性。

HTTP

在直接 HTTP 要求中,您可以在表格參照的 schema 物件中提供 fields 物件清單,定義這些資料欄層級的屬性。

"tableReferences": [
  {
    "projectId": "bigquery-public-data",
    "datasetId": "thelook_ecommerce",
    "tableId": "orders",
    "schema": {
      "fields": [{
          "name": "status",
          "description": "The current status of the order.",
      }]
    }
  },
  {
    "projectId": "bigquery-public-data",
    "datasetId": "thelook_ecommerce",
    "tableId": "users",
    "schema": {
      "fields": [{
          "name": "first_name",
          "description": "The first name of the user.",
          "tags": "person",
      }]
    }
  }
]

Python SDK

使用 Python SDK 時,您可以將 Field 物件清單指派給資料表 schema 屬性的 fields 屬性,定義這些資料欄層級屬性。

# Define column context for the 'orders' table
bigquery_table_reference_1.schema.fields = [
    geminidataanalytics.Field(
        name="status",
        description="The current status of the order.",
    )
]

# Define column context for the 'users' table
bigquery_table_reference_2.schema.fields = [
    geminidataanalytics.Field(
        name="first_name",
        description="The first name of the user.",
        tags=["person"],
    )
]

查詢範例

example_queries 鍵會採用 example_query 物件清單,定義自然語言查詢,協助代理程式針對常見或重要問題提供更準確且相關的回覆。提供自然語言問題和對應的 SQL 查詢給代理程式,有助於引導代理程式提供品質更高、更一致的結果。

您可以為每個 example_query 物件提供下列欄位,定義自然語言問題和對應的 SQL 查詢:

  • natural_language_question:使用者可能會提出的自然語言問題
  • sql_query:對應自然語言問題的 SQL 查詢

下列範例說明如何透過直接 HTTP 要求和 Python SDK,為 orders 資料表提供查詢範例。

HTTP

直接發出 HTTP 要求時,請在 example_queries 欄位中提供 example_query 物件清單。每個物件都必須包含 naturalLanguageQuestion 鍵和對應的 sqlQuery 鍵。

"example_queries": [
  {
    "naturalLanguageQuestion": "How many orders are there?",
    "sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`"
  },
  {
    "naturalLanguageQuestion": "How many orders were shipped?",
    "sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'"
  }
]

Python SDK

使用 Python SDK 時,您可以提供 ExampleQuery 物件清單。為每個物件提供 natural_language_questionsql_query 參數的值。

example_queries = [
    geminidataanalytics.ExampleQuery(
        natural_language_question="How many orders are there?",
        sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`",
    ),
    geminidataanalytics.ExampleQuery(
        natural_language_question="How many orders were shipped?",
        sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'",
    )
]

在系統指令中定義其他脈絡

您可以使用 system_instruction 參數,為結構化內容欄位不支援的內容提供補充指引。提供這類額外指引,有助於專員進一步瞭解資料和用途的脈絡。

系統指令包含一系列重要元件和物件,可向資料代理程式提供資料來源的詳細資料,以及代理程式在回答問題時的角色指引。您可以在 system_instruction 參數中以 YAML 格式的字串,向資料代理程式提供系統指令。

下列範本顯示字串的建議 YAML 結構,您可以提供給 BigQuery 資料來源的 system_instruction 參數,包括可用的鍵和預期的資料類型。這個範本提供建議的結構,以及定義系統指令的重要元件,但不包含所有可能的系統指令格式。

- system_instruction: str # A description of the expected behavior of the agent. For example: You are a sales agent.
- tables: # A list of tables to describe for the agent.
  - table: # Details about a single table that is relevant for the agent.
    - name: str # The name of the table.
    - fields: # Details about columns (fields) within the table.
      - field: # Details about a single column within the current table.
        - name: str # The name of the column.
        - aggregations: list[str] # Commonly used or default aggregations for the column.
  - relationships: # A list of join relationships between tables.
    - relationship: # Details about a single join relationship.
      - name: str # The name of this join relationship.
      - description: str # A description of the relationship.
      - relationship_type: str # The join relationship type: one-to-one, one-to-many, many-to-one, or many-to-many.
      - join_type: str # The join type: inner, outer, left, right, or full.
      - left_table: str # The name of the left table in the join.
      - right_table: str # The name of the right table in the join.
      - relationship_columns: # A list of columns that are used for the join.
        - left_column: str # The join column from the left table.
        - right_column: str # The join column from the right table.
- glossaries: # A list of definitions for glossary business terms, jargon, and abbreviations.
  - glossary: # The definition for a single glossary item.
    - term: str # The term, phrase, or abbreviation to define.
    - description: str # A description or definition of the term.
    - synonyms: list[str] # Alternative terms for the glossary entry.
- additional_descriptions: # A list of any other general instructions or content.
  - text: str # Any additional general instructions or context not covered elsewhere.

以下各節提供系統指令主要元件的範例:

system_instruction

使用 system_instruction 鍵定義代理程式的角色和員工角色。這項初始指令會為 API 回覆設定語氣和風格,並協助代理程式瞭解核心用途。

舉例來說,您可以為虛構的電子商務商店定義代理程式,如下所示:

- system_instruction: >-
    You are an expert sales analyst for a fictitious ecommerce store. You will answer questions about sales, orders, and customer data. Your responses should be concise and data-driven.

tables

定義資料表的基本屬性 (例如說明和同義字) 時,您可以使用結構化內容,也可以在系統指令中使用 tables 鍵提供補充業務邏輯。如果是 BigQuery 資料來源,則包括使用 fields 鍵,為特定資料欄定義預設 aggregations

下列 YAML 程式碼區塊範例說明如何在系統指令中使用 tables 鍵,巢狀結構化提供表格 bigquery-public-data.thelook_ecommerce.orders 補充指引的欄位:

- tables:
  - table:
    - name: bigquery-public-data.thelook_ecommerce.orders
    - fields:
      - field:
        - name: num_of_items
        - aggregations: 'sum, avg'

relationships

系統指令中的 relationships 鍵包含資料表之間的彙整關係清單。定義聯結關係後,代理程式就能瞭解如何在回答問題時聯結多個資料表的資料。

舉例來說,您可以定義 bigquery-public-data.thelook_ecommerce.orders 資料表和 bigquery-public-data.thelook_ecommerce.users 資料表之間的 orders_to_user 關係,如下所示:

- relationships:
  - relationship:
    - name: orders_to_user
    - description: >-
        Connects customer order data to user information with the user_id and id fields to allow an aggregated view of sales by customer demographics.
    - relationship_type: many-to-one
    - join_type: left
    - left_table: bigquery-public-data.thelook_ecommerce.orders
    - right_table: bigquery-public-data.thelook_ecommerce.users
    - relationship_columns:
      - left_column: user_id
      - right_column: id

glossaries

系統指令中的 glossaries 鍵會列出與資料和用途相關的業務用語、術語和縮寫定義。提供字彙表定義,有助於服務專員準確解讀並回答使用特定業務用語的問題。

舉例來說,您可以根據特定業務情境,定義常見業務狀態和「OMPF」等字詞,如下所示:

- glossaries:
  - glossary:
    - term: complete
    - description: Represents an order status where the order has been completed.
    - synonyms: 'finish, done, fulfilled'
  - glossary:
    - term: shipped
    - description: Represents an order status where the order has been shipped to the customer.
  - glossary:
    - term: returned
    - description: Represents an order status where the customer has returned the order.
  - glossary:
    - term: OMPF
    - description: Order Management and Product Fulfillment

additional_descriptions

使用 additional_descriptions 鍵提供任何不適合其他結構化脈絡或系統指令欄位的一般指示或脈絡。在系統指令中提供額外說明,有助於專員進一步瞭解資料和用途的脈絡。

舉例來說,您可以使用 additional_descriptions 鍵提供機構組織的相關資訊,如下所示:

- additional_descriptions:
  - text: All the sales data pertains to The Look, a fictitious ecommerce store.
  - text: 'Orders can be of three categories: food, clothes, and electronics.'

示例:銷售專員的撰寫背景資訊

以下是虛構的銷售分析師代理程式範例,說明如何結合結構化背景資訊和系統指令,提供撰寫的背景資訊。

範例:結構化背景資訊

您可以提供結構化內容,詳細說明資料表、資料欄和查詢範例,引導代理程式,如下列 HTTP 和 Python SDK 範例所示。

HTTP

以下範例說明如何在 HTTP 要求中定義結構化內容:

{
  "bq": {
    "tableReferences": [
      {
        "projectId": "bigquery-public-data",
        "datasetId": "thelook_ecommerce",
        "tableId": "orders",
        "schema": {
          "description": "Data for orders in The Look, a fictitious ecommerce store.",
          "synonyms": ["sales"],
          "tags": ["sale", "order", "sales_order"],
          "fields": [
            {
              "name": "status",
              "description": "The current status of the order."
            },
            {
              "name": "num_of_items",
              "description": "The number of items in the order."
            }
          ]
        }
      },
      {
        "projectId": "bigquery-public-data",
        "datasetId": "thelook_ecommerce",
        "tableId": "users",
        "schema": {
          "description": "Data for users in The Look, a fictitious ecommerce store.",
          "synonyms": ["customers"],
          "tags": ["user", "customer", "buyer"],
          "fields": [
            {
              "name": "first_name",
              "description": "The first name of the user.",
              "tags": ["person"]
            },
            {
              "name": "last_name",
              "description": "The last name of the user.",
              "tags": ["person"]
            },
            {
              "name": "age_group",
              "description": "The age demographic group of the user."
            },
            {
              "name": "email",
              "description": "The email address of the user.",
              "tags": ["contact"]
            }
          ]
        }
      }
    ]
  },
  "example_queries": [
    {
      "naturalLanguageQuestion": "How many orders are there?",
      "sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`"
    },
    {
      "naturalLanguageQuestion": "How many orders were shipped?",
      "sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'"
    },
    {
      "naturalLanguageQuestion": "How many unique customers are there?",
      "sqlQuery": "SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users`"
    },
    {
      "naturalLanguageQuestion": "How many users in the 25-34 age group have a cymbalgroup email address?",
      "sqlQuery": "SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users` WHERE users.age_group = '25-34' AND users.email LIKE '%@cymbalgroup.com'"
    }
  ]
}

Python SDK

以下範例說明如何使用 Python SDK 定義結構化環境:

# Define context for the 'orders' table
bigquery_table_reference_1 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_1.project_id = "bigquery-public-data"
bigquery_table_reference_1.dataset_id = "thelook_ecommerce"
bigquery_table_reference_1.table_id = "orders"

bigquery_table_reference_1.schema = geminidataanalytics.Schema()
bigquery_table_reference_1.schema.description = "Data for orders in The Look, a fictitious ecommerce store."
bigquery_table_reference_1.schema.synonyms = ["sales"]
bigquery_table_reference_1.schema.tags = ["sale", "order", "sales_order"]
bigquery_table_reference_1.schema.fields = [
    geminidataanalytics.Field(
        name="status",
        description="The current status of the order.",
    ),
    geminidataanalytics.Field(
        name="num_of_items",
        description="The number of items in the order."
    )
]

# Define context for the 'users' table
bigquery_table_reference_2 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_2.project_id = "bigquery-public-data"
bigquery_table_reference_2.dataset_id = "thelook_ecommerce"
bigquery_table_reference_2.table_id = "users"

bigquery_table_reference_2.schema = geminidataanalytics.Schema()
bigquery_table_reference_2.schema.description = "Data for users in The Look, a fictitious ecommerce store."
bigquery_table_reference_2.schema.synonyms = ["customers"]
bigquery_table_reference_2.schema.tags = ["user", "customer", "buyer"]
bigquery_table_reference_2.schema.fields = [
    geminidataanalytics.Field(
        name="first_name",
        description="The first name of the user.",
        tags=["person"],
    ),
    geminidataanalytics.Field(
        name="last_name",
        description="The last name of the user.",
        tags=["person"],
    ),
    geminidataanalytics.Field(
        name="age_group",
        description="The age demographic group of the user.",
    ),
    geminidataanalytics.Field(
        name="email",
        description="The email address of the user.",
        tags=["contact"],
    )
]

# Define example queries
example_queries = [
  geminidataanalytics.ExampleQuery(
      natural_language_question="How many orders are there?",
      sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`",
  ),
  geminidataanalytics.ExampleQuery(
      natural_language_question="How many orders were shipped?",
      sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'",
  ),
  geminidataanalytics.ExampleQuery(
      natural_language_question="How many unique customers are there?",
      sql_query="SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users`",
  ),
  geminidataanalytics.ExampleQuery(
      natural_language_question="How many users in the 25-34 age group have a cymbalgroup email address?",
      sql_query="SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users` WHERE users.age_group = '25-34' AND users.email LIKE '%@cymbalgroup.com'",
  )
]

範例:系統指示

下列系統指令會定義代理程式的角色,並提供結構化欄位不支援的指引,例如關係定義、詞彙表字詞、額外說明和補充orders表格詳細資料,藉此補充結構化背景資訊。在本範例中,由於 users 表格已透過結構化內容完整定義,因此不需要在系統指令中重新定義。

- system_instruction: >-
    You are an expert sales analyst for a fictitious ecommerce store. You will answer questions about sales, orders, and customer data. Your responses should be concise and data-driven.
- tables:
    - table:
        - name: bigquery-public-data.thelook_ecommerce.orders
        - fields:
            - field:
                - name: num_of_items
                - aggregations: 'sum, avg'
- relationships:
    - relationship:
        - name: orders_to_user
        - description: >-
            Connects customer order data to user information with the user_id and id fields.
        - relationship_type: many-to-one
        - join_type: left
        - left_table: bigquery-public-data.thelook_ecommerce.orders
        - right_table: bigquery-public-data.thelook_ecommerce.users
        - relationship_columns:
            - left_column: user_id
            - right_column: id
- glossaries:
    - glossary:
        - term: complete
        - description: Represents an order status where the order has been completed.
        - synonyms: 'finish, done, fulfilled'
    - glossary:
        - term: OMPF
        - description: Order Management and Product Fulfillment
- additional_descriptions:
    - text: All the sales data pertains to The Look, a fictitious ecommerce store.