為 BigQuery 資料來源定義資料代理程式環境

撰寫的背景資訊是資料代理擁有者提供的指引,可塑造資料代理的行為,並改善 API 的回應。有效撰寫的內容可為 Conversational Analytics API 資料代理程式提供實用背景資訊,協助回答資料來源相關問題。

本頁面說明如何為 BigQuery 資料來源提供撰寫的內容。如果是 BigQuery 資料來源,您可以透過結構化內容系統指令的組合,提供撰寫的內容。請盡可能透過結構化內容欄位提供背景資訊。然後使用 system_instruction 參數,提供結構化欄位未涵蓋的補充指引。系統指示是資料代理擁有者可提供給代理的撰寫脈絡,用來告知代理其角色、語氣和整體行為。系統指令通常比結構化脈絡更自由。

結構化情境欄位和系統指令皆為選填,但提供豐富的情境資訊可讓服務專員給出更準確且相關的回覆。資料代理程式在建立和執行期間會納入撰寫的內容,確保其動作、查詢和回覆準確、符合規定且瞭解業務。系統會擷取並建立這個脈絡的索引,然後透過系統指令和結構化脈絡的組合,塑造代理程式的行為。

定義構成撰寫脈絡的結構化欄位和系統指令後,您可以在下列其中一個呼叫中向 API 提供該脈絡:

定義結構化情境欄位

本節說明如何使用結構化內容欄位,為資料代理程式提供內容。您可以向代理程式提供下列資訊,做為結構化脈絡:

資料表層級的結構化背景資訊

使用 tableReferences 鍵向代理程式提供詳細資料,說明可回答問題的特定資料表。針對每個表格參照,您可以使用下列結構化內容欄位定義表格的結構定義:

  • description:資料表內容和用途的摘要
  • synonyms:可用來指稱資料表的替代字詞清單
  • tags:與資料表相關聯的關鍵字或標記清單

下列範例說明如何透過直接 HTTP 要求和 Python SDK,以結構化內容的形式提供這些屬性。

HTTP

在直接 HTTP 要求中,您會在相關表格參照的 schema 物件中提供這些表格層級屬性。如需如何建構完整要求酬載的完整範例,請參閱「連結至 BigQuery 資料」。

{
  "bq": {
    "tableReferences": [
    {
        "projectId": "bigquery-public-data",
        "datasetId": "thelook_ecommerce",
        "tableId": "orders",
        "schema": {
          "description": "Data for orders in The Look, a fictitious ecommerce store.",
          "synonyms": [
            "sales"
          ],
          "tags": [
            "sale",
            "order",
            "sales_order"
          ]
        }
      },
      {
        "projectId": "bigquery-public-data",
        "datasetId": "thelook_ecommerce",
        "tableId": "users",
        "schema": {
          "description": "Data for users in The Look, a fictitious ecommerce store.",
          "synonyms": [
            "customers"
          ],
          "tags": [
            "user",
            "customer",
            "buyer"
          ]
        }
      }
    ]
  }
}

Python SDK

使用 Python SDK 時,您可以在 BigQueryTableReference 物件的 schema 屬性中定義這些資料表層級屬性。以下範例說明如何建立資料表參照物件,為 ordersusers 資料表提供背景資訊。如需建構及使用資料表參照物件的完整範例,請參閱「連結至 BigQuery 資料」。

  # Define context for the 'orders' table
  bigquery_table_reference_1 = geminidataanalytics.BigQueryTableReference()
  bigquery_table_reference_1.project_id = "bigquery-public-data"
  bigquery_table_reference_1.dataset_id = "thelook_ecommerce"
  bigquery_table_reference_1.table_id = "orders"

  bigquery_table_reference_1.schema = geminidataanalytics.Schema()
  bigquery_table_reference_1.schema.description = (
      "Data for orders in The Look, a fictitious ecommerce store."
  )
  bigquery_table_reference_1.schema.synonyms = ["sales"]
  bigquery_table_reference_1.schema.tags = ["sale", "order", "sales_order"]

  # Define context for the 'users' table
  bigquery_table_reference_2 = geminidataanalytics.BigQueryTableReference()
  bigquery_table_reference_2.project_id = "bigquery-public-data"
  bigquery_table_reference_2.dataset_id = "thelook_ecommerce"
  bigquery_table_reference_2.table_id = "users"

  bigquery_table_reference_2.schema = geminidataanalytics.Schema()
  bigquery_table_reference_2.schema.description = (
      "Data for users in The Look, a fictitious ecommerce store."
  )
  bigquery_table_reference_2.schema.synonyms = ["customers"]
  bigquery_table_reference_2.schema.tags = ["user", "customer", "buyer"]

資料欄層級結構化脈絡

fields 鍵 (巢狀結構位於表格參照的 schema 物件中) 會採用 field 物件清單,用來描述個別資料欄。並非所有欄位都需要額外背景資訊,但如果提供常用欄位的詳細資料,有助於提升服務專員的成效。

針對每個 field 物件,您可以使用下列結構化內容欄位定義資料欄的基本屬性:

  • description:簡要說明資料欄的內容和用途
  • synonyms:可用來指稱資料欄的替代字詞清單
  • tags:與資料欄相關聯的關鍵字或標記清單

以下範例說明如何透過直接 HTTP 要求和 Python SDK,為 orders 表格中的 status 欄位和 users 表格中的 first_name 欄位,以結構化內容的形式提供這些屬性。

HTTP

在直接 HTTP 要求中,您可以在表格參照的 schema 物件中提供 fields 物件清單,定義這些資料欄層級的屬性。

{
  "bq": {
    "tableReferences": [
      {
        "projectId": "bigquery-public-data",
        "datasetId": "thelook_ecommerce",
        "tableId": "orders",
        "schema": {
          "fields": [{
            "name": "status",
            "description": "The current status of the order."
          }]
        }
      },
      {
        "projectId": "bigquery-public-data",
        "datasetId": "thelook_ecommerce",
        "tableId": "users",
        "schema": {
          "fields": [{
            "name": "first_name",
            "description": "The first name of the user.",
            "tags": "person"
          }]
        }
      }
    ]
  }
}

Python SDK

使用 Python SDK 時,您可以將 Field 物件清單指派給資料表 schema 屬性的 fields 屬性,定義這些資料欄層級屬性。

# Define column context for the 'orders' table
bigquery_table_reference_1.schema.fields = [
    geminidataanalytics.Field(
        name="status",
        description="The current status of the order.",
    ),
]

# Define column context for the 'users' table
bigquery_table_reference_2.schema.fields = [
    geminidataanalytics.Field(
        name="first_name",
        description="The first name of the user.",
        tags=["person"],
    ),
]

查詢範例

example_queries 鍵會採用範例查詢物件清單,定義自然語言查詢,協助代理程式針對常見或重要問題提供更準確且相關的回覆。同時提供自然語言問題和對應的 SQL 查詢,即可引導虛擬服務專員提供品質更高、更一致的結果。

您可以為每個範例查詢物件提供下列欄位,定義自然語言問題和對應的 SQL 查詢:

  • natural_language_question:使用者可能會提出的自然語言問題
  • sql_query:與自然語言問題相應的 SQL 查詢

下列範例說明如何透過直接 HTTP 要求和 Python SDK,為 users 資料表提供範例查詢。

HTTP

在直接 HTTP 要求中,於 example_queries 鍵中提供範例查詢物件清單。每個物件都必須包含 naturalLanguageQuestion 鍵和對應的 sqlQuery 鍵。

  "example_queries": [
    {
    "naturalLanguageQuestion": "How many unique customers are there?",
    "sqlQuery": "SELECT COUNT(DISTINCT id) FROM bigquery-public-data.thelook_ecommerce.users"
    },
    {
    "naturalLanguageQuestion": "How many users in the 25-34 age group have an example.com email address?",
    "sqlQuery": "SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users` WHERE users.age_group = '25-34' AND users.email LIKE '%@example.com'"
    }
  ]

Python SDK

使用 Python SDK 時,您可以提供 ExampleQuery 物件清單。為每個物件提供 natural_language_questionsql_query 參數的值。

example_queries = [
    geminidataanalytics.ExampleQuery(
        natural_language_question="How many unique customers are there?",
        sql_query="SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users`"
        ),
    geminidataanalytics.ExampleQuery(
        natural_language_question=
            "How many users in the 25-34 age group have an example.com email address?",
        sql_query="SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users` WHERE users.age_group = '25-34' AND users.email LIKE '%@example.com'"
        ),
]

詞彙表字詞

使用 glossary_terms 鍵向服務專員提供與資料或機構相關的核准條款詳細資料,讓服務專員提供更準確一致的答案。請為每個字詞提供說明或同義詞。

您可以使用下列結構化背景資訊欄位定義詞彙表字詞:

  • display_name:要定義的特定領域字詞、詞組或縮寫
  • description:這個詞彙的選用說明或定義
  • labels:使用者可能會在對話中使用的同義字或替代字詞 (選用)

下列範例說明如何透過直接 HTTP 要求和 Python SDK,為 orders 表格提供術語定義。

HTTP

在直接 HTTP 要求中,您會在 glossary_terms 鍵中提供每個詞彙表字詞物件。針對每個物件,提供 display_name 的值,並視需要提供 description 或任何 labels 或同義詞。

  "glossary_terms": [
    {
        "display_name": "Complete",
        "description": "Represents an order status where the order has been completed.",
        "labels": ["finish", "done", "fulfilled"]
    }
  ]

Python SDK

使用 Python SDK 時,您可以提供 GlossaryTerm 物件清單。針對每個物件,提供 display_namedescription 的值,以及任何 labels 或同義詞參數 (視需要)。

glossary_terms = [
    geminidataanalytics.GlossaryTerm(
        display_name="Complete",
        description=(
            "Represents an order status where the order has been completed."
        ),
        labels=["finish", "done", "fulfilled"],
    ),
]

在系統指令中定義其他脈絡資訊

您可以使用 system_instruction 參數,為結構化背景資訊欄位不支援的背景資訊提供補充指引。提供這項額外指引,有助於專員進一步瞭解您的資料和用途。

系統指令包含一系列重要元件和物件,可向資料代理程式提供資料來源的詳細資料,以及代理程式在回答問題時的角色指引。您可以在 system_instruction 參數中,以 YAML 格式的字串向資料代理程式提供系統指令。

以下範本顯示字串的建議 YAML 結構,您可以提供給 BigQuery 資料來源的 system_instruction 參數,包括可用的鍵和預期的資料類型。這個範本提供建議的結構,以及定義系統指令的重要元件,但並未包含所有可能的系統指令格式。

- system_instruction: str # A description of the expected behavior of the agent. For example: You are a sales agent.
- tables: # A list of tables to describe for the agent.
  - table: # Details about a single table that is relevant for the agent.
    - name: str # The name of the table.
    - fields: # Details about columns (fields) within the table.
      - field: # Details about a single column within the current table.
        - name: str # The name of the column.
        - aggregations: list[str] # Commonly used or default aggregations for the column.
  - relationships: # A list of join relationships between tables.
    - relationship: # Details about a single join relationship.
      - name: str # The name of this join relationship.
      - description: str # A description of the relationship.
      - relationship_type: str # The join relationship type: one-to-one, one-to-many, many-to-one, or many-to-many.
      - join_type: str # The join type: inner, outer, left, right, or full.
      - left_table: str # The name of the left table in the join.
      - right_table: str # The name of the right table in the join.
      - relationship_columns: # A list of columns that are used for the join.
        - left_column: str # The join column from the left table.
        - right_column: str # The join column from the right table.
- additional_descriptions: # A list of any other general instructions or content.
  - text: str # Any additional general instructions or context not covered elsewhere.

以下各節提供系統指令主要元件的範例:

system_instruction

使用 system_instruction 鍵定義代理程式的角色和員工角色。這項初始指令會設定 API 回覆的語氣和風格,並協助代理程式瞭解核心用途。

舉例來說,您可以將代理程式定義為虛構電子商務商店的銷售分析師,如下所示:

- system_instruction: >-
    You are an expert sales analyst for a fictitious ecommerce store. You will answer questions about sales, orders, and customer data. Your responses should be concise and data-driven.

tables

定義資料表的基本屬性 (例如說明和同義字) 時,您可以使用結構化內容,也可以在系統指令中使用 tables 鍵提供補充業務邏輯。如果是 BigQuery 資料來源,這包括使用 fields 鍵為特定資料欄定義預設 aggregations

下列 YAML 程式碼區塊範例說明如何在系統指令中使用 tables 鍵,巢狀結構化提供表格 bigquery-public-data.thelook_ecommerce.orders 補充指引的欄位:

- tables:
  - table:
    - name: bigquery-public-data.thelook_ecommerce.orders
    - fields:
      - field:
        - name: num_of_items
        - aggregations: 'sum, avg'

relationships

系統指令中的 relationships 鍵包含資料表之間的彙整關係清單。定義聯結關係後,代理程式就能瞭解如何聯結多個資料表的資料,以便回答問題。

舉例來說,您可以定義 bigquery-public-data.thelook_ecommerce.orders 資料表和 bigquery-public-data.thelook_ecommerce.users 資料表之間的 orders_to_user 關係,如下所示:

- relationships:
  - relationship:
    - name: orders_to_user
    - description: >-
        Connects customer order data to user information with the user_id and id fields to allow an aggregated view of sales by customer demographics.
    - relationship_type: many-to-one
    - join_type: left
    - left_table: bigquery-public-data.thelook_ecommerce.orders
    - right_table: bigquery-public-data.thelook_ecommerce.users
    - relationship_columns:
      - left_column: user_id
      - right_column: id

additional_descriptions

使用 additional_descriptions 鍵提供不屬於其他結構化脈絡或系統指令欄位的任何一般指示或脈絡。在系統指令中提供額外說明,有助於虛擬服務專員進一步瞭解資料和用途的脈絡。

舉例來說,您可以使用 additional_descriptions 鍵提供機構組織的相關資訊,如下所示:

- additional_descriptions:
  - text: All the sales data pertains to The Look, a fictitious ecommerce store.
  - text: 'Orders can be of three categories: food, clothes, and electronics.'

示例:銷售專員的撰寫背景資訊

以下是虛構的銷售分析師代理程式範例,說明如何結合結構化背景資訊和系統指令,提供撰寫的背景資訊。

範例:結構化背景資訊

您可以提供結構化內容,詳細說明資料表、資料欄和查詢範例,引導代理程式,如下列 HTTP 和 Python SDK 範例所示。

HTTP

以下範例說明如何在 HTTP 要求中定義結構化內容:

{
  "bigquery_data_sources": {
    "bq": {
      "tableReferences": [
        {
          "projectId": "bigquery-public-data",
          "datasetId": "thelook_ecommerce",
          "tableId": "orders",
          "schema": {
            "description": "Data for orders in The Look, a fictitious ecommerce store.",
            "synonyms": ["sales"],
            "tags": ["sale", "order", "sales_order"],
            "fields": [
              {
                "name": "status",
                "description": "The current status of the order."
              },
              {
                "name": "num_of_items",
                "description": "The number of items in the order."
              }
            ]
          }
        },
        {
          "projectId": "bigquery-public-data",
          "datasetId": "thelook_ecommerce",
          "tableId": "users",
          "schema": {
            "description": "Data for users in The Look, a fictitious ecommerce store.",
            "synonyms": ["customers"],
            "tags": ["user", "customer", "buyer"],
            "fields": [
              {
                "name": "first_name",
                "description": "The first name of the user.",
                "tags": ["person"]
              },
              {
                "name": "last_name",
                "description": "The last name of the user.",
                "tags": ["person"]
              },
              {
                "name": "age_group",
                "description": "The age demographic group of the user."
              },
              {
                "name": "email",
                "description": "The email address of the user.",
                "tags": ["contact"]
              }
            ]
          }
        }
      ]
    }
  },
  "example_queries": [
    {
      "naturalLanguageQuestion": "How many orders are there?",
      "sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`"
    },
    {
      "naturalLanguageQuestion": "How many orders were shipped?",
      "sqlQuery": "SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'"
    },
    {
      "naturalLanguageQuestion": "How many unique customers are there?",
      "sqlQuery": "SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users`"
    },
    {
      "naturalLanguageQuestion": "How many users in the 25-34 age group have an example.com email address?",
      "sqlQuery": "SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users` WHERE users.age_group = '25-34' AND users.email LIKE '%@example.com'"
    }
  ],
  "glossary_terms": [
    {
        "display_name": "Complete",
        "description": "Represents an order status where the order has been completed.",
        "labels": ["finish", "done", "fulfilled"]
    },
    {
        "display_name": "OMPF",
        "description": "Order Management and Product Fulfillment"
    }
  ]
}

Python SDK

以下範例說明如何使用 Python SDK 定義結構化環境:

# Define context for the 'orders' table
bigquery_table_reference_1 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_1.project_id = "bigquery-public-data"
bigquery_table_reference_1.dataset_id = "thelook_ecommerce"
bigquery_table_reference_1.table_id = "orders"

bigquery_table_reference_1.schema = geminidataanalytics.Schema()
bigquery_table_reference_1.schema.description = (
    "Data for orders in The Look, a fictitious ecommerce store."
)
bigquery_table_reference_1.schema.synonyms = ["sales"]
bigquery_table_reference_1.schema.tags = ["sale", "order", "sales_order"]
bigquery_table_reference_1.schema.fields = [
    geminidataanalytics.Field(
        name="status",
        description="The current status of the order.",
    ),
    geminidataanalytics.Field(
        name="num_of_items",
        description="The number of items in the order.",
    ),
]

# Define context for the 'users' table
bigquery_table_reference_2 = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference_2.project_id = "bigquery-public-data"
bigquery_table_reference_2.dataset_id = "thelook_ecommerce"
bigquery_table_reference_2.table_id = "users"

bigquery_table_reference_2.schema = geminidataanalytics.Schema()
bigquery_table_reference_2.schema.description = (
    "Data for users in The Look, a fictitious ecommerce store."
)
bigquery_table_reference_2.schema.synonyms = ["customers"]
bigquery_table_reference_2.schema.tags = ["user", "customer", "buyer"]
bigquery_table_reference_2.schema.fields = [
    geminidataanalytics.Field(
        name="first_name",
        description="The first name of the user.",
        tags=["person"],
    ),
    geminidataanalytics.Field(
        name="last_name",
        description="The last name of the user.",
        tags=["person"],
    ),
    geminidataanalytics.Field(
        name="age_group",
        description="The age demographic group of the user.",
    ),
    geminidataanalytics.Field(
        name="email",
        description="The email address of the user.",
        tags=["contact"],
    ),
]

# Define example queries
example_queries = [
    geminidataanalytics.ExampleQuery(
        natural_language_question="How many orders are there?",
        sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders`",
    ),
    geminidataanalytics.ExampleQuery(
        natural_language_question="How many orders were shipped?",
        sql_query="SELECT COUNT(*) FROM `bigquery-public-data.thelook_ecommerce.orders` WHERE status = 'shipped'",
    ),
    geminidataanalytics.ExampleQuery(
        natural_language_question="How many unique customers are there?",
        sql_query="SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users`",
    ),
    geminidataanalytics.ExampleQuery(
        natural_language_question=(
            "How many users in the 25-34 age group have an example.com email "
            "address?"
        ),
        sql_query="SELECT COUNT(DISTINCT id) FROM `bigquery-public-data.thelook_ecommerce.users` WHERE users.age_group = '25-34' AND users.email LIKE '%@example.com'",
    ),
]

# Define glossary terms
glossary_terms = [
    geminidataanalytics.GlossaryTerm(
        display_name="Complete",
        description=(
            "Represents an order status where the order has been completed."
        ),
        labels=["finish", "done", "fulfilled"],
    ),
    geminidataanalytics.GlossaryTerm(
        display_name="OMPF",
        description="Order Management and Product Fulfillment",
    ),
]

範例:系統指示

下列系統指令會定義代理程式的角色,並提供結構化欄位不支援的指引 (例如關係定義、額外說明和補充 orders 表格詳細資料),藉此補充結構化背景資訊。在本範例中,users 表格已透過結構化內容完整定義,因此不需要在系統指令中重新定義。

- system_instruction: >-
    You are an expert sales analyst for a fictitious ecommerce store. You will answer questions about sales, orders, and customer data. Your responses should be concise and data-driven.
- tables:
    - table:
        - name: bigquery-public-data.thelook_ecommerce.orders
        - fields:
            - field:
                - name: num_of_items
                - aggregations: 'sum, avg'
- relationships:
    - relationship:
        - name: orders_to_user
        - description: >-
            Connects customer order data to user information with the user_id and id fields.
        - relationship_type: many-to-one
        - join_type: left
        - left_table: bigquery-public-data.thelook_ecommerce.orders
        - right_table: bigquery-public-data.thelook_ecommerce.users
        - relationship_columns:
            - left_column: user_id
            - right_column: id
- additional_descriptions:
    - text: All the sales data pertains to The Look, a fictitious ecommerce store.