內容集總覽

內容集是資料庫專屬資訊的集合,可讓 QueryData 等工具產生高準確度的查詢。內容集包含範本、層面和值搜尋,可協助 QueryData 瞭解資料庫結構定義和應用程式的商業邏輯。

系統支援下列資料庫:

  • AlloyDB for PostgreSQL
  • MySQL 適用的 Cloud SQL
  • PostgreSQL 適用的 Cloud SQL
  • Spanner

什麼是內容集

如要建構有效的代理應用程式,QueryData 等工具必須瞭解您的資料機構和商業邏輯。您以內容集的形式提供這項資訊。

您可以在檔案中定義內容,這些檔案包含各內容類型的 JSON 物件。您可以使用 Gemini CLI 撰寫這些脈絡檔案。接著,將結構定義檔案上傳至您在 Google Cloud控制台中建立的脈絡資料集。這個程序可讓 QueryData 等工具瞭解資料庫的特定結構定義和應用程式的商業邏輯。

結構定義檔案類似下列內容:

{
  "templates": [
    {
      "nl_query": "Count Prague loan accounts",
      "sql": "SELECT COUNT(T1.account_id) FROM bird_dev_financial.account AS T1 INNER JOIN bird_dev_financial.loan AS T2 ON T1.account_id = T2.account_id INNER JOIN bird_dev_financial.district AS T3 ON T1.district_id = T3.district_id WHERE T3.\"A3\" ='Prague'",
      "intent": "How many accounts associated with loans are located in the Prague region?",
      "manifest": "How many accounts associated with loans are located in a given city?",
      "parameterized": {
        "parameterized_intent": "How many accounts associated with loans are located in $1",
        "parameterized_sql": "SELECT COUNT(T1.account_id) FROM bird_dev_financial.account AS T1 INNER JOIN bird_dev_financial.loan AS T2 ON T1.account_id = T2.account_id INNER JOIN bird_dev_financial.district AS T3 ON T1.district_id = T3.district_id WHERE T3.\"A3\" = $1"
      }
    }
  ],
  "facets": [
    {
      "sql_snippet": "T.\"A11\" BETWEEN 6000 AND 10000",
      "intent": "Average salary between 6000 and 10000",
      "manifest": "Average salary between a given number and a given number",
      "parameterized": {
         "parameterized_intent": "Average salary between $1 and $2",
         "parameterized_sql_snippet": "T.\"A11\" BETWEEN $1 AND $2"
      }
    }
  ],
  "value_searches": [
    {
      "query": "SELECT $value as value, 'accounts.account_type' as columns, 'Account Type' as concept_type, 0 as distance, '{}'::text as context FROM \"accounts\" T WHERE T.\"account_type\" = $value",
      "concept_type": "Account Type",
      "description": "Exact match for account types"
    }
   ]
}

脈絡資料集檔案格式

內容集檔案包含 JSON 格式的精選範本和層面集,可引導 QueryData 等工具將自然語言問題翻譯成特定資料庫的查詢。定義情境可確保系統針對常見查詢模式產生高準確度的 SQL。

請確保設定的內容準確且涵蓋預期的應用程式查詢,盡可能提高準確度。

您可以在 us-central1us-east1europe-west4asia-southeast1 地區建立內容集。

查詢範本

查詢範本是一組精選的代表性自然語言問題,以及對應的 SQL 查詢。此外,這些範例也包含說明,可為自然語言轉 SQL 程式碼生成功能提供聲明式基本原理。

查詢範本物件類似於下列項目:

{
  "templates": [
    {
      "nl_query": "Count prague loan accounts",
      "sql": "SELECT COUNT(T1.account_id) FROM bird_dev_financial.account AS T1 INNER JOIN bird_dev_financial.loan AS T2 ON T1.account_id = T2.account_id INNER JOIN bird_dev_financial.district AS T3 ON T1.district_id = T3.district_id WHERE T3.\"A3\" = 'Prague'",
      "intent": "How many accounts associated with loans are located in the Prague region?",
      "manifest": "How many accounts associated with loans are located in a given city?",
      "parameterized": {
        "parameterized_intent": "How many accounts associated with loans are located in $1",
        "parameterized_sql": "SELECT COUNT(T1.account_id) FROM bird_dev_financial.account AS T1 INNER JOIN bird_dev_financial.loan AS T2 ON T1.account_id = T2.account_id INNER JOIN bird_dev_financial.district AS T3 ON T1.district_id = T3.district_id WHERE T3.\"A3\" = $1"
      }
    }
  ]
},
...

查詢範本 JSON 物件的主要元件如下:

  • nl_query:自然語言查詢範例,例如 QueryData 等工具可處理這類查詢。
  • sql:自然語言查詢的 SQL 查詢。
  • intent:自然語言查詢的目標或用途。如未設定,這個值預設為自然語言查詢。
  • manifest:意圖的自動產生一般形式。
  • parameterized_intent:意圖的範本形式,系統會自動產生,並將實體值替換為參數。
  • parameterized_sql:對應於參數化意圖的 SQL 查詢範本,系統會自動產生此範本。

查詢 facet

查詢構面是一組精選的代表性自然語言條件,以及對應的 SQL 述詞。多面向可管理篩選條件,讓查詢範本執行多面向搜尋。

查詢構面物件類似於下列項目:

{
...
"facets": [
    {
      "sql_snippet": "T.\"A11\" BETWEEN 6000 AND 10000",
      "intent": "Average salary between 6000 and 10000",
      "manifest": "Average salary between a given number and a given number",
      "parameterized": {
         "parameterized_intent": "Average salary between $1 and $2",
         "parameterized_sql_snippet": "T.\"A11\" BETWEEN $1 AND $2"
      }
    }
  ]
}

分面 JSON 物件的主要元件如下:

  • sql_snippet:SQL 片段。
  • intent:SQL 述詞的說明。
  • manifest:意圖的自動產生一般形式。
  • parameterized_intent:意圖的範本形式,系統會自動產生,並將實體值替換為參數。
  • parameterized_sql_snippet:對應參數化意圖的範本化自動產生 sql_snippet。

價值搜尋查詢

[值搜尋查詢](/spanner/docs/build-context-gemini-cli#generate-value-searches) 是開發人員定義的查詢,會使用比對函式在資料庫中尋找值及其內容。值連結功能會使用這些查詢的結果,找出含有相符值的資料表和資料欄、瞭解值的概念類型,並修正錯別字。

QueryData API 會使用值連結,更準確地將自然語言轉換為 SQL。使用值搜尋查詢時,API 可以根據資料庫值修正錯別字並解析值類型,進而提高轉換準確率。

連結值可提高自然語言到 SQL 轉換的準確度。舉例來說,如果使用者詢問「希斯洛機場有任何航班嗎?」,資料庫可能會將機場名稱儲存為「倫敦希斯洛機場」。如果沒有值連結,產生的 SQL 可能會依 WHERE name = 'Heathrow' 篩選,並傳回任何結果。值搜尋查詢會引導代理程式將「希斯洛」對應至正確的資料庫值「倫敦希斯洛」及其結構定義位置 (airports.name),確保生成的 SQL 準確無誤。

值搜尋查詢範例如下:

{
  ...
  "value_searches": [
    {
      "query": "SELECT $value as value, 'airports.iata' as columns, 'Airport IATA Code' as concept_type, 0 as distance, '{}'::text as context FROM \"airports\" T WHERE T.\"iata\" = $value",
      "concept_type": "Airport IATA Code",
      "description": "Exact match (Standard SQL) for 3-letter airport codes"
    },
    {
      "query": "WITH TrigramMetrics AS ( SELECT T.\"name\" AS original_value, (T.\"name\" <-> $value::text) AS normalized_dist FROM \"airports\" T WHERE T.\"name\" % $value::text ) SELECT original_value AS value, 'airports.name' AS columns, 'Airport Name' AS concept_type, normalized_dist AS distance, '{}'::text AS context FROM TrigramMetrics",
      "concept_type": "Airport Name",
      "description": "Fuzzy match using standard trigram for partial airport names"
    }
  ]
}

值搜尋 JSON 物件的主要元件如下:

  • query:參數化 SQL 陳述式,用於定義比對值詞組與資料庫表格資料欄中儲存值的邏輯。結果集通常會投射相符值、結構定義位置、概念類型,以及標準化距離指標 (介於 0 和 1 之間)。
  • concept_type:指派給值的語意標籤,例如 districtloan_status。這個標籤有助於連結值,並最終讓 QueryData API 等工具瞭解值詞組在結構定義中的角色。這項功能也有助於產生以值概念為目標的 SQL 陳述式,以及值片語出現的資料表和資料欄。
  • description:搜尋邏輯的說明。

限制

背景資訊集有下列限制:

  • 資料庫的內容集僅支援範本、構面和值搜尋。
  • 資料庫的內容集只會用於對話式數據分析 API 中的 QueryData 端點。

後續步驟