上下文集是特定于数据库的信息集合,可让 QueryData 等工具生成准确率高的查询。上下文集包含模板、分面和值搜索,可帮助 QueryData 了解您的数据库架构和应用的业务逻辑。
支持以下数据库:
- AlloyDB for PostgreSQL
- Cloud SQL for MySQL
- Cloud SQL for PostgreSQL
- Spanner
什么是上下文集
如需构建有效的代理应用,QueryData 等工具必须了解您的数据组织和业务逻辑。您需要以上下文集的形式提供此信息。
您可以在文件中定义上下文,这些文件包含每种上下文类型的 JSON 对象。您可以在 Gemini CLI 的帮助下编写这些上下文文件。然后,您可以将上下文的描述文件上传到您在控制台中创建的上下文集。 Google Cloud通过此过程,QueryData 等工具可以了解数据库的特定架构和应用的业务逻辑。
上下文的描述文件类似于以下内容:
{
"templates": [
{
"nl_query": "Count Prague loan accounts",
"sql": "SELECT COUNT(T1.account_id) FROM bird_dev_financial.account AS T1 INNER JOIN bird_dev_financial.loan AS T2 ON T1.account_id = T2.account_id INNER JOIN bird_dev_financial.district AS T3 ON T1.district_id = T3.district_id WHERE T3.\"A3\" ='Prague'",
"intent": "How many accounts associated with loans are located in the Prague region?",
"manifest": "How many accounts associated with loans are located in a given city?",
"parameterized": {
"parameterized_intent": "How many accounts associated with loans are located in $1",
"parameterized_sql": "SELECT COUNT(T1.account_id) FROM bird_dev_financial.account AS T1 INNER JOIN bird_dev_financial.loan AS T2 ON T1.account_id = T2.account_id INNER JOIN bird_dev_financial.district AS T3 ON T1.district_id = T3.district_id WHERE T3.\"A3\" = $1"
}
}
],
"facets": [
{
"sql_snippet": "T.\"A11\" BETWEEN 6000 AND 10000",
"intent": "Average salary between 6000 and 10000",
"manifest": "Average salary between a given number and a given number",
"parameterized": {
"parameterized_intent": "Average salary between $1 and $2",
"parameterized_sql_snippet": "T.\"A11\" BETWEEN $1 AND $2"
}
}
],
"value_searches": [
{
"query": "SELECT $value as value, 'accounts.account_type' as columns, 'Account Type' as concept_type, 0 as distance, '{}'::text as context FROM \"accounts\" T WHERE T.\"account_type\" = $value",
"concept_type": "Account Type",
"description": "Exact match for account types"
}
]
}
上下文集文件格式
上下文集文件由一组精选的 JSON 格式的模板和分面组成,这些模板和分面可指导 QueryData 等工具将自然语言问题转换为特定数据库的查询。定义上下文可确保针对常见查询模式生成准确率高的 SQL。
请确保上下文集准确无误,并全面涵盖预期的应用查询,以最大限度地提高准确率。
上下文集可以在 us-central1、us-east1、europe-west4 和 asia-southeast1 区域中创建。
查询模板
查询模板是一组精选的代表性自然语言问题,其中包含相应的 SQL 查询。它们还包含说明,以便为自然语言到 SQL 的生成提供声明性依据。
查询模板对象类似于以下内容:
{
"templates": [
{
"nl_query": "Count prague loan accounts",
"sql": "SELECT COUNT(T1.account_id) FROM bird_dev_financial.account AS T1 INNER JOIN bird_dev_financial.loan AS T2 ON T1.account_id = T2.account_id INNER JOIN bird_dev_financial.district AS T3 ON T1.district_id = T3.district_id WHERE T3.\"A3\" = 'Prague'",
"intent": "How many accounts associated with loans are located in the Prague region?",
"manifest": "How many accounts associated with loans are located in a given city?",
"parameterized": {
"parameterized_intent": "How many accounts associated with loans are located in $1",
"parameterized_sql": "SELECT COUNT(T1.account_id) FROM bird_dev_financial.account AS T1 INNER JOIN bird_dev_financial.loan AS T2 ON T1.account_id = T2.account_id INNER JOIN bird_dev_financial.district AS T3 ON T1.district_id = T3.district_id WHERE T3.\"A3\" = $1"
}
}
]
},
...
查询模板 JSON 对象的主要组成部分如下:
nl_query:QueryData 等工具处理的自然语言查询示例。sql:自然语言查询的 SQL 查询。intent:自然语言查询的目标或用途。如果未设置,此值默认为自然语言查询。manifest:意图的通用自动生成形式。parameterized_intent:意图的模板化自动生成形式,实体值替换为参数。parameterized_sql:与参数化意图对应的 SQL 查询的模板化自动生成形式。
查询分面
查询分面是一组精选的代表性自然语言条件,其中包含相应的 SQL 谓词。分面管理过滤条件和条件,使查询模板能够执行分面搜索。
查询分面对象类似于以下内容:
{
...
"facets": [
{
"sql_snippet": "T.\"A11\" BETWEEN 6000 AND 10000",
"intent": "Average salary between 6000 and 10000",
"manifest": "Average salary between a given number and a given number",
"parameterized": {
"parameterized_intent": "Average salary between $1 and $2",
"parameterized_sql_snippet": "T.\"A11\" BETWEEN $1 AND $2"
}
}
]
}
分面 JSON 对象的主要组成部分如下:
sql_snippet:SQL 代码段。intent:SQL 谓词的说明。manifest:意图的通用自动生成形式。parameterized_intent:意图的模板化自动生成形式,实体值替换为参数。parameterized_sql_snippet:与参数化意图对应的 sql_snippet 的模板化自动生成形式。
值搜索查询
[值搜索查询](/spanner/docs/build-context-gemini-cli#generate-value-searches)是开发者定义的查询,使用匹配函数在数据库中查找值及其上下文。值关联使用这些查询的结果来识别哪些表和列包含匹配的值,了解值的概念类型,并纠正拼写错误。
QueryData API 使用值关联将自然语言更准确地转换为 SQL。通过使用值搜索查询,API 可以根据数据库值纠正拼写错误并解析值类型,从而提高转换准确率。
值关联可提高自然语言到 SQL 的转换准确率。例如,如果用户问“有没有从希思罗机场起飞的航班?”,数据库可能会将机场名称存储为“London
Heathrow”。如果没有值关联,生成的 SQL 可能会按 WHERE name = 'Heathrow' 进行过滤,并且不返回任何结果。值搜索查询会引导代理将“Heathrow”映射到正确的数据库值“London
Heathrow”及其架构位置 (airports.name),确保生成的 SQL 准确无误。
值搜索查询示例:
{
...
"value_searches": [
{
"query": "SELECT $value as value, 'airports.iata' as columns, 'Airport IATA Code' as concept_type, 0 as distance, '{}'::text as context FROM \"airports\" T WHERE T.\"iata\" = $value",
"concept_type": "Airport IATA Code",
"description": "Exact match (Standard SQL) for 3-letter airport codes"
},
{
"query": "WITH TrigramMetrics AS ( SELECT T.\"name\" AS original_value, (T.\"name\" <-> $value::text) AS normalized_dist FROM \"airports\" T WHERE T.\"name\" % $value::text ) SELECT original_value AS value, 'airports.name' AS columns, 'Airport Name' AS concept_type, normalized_dist AS distance, '{}'::text AS context FROM TrigramMetrics",
"concept_type": "Airport Name",
"description": "Fuzzy match using standard trigram for partial airport names"
}
]
}
值搜索 JSON 对象的主要组成部分如下:
query:参数化 SQL 语句,用于定义将值短语与数据库中表的列中存储的值进行匹配的逻辑。结果集通常会投影匹配的值、架构位置、概念类型和归一化距离指标(介于 0 和 1 之间)。concept_type:分配给值的语义标签,例如district或loan_status。此标签有助于值关联,并最终帮助 QueryData API 等工具了解值短语在架构中的作用。它还有助于生成针对值的概念的 SQL 语句,以及值短语在其中显示的表和列。description:搜索逻辑的说明。
限制
上下文集具有以下限制:
- 数据库的上下文集仅支持模板、分面和值搜索。
- 数据库的上下文集仅由 Conversational Analytics API 中的
QueryData端点使用。
后续步骤
- 了解如何在 Spanner Studio 中创建或删除上下文集。