本页面介绍了如何使用分面进行全文搜索。作为交互式过滤的一部分,分面是可能的过滤条件值,以及该过滤条件对应的匹配项数量。在具有此功能的网站上,当您输入搜索词组时,导航菜单中会显示结果列表,并且还会显示可用于缩小搜索结果范围的分面列表,以及符合每个类别的结果数量。
例如,如果您使用查询“foo”搜索专辑,可能会获得数百个结果。如果存在流派分面,您可以按流派进行选择,例如“摇滚 (250)”“R&B (50)”或“流行 (150)”。
在 Spanner 全文搜索中,您可以使用标准 SQL 表达式和全文搜索函数进行过滤和过滤计数。 您无需使用特殊语法即可使用分面。
添加用于全文搜索的分面
以下示例会创建一个专辑表,并为每张专辑的标题进行标记化。该表用于本页面上的示例。
GoogleSQL
CREATE TABLE Albums (
AlbumId INT64 NOT NULL,
Title STRING(MAX),
Rating INT64,
Genres ARRAY<STRING(MAX)>,
Likes INT64 NOT NULL,
Title_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(Title)) HIDDEN,
) PRIMARY KEY(AlbumId);
PostgreSQL
CREATE TABLE albums (
albumid bigint NOT NULL,
title text,
rating bigint,
genres text[],
likes bigint NOT NULL,
title_tokens spanner.TOKENLIST GENERATED ALWAYS AS (spanner.TOKENIZE_FULLTEXT(Title)) VIRTUAL HIDDEN,
PRIMARY KEY(albumid));
在 Title_Tokens 上创建搜索索引。(可选)您可以将 Title、Genres 和 Rating 存储在搜索索引中,以避免在计算分面时回联接到基表。
GoogleSQL
CREATE SEARCH INDEX AlbumsIndex
ON Albums (Title_Tokens)
STORING (Title, Genres, Rating)
ORDER BY Likes DESC;
PostgreSQL
CREATE SEARCH INDEX albumsindex
ON albums (title_tokens)
INCLUDE (title, genres, rating)
ORDER BY likes DESC
在此示例中,请将以下数据插入到表中。
GoogleSQL
INSERT INTO Albums (AlbumId, Title, Rating, Genres, Likes) VALUES
(1, "The Foo Strike Again", 5, ["Rock", "Alternative"], 600),
(2, "Who are the Who?", 5, ["Progressive", "Indie"], 200),
(3, "No Foo For You", 4, ["Metal", "Alternative"], 50)
PostgreSQL
INSERT INTO albums (albumid, title, rating, genres, likes) VALUES
(1, 'The Foo Strike Again', 5,'{"Rock", "Alternative"}', 600),
(2, 'Who are the Who?', 5,'{"Progressive", "Indie"}', 200),
(3, 'No Foo For You', 4,'{"Metal", "Alternative"}', 50)
检索单个分面的计数值
此示例展示了如何对“评分”分面执行分面计数。它在 Albums 表的 Title_Tokens 列中执行“foo”的文本搜索。
GoogleSQL
SELECT Rating, COUNT(*) AS result_count
FROM Albums
WHERE SEARCH(Title_Tokens, "foo")
GROUP BY Rating
ORDER BY Rating DESC
| Rating | result_count |
|--------|--------------|
| 5 | 1 |
| 4 | 1 |
PostgreSQL
SELECT rating, COUNT(*) AS result_count
FROM albums
WHERE spanner.SEARCH(title_tokens, 'foo')
GROUP BY rating
ORDER BY rating DESC;
| rating | result_count |
|--------|--------------|
| 5 | 1 |
| 4 | 1 |
检索多个分面的计数值
此示例展示了对多个分面执行分面计数的步骤。 它会执行以下操作:
- 检索初始搜索结果:在
Albums表的Title_Tokens列中执行“foo”的文本搜索。 - 计算分面数量:然后计算
Rating和Genres分面的数量。
GoogleSQL
WITH search_results AS (
SELECT AlbumId, Title, Genres, Rating, Likes
FROM Albums
WHERE SEARCH(Title_Tokens, "foo")
ORDER BY Likes DESC, AlbumId
LIMIT 10000
)
SELECT
-- Result set #1: First page of search results
ARRAY(
SELECT AS STRUCT *
FROM search_results
ORDER BY Likes DESC, AlbumId
LIMIT 50
) as result_page,
-- Result set #2: Number of results by rating
ARRAY(
SELECT AS STRUCT Rating, COUNT(*) as result_count
FROM search_results
GROUP BY Rating
ORDER BY result_count DESC, Rating DESC
) as rating_counts,
-- Result set #3: Number of results for top 5 genres
ARRAY(
SELECT AS STRUCT genre, COUNT(*) as result_count
FROM search_results
JOIN UNNEST(Genres) genre
GROUP BY genre
ORDER BY result_count DESC, genre
LIMIT 5
) as genres_counts
PostgreSQL
WITH search_results AS (
SELECT albumid, title, genres, rating, likes
FROM albums
WHERE spanner.SEARCH(title_tokens, 'foo')
ORDER BY likes DESC, albumid
LIMIT 10000
)
-- The pattern ARRAY(SELECT TO_JSONB ...) enables returning multiple nested
-- result sets in the same query.
SELECT
-- Result set #1: First page of search results
ARRAY(
SELECT JSONB_BUILD_OBJECT(
'albumid', albumid,
'title', title,
'genres', genres,
'rating', rating,
'likes', likes
)
FROM search_results
ORDER BY likes DESC, albumid
LIMIT 50
) as result_page,
-- Result set #2: Number of results by rating
ARRAY(
SELECT JSONB_BUILD_OBJECT(
'rating', rating,
'result_count', COUNT(*)
)
FROM search_results
GROUP BY rating
ORDER BY COUNT(*) DESC, rating DESC
) as rating_counts,
-- Result set #3: Number of results for top 5 genres
ARRAY(
SELECT JSONB_BUILD_OBJECT(
'genre', genre,
'result_count', COUNT(*)
)
FROM
search_results,
UNNEST(genres) AS genre
GROUP BY genre
ORDER BY COUNT(*) DESC, genre
LIMIT 5
) as genres_counts
具体而言,此示例会执行以下操作:
WITH search_results AS (...)会收集大量初始搜索结果,用于第一页结果和分面计算。SEARCH(Title_Tokens, "foo")是主要搜索查询。LIMIT 10000通过将结果集减少到 10,000 来限制搜索费用。对于可能会返回数百万个结果的非常广泛的搜索,计算整个数据集的精确分面数量可能非常耗费资源。通过限制搜索结果,查询可以快速提供近似(下限)分面计数。这意味着,结果数量至少达到该数值,但可能还有更多匹配的结果,超出了 10,000 的限制。result_page子查询会生成向用户显示的第一页搜索结果。它仅从search_results中选择前 50 个记录,并按Likes和AlbumId排序。这是用户最初看到的内容。rating_counts子查询用于计算Rating的分面数量。该子查询按search_results中的所有记录的Rating对其进行分组,并统计每个评分类别中的结果数量。genres_counts子查询用于计算Genres的分面数量。由于该子查询是一个数组,因此请使用UNNEST(Genres)进行联接,以便将该数组中的每个流派视为单独的行进行计数。
检索后续页面
在初始分面查询之后查询连续页面时,您可以重复使用从第一页返回的分面计数。
如需详细了解如何进行分页,请参阅使用基于键的分页。