本页介绍了如何使用方面进行全文搜索。作为互动式过滤的一部分,分面是潜在的过滤条件值,以及该过滤条件的匹配次数。在具有此功能的网站上,当您输入搜索短语时,导航菜单中会显示结果列表,并且还会显示可用于缩小搜索结果范围的各个方面,以及符合每个类别的结果数量。
例如,如果您使用查询“foo”搜索专辑,可能会获得数百条结果。如果存在流派这一方面,您可以按流派进行选择,例如“摇滚 (250)”“R&B (50)”或“流行 (150)”。
在 Spanner 全文搜索中,您可以使用标准 SQL 表达式和全文搜索函数进行过滤和过滤计数。您无需使用特殊语法即可使用分面。
添加用于全文搜索的构面
以下示例会创建一个专辑表,并为每张专辑的标题进行分词。本页中的示例使用了此表。
GoogleSQL
CREATE TABLE Albums (
AlbumId INT64 NOT NULL,
Title STRING(MAX),
Rating INT64,
Genres ARRAY<STRING(MAX)>,
Likes INT64 NOT NULL,
Title_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(Title)) HIDDEN,
) PRIMARY KEY(AlbumId);
PostgreSQL
CREATE TABLE albums (
albumid bigint NOT NULL,
title text,
rating bigint,
genres text[],
likes bigint NOT NULL,
title_tokens spanner.TOKENLIST GENERATED ALWAYS AS (spanner.TOKENIZE_FULLTEXT(Title)) VIRTUAL HIDDEN,
PRIMARY KEY(albumid));
在 Title_Tokens
上创建搜索索引。您可以选择将 Title
、Genres
和 Rating
存储在搜索索引中,以避免在计算分面时回联到基表。
GoogleSQL
CREATE SEARCH INDEX AlbumsIndex
ON Albums (Title_Tokens)
STORING (Title, Genres, Rating)
ORDER BY Likes DESC;
PostgreSQL
CREATE SEARCH INDEX albumsindex
ON albums (title_tokens)
INCLUDE (title, genres, rating)
ORDER BY likes DESC
在此示例中,请将以下数据插入到表中。
GoogleSQL
INSERT INTO Albums (AlbumId, Title, Rating, Genres, Likes) VALUES
(1, "The Foo Strike Again", 5, ["Rock", "Alternative"], 600),
(2, "Who are the Who?", 5, ["Progressive", "Indie"], 200),
(3, "No Foo For You", 4, ["Metal", "Alternative"], 50)
PostgreSQL
INSERT INTO albums (albumid, title, rating, genres, likes) VALUES
(1, 'The Foo Strike Again', 5,'{"Rock", "Alternative"}', 600),
(2, 'Who are the Who?', 5,'{"Progressive", "Indie"}', 200),
(3, 'No Foo For You', 4,'{"Metal", "Alternative"}', 50)
检索单个方面的统计值
此示例展示了如何对“评分”分面执行分面计数。它在 Albums 表的 Title_Tokens 列中执行“foo”的文本搜索。
GoogleSQL
SELECT Rating, COUNT(*) AS result_count
FROM Albums
WHERE SEARCH(Title_Tokens, "foo")
GROUP BY Rating
ORDER BY Rating DESC
| Rating | result_count |
|--------|--------------|
| 5 | 1 |
| 4 | 1 |
PostgreSQL
SELECT rating, COUNT(*) AS result_count
FROM albums
WHERE spanner.SEARCH(title_tokens, 'foo')
GROUP BY rating
ORDER BY rating DESC;
| rating | result_count |
|--------|--------------|
| 5 | 1 |
| 4 | 1 |
检索多个分面的数量值
此示例展示了对多个方面执行分面计数的步骤。 它会执行以下操作:
- 检索初始搜索结果:在
Albums
表的Title_Tokens
列中执行“foo”的文本搜索。 - 计算分面计数:然后计算
Rating
和Genres
分面的计数。
GoogleSQL
WITH search_results AS (
SELECT AlbumId, Title, Genres, Rating, Likes
FROM Albums
WHERE SEARCH(Title_Tokens, "foo")
ORDER BY Likes DESC, AlbumId
LIMIT 10000
)
SELECT
-- Result set #1: First page of search results
ARRAY(
SELECT AS STRUCT *
FROM search_results
ORDER BY Likes DESC, AlbumId
LIMIT 50
) as result_page,
-- Result set #2: Number of results by rating
ARRAY(
SELECT AS STRUCT Rating, COUNT(*) as result_count
FROM search_results
GROUP BY Rating
ORDER BY result_count DESC, Rating DESC
) as rating_counts,
-- Result set #3: Number of results for top 5 genres
ARRAY(
SELECT AS STRUCT genre, COUNT(*) as result_count
FROM search_results
JOIN UNNEST(Genres) genre
GROUP BY genre
ORDER BY result_count DESC, genre
LIMIT 5
) as genres_counts
PostgreSQL
WITH search_results AS (
SELECT albumid, title, genres, rating, likes
FROM albums
WHERE spanner.SEARCH(title_tokens, 'foo')
ORDER BY likes DESC, albumid
LIMIT 10000
)
-- The pattern ARRAY(SELECT TO_JSONB ...) enables returning multiple nested
-- result sets in the same query.
SELECT
-- Result set #1: First page of search results
ARRAY(
SELECT JSONB_BUILD_OBJECT(
'albumid', albumid,
'title', title,
'genres', genres,
'rating', rating,
'likes', likes
)
FROM search_results
ORDER BY likes DESC, albumid
LIMIT 50
) as result_page,
-- Result set #2: Number of results by rating
ARRAY(
SELECT JSONB_BUILD_OBJECT(
'rating', rating,
'result_count', COUNT(*)
)
FROM search_results
GROUP BY rating
ORDER BY COUNT(*) DESC, rating DESC
) as rating_counts,
-- Result set #3: Number of results for top 5 genres
ARRAY(
SELECT JSONB_BUILD_OBJECT(
'genre', genre,
'result_count', COUNT(*)
)
FROM
search_results,
UNNEST(genres) AS genre
GROUP BY genre
ORDER BY COUNT(*) DESC, genre
LIMIT 5
) as genres_counts
具体而言,此示例会执行以下操作:
WITH search_results AS (...)
会收集大量初始搜索结果,用于生成第一页结果和计算分面。SEARCH(Title_Tokens, "foo")
是主要搜索查询。LIMIT 10000
通过将结果集减少到 10,000 来限制搜索费用。对于可能会返回数百万条结果的非常广泛的搜索,计算整个数据集的精确分面计数可能非常耗费资源。通过限制搜索结果,查询可以快速提供近似(下限)分面计数。这意味着,结果数量至少达到该数值,但可能还有更多匹配的结果,只是超出了 1 万条的上限。result_page
子查询会生成向用户显示的第一页搜索结果。它仅从search_results
中选择前 50 条记录,并按Likes
和AlbumId
排序。这是用户最初看到的内容。rating_counts
子查询会计算Rating
的分面计数。它会按Rating
对search_results
中的所有记录进行分组,并统计每个评分类别中的结果数量。genres_counts
子查询用于计算Genres
的分面计数。由于它是数组,因此请使用UNNEST(Genres)
进行联接,以便将数组中的每个流派视为单独的行进行统计。
检索后续页面
在初始商品详情查询之后查询后续页面时,您可以重复使用第一页返回的商品详情计数。
如需详细了解如何进行分页,请参阅使用基于键的分页。