使用分面执行搜索

本页介绍了如何使用方面进行全文搜索。作为互动式过滤的一部分,分面是潜在的过滤条件值,以及该过滤条件的匹配次数。在具有此功能的网站上,当您输入搜索短语时,导航菜单中会显示结果列表,并且还会显示可用于缩小搜索结果范围的各个方面,以及符合每个类别的结果数量。

例如,如果您使用查询“foo”搜索专辑,可能会获得数百条结果。如果存在流派这一方面,您可以按流派进行选择,例如“摇滚 (250)”“R&B (50)”或“流行 (150)”。

在 Spanner 全文搜索中,您可以使用标准 SQL 表达式和全文搜索函数进行过滤和过滤计数。您无需使用特殊语法即可使用分面。

添加用于全文搜索的构面

以下示例会创建一个专辑表,并为每张专辑的标题进行分词。本页中的示例使用了此表。

GoogleSQL

CREATE TABLE Albums (
  AlbumId INT64 NOT NULL,
  Title STRING(MAX),
  Rating INT64,
  Genres ARRAY<STRING(MAX)>,
  Likes INT64 NOT NULL,
  Title_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(Title)) HIDDEN,
) PRIMARY KEY(AlbumId);

PostgreSQL

CREATE TABLE albums (
  albumid bigint NOT NULL,
  title text,
  rating bigint,
  genres text[],
  likes bigint NOT NULL,
  title_tokens spanner.TOKENLIST GENERATED ALWAYS AS (spanner.TOKENIZE_FULLTEXT(Title)) VIRTUAL HIDDEN,
PRIMARY KEY(albumid));

Title_Tokens 上创建搜索索引。您可以选择将 TitleGenresRating 存储在搜索索引中,以避免在计算分面时回联到基表。

GoogleSQL

CREATE SEARCH INDEX AlbumsIndex
ON Albums (Title_Tokens)
STORING (Title, Genres, Rating)
ORDER BY Likes DESC;

PostgreSQL

CREATE SEARCH INDEX albumsindex
ON albums (title_tokens)
INCLUDE (title, genres, rating)
ORDER BY likes DESC

在此示例中,请将以下数据插入到表中。

GoogleSQL

INSERT INTO Albums (AlbumId, Title, Rating, Genres, Likes) VALUES
(1, "The Foo Strike Again", 5, ["Rock", "Alternative"], 600),
(2, "Who are the Who?", 5, ["Progressive", "Indie"], 200),
(3, "No Foo For You", 4, ["Metal", "Alternative"], 50)

PostgreSQL

INSERT INTO albums (albumid, title, rating, genres, likes) VALUES
(1, 'The Foo Strike Again', 5,'{"Rock", "Alternative"}', 600),
(2, 'Who are the Who?', 5,'{"Progressive", "Indie"}', 200),
(3, 'No Foo For You', 4,'{"Metal", "Alternative"}', 50)

检索单个方面的统计值

此示例展示了如何对“评分”分面执行分面计数。它在 Albums 表的 Title_Tokens 列中执行“foo”的文本搜索。

GoogleSQL

SELECT Rating, COUNT(*) AS result_count
FROM Albums
WHERE SEARCH(Title_Tokens, "foo")
GROUP BY Rating
ORDER BY Rating DESC

| Rating | result_count |
|--------|--------------|
| 5      | 1            |
| 4      | 1            |

PostgreSQL

SELECT rating, COUNT(*) AS result_count
FROM albums
WHERE spanner.SEARCH(title_tokens, 'foo')
GROUP BY rating
ORDER BY rating DESC;

| rating | result_count |
|--------|--------------|
| 5      | 1            |
| 4      | 1            |

检索多个分面的数量值

此示例展示了对多个方面执行分面计数的步骤。 它会执行以下操作:

  1. 检索初始搜索结果:在 Albums 表的 Title_Tokens 列中执行“foo”的文本搜索。
  2. 计算分面计数:然后计算 RatingGenres 分面的计数。

GoogleSQL

WITH search_results AS (
  SELECT AlbumId, Title, Genres, Rating, Likes
  FROM Albums
  WHERE SEARCH(Title_Tokens, "foo")
  ORDER BY Likes DESC, AlbumId
  LIMIT 10000
)

SELECT
-- Result set #1: First page of search results
ARRAY(
  SELECT AS STRUCT *
  FROM search_results
  ORDER BY Likes DESC, AlbumId
  LIMIT 50
) as result_page,
-- Result set #2: Number of results by rating
ARRAY(
  SELECT AS STRUCT Rating, COUNT(*) as result_count
  FROM search_results
  GROUP BY Rating
  ORDER BY result_count DESC, Rating DESC
) as rating_counts,
-- Result set #3: Number of results for top 5 genres
ARRAY(
  SELECT AS STRUCT genre, COUNT(*) as result_count
  FROM search_results
  JOIN UNNEST(Genres) genre
  GROUP BY genre
  ORDER BY result_count DESC, genre
  LIMIT 5
) as genres_counts

PostgreSQL

WITH search_results AS (
  SELECT albumid, title, genres, rating, likes
  FROM albums
  WHERE spanner.SEARCH(title_tokens, 'foo')
  ORDER BY likes DESC, albumid
  LIMIT 10000
)

-- The pattern ARRAY(SELECT TO_JSONB ...) enables returning multiple nested
-- result sets in the same query.
SELECT
  -- Result set #1: First page of search results
  ARRAY(
  SELECT JSONB_BUILD_OBJECT(
    'albumid', albumid,
    'title', title,
    'genres', genres,
    'rating', rating,
    'likes', likes
    )
  FROM search_results
  ORDER BY likes DESC, albumid
  LIMIT 50
) as result_page,
-- Result set #2: Number of results by rating
ARRAY(
  SELECT JSONB_BUILD_OBJECT(
    'rating', rating,
    'result_count', COUNT(*)
    )
  FROM search_results
  GROUP BY rating
  ORDER BY COUNT(*) DESC, rating DESC
) as rating_counts,
-- Result set #3: Number of results for top 5 genres
ARRAY(
  SELECT JSONB_BUILD_OBJECT(
    'genre', genre,
    'result_count', COUNT(*)
    )
  FROM
    search_results,
    UNNEST(genres) AS genre
  GROUP BY genre
  ORDER BY COUNT(*) DESC, genre
  LIMIT 5
) as genres_counts

具体而言,此示例会执行以下操作:

  • WITH search_results AS (...) 会收集大量初始搜索结果,用于生成第一页结果和计算分面。
  • SEARCH(Title_Tokens, "foo") 是主要搜索查询。
  • LIMIT 10000 通过将结果集减少到 10,000 来限制搜索费用。对于可能会返回数百万条结果的非常广泛的搜索,计算整个数据集的精确分面计数可能非常耗费资源。通过限制搜索结果,查询可以快速提供近似(下限)分面计数。这意味着,结果数量至少达到该数值,但可能还有更多匹配的结果,只是超出了 1 万条的上限。
  • result_page 子查询会生成向用户显示的第一页搜索结果。它仅从 search_results 中选择前 50 条记录,并按 LikesAlbumId 排序。这是用户最初看到的内容。
  • rating_counts 子查询会计算 Rating 的分面计数。它会按 Ratingsearch_results 中的所有记录进行分组,并统计每个评分类别中的结果数量。
  • genres_counts 子查询用于计算 Genres 的分面计数。由于它是数组,因此请使用 UNNEST(Genres) 进行联接,以便将数组中的每个流派视为单独的行进行统计。

检索后续页面

在初始商品详情查询之后查询后续页面时,您可以重复使用第一页返回的商品详情计数。

如需详细了解如何进行分页,请参阅使用基于键的分页