使用分面执行搜索

本页面介绍了如何使用分面进行全文搜索。作为交互式过滤的一部分,分面是可能的过滤条件值,以及该过滤条件对应的匹配项数量。在具有此功能的网站上,当您输入搜索词组时,导航菜单中会显示结果列表,并且还会显示可用于缩小搜索结果范围的分面列表,以及符合每个类别的结果数量。

例如,如果您使用查询“foo”搜索专辑,可能会获得数百个结果。如果存在流派分面,您可以按流派进行选择,例如“摇滚 (250)”“R&B (50)”或“流行 (150)”。

在 Spanner 全文搜索中,您可以使用标准 SQL 表达式和全文搜索函数进行过滤和过滤计数。 您无需使用特殊语法即可使用分面。

添加用于全文搜索的分面

以下示例会创建一个专辑表,并为每张专辑的标题进行标记化。该表用于本页面上的示例。

GoogleSQL

CREATE TABLE Albums (
  AlbumId INT64 NOT NULL,
  Title STRING(MAX),
  Rating INT64,
  Genres ARRAY<STRING(MAX)>,
  Likes INT64 NOT NULL,
  Title_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(Title)) HIDDEN,
) PRIMARY KEY(AlbumId);

PostgreSQL

CREATE TABLE albums (
  albumid bigint NOT NULL,
  title text,
  rating bigint,
  genres text[],
  likes bigint NOT NULL,
  title_tokens spanner.TOKENLIST GENERATED ALWAYS AS (spanner.TOKENIZE_FULLTEXT(Title)) VIRTUAL HIDDEN,
PRIMARY KEY(albumid));

Title_Tokens 上创建搜索索引。(可选)您可以将 TitleGenresRating 存储在搜索索引中,以避免在计算分面时回联接到基表。

GoogleSQL

CREATE SEARCH INDEX AlbumsIndex
ON Albums (Title_Tokens)
STORING (Title, Genres, Rating)
ORDER BY Likes DESC;

PostgreSQL

CREATE SEARCH INDEX albumsindex
ON albums (title_tokens)
INCLUDE (title, genres, rating)
ORDER BY likes DESC

在此示例中,请将以下数据插入到表中。

GoogleSQL

INSERT INTO Albums (AlbumId, Title, Rating, Genres, Likes) VALUES
(1, "The Foo Strike Again", 5, ["Rock", "Alternative"], 600),
(2, "Who are the Who?", 5, ["Progressive", "Indie"], 200),
(3, "No Foo For You", 4, ["Metal", "Alternative"], 50)

PostgreSQL

INSERT INTO albums (albumid, title, rating, genres, likes) VALUES
(1, 'The Foo Strike Again', 5,'{"Rock", "Alternative"}', 600),
(2, 'Who are the Who?', 5,'{"Progressive", "Indie"}', 200),
(3, 'No Foo For You', 4,'{"Metal", "Alternative"}', 50)

检索单个分面的计数值

此示例展示了如何对“评分”分面执行分面计数。它在 Albums 表的 Title_Tokens 列中执行“foo”的文本搜索。

GoogleSQL

SELECT Rating, COUNT(*) AS result_count
FROM Albums
WHERE SEARCH(Title_Tokens, "foo")
GROUP BY Rating
ORDER BY Rating DESC

| Rating | result_count |
|--------|--------------|
| 5      | 1            |
| 4      | 1            |

PostgreSQL

SELECT rating, COUNT(*) AS result_count
FROM albums
WHERE spanner.SEARCH(title_tokens, 'foo')
GROUP BY rating
ORDER BY rating DESC;

| rating | result_count |
|--------|--------------|
| 5      | 1            |
| 4      | 1            |

检索多个分面的计数值

此示例展示了对多个分面执行分面计数的步骤。 它会执行以下操作:

  1. 检索初始搜索结果:在 Albums 表的 Title_Tokens 列中执行“foo”的文本搜索。
  2. 计算分面数量:然后计算 RatingGenres 分面的数量。

GoogleSQL

WITH search_results AS (
  SELECT AlbumId, Title, Genres, Rating, Likes
  FROM Albums
  WHERE SEARCH(Title_Tokens, "foo")
  ORDER BY Likes DESC, AlbumId
  LIMIT 10000
)

SELECT
-- Result set #1: First page of search results
ARRAY(
  SELECT AS STRUCT *
  FROM search_results
  ORDER BY Likes DESC, AlbumId
  LIMIT 50
) as result_page,
-- Result set #2: Number of results by rating
ARRAY(
  SELECT AS STRUCT Rating, COUNT(*) as result_count
  FROM search_results
  GROUP BY Rating
  ORDER BY result_count DESC, Rating DESC
) as rating_counts,
-- Result set #3: Number of results for top 5 genres
ARRAY(
  SELECT AS STRUCT genre, COUNT(*) as result_count
  FROM search_results
  JOIN UNNEST(Genres) genre
  GROUP BY genre
  ORDER BY result_count DESC, genre
  LIMIT 5
) as genres_counts

PostgreSQL

WITH search_results AS (
  SELECT albumid, title, genres, rating, likes
  FROM albums
  WHERE spanner.SEARCH(title_tokens, 'foo')
  ORDER BY likes DESC, albumid
  LIMIT 10000
)

-- The pattern ARRAY(SELECT TO_JSONB ...) enables returning multiple nested
-- result sets in the same query.
SELECT
  -- Result set #1: First page of search results
  ARRAY(
  SELECT JSONB_BUILD_OBJECT(
    'albumid', albumid,
    'title', title,
    'genres', genres,
    'rating', rating,
    'likes', likes
    )
  FROM search_results
  ORDER BY likes DESC, albumid
  LIMIT 50
) as result_page,
-- Result set #2: Number of results by rating
ARRAY(
  SELECT JSONB_BUILD_OBJECT(
    'rating', rating,
    'result_count', COUNT(*)
    )
  FROM search_results
  GROUP BY rating
  ORDER BY COUNT(*) DESC, rating DESC
) as rating_counts,
-- Result set #3: Number of results for top 5 genres
ARRAY(
  SELECT JSONB_BUILD_OBJECT(
    'genre', genre,
    'result_count', COUNT(*)
    )
  FROM
    search_results,
    UNNEST(genres) AS genre
  GROUP BY genre
  ORDER BY COUNT(*) DESC, genre
  LIMIT 5
) as genres_counts

具体而言,此示例会执行以下操作:

  • WITH search_results AS (...) 会收集大量初始搜索结果,用于第一页结果和分面计算。
  • SEARCH(Title_Tokens, "foo") 是主要搜索查询。
  • LIMIT 10000 通过将结果集减少到 10,000 来限制搜索费用。对于可能会返回数百万个结果的非常广泛的搜索,计算整个数据集的精确分面数量可能非常耗费资源。通过限制搜索结果,查询可以快速提供近似(下限)分面计数。这意味着,结果数量至少达到该数值,但可能还有更多匹配的结果,超出了 10,000 的限制。
  • result_page 子查询会生成向用户显示的第一页搜索结果。它仅从 search_results 中选择前 50 个记录,并按 LikesAlbumId 排序。这是用户最初看到的内容。
  • rating_counts 子查询用于计算 Rating 的分面数量。该子查询按 search_results 中的所有记录的 Rating 对其进行分组,并统计每个评分类别中的结果数量。
  • genres_counts 子查询用于计算 Genres 的分面数量。由于该子查询是一个数组,因此请使用 UNNEST(Genres) 进行联接,以便将该数组中的每个流派视为单独的行进行计数。

检索后续页面

在初始分面查询之后查询连续页面时,您可以重复使用从第一页返回的分面计数。

如需详细了解如何进行分页,请参阅使用基于键的分页