Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

建立及管理 RUM 索引

本文說明如何建立 RUM 擴充功能和索引，以最佳化 AlloyDB for PostgreSQL 的全文搜尋功能。並提供常見用途的範例，包括排名、詞組搜尋和依時間戳記排序。

事前準備

如要建立 RUM 擴充功能，您必須擁有 alloydbsuperuser 資料庫角色。

AlloyDB 管理員 (roles/alloydb.admin) IAM 角色可授予 AlloyDB 資源的完整控制權，但不會授予 alloydbsuperuser 資料庫角色。如要建立擴充功能，系統管理員必須明確授予您 alloydbsuperuser 資料庫角色。

如要進一步瞭解如何授予角色，請參閱「將 IAM 使用者或服務帳戶新增至叢集」。

建立 RUM 擴充功能

每個資料庫都必須建立一次 RUM 擴充功能。

使用 psql 或其他用戶端連線至 AlloyDB 資料庫。詳情請參閱連線至叢集執行個體。
執行下列 SQL 指令來建立擴充功能：
```
CREATE EXTENSION IF NOT EXISTS rum;
```

建立 RUM 索引

如要將全文搜尋查詢最佳化，請在資料上建立 RUM 索引。RUM 提供多個運算子類別，適用於不同用途。

RUM 運算子類別類型

下表摘要列出不同的 RUM 運算子類別及其主要用途。

運算子類別	主要用途	限制
`rum_tsvector_ops`	標準全文搜尋，可進行排名和詞組搜尋。	不適用
`rum_tsvector_hash_ops`	全文搜尋的索引較小，更新速度也更快。	不支援前置字元搜尋。
`rum_tsvector_addon_ops`	全文搜尋結果依其他資料欄排序。	不適用
`rum_anyarray_ops`	在陣列資料欄中搜尋。	不適用
`rum_<TYPE>_ops`	為距離查詢建立純量型別的索引。	不適用
`rum_tsvector_hash_addon_ops`	以雜湊為基礎的全文搜尋，並依其他資料欄排序。	不支援前置字元比對。
`rum_tsquery_ops`	為反向搜尋建立儲存的 `tsquery` 值索引。	不適用
`rum_anyarray_addon_ops`	陣列搜尋結果依其他資料欄排序。	不適用

基本全文搜尋的索引

如要進行標準文字搜尋，且需要快速排名和詞組搜尋功能，請使用 rum_tsvector_ops 運算子類別。這個運算子類別會儲存索引中每個詞素的位置。下列範例會建立名為 documents 的資料表，其中包含 content 資料欄。

建立名為 documents 的資料表：

CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  published_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

在 documents 資料表中填入範例資料：

INSERT INTO documents (title, content) VALUES
  ('Title', 'This search engine is working as intended');

在表格中新增系統產生的 tsvector 欄。這個資料欄會自動儲存處理過的文字，並提升查詢效能：

ALTER TABLE documents
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;

在新 search_vector 欄上建立 RUM 索引：

CREATE INDEX idx_docs_rum
ON documents
USING rum (search_vector rum_tsvector_ops);

使用索引查詢資料表。<=> 運算子會直接從索引計算文件和查詢之間的相關分數或距離，以便快速排序：

SELECT title, content
FROM documents
WHERE search_vector @@ to_tsquery('english', 'search <-> engine')
ORDER BY search_vector <=> to_tsquery('english', 'search <-> engine');

在 documents 資料表中填入更多資料：

INSERT INTO documents (title, content) VALUES ('Title1', 'English is my primary language.');
INSERT INTO documents (title, content) VALUES ('Title2', 'Google has a great engineering culture');

執行前置字串搜尋查詢。這會尋找含有以 eng 開頭字詞的文件，例如 engineer 或 english：
```
SELECT title, content
FROM documents
WHERE search_vector @@ to_tsquery('english', 'eng:*');
```

最佳化雜湊搜尋的索引

使用 rum_tsvector_hash_ops 運算子類別縮減索引大小，並提升更新速度。這個類別會儲存每個詞素的雜湊，而非完整詞素。這種方法會產生較小的索引，但不支援前置字串搜尋。以下範例假設您有名為 documents 的資料表，其中包含 search_vector 資料欄。

使用雜湊運算子類別建立 RUM 索引：

CREATE INDEX idx_docs_rum_hash
ON documents
USING rum (search_vector rum_tsvector_hash_ops);

在 documents 資料表中填入更多資料：

INSERT INTO documents (title, content) VALUES ('Title3', 'That person was driving incredibly fast, however the routing was not very efficient');

執行標準比對查詢：

SELECT * FROM documents WHERE search_vector @@ to_tsquery('english', 'fast & efficient');

依時間戳記排序的搜尋索引

使用 rum_tsvector_addon_ops 運算子類別，根據文字篩選及依其他欄位 (例如時間戳記) 排序，藉此最佳化查詢。這個模式會將額外欄位的值直接儲存在索引中，避免搜尋後出現緩慢的排序作業。以下範例假設您有名為 documents 的資料表，其中包含 search_vector 資料欄和 published_at 資料欄。

建立包含 published_at 時間戳記的索引：

CREATE INDEX idx_docs_rum_timestamp
ON documents
USING rum (search_vector rum_tsvector_addon_ops, published_at)
WITH (attach = 'published_at', to = 'search_vector');

執行查詢，找出含有「engine」一詞的文件，並依發布日期排序。索引可有效處理搜尋和排序作業：

SELECT title, published_at
FROM documents
WHERE search_vector @@ to_tsquery('english', 'engine')
ORDER BY published_at DESC;

陣列搜尋的索引

使用 rum_anyarray_ops 運算子類別為陣列資料欄建立索引，例如標記清單。這樣一來，您就能有效查詢重疊 (&&)、包含 (@>) 或包含於 (<@) 其他陣列的陣列。以下範例會在 documents 資料表中新增 tags 資料欄。

新增 tags 欄並填入資料：

ALTER TABLE documents 
ADD COLUMN tags TEXT[];

INSERT INTO documents (title, content, tags) VALUES ( 'Title4', 'Sample Text', ARRAY['ai', 'ml'] );

在名為 tags 的 TEXT[] 資料欄上建立 RUM 索引：

CREATE INDEX idx_tags_rum
ON documents
USING rum (tags rum_anyarray_ops);

執行查詢，找出標記中含有 ai 或 ml 的文件：

SELECT * FROM documents WHERE tags && '{"ai", "ml"}';

純量型別的索引

使用 rum_<TYPE>_ops 運算子類別，為包含連續值的資料欄建立索引，例如整數、時間戳記或浮點數。這些運算子類別可讓您使用 <=> 運算子，有效計算值之間的距離。以下範例假設您有名為 documents 的資料表。

在 documents 資料表中新增一般整數資料欄，例如 rating：

ALTER TABLE documents
ADD COLUMN rating INT;

UPDATE documents 
SET rating = floor(random() * 5 + 1);

在 rating 資料欄上建立 RUM 索引：

CREATE INDEX idx_rating_rum
ON documents
USING rum (rating rum_int4_ops);

執行查詢，找出 rating 最接近值 5 的文件：

SELECT title, rating
FROM documents
ORDER BY rating <=> 5;

依時間戳記排序的最佳化雜湊搜尋索引

使用 rum_tsvector_hash_addon_ops 運算子類別，結合雜湊索引的優點和附加索引的排序功能。這個類別會儲存每個詞素的雜湊，以及額外資料欄的值。這項設定支援依額外資料欄有效排序，但不支援前置字元比對。以下範例假設您有名為 documents 的資料表，其中包含 search_vector 資料欄和 published_at 時間戳記資料欄。

建立使用雜湊運算子類別的 RUM 索引，並加入 published_at 時間戳記：

CREATE INDEX idx_docs_rum_hash_timestamp
ON documents
USING rum (search_vector rum_tsvector_hash_addon_ops, published_at)
WITH (attach = 'published_at', to = 'search_vector');

執行查詢，找出含有 engine 的文件，並依發布日期排序：

SELECT title, published_at
FROM documents
WHERE search_vector @@ to_tsquery('english', 'engine')
ORDER BY published_at DESC;

儲存查詢的索引

使用 rum_tsquery_ops 運算子類別為 tsquery 值建立索引。這項功能可讓您執行「反向搜尋」，找出與指定輸入文件相符的已儲存查詢。以下範例會建立名為 queries 的資料表。

建立資料表來儲存查詢：

CREATE TABLE queries (
query_text tsquery
);
INSERT INTO queries (query_text) VALUES (plainto_tsquery('AlloyDB is fast!'));

在 query_text 資料欄上建立 RUM 索引：

CREATE INDEX idx_queries_rum
ON queries
USING rum (query_text rum_tsquery_ops);

執行查詢，找出與文件相符的已儲存查詢：

SELECT *
FROM queries
WHERE to_tsvector('english', 'AlloyDB is fast') @@ query_text;

依時間戳記排序的陣列搜尋索引

使用 rum_anyarray_addon_ops 運算子類別，為陣列資料欄建立索引，並新增用於排序的資料欄。以下範例假設您有名為 documents 的資料表，其中包含 tags 資料欄和 published_at 時間戳記資料欄。

在包含 published_at 時間戳記的 tags 資料欄上建立 RUM 索引：

CREATE INDEX idx_tags_rum_timestamp
ON documents
USING rum (tags rum_anyarray_addon_ops, published_at)
WITH (attach = 'published_at', to = 'tags');

執行查詢，找出含有 ai 標記的文件，並依發布日期排序：

SELECT title, published_at
FROM documents
WHERE tags @> '{"ai"}'
ORDER BY published_at DESC;

後續步驟

瞭解全文搜尋。
瞭解如何執行混合型向量相似度搜尋。