Accelerate pattern-matching expressions

Spanner search indexes can accelerate pattern matching expressions such as LIKE, STARTS_WITH, ENDS_WITH, and regular expression matching predicate REGEXP_CONTAINS. This page describes how to create and configure a search index using TOKENIZE_NGRAMS to accelerate pattern matching predicates.

Configure an n-gram `TOKENLIST` for pattern-matching acceleration

To enable pattern-matching expressions acceleration, tokenize a lower-cased STRING column with TOKENIZE_NGRAMS and store the STRING column using the STORING clause in GoogleSQL, or INCLUDE clause in PostgreSQL.

GoogleSQL

CREATE TABLE Albums (
AlbumId INT64 NOT NULL,
AlbumTitle STRING(MAX),
AlbumTitle_Ngram_Tokens TOKENLIST AS (
  TOKENIZE_NGRAMS(LOWER(AlbumTitle), ngram_size_min=>3, ngram_size_max=>4)) HIDDEN,
) PRIMARY KEY(AlbumId);

CREATE SEARCH INDEX AlbumsIndex
ON Albums(AlbumTitle_Ngram_Tokens) STORING (AlbumTitle);

PostgreSQL

CREATE TABLE albums (
albumid bigint NOT NULL,
album_title varchar,
album_title_ngrams_tokens spanner.tokenlist GENERATED ALWAYS AS (
  spanner.tokenize_ngrams(
    lower(album_title),
    ngram_size_min => 3,
    ngram_size_max => 4
  )
) VIRTUAL HIDDEN,
PRIMARY KEY(albumid));

CREATE SEARCH INDEX albumsidx ON
albums(album_title_ngrams_tokens) INCLUDE (album_title);

Automatic acceleration of queries with pattern-matching predicates

The query optimizer might choose to accelerate the following queries using AlbumsIndex with AlbumTitle_Ngram_Tokens. Optionally, the query can provide @{force_index = AlbumsIndex} to force the optimizer to use AlbumsIndex.

GoogleSQL

In GoogleSQL, we accelerate LIKE, STARTS_WITH, ENDS_WITH, and REGEXP_CONTAINS.

LIKE predicate:

SELECT AlbumId
FROM Albums @{FORCE_INDEX=AlbumsIndex}
WHERE AlbumTitle LIKE "%999%";

STARTS_WITH predicate:

SELECT AlbumId
FROM Albums @{FORCE_INDEX=AlbumsIndex}
WHERE STARTS_WITH(AlbumTitle, "apple")

ENDS_WITH predicate:

SELECT AlbumId
FROM Albums @{FORCE_INDEX=AlbumsIndex}
WHERE ENDS_WITH(AlbumTitle, "apple")

REGEXP_CONTAINS predicate:

SELECT AlbumId
FROM Albums @{FORCE_INDEX=AlbumsIndex}
WHERE REGEXP_CONTAINS(AlbumTitle, r"(good|great)[ ]+morning")

PostgreSQL

In PostgreSQL, we accelerate LIKE and STARTS_WITH.

LIKE predicate:

SELECT albumid
FROM albums /*@ FORCE_INDEX = albumsidx */
WHERE album_title like '%999%';

STARTS_WITH predicate:

SELECT albumid
FROM albums /*@ FORCE_INDEX = albumsidx */
WHERE starts_with(album_title, 'apple')

Prerequisites on acceleration

For Spanner to enable this acceleration, the following rules must be met:

The index must store the STRING column using the STORING clause in GoogleSQL, or INCLUDE clause in PostgreSQL. This prevents costly back-joins to the base table during post-filtering, which is critical for performance when the search over-retrieves documents.
The STRING column must be tokenized using TOKENIZE_NGRAMS.
The tokenization must apply to LOWER(column_name) rather than column_name.
The LIKE pattern, STARTS_WITH prefix, ENDS_WITH suffix, or REGEXP_CONTAINS regular expression must be specified as a constant literal. Query parameters are not supported to avoid acceleration on patterns that are too short.
The LIKE pattern, STARTS_WITH prefix, ENDS_WITH suffix, or REGEXP_CONTAINS regular expression must contain enough text for at least one n-gram. For example r".*" doesn't qualify because there's no sequence of characters to match. Similarly, if the ngram minimum size is set to 3, the LIKE predicate "%ab%" doesn't qualify because "ab" (size 2) is too short.

What's next

Learn about finding approximate matches with fuzzy search.

Accelerate pattern-matching expressions Stay organized with collections Save and categorize content based on your preferences.

Configure an n-gram TOKENLIST for pattern-matching acceleration

GoogleSQL

PostgreSQL

Automatic acceleration of queries with pattern-matching predicates

GoogleSQL

PostgreSQL

Prerequisites on acceleration

What's next

Accelerate pattern-matching expressions

Configure an n-gram `TOKENLIST` for pattern-matching acceleration