Spanner Omni vector search overview

Vector search in Spanner Omni is a high-performance, built-in feature that enables semantic search and similarity matching on high-dimensional vector data. By storing and indexing vector embeddings directly within your transactional database, Spanner Omni eliminates separate vector databases and complex extract, transform, load (ETL) pipelines.

The topics in this document apply to Spanner Omni in the same way they apply to Spanner.

Vector search overview

Vector search lets you find semantically similar items by representing data as numerical vectors (embeddings). Spanner Omni supports two primary search methods:

K-nearest neighbors (KNN): Performs an exact search by calculating the distance between the query and every vector in the dataset. It provides the highest recall but can be computationally expensive for large datasets.
Approximate nearest neighbors (ANN): Uses a vector index to find matches fast across large datasets. It trades a small amount of accuracy (recall) for gains in speed and scalability.

Vector search is especially powerful when combined with other features:

Combination	Benefit
Vector search with SQL filtering	Efficiently combine vector search with filters (for example, "Find similar images where category = 'shoes' and price < 100").
Vector search + full-text search	Combine semantic similarity with keyword precision using reciprocal rank fusion (RRF) for improved search relevance.
Vector + graph	Use vector search to find relevant entry points (nodes) in a property graph and then traverse complex relationships.

For more information, see the Spanner vector search overview in the Spanner documentation.

Perform K-nearest neighbors search

Spanner Omni supports K-nearest neighbors (KNN) search using built-in distance functions. You can provide a vector embedding as an input parameter to find the nearest vectors in N-dimensional space.

The following distance functions are available:

COSINE_DISTANCE(): Measures the cosine of the angle between two vectors
EUCLIDEAN_DISTANCE(): Measures the shortest straight-line distance between two vectors
DOT_PRODUCT(): Calculates the cosine of the angle multiplied by the product of vector magnitudes (ideal for normalized data)

For more information, see Perform vector similarity search by finding the K-nearest neighbors in the Spanner documentation.

Choose the best vector distance function

Selecting the appropriate distance function depends on your data and the model used to generate embeddings.

Function	Description	Relationship to increasing similarity
Dot product	Calculates the cosine of angle multiplied by the product of corresponding vector magnitudes.	Increases
Cosine distance	Measures the cosine of the angle between two vectors (1 - cosine similarity).	Decreases
Euclidean distance	Measures the straight line distance between two vectors.	Decreases

If your embeddings are normalized (magnitude = 1.0), DOT_PRODUCT() is typically an efficient choice. For non-normalized data, experiment with COSINE_DISTANCE() or EUCLIDEAN_DISTANCE() to determine which produces better results for your use case.

For more information, see Choose among vector distance functions in the Spanner documentation.

Approximate nearest neighbors (ANN)

ANN search is designed for very large datasets where exact KNN search becomes too slow or expensive. It uses a vector index to provide fast results with a small tradeoff in recall.

Approximate nearest neighbor (ANN) search in Spanner Omni supports datasets of up to 1 million vectors for vectors up to 128 dimensions in length. If your vectors have more dimensions, then the supported number of vectors decreases proportionately.

Perform ANN search with vector indexes

To perform an ANN search, you use approximate distance functions such as APPROX_COSINE_DISTANCE(), APPROX_EUCLIDEAN_DISTANCE(), or APPROX_DOT_PRODUCT(). These functions require:

An existing vector index on the embedding column.
An ORDER BY clause using the approximate distance function.
A LIMIT clause to specify the number of results.

For more information, see Find approximate nearest neighbors (ANN) and query vector embeddings in the Spanner documentation.

Create and manage vector indexes

When creating a vector index, you must specify the vector_length of your embedding column and can use the STORING clause to include additional columns for faster filtering.

The following is an example of how to create a vector index:

CREATE VECTOR INDEX INDEX_NAME
  ON TABLE_NAME(EMBEDDING_COLUMN)
  OPTIONS (distance_type = 'DISTANCE_TYPE', tree_depth = 2, num_leaves = 1000);

For more information, see Create and manage vector indexes in the Spanner documentation.

Vector indexing best practices

To maintain high search performance and recall:

Tune index options: Adjust num_leaves and num_leaves_to_search based on your data size and performance requirements.
Rebuild periodically: Rebuild your index if the distribution of your vectors changes significantly over time.
Use filtering effectively: Store frequently filtered columns in the index to improve search efficiency.

For more information, see Vector indexing best practices in the Spanner documentation.