Perform vector search in AlloyDB Omni

Select a documentation version:

This tutorial describes how to set up and perform a vector search in AlloyDB Omni. You run queries using exact nearest neighbor search (KNN) and approximate nearest neighbor (ANN)-based ScaNN index. You also learn how to run filtered vector search queries efficiently using the ScaNN index.

Objectives

  • Install AlloyDB AI in AlloyDB Omni.
  • Connect to your database and install required extensions.
  • Create a product and product inventory table.
  • Insert data to the product and product inventory tables and perform a basic vector search.
  • Create a ScaNN index on the products table.
  • Perform a vector search.
  • Perform a complex vector search with a filter and a join.

Costs

In this document, you use the following billable component of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.

Prerequisites

Complete the following prerequisites before performing a vector search.

Install AlloyDB AI in AlloyDB Omni based on your computing environment

Based on the computing environment that you're using, complete the instructions in Install AlloyDB AI in AlloyDB Omni to install AlloyDB Omni.

Connect to your database using psql

Connect to your database using psql by selecting the computing environment that you're using:

Docker

  docker exec -it my-omni-1 psql -U postgres

Podman

  podman exec -it my-omni-1 psql -U postgres

Install required extensions

Run the following query to install the vector, alloydb_scann, and the google_ml_integration extensions:

  CREATE EXTENSION IF NOT EXISTS vector;
  CREATE EXTENSION IF NOT EXISTS alloydb_scann;
  CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE;

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.

Insert product and product inventory data and perform a basic vector search

The product_inventory and product tables are used in this tutorial to run complex vector search queries.

  1. Run the following statement to create a product table that does the following:

    • Stores basic product information.
    • Includes an embedding vector column that computes and stores an embedding vector for a product description of each product.
      CREATE TABLE product (
        id INT PRIMARY KEY,
        name VARCHAR(255) NOT NULL,
        description TEXT,
        category VARCHAR(255),
        color VARCHAR(255),
        embedding vector(768) GENERATED ALWAYS AS (embedding('text-embedding-004', description)) STORED
      );
    
  2. Run the following query to create a product_inventory table that stores information about available inventory and corresponding prices.

    CREATE TABLE product_inventory (
      id INT PRIMARY KEY,
      product_id INT REFERENCES product(id),
      inventory INT,
      price DECIMAL(10,2)
    );
    
  3. Run the following query to insert product data into the product table:

    INSERT INTO product (id, name, description,category, color) VALUES
    (1, 'Stuffed Elephant', 'Soft plush elephant with floppy ears.', 'Plush Toys', 'Gray'),
    (2, 'Remote Control Airplane', 'Easy-to-fly remote control airplane.', 'Vehicles', 'Red'),
    (3, 'Wooden Train Set', 'Classic wooden train set with tracks and trains.', 'Vehicles', 'Multicolor'),
    (4, 'Kids Tool Set', 'Toy tool set with realistic tools.', 'Pretend Play', 'Multicolor'),
    (5, 'Play Food Set', 'Set of realistic play food items.', 'Pretend Play', 'Multicolor'),
    (6, 'Magnetic Tiles', 'Set of colorful magnetic tiles for building.', 'Construction Toys', 'Multicolor'),
    (7, 'Kids Microscope', 'Microscope for kids with different magnification levels.', 'Educational Toys', 'White'),
    (8, 'Telescope for Kids', 'Telescope designed for kids to explore the night sky.', 'Educational Toys', 'Blue'),
    (9, 'Coding Robot', 'Robot that teaches kids basic coding concepts.', 'Educational Toys', 'White'),
    (10, 'Kids Camera', 'Durable camera for kids to take pictures and videos.', 'Electronics', 'Pink'),
    (11, 'Walkie Talkies', 'Set of walkie talkies for kids to communicate.', 'Electronics', 'Blue'),
    (12, 'Karaoke Machine', 'Karaoke machine with built-in microphone and speaker.', 'Electronics', 'Black'),
    (13, 'Kids Drum Set', 'Drum set designed for kids with adjustable height.', 'Musical Instruments', 'Blue'),
    (14, 'Kids Guitar', 'Acoustic guitar for kids with nylon strings.', 'Musical Instruments', 'Brown'),
    (15, 'Kids Keyboard', 'Electronic keyboard with different instrument sounds.', 'Musical Instruments', 'Black'),
    (16, 'Art Easel', 'Double-sided art easel with chalkboard and whiteboard.', 'Arts & Crafts', 'White'),
    (17, 'Finger Paints', 'Set of non-toxic finger paints for kids.', 'Arts & Crafts', 'Multicolor'),
    (18, 'Modeling Clay', 'Set of colorful modeling clay.', 'Arts & Crafts', 'Multicolor'),
    (19, 'Watercolor Paint Set', 'Watercolor paint set with brushes and palette.', 'Arts & Crafts', 'Multicolor'),
    (20, 'Beading Kit', 'Kit for making bracelets and necklaces with beads.', 'Arts & Crafts', 'Multicolor'),
    (21, '3D Puzzle', '3D puzzle of a famous landmark.', 'Puzzles', 'Multicolor'),
    (22, 'Race Car Track Set', 'Race car track set with cars and accessories.', 'Vehicles', 'Multicolor'),
    (23, 'RC Monster Truck', 'Remote control monster truck with oversized tires.', 'Vehicles', 'Green'),
    (24, 'Train Track Expansion Set', 'Expansion set for wooden train tracks.', 'Vehicles', 'Multicolor');
    
  4. Optional: To verify that the data is inserted in the product table, run the following query:

    SELECT * FROM product;
    
  5. Run the following query to insert inventory data into the product_inventory table:

    INSERT INTO product_inventory (id, product_id, inventory, price) VALUES
    (1, 1, 9, 13.09),
    (2, 2, 40, 79.82),
    (3, 3, 34, 52.49),
    (4, 4, 9, 12.03),
    (5, 5, 36, 71.29),
    (6, 6, 10, 51.49),
    (7, 7, 7, 37.35),
    (8, 8, 6, 10.87),
    (9, 9, 7, 42.47),
    (10, 10, 3, 24.35),
    (11, 11, 4, 10.20),
    (12, 12, 47, 74.57),
    (13, 13, 5, 28.54),
    (14, 14, 11, 25.58),
    (15, 15, 21, 69.84),
    (16, 16, 6, 47.73),
    (17, 17, 26, 81.00),
    (18, 18, 11, 91.60),
    (19, 19, 8, 78.53),
    (20, 20, 43, 84.33),
    (21, 21, 46, 90.01),
    (22, 22, 6, 49.82),
    (23, 23, 37, 50.20),
    (24, 24, 27, 99.27);
    
  6. Run the following vector search query to search for products that are similar to the word music. Even though the word music isn't explicitly mentioned in the product description, the result shows products that are relevant to the query:

    SELECT * FROM product
    ORDER BY embedding <=> embedding('text-embedding-005', 'music')::vector
    LIMIT 3;
    

    If you perform a basic vector search without creating an index, AlloyDB AI uses KNN, which provides efficient recall; however, at scale, using KNN might impact performance. For better query performance, we recommend that you use the ScaNN index for ANN search, which provides high recall with low latencies.

    If you don't create an index, then by default AlloyDB Omni uses KNN.

Create a ScaNN index on the products table

Run the following query to create a product_index ScaNN index on the product table:

  CREATE INDEX product_index ON product
  USING scann (embedding cosine)
  WITH (num_leaves=5);

The num_leaves parameter indicates the number of leaf nodes that the tree-based index builds the index with. For more information about how to tune this parameter, see Tune vector query performance.

Run the following vector search query that tries to find products that are similar to the natural language query music. Even though the word music isn't included in the product description, the result shows products that are relevant to the query:

SET LOCAL scann.num_leaves_to_search = 2;

SELECT * FROM product
ORDER BY embedding <=> embedding('text-embedding-005', 'music')::vector
  LIMIT 3;

The scann.num_leaves_to_search query parameter controls the number of leaf nodes that are searched during a similarity search. The num_leaves and scann.num_leaves_to_search parameter values help to balance query performance and recall.

Run the following complex vector search query, which returns relevant results that satisfy the query conditions, even with filters:

SET LOCAL scann.num_leaves_to_search = 2;

SELECT * FROM product p
JOIN product_inventory pi ON p.id = pi.product_id
WHERE pi.price < 80.00
ORDER BY embedding <=> embedding('text-embedding-005', 'music')::vector
LIMIT 3;

Clean up

Docker

  docker container stop my-omni-1
  docker container rm my-omni-1

Podman

  podman container stop my-omni-1
  podman container rm my-omni-1

What's next