Generate and manage vector embeddings for large tables

You can generate and manage vector embeddings for an entire table column, which gives you a scalable solution for large-scale embedding generation. This solution is particularly useful for facilitating semantic search and Retrieval Augmented Generation (RAG) on text content, including the following:

Initial embedding generation after table creation
Generating embeddings after a large data import
Refreshing embeddings after significant data changes
Maintaining embeddings incrementally

Before you begin

Before you can generate and manage vector embeddings for large tables, do the following:

Finish the initial setup described in Generate text embeddings.
Verify that AUTOCOMMIT is set to ON in the PostgreSQL client that you use.
Confirm that the embedding model that you use has sufficient quota for auto embedding generation. Insufficient quota can result in a slow or failed auto embedding operation. For example, here are the limits for Vertex AI embedding models:
- Text embedding limits: each request can have up to 250 input texts generating one embedding per input text and 20,000 tokens per request. Only the first 2,048 tokens in each input text are used to compute the embeddings.
- Requests per minute:
  - base_model : text-embedding : 1500
  - base_model : gemini-embedding : 100000
- Gemini Embedding model token limit: unlike other embedding models which were primarily limited by RPM quotas, the Gemini Embedding model series is limited to 5,000,000 tokens per minute per project.

Initialize embeddings for a table

The functions for managing auto embeddings are available in the ai schema. This schema provides an interface for the latest AI features in AlloyDB.

Use the ai.initialize_embeddings() SQL function to generate embeddings for a table's content column. This is a blocking call:

If the function returns success, the embedding generation is complete.
The function automatically attempts to recover from transient issues like model quota errors. A failure returns to you only if these recovery attempts are unsuccessful. For persistent problems, such as a misconfigured batch_size that causes the request to exceed size limits, or if the operation was canceled manually, you must manually reissue the call.

This function supports Google-provided models—for example, Vertex AI's text-embedding-005—as well as custom models that you registered.

Before you generate embeddings, create a column content_embeddings in your table. This column typically has a vector(DIMENSION) type and a DEFAULT NULL value.

ALTER TABLE user_reviews ADD COLUMN IF NOT EXISTS content_embeddings vector(768) DEFAULT NULL;

Perform batch generation

By default, AlloyDB uses batching to generate embeddings for multiple text inputs in a single request, which improves efficiency. If you don't provide a specific batch size, AlloyDB applies an automatically determined default value.

Hint batch size

The batch_size parameter in ai.initialize_embeddings lets you guide AlloyDB's query optimizer by suggesting a preferred batch size for directly supported models. AlloyDB might dynamically reduce this size based on model limits or quotas, but your hint helps influence the query execution plan.

CALL ai.initialize_embeddings(
    model_id => 'text-embedding-005',
    table_name => 'user_reviews',
    content_column => 'content',
    embedding_column => 'content_embeddings',
    batch_size => 50
);

Use a custom embedding model with batch support

If you want to use a custom or externally supported model that supports batching, define the batch transform functions and specify them as model_batch_in_transform_fn and model_batch_out_transform_fn when you create a model. You can also specify a batch_size in the initialize_embeddings call. For models that support batching, we recommend that you use a batch_size greater than 1 for better performance.

Define your custom model's input, output, and batch transform functions.

-- Scalar input transform functions
CREATE OR REPLACE FUNCTION acme_text_input_transform(model_id TEXT, input TEXT) RETURNS JSON;
CREATE OR REPLACE FUNCTION acme_text_output_transform(model_id TEXT, model_output JSON) RETURNS real[];
CREATE OR REPLACE FUNCTION acme_generate_headers(model_id TEXT, input TEXT) RETURNS JSON;
-- Batch input transform functions
CREATE OR REPLACE FUNCTION acme_text_batch_input_transform(model_id TEXT, input TEXT[]) RETURNS JSON;
CREATE OR REPLACE FUNCTION acme_text_batch_output_transform(model_id TEXT, model_output JSON) RETURNS real[][];

To create your model, specify the batch transform functions.

CALL
  ai.create_model(
    model_id => 'custom-embedding-model',
    model_request_url => 'https://acme.com/models/text/embeddings/v1',
    model_type => 'text_embedding',
    model_in_transform_fn => 'acme_text_input_transform',
    model_out_transform_fn => 'acme_text_output_transform',
    generate_headers_fn => 'acme_generate_headers',
    model_batch_in_transform_fn => 'acme_text_batch_input_transform',
    model_batch_out_transform_fn => 'acme_text_batch_output_transform'
  );

Invoke the embedding generation for your custom model.

CALL
  ai.initialize_embeddings(
    model_id => 'custom-embedding-model',
    table_name => 'user_reviews',
    content_column => 'content',
    embedding_column => 'content_embeddings',
    batch_size => 10
);

You can also use the auto embedding feature with custom models that don't natively support batching. To do this, you must still define the batch transform functions model_batch_in_transform_fn and model_batch_out_transform_fn. For a non-batching model, define these functions to process a single input at a time from the input array. When you call ai.initialize_embeddings for this model, set the batch_size to 1.

Refresh embeddings incrementally

When you refresh an embedding, it is regenerated based on the latest value in the input content column.

To give you control over consistency and performance, AlloyDB supports various modes for incremental embedding refreshes. You can select a mode using the incremental_refresh_mode enum argument in ai.initialize_embeddings(). The following is a list of possible modes:

transactional: embeddings refresh as part of the transaction updating the content column. This process, which often uses a mechanism similar to a database trigger to automatically generate embeddings when the content column is updated, can introduce overhead and slow down update operations. The introduced overhead is a trade-off for maintaining transactional semantics and ensure that embeddings are in sync with the content. This mode relies on the scalar transform functions of your model, so you must define model_in_transform_fn and model_out_transform_fn when you create the model. To use the transactional mode, you must have the owner role on the table.
```
CALL
  ai.initialize_embeddings(
    model_id => 'text-embedding-005',
    table_name => 'user_reviews',
    content_column => 'content',
    embedding_column => 'content_embeddings',
    batch_size => 10,
    incremental_refresh_mode => 'transactional'
);
```
none: This is the default mode. In this mode, AlloyDB doesn't update embeddings automatically. There is no tracking for incremental changes, so calling the ai.refresh_embeddings() function regenerates embeddings for the entire table. This mode provides full control.

Refresh all embeddings for a table

After you successfully run ai.initialize_embeddings() for a table, you can use the ai.refresh_embeddings() function to regenerate the embeddings. You can use a refresh operation to update embeddings for rows that are modified concurrently during the initial initialize_embeddings call or to perform a periodic full refresh.

The refresh function reuses the settings from the initial call, so you only need to specify the table and embedding column. You can also provide an optional batch_size to override the default value.

CALL ai.refresh_embeddings(
    table_name => 'user_reviews',
    embedding_column => 'content_embeddings',
    batch_size => 50  -- Optional override
);

Modify table data during embedding generation

While ai.initialize_embeddings() is a blocking call for the session that it runs in, other connections can continue to work with the table. The embedding generation process updates rows in batches, which involves standard row-level locking. This means that concurrent DML operations such as UPDATE or DELETE are only briefly blocked if they attempt to modify the same rows that the embedding process is updating. Queries from other connections are not blocked. Note that initialize_embeddings skips rows that are concurrently modified. If incremental_refresh_mode is none, the embeddings for these modified rows aren't updated until a subsequent refresh is called.

-- connection1 (starts embedding generation)
SELECT
  ai.initialize_embeddings(
    model_id => 'text-embedding-005',
    table_name => 'user_reviews',
    content_column => 'content',
    embedding_column => 'content_embeddings'
  );

-- connection2 (performs DMLs/queries without blocking)
INSERT INTO user_reviews(id, review_time, is_edited, content)
VALUES (48290, now(), false, 'I really liked the product functionality, but wish it came in orange color');

UPDATE user_reviews
SET is_edited = TRUE, content = 'Changing to 5 star. My issue is resolved by the support'
WHERE id = 700;

If you cancel initialize_embeddings using pg_cancel, or if initialize_embeddings fails due to an internal error, it returns a failure status. Any partial embedding generation that was completed isn't rolled back. To recover from the failure and complete the embedding generation, you must first clean up the configuration using the ai.drop_embedding_config() function and then re-issue the ai.initialize_embeddings() call.

To refresh embeddings for concurrently modified rows, call ai.refresh_embeddings after the ai.initialize_embeddings call completes. This refresh call regenerates the embeddings for the entire table.

Delete embedding generation settings

If you need to remove the embedding generation settings for a specific table and embedding column combination, use the ai.drop_embedding_config() function. This function can be useful for cleanup or when you reconfigure embedding management for a column.

CALL
  ai.drop_embedding_config(
    table_name => 'user_reviews',
    embedding_column => 'content_embeddings');

Examples for generating embeddings in auto

This section provides examples for generating embeddings in auto using registered model endpoints.

OpenAI embedding model

To generate embeddings using the registered text-embedding-3-small model endpoint provided by OpenAI, run the following statement:

CALL ai.initialize_embeddings(
    model_id => 'text-embedding-3-small',
    table_name => 'user_reviews',
    chunk_column => 'content',
    embedding_column => 'content_embeddings'
);

Custom embedding models

For your own or externally supported models, you must define input and output transform functions and register them with ai.create_model. If you plan to use the auto embedding feature, you must specify both scalar transform functions—for example, acme_text_input_transform, acme_text_output_transform—and batch transform functions—for example, acme_text_batch_input_transform, acme_text_batch_output_transform.