You can generate and manage vector embeddings for an entire table column, which gives you a scalable solution for large-scale embedding generation. This solution is particularly useful for facilitating semantic search and Retrieval Augmented Generation (RAG) on text content, including the following:
- Initial embedding generation after table creation
- Generating embeddings after a large data import
- Refreshing embeddings after significant data changes
- Maintaining embeddings incrementally
Before you begin
Before you can generate and manage vector embeddings for large tables, do the following:
- Finish the initial setup described in Generate text embeddings.
- Verify that
AUTOCOMMITis set toONin the PostgreSQL client that you use. Confirm that the embedding model that you use has sufficient quota for auto embedding generation. Insufficient quota can result in a slow or failed auto embedding operation. For example, here are the limits for Vertex AI embedding models:
- Text embedding limits: each request can have up to 250 input texts generating one embedding per input text and 20,000 tokens per request. Only the first 2,048 tokens in each input text are used to compute the embeddings.
-
- base_model : text-embedding : 1500
- base_model : gemini-embedding : 100000
Gemini Embedding model token limit: unlike other embedding models which were primarily limited by RPM quotas, the Gemini Embedding model series is limited to 5,000,000 tokens per minute per project.
Initialize embeddings for a table
The functions for managing auto embeddings are available in the ai schema. This schema provides an interface for the latest AI features in AlloyDB.
Use the ai.initialize_embeddings() SQL function to generate embeddings for a table's content column. This is a blocking call:
- If the function returns success, the embedding generation is complete.
- The function automatically attempts to recover from transient issues like model quota errors. A failure returns to you only if these recovery attempts are unsuccessful. For persistent problems, such as a misconfigured
batch_sizethat causes the request to exceed size limits, or if the operation was canceled manually, you must manually reissue the call.
This function supports Google-provided models—for example, Vertex AI's text-embedding-005—as well as custom models that you registered.
Before you generate embeddings, create a column content_embeddings in your table. This column typically has a vector(DIMENSION) type and a DEFAULT NULL value.
ALTER TABLE user_reviews ADD COLUMN IF NOT EXISTS content_embeddings vector(768) DEFAULT NULL;
Perform batch generation
By default, AlloyDB uses batching to generate embeddings for multiple text inputs in a single request, which improves efficiency. If you don't provide a specific batch size, AlloyDB applies an automatically determined default value.
Hint batch size
The batch_size parameter in ai.initialize_embeddings lets you guide AlloyDB's query optimizer by suggesting a preferred batch size for directly supported models. AlloyDB might dynamically reduce this size based on model limits or quotas, but your hint helps influence the query execution plan.
CALL ai.initialize_embeddings(
model_id => 'text-embedding-005',
table_name => 'user_reviews',
content_column => 'content',
embedding_column => 'content_embeddings',
batch_size => 50
);
Use a custom embedding model with batch support
If you want to use a custom or externally supported model that supports batching, define the batch transform functions and specify them as model_batch_in_transform_fn and model_batch_out_transform_fn when you create a model. You can also specify a batch_size in the initialize_embeddings call. For models that support batching, we recommend that you use a batch_size greater than 1 for better performance.
Define your custom model's input, output, and batch transform functions.
-- Scalar input transform functions CREATE OR REPLACE FUNCTION acme_text_input_transform(model_id TEXT, input TEXT) RETURNS JSON; CREATE OR REPLACE FUNCTION acme_text_output_transform(model_id TEXT, model_output JSON) RETURNS real[]; CREATE OR REPLACE FUNCTION acme_generate_headers(model_id TEXT, input TEXT) RETURNS JSON; -- Batch input transform functions CREATE OR REPLACE FUNCTION acme_text_batch_input_transform(model_id TEXT, input TEXT[]) RETURNS JSON; CREATE OR REPLACE FUNCTION acme_text_batch_output_transform(model_id TEXT, model_output JSON) RETURNS real[][];To create your model, specify the batch transform functions.
CALL ai.create_model( model_id => 'custom-embedding-model', model_request_url => 'https://acme.com/models/text/embeddings/v1', model_type => 'text_embedding', model_in_transform_fn => 'acme_text_input_transform', model_out_transform_fn => 'acme_text_output_transform', generate_headers_fn => 'acme_generate_headers', model_batch_in_transform_fn => 'acme_text_batch_input_transform', model_batch_out_transform_fn => 'acme_text_batch_output_transform' );Invoke the embedding generation for your custom model.
CALL ai.initialize_embeddings( model_id => 'custom-embedding-model', table_name => 'user_reviews', content_column => 'content', embedding_column => 'content_embeddings', batch_size => 10 );
You can also use the auto embedding feature with custom models that don't natively support batching. To do this, you must still define the batch transform functions model_batch_in_transform_fn and model_batch_out_transform_fn. For a non-batching model, define these functions to process a single input at a time from the input array. When you call ai.initialize_embeddings for this model, set the batch_size to 1.
Refresh embeddings incrementally
When you refresh an embedding, it is regenerated based on the latest value in the input content column.
To give you control over consistency and performance, AlloyDB supports various modes for incremental embedding refreshes. You can select a mode using the incremental_refresh_mode enum argument in ai.initialize_embeddings(). The following is a list of possible modes:
transactional: embeddings refresh as part of the transaction updating the content column. This process, which often uses a mechanism similar to a database trigger to automatically generate embeddings when the content column is updated, can introduce overhead and slow down update operations. The introduced overhead is a trade-off for maintaining transactional semantics and ensure that embeddings are in sync with the content. This mode relies on the scalar transform functions of your model, so you must definemodel_in_transform_fnandmodel_out_transform_fnwhen you create the model. To use thetransactionalmode, you must have the owner role on the table.CALL ai.initialize_embeddings( model_id => 'text-embedding-005', table_name => 'user_reviews', content_column => 'content', embedding_column => 'content_embeddings', batch_size => 10, incremental_refresh_mode => 'transactional' );none: This is the default mode. In this mode, AlloyDB doesn't update embeddings automatically. There is no tracking for incremental changes, so calling theai.refresh_embeddings()function regenerates embeddings for the entire table. This mode provides full control.
Refresh all embeddings for a table
After you successfully run ai.initialize_embeddings() for a table, you can use the ai.refresh_embeddings() function to regenerate the embeddings. You can use a refresh operation to update embeddings for rows that are modified concurrently during the initial initialize_embeddings call or to perform a periodic full refresh.
The refresh function reuses the settings from the initial call, so you only need to specify the table and embedding column. You can also provide an optional batch_size to override the default value.
CALL ai.refresh_embeddings(
table_name => 'user_reviews',
embedding_column => 'content_embeddings',
batch_size => 50 -- Optional override
);
Modify table data during embedding generation
While ai.initialize_embeddings() is a blocking call for the session that it runs in, other connections can continue to work with the table. The embedding generation process updates rows in batches, which involves standard row-level locking. This means that concurrent DML operations such as UPDATE or DELETE are only briefly blocked if they attempt to modify the same rows that the embedding process is updating. Queries from other connections are not blocked. Note that
initialize_embeddings skips rows that are concurrently modified. If
incremental_refresh_mode is none, the embeddings for these modified rows aren't updated until a subsequent refresh is called.
-- connection1 (starts embedding generation)
SELECT
ai.initialize_embeddings(
model_id => 'text-embedding-005',
table_name => 'user_reviews',
content_column => 'content',
embedding_column => 'content_embeddings'
);
-- connection2 (performs DMLs/queries without blocking)
INSERT INTO user_reviews(id, review_time, is_edited, content)
VALUES (48290, now(), false, 'I really liked the product functionality, but wish it came in orange color');
UPDATE user_reviews
SET is_edited = TRUE, content = 'Changing to 5 star. My issue is resolved by the support'
WHERE id = 700;
If you cancel initialize_embeddings using pg_cancel, or if initialize_embeddings fails due to an internal error, it returns a failure status. Any partial embedding generation that was completed isn't rolled back. To recover from the failure and complete the embedding generation, you must first clean up the configuration using the ai.drop_embedding_config() function and then re-issue the ai.initialize_embeddings() call.
To refresh embeddings for concurrently modified rows, call ai.refresh_embeddings after the ai.initialize_embeddings call completes. This refresh call regenerates the embeddings for the entire table.
Delete embedding generation settings
If you need to remove the embedding generation settings for a specific table and embedding column combination, use the ai.drop_embedding_config() function. This function can be useful for cleanup or when you reconfigure embedding management for a column.
CALL
ai.drop_embedding_config(
table_name => 'user_reviews',
embedding_column => 'content_embeddings');
Examples for generating embeddings in auto
This section provides examples for generating embeddings in auto using registered model endpoints.
OpenAI embedding model
To generate embeddings using the registered text-embedding-3-small model endpoint provided by OpenAI, run the following statement:
CALL ai.initialize_embeddings(
model_id => 'text-embedding-3-small',
table_name => 'user_reviews',
chunk_column => 'content',
embedding_column => 'content_embeddings'
);
Custom embedding models
For your own or externally supported models, you must define input and output transform functions and register them with ai.create_model. If you plan to use the auto embedding feature, you must specify both scalar transform functions—for example, acme_text_input_transform, acme_text_output_transform—and batch transform functions—for example, acme_text_batch_input_transform, acme_text_batch_output_transform.
What's next
- Run vector similarity searches.
- Learn how to build a smart shopping assistant with AlloyDB, pgvector, and model endpoint management.
- Create indexes and query vectors.
- Learn an example embedding workflow.