This document shows you how to tune your indexes to achieve faster query performance and better recall.
Tune a ScaNN index
ScaNN index uses tree-quantization based indexing. In Tree-quantization techniques, indexes learn a search tree together with a quantization (or hashing) function. When you run a query, the search tree is used to prune the search space while quantization is used to compress the index size. This pruning speeds up the scoring of the similarity (i.e., distance) between the query vector and the database vectors.
To achieve both a high query-per-second rate (QPS)
and a high recall with your nearest-neighbor queries, you must partition
the tree of your ScaNN index in a way that is most appropriate to your data
and your queries.
Before you build a ScaNN index, complete the following:
- Make sure that a table with your data is already created.
- Make sure that the value you set for the
maintenance_work_memand theshared_buffersflag is less than total machine memory to avoid issues while generating the index.
Tuning parameters
The following index parameters and database flags are used together to find the right balance of recall and QPS. All the parameters apply to both ScaNN index types.
| Tuning parameter | Description | Parameter type |
|---|---|---|
num_leaves |
The number of partitions to apply to this index. The number of partitions you apply to when creating an index affects the index performance. By increasing partitions for a set number of vectors, you create a more fine-grained index, which improves recall and query performance. However, this comes at the cost of longer index creation times. Since three-level trees build faster than two-level trees, you can increase the num_leaves_value when creating a three-level tree index to achieve better performance.
|
Index creation |
quantizer |
The type of quantizer you want to use for the K-means tree. The default value is SQ8 for better query performance.Set it to FLAT for better recall. |
Index creation |
enable_pca |
Enables Principal Component Analysis (PCA), which is a dimension reduction technique used to automatically
reduce the size of the embedding when possible. This option is enabled by default. Set to false if you observe deterioration in recall. |
Index creation |
scann.num_leaves_to_search |
The database flag controls the trade off between recall and QPS. The default value is 1% of the value set in num_leaves. Higher the value set, better is the recall, but results in lower QPS, and the other way around. |
Query runtime |
scann.max_top_neighbors_buffer_size |
The database flag specifies the size of cache used to improve the performance for filtered queries by scoring or ranking the scanned candidate neighbors in memory instead of the disk. The default value is 20000. Higher the value set, better is the QPS under filtered queries, but results in higher memory usage, and the other way around. |
Query runtime |
scann.pre_reordering_num_neighbors |
The database flag when set, specifies the number of candidate neighbors to consider during the reordering stages after initial search identifies a set of candidates. Set this to a value higher than the number of neighbors you want the query to return. Higher value sets result in better recall, but this approach results in lower QPS. |
Query runtime |
max_num_levels |
The maximum number of levels of the K-means clustering tree.
|
Index creation |
Tune a ScaNN index
Consider the following examples for two-level and three-level ScaNN indexes that show how tuning parameters are set:
Two-level index
SET LOCAL scann.num_leaves_to_search = 1;
SET LOCAL scann.pre_reordering_num_neighbors=50;
CREATE INDEX my-scann-index ON my-table
USING scann (vector_column cosine)
WITH (num_leaves = [power(1000000, 1/2)]);
Three-level index
SET LOCAL scann.num_leaves_to_search = 10;
SET LOCAL scann.pre_reordering_num_neighbors=50;
CREATE INDEX my-scann-index ON my-table
USING scann (vector_column cosine)
WITH (num_leaves = [power(1000000, 2/3)], max_num_levels = 2);
Any insert or update operation on a table where a ScaNN index is already
generated impacts how the learned tree optimizes the index. If
your table is prone to frequent updates or insertions, then we recommend
periodically reindexing the existing ScaNN index to improve the recall accuracy.
You can monitor index metrics to determine the amount of mutations created since the index was built, and then reindex accordingly. For more information about metrics, see Vector index metrics.
Best practices for tuning
Based on the type of ScaNN index you plan to use, the recommendations for tuning your index vary. This section provides recommendations about how to tune index parameters for optimal balance between recall and QPS.
Two-level tree index
To apply recommendations to help you find the optimal values of num_leaves and num_leaves_to_search for your dataset,
follow these steps:
- Create the
ScaNNindex withnum_leavesset to the square root of the indexed table's row count. - Run your test queries, increasing the value of
scann.num_of_leaves_to_search, until you achieve your target recall range–for example, 95%. For more information about analyzing your queries, see Analyze your queries. - Take note of the ratio between
scann.num_leaves_to_searchandnum_leavesthat will be used in subsequent steps. This ratio provides approximation around the dataset that will help you achieve your target recall.
If you are working with high dimension vectors (500 dimensions or higher) and want to improve recall, then try tuning the value ofscann.pre_reordering_num_neighbors. As a starting point, set the value to100 * sqrt(K)whereKis the limit that you set in your query. - If your QPS is too low after your queries achieve a target recall, then follow these steps:
- Recreate the index, increasing the value of
num_leavesandscann.num_leaves_to_searchaccording to the following guidance:- Set
num_leavesto a larger factor of the square root of your row count. For example, if the index hasnum_leavesset to the square root of your row count, try setting it to double the square root. If the value is already double, then try setting it to triple the square root. - Increase
scann.num_leaves_to_searchas needed to maintain its ratio withnum_leaves, which you noted in Step 3. - Set
num_leavesto a value less than or equal to the row count divided by 100.
- Set
- Run the test queries again.
While you're running the test queries, experiment with reducing
scann.num_leaves_to_search, finding a value that increases QPS while keeping your recall high. Try different values ofscann.num_leaves_to_searchwithout rebuilding the index.
- Recreate the index, increasing the value of
- Repeat Step 4 until both the QPS and the recall range have reached acceptable values.
Three-level tree index
In addition to the recommendations for the two-level tree ScaNN index, use the following guidance and the steps to tune the index:
- Increasing the
max_num_levelsfrom1for a two-level tree to2for a three-level tree significantly reduces the time to create an index, but at the expense of recall accuracy. Setmax_num_levelsusing the following recommendation:- Set the value to
2if the number of vector rows exceeds 100 million rows. - Set the value to
1if the number of vector rows are less than 10 million rows. - Set to either
1or2if the number of vector rows lie between 10 million and 100 million rows, based on balance of index creation time and the recall accuracy you need.
- Set the value to
To apply recommendations to find the optimal value of num_leaves and max_num_levels index parameters, follow these steps:
Create the
ScaNNindex with the followingnum_leavesandmax_num_levelscombinations based on your dataset:- vector rows greater than 100 million rows: Set
max_num_levelsas2andnum_leavesaspower(rows, ⅔). - vector rows less than 100 million rows: Set
max_num_levelsas1andnum_leavesassqrt(rows). - vector rows between 10 million and 100 million rows: Start by setting
max_num_levelsas1andnum_leavesassqrt(rows).
- vector rows greater than 100 million rows: Set
Run your test queries. For more information about analyzing queries, see Analyze your queries.
If the index creation time is satisfactory, then retain the
max_num_levelsvalue, and experiment with thenum_leavesvalue for optimal recall accuracy.If you aren't satisfied with the index creation time, then do the following:
If
max_num_levelsvalue is1, then drop the index. Rebuild the index withmax_num_levelsvalue set to2.Run the queries and tune the
num_leavesvalue for optimal recall accuracy.If the
max_num_levelsvalue is2, then drop the index. Rebuild the index with the samemax_num_levelsvalue and tune thenum_leavesvalue for optimal recall accuracy.
Tune an IVF index
Tuning the values you set for the lists, ivf.probes, and the quantizer parameters might
help optimize your application's performance:
| Tuning parameter | Description | Parameter type |
|---|---|---|
lists |
The number of lists created during index building. The starting point for setting this value is (rows)/1000 for up to one million rows, and sqrt(rows) for more than one million rows. |
Index creation |
quantizer |
The type of quantizer you want to use for the K-means tree. The default value is SQ8 for better query performance. Set it to FLAT for better recall. |
Index creation |
ivf.probes |
the number of nearest lists to explore during search. The starting point for this value is sqrt(lists). |
Query runtime |
Consider the following example that shows an IVF index with the tuning parameters set:
SET LOCAL ivf.probes = 10;
CREATE INDEX my-ivf-index ON my-table
USING ivf (vector_column cosine)
WITH (lists = 100, quantizer = 'SQ8');
Tune an IVFFlat index
Tuning the values you set for the lists and theivfflat.probes parameters can
help optimize application performance:
| Tuning parameter | Description | Parameter type |
|---|---|---|
lists |
The number of lists created during index building. The starting point for setting this value is (rows)/1000 for up to one million rows, and sqrt(rows) for more than one million rows. |
Index creation |
ivfflat.probes |
The number of nearest lists to explore during search. The starting point for this value is sqrt(lists). |
Query runtime |
Before you build an IVFFlat index, make sure that your database's
max_parallel_maintenance_workers flag is set to a value sufficient to expedite
the index creation on large tables.
Consider the following example that shows an IVFFlat index with the tuning parameters set:
SET LOCAL ivfflat.probes = 10;
CREATE INDEX my-ivfflat-index ON my-table
USING ivfflat (vector_column cosine)
WITH (lists = 100);
Tune an HNSW index
Tuning the values you set for the m, ef_construction, and the hnsw.ef_search parameters can
help optimize application performance.
| Tuning parameter | Description | Parameter type |
|---|---|---|
m |
The maximum number of connections per from a node in the graph. You can start with the default value as 16(default) and experiment with higher values based on the size of your dataset. |
Index creation |
ef_construction |
The size of the dynamic candidate list maintained during graph construction, which constantly updates the current best candidates for nearest neighbors for a node. Set this value to any value higher than twice of the m value—for example, 64(default). |
Index creation |
ef_search |
The size of the dynamic candidate list used during search. You can start setting this value to either m or ef_construction, and then change it while observing the recall. The default value is 40. |
Query runtime |
Consider the following example that shows an hnsw index with the tuning parameters set:
SET LOCAL hnsw.ef_search = 40;
CREATE INDEX my-hnsw-index ON my-table
USING hnsw (vector_column cosine)
WITH (m = 16, ef_construction = 200);
Analyze your queries
Use the EXPLAIN ANALYZE command to analyze your query insights as shown in the following example SQL query.
EXPLAIN ANALYZE SELECT result-column FROM my-table
ORDER BY EMBEDDING_COLUMN ::vector
USING INDEX my-scann-index
<-> embedding('textembedding-gecko@003', 'What is a database?')
LIMIT 1;
The example response QUERY PLAN includes information such as the time taken, the number of rows scanned or returned, and the resources used.
Limit (cost=0.42..15.27 rows=1 width=32) (actual time=0.106..0.132 rows=1 loops=1)
-> Index Scan using my-scann-index on my-table (cost=0.42..858027.93 rows=100000 width=32) (actual time=0.105..0.129 rows=1 loops=1)
Order By: (embedding_column <-> embedding('textgecko@003', 'What is a database?')::vector(768))
Limit value: 1
Planning Time: 0.354 ms
Execution Time: 0.141 ms
View vector index metrics
You can use the vector index metrics to review performance of your vector index, identify areas for improvement, and tune your index based on the metrics, if needed.
To view all vector index metrics, run the following SQL query, which uses the
pg_stat_ann_indexes view:
SELECT * FROM pg_stat_ann_indexes;
For more information about the complete list of metrics, see Vector index metrics.