本页面介绍了如何监控列式引擎的利用率。
使用 EXPLAIN 验证列式引擎的使用情况
您可以使用 EXPLAIN 语句观察查询生成的查询计划中显示的新列式运算符,从而验证列式引擎的使用情况。
psql 客户端
EXPLAIN (ANALYZE,COSTS OFF,BUFFERS,TIMING OFF,SUMMARY OFF)
SELECT l_returnflag, l_linestatus, l_quantity, l_extendedprice,
l_discount, l_tax
FROM lineitem
WHERE l_shipdate <= date '1992-08-06'
;
QUERY PLAN
-----------------------------------------------------------------------------
Append (actual rows=3941797 loops=1)
Buffers: shared hit=9
-> Custom Scan (columnar scan) on lineitem (actual rows=3941797 loops=1)
Filter: (l_shipdate <= '1992-08-06'::date)
Rows Removed by Columnar Filter: 56054083
Columnar cache search mode: columnar filter only
Buffers: shared hit=9
-> Seq Scan on lineitem (never executed)
Filter: (l_shipdate <= '1992-08-06'::date)
Custom Scan (columnar scan)表示查询计划中包含列式引擎扫描。Rows Removed by Columnar Filter列出列式向量化执行过滤掉的行数。Columnar cache search mode可以是columnar filter only、native或row store scan。规划工具会根据成本和下推式评估功能自动选择搜索模式。
当规划工具选择 native 模式时,它会将一些列式运算符向下推送到扫描:
Rows Aggregated by Columnar Scan列出汇总的行数。Rows Sorted by Columnar Scan列出排序的行数。Rows Limited by Columnar Scan列出已扫描的有限行数。
使用联接时,列式扫描运算符也可以使用 Late Materialization 模式。
psql 客户端
EXPLAIN (ANALYZE,COSTS OFF,BUFFERS,TIMING OFF,SUMMARY OFF)
SELECT l_shipmode, o_orderpriority
FROM orders, lineitem
WHERE o_orderkey = l_orderkey
AND l_shipmode in ('AIR', 'FOB')
AND l_receiptdate >= date '1995-01-01'
;
QUERY PLAN
-----------------------------------------------------------------------------
Hash Join (actual rows=9865288 loops=1)
Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
Buffers: temp read=127738 written=127738
-> Append (actual rows=9865288 loops=1)
-> Custom Scan (columnar scan) on lineitem (actual rows=9865288 loops=1)
Filter: ((l_shipmode = ANY ('{AIR,FOB}'::bpchar[])) AND
(l_receiptdate >= '1995-01-01'::date))
Rows Removed by Columnar Filter: 50130592
Columnar cache search mode: native
-> Index Scan using idx_lineitem_orderkey_fkidx on lineitem
(never executed)
Filter: ((l_shipmode = ANY ('{AIR,FOB}'::bpchar[])) AND
(l_receiptdate >= '1995-01-01'::date))
-> Hash (actual rows=15000000 loops=1)
Buckets: 1048576 Batches: 32 Memory Usage: 37006kB
Buffers: temp written=83357
-> Append (actual rows=15000000 loops=1)
-> Custom Scan (columnar scan) on orders (actual rows=15000000
loops=1)
Rows Removed by Columnar Filter: 0
Columnar projection mode: late materialization
Columnar cache search mode: native
-> Seq Scan on orders (never executed)
Columnar projection mode 可以是 late materialization。当规划工具通过推迟某些列值的物化来优化投影时,列式运算符会自动选择此模式。
查看列式引擎向量化联接使用情况
您可以使用 EXPLAIN 语句验证新的向量化联接运算符,以观察查询生成的查询计划中显示的新 Vectorized Hash Join 运算符。
EXPLAIN (ANALYZE,COSTS OFF,BUFFERS,TIMING OFF,SUMMARY OFF)
SELECT l_quantity, l_extendedprice, l_discount, l_tax
FROM lineitem, orders
WHERE l_shipdate <= date '2022-08-06'
AND l_orderkey = o_orderkey
AND o_orderdate <= date '2025-03-07';
QUERY PLAN
---------------------------------------------------------------------------------------
Vectorized Hash Join (actual rows=3934686 loops=1)
Vectorized partitioning, Partitions: 16 (Disk usage: 208MB)
(Peak Memory Usage: 27MB, Threads: 1)
Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
Rows Removed by Bloom Filter: 0
Buffers: temp read=26728 written=26728
-> Append (actual rows=3934686 loops=1)
-> Custom Scan (columnar scan) on lineitem (actual rows=3934686 loops=1)
Filter: (l_shipdate <= '2022-08-06'::date)
Rows Removed by Columnar Filter: 56051366
Columnar cache search mode: native
-> Seq Scan on lineitem (never executed)
Filter: (l_shipdate <= '2022-08-06'::date)
-> Vectorized Hash (actual rows=7245824 loops=1)
Build bloom filter, Memory Usage: 1024kB
-> Append (actual rows=7245824 loops=1)
-> Custom Scan (columnar scan) on orders (actual rows=7245824 loops=1)
Filter: (o_orderdate <= '2025-03-07'::date)
Rows Removed by Columnar Filter: 7754176
Columnar cache search mode: native
-> Seq Scan on orders (never executed)
Filter: (o_orderdate <= '2025-03-07'::date)
Vectorized Hash Join表示两个关系之间的联接使用了向量化哈希联接。Vectorized partitioning, Partitions:列出数据被分割成的分区数。
将 COLUMNAR_ENGINE 选项用于 EXPLAIN
EXPLAIN 命令支持 COLUMNAR_ENGINE 选项。如果指定了此选项,该命令会在 EXPLAIN 计划中输出 Columnar Check 消息,以显示可能导致规划器或执行器为该计划做出特定列式引擎相关选择的因素。该命令还会输出特定于列式引擎的其他详细信息。该消息与非列式扫描节点相关联。它通常表示阻止选择列式扫描来用于扫描的第一个原因。一些示例原因包括 the table is too small、a needed column of the table is not in the CE store 或 a needed column has a CE unsupported data type。
以下是该命令的输出示例,其中会输出 Columnar Check 消息:
psql 客户端
EXPLAIN (COLUMNAR_ENGINE, COSTS OFF)
SELECT * FROM sample_small_table
WHERE col1 > 10000;
QUERY PLAN
--------------------------------------
Seq Scan on sample_small_table
Filter: (col1 > 10000)
Columnar Check: table is too small
根据消息,用户可以执行以下操作之一:
- 解决具体原因,然后重新运行该命令以确认列式计划。
- 迭代执行同一流程。可能存在多个原因,但该子句仅尝试列出第一个原因。
以下示例显示了所输出的特定于列式引擎的其他详细信息:
psql 客户端
EXPLAIN (ANALYZE, COLUMNAR_ENGINE, COSTS OFF, TIMING OFF, SUMMARY OFF, VERBOSE) SELECT SUM(c1) FROM counter_table WHERE c2 > 0 and c2 < 80000;
QUERY PLAN
---------------------------------------------------------------------------------------------
Aggregate (actual rows=1 loops=1)
Output: sum(c1)
-> Append (actual rows=79999 loops=1)
-> Custom Scan (columnar scan) on public.counter_table (actual rows=79999 loops=1)
Output: c1
Filter: ((counter_table.c2 > 0) AND (counter_table.c2 < 80000))
Rows Removed by Columnar Filter: 1
Rows Aggregated by Columnar Scan: 79999
Bytes fetched from storage cache: 1392655
Columnar cache search mode: native
-> Seq Scan on public.counter_table (never executed)
Output: c1
Filter: ((counter_table.c2 > 0) AND (counter_table.c2 < 80000))
(13 rows)
查看将列置于列存储区中的表的相关信息
您可以通过查询 g_columnar_relations 视图,查看将列置于列存储区中的表或物化视图的相关信息。
psql 客户端
SELECT * FROM g_columnar_relations; -[ RECORD 1 ]------------------+--------------------- database_name | postgres schema_name | public relation_name | uuid_mod_test_stored status | Usable size | 700672237 invalid_block_count | 0 block_count_in_cc | 123907 total_block_count | 123907 auto_refresh_trigger_count | 0 auto_refresh_failure_count | 0 auto_refresh_trigger_timestamp | NULL auto_refresh_start_timestamp | NULL auto_refresh_end_timestamp | NULL auto_refresh_recent_status | NONE YET
查看列存储区中列的相关信息
您可以通过查询 g_columnar_columns 视图来查看列存储区中列的相关信息,包括这些列的大小和上次访问时间。
psql 客户端
SELECT database_name, schema_name, relation_name, column_name, size_in_bytes, last_accessed_time FROM g_columnar_columns;
查看近期查询的列式引擎执行统计信息
您可以使用 g_columnar_stat_statements 视图查看近期查询的列式引擎执行统计信息。此视图提供特定于列式引擎的统计信息,需要启用 pg_stat_statements 扩展程序。
psql 客户端
如需查看这些统计信息,请按以下步骤操作:
- 启用
pg_stat_statements扩展程序:CREATE EXTENSION pg_stat_statements;
- 运行要查看其统计信息的查询。您可以手动执行此操作,也可以让应用运行一段时间以生成数据。
- 查询
g_columnar_stat_statements视图以查看收集的统计信息。SELECT * FROM g_columnar_stat_statements WHERE page_read > 0;
输出类似于以下内容:
-[ RECORD 1 ]--------------------------+-------------------- user_id | 16391 db_id | 1008530 query_id | 6672321263128921833 columnar_unit_read | 32 page_read | 128003 rows_filtered | 0 columnar_scan_time | 0 vectorized_join_time | 0 vectorized_aggregation_time | 0 vectorized_aggregation_vj_time | 0 -[ RECORD 2 ]--------------------------+-------------------- user_id | 16391 db_id | 1008530 query_id | -816808861243589285 columnar_unit_read | 30 page_read | 119811 rows_filtered | 0 columnar_scan_time | 0 vectorized_join_time | 0 vectorized_aggregation_time | 0 vectorized_aggregation_vj_time | 0
查看列存储区内存用量
如需查看列式引擎可使用而未使用的 RAM 量,您可以查询 google_columnar_engine_memory_available() 函数。生成的整数显示可用内存(以兆字节 [MB] 为单位)。
SELECT google_columnar_engine_memory_available();
查看列存储区中索引的相关信息
您可以通过查询 g_columnar_indexes 视图,查看有关列式索引的状态、大小和各种性能相关指标的信息。
psql 客户端
SELECT
database_name, schema_name, index_name, status, size,
total_partition_count, non_leaf_partition_count, leaf_partition_count,
invalid_block_count, block_count_in_cc, total_block_count,
creation_time, auto_refresh_trigger_count,
auto_refresh_failure_count, auto_refresh_trigger_timestamp,
auto_refresh_start_timestamp, auto_refresh_end_timestamp,
auto_refresh_recent_status
FROM g_columnar_indexes;
下表列出了 g_columnar_indexes 视图的各个字段。
| 字段 | 数据类型 | 说明 |
|---|---|---|
| database_name | NAME | |
| schema_name | NAME | 索引所属模式的名称 |
| index_name | NAME | 索引的名称 |
| status | NAME | 列式引擎中索引的状态 |
| size | BIGINT | 列式引擎中索引的大小 |
| total_partition_count | BIGINT | 列式引擎中索引的分区总数 |
| non_leaf_partition_count | BIGINT | 列式引擎中索引的非叶分区总数 |
| leaf_partition_count | BIGINT | 列式引擎中索引的叶分区总数 |
| invalid_block_count | BIGINT | 列式引擎中索引的无效块总数 |
| block_count_in_cc | BIGINT | 列式引擎中索引的块总数 |
| total_block_count | BIGINT | 索引的块总数 |
| creation_time | BIGINT | 列式引擎中索引的创建时间 |
| auto_refresh_trigger_count | BIGINT | 自 postgres 启动以来触发的自动刷新总次数。 |
| auto_refresh_failure_count | BIGINT | 自 postgres 启动以来自动刷新失败的总次数 |
| auto_refresh_trigger_timestamp | BIGINT | 触发的最近一次自动刷新的时间戳 |
| auto_refresh_start_timestamp | BIGINT | 启动的最近一次自动刷新的时间戳 |
| auto_refresh_end_timestamp | BIGINT | 完成的最近一次自动刷新的时间戳。 |
| auto_refresh_recent_status | TEXT | 最近一次自动刷新的状态 |