Table and index bloat occurs when the vacuum operation doesn't keep up with the rate of Data Manipulation Language (DML) against a table. Some table and index bloat is unavoidable and generally acceptable because transactions can reuse that space. Severe bloat causes performance issues, such as slower query times, increased I/O, less efficient shared buffer and cache usage, and increased storage costs.
The best way to estimate bloat is to use the
pgstattuple
module. However, using pgstattuple is less efficient when the extension must
read all the pages of a relation. If you can't use pgstattuple, you can use
other scripts to estimate bloat, although this approach provides less accuracy.
Read pgstattuple output
When you use pgstattuple to report the physical storage characteristics of a
table or index, the output is in the following format:
> SELECT * FROM pgstattuple('large_table');
-[ RECORD 1 ]------+-----------
table_len | 7431888896
tuple_count | 66666666
tuple_len | 2796296271
tuple_percent | 37.63
dead_tuple_count | 0
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 3873153652
free_percent | 52.12
The following describes the output:
table_len(7431888896)- The total size of the table in bytes. It includes all live tuples, dead
tuples, and free space. In this case,
large_tableoccupies approximately 7,088 GB. tuple_count(66666666)- Represents the number of live tuples—visible and active rows—that are in the table. The table contains approximately 66 million active rows.
tuple_len(2796296271)- The total length in bytes of all the live tuples combined, which is approximately 2.6 GB.
tuple_percent(37.63)The percentage of the
table_lenthat live tuples occupy. You calculate it as follows:(
tuple_len/table_len) ∗ 100 = (2796296271/7431888896) ∗ 100 ≈37.63percentThis output indicates that active data uses less than 40% of the table's disk space.
dead_tuple_count(0)The number of dead tuples—rows that vacuum operations haven't yet removed after updates or deletions. A value of
0means that the table is clean and has no dead tuples waiting to be reclaimed. This often suggests thatVACUUM, or autovacuum, runs efficiently on the table.dead_tuple_len(0)The total length in bytes of all the dead tuples. Consistent with
dead_tuple_count, a value of0means that dead rows don't consume any space.dead_tuple_percent(0)The percentage of the
table_lenthat dead tuples occupy. You calculate it as follows:(
dead_tuple_len/table_len) ∗ 100 = (0/7431888896) ∗ 100 =0percentfree_space(3873153652)This value represents the amount of free space in the table's data pages. You can reuse this space for new inserts or updates without requiring table expansion. This indicates a significant amount of free space, approximately 3.7 GB.
free_percent(52.12)This value represents the percentage of the
table_lenthat's free space. You calculate it as follows:(
free_space/table_len) ∗ 100 = (3873153652/7431888896) ∗ 100 ≈52.12percent
Estimate bloat using queries
You can estimate bloat using queries against system catalog tables, though this
method is less accurate than using extensions such as pgstattuple. If you
use scripts like
PostgreSQL table bloat check,
your table statistics must be up to date, otherwise the estimate is inaccurate.
When you compare the output of a query with the pgstattuple module, the
estimates are comparable, but not exact measurements of bloat. The module is
still the more accurate gauge of bloat because the SQL query relies on the
cumulative statistics engine of PostgreSQL, whereas the module reads the actual
data pages.
The following example compares the output of two different methods for
analyzing table bloat in PostgreSQL for a table named large_table.
> SELECT * FROM pgstattuple('large_table');
-[ RECORD 1 ]------+-----------
table_len | 2475892736
tuple_count | 22000000
tuple_len | 922925931
tuple_percent | 37.28
dead_tuple_count | 0
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 1293696900
free_percent | 52.25
Vs.
-[ RECORD 1 ]+------------
databasename | postgres
schemaname | public
tablename | large_table
can_estimate | t
est_rows | 21965400
pct_bloat | 54
mb_bloat | 1269.24
table_mb | 2361.195
Address bloat
You can manage bloat in the following ways, each of which has advantages and disadvantages, which require careful planning:
- Perform a full vacuum. This operation requires a full table lock for the duration of the vacuum, and requires downtime.
- Move data to new table. To use this method, you must quiesce table changes to make sure that changes aren't transferred to the new relation.
- Use
pg_squeeze. This extension mitigates bloat online. The extension creates a shadow table, inserts all the data, and then maintains table changes using a replication slot until the last step, which is switching out the bloated table with the newly created un-bloated table. Then, it requests a brief table lock to move the non-bloated table to replace the bloated table. This method requires twice the object's space for the duration of the operation, but it's the least intrusive to normal operation.