At a high level, vacuuming is used to free up dead tuples in a table so they can be reused. It also helps you avoid transaction ID wraparound and improves statistics updates and query optimizations.
VACUUM:
- It is a manual command.
- Vacuum will remove dead tuples both from the table and indexes – it will not return the disk space back to the OS, but it will make it usable for new rows.
- Don't run at transaction when transaction time it will take higher CPU and I/O usage on the system.
- you may run it only once a day/week when less transaction probably night, in which case you’ll probably accumulate more dead tuples
- VACUUM FULL would reclaim the space and return it to the OS.
- Initially It acquires exclusive lock on the table, blocking all operations (including SELECT statement).
- Then it creates a copy of the table, doubling the disk space needed, so it’s not very practical when already running out of disk space.
- Autovacuum will be executed more often during busy periods, and less often when the database is mostly idle
- it should not consume too much resources (CPU and disk I/O),
- By default autovacuum trigger when dead tuble reached 20% of table
- autovacuum not only clearing dead tuble its updating current statistic to optimizer(for new query planner) when dead tuble reached 10% of table by default.
run this script again and again with minimal time period and change the database name and environment variable as per your postgres server
There were 4 parameters in our postgresql.conf that were set to the default values:
autovacuum_vacuum_scale_factor = 0.2;autovacuum_analyze_scale_factor = 0.1;autovacuum_vacuum_threshold (integer)50autovacuum_analyze_threshold (integer)50The default values were far too small for the number of transactions our database was processing. For example, a nijam table in this database would not get autovacuumed tuples were updated or deleted for last few days.
select n_dead_tup ,last_vacuum,last_analyze,n_tup_upd, n_tup_del,n_tup_hot_upd,relname ,seq_scan,idx_scan from pg_stat_all_tables where relname='k1'; n_dead_tup | last_vacuum | last_analyze | n_tup_upd | n_tup_del | n_tup_hot_upd | relname | seq_scan | idx_scan------------+-------------+--------------+-----------+-----------+---------------+---------+----------+---------- 200 | | |200| 0| 0 | k1 | 17 |(1 row)
SELECT reltuples::numeric FROM pg_class WHERE relname = 'k1'; reltuples----------- 10,000This is how we reach the above number:
autovacuum_vacuum_threshold + pg_class.reltuples * scale_factorautovacuum_analyze_threshold + pg_class.reltuples * scale_factor
PostgreSQL would trigger a vacuum when dead tubles reached 2050 right now we have 200 only
50+10000*0.2 =2050
PostgreSQL would trigger an analyze when dead tubles reached 1050
50+10000*0.1 =1050
we have a table with 10,000 rows and 200 of them had changed:
- autovacuum_analyze_threshold tells us, that we're over the default 50;
- we calculate the fraction based on autovacuum_analyze_scale_factor (which defaults to 0.1) and this gives us 1000 rows;
- total calculated threshold therefore is 1050;
- as 200 is less then 1050, ANALYZE is not started (we wait for more changes).
Autovacuum parameter Options:
Option 1 : Default 20% for vacuum and 10% for analyze
autovacuum_vacuum_scale_factor = 0.2autovacuum_analyze_scale_factor = 0.1the table will be considered as in need of cleanup. The formula basically says that up to 20% of a table may be dead tuples before it gets cleaned up (the threshold of 50 rows is there to prevent very frequent cleanups of tiny tables).
The default scale factor works fine for small and medium-sized tables, but not so much for very large tables – on 20GB table this is roughly 4GB of dead tuples for vacuum and 2GB for analyze, while on 10TB table need to reach 2TB for vacuum and 1TB for analyze.
This is an example of accumulating a lot of dead tuples, and processing all of it at once, which is going to hurt. And per the rule mentioned before, the solution is to do it more often by significantly decreasing the scale factor, perhaps even like this:
Option 2 :1% for vacuum and 1% for analyze
autovacuum_vacuum_scale_factor = 0.01 autovacuum_analyze_scale_factor = 0.01
AT BIG TABLE 10TB:
PostgreSQL would trigger a vacuum and analyze when dead tubles reached (1%) means 100GB
AT MEDIUM TABLE 2TB:
PostgreSQL would trigger a vacuum and analyze when dead tubles reached (1%) means 20GB
AT SMALL TABLE 10GB:
PostgreSQL would trigger a vacuum and analyze when dead tubles reached (1%) means 100MB
This is good for small table not for Tera Bytes
which decreases the limit to only 1% of the table. An alternative solution is to abandon the scale factor entirely, and use solely the threshold
Option 3 : Based On threshold:
autovacuum_vacuum_scale_factor = 0 autovacuum_analyze_scale_factor = 0 autovacuum_vacuum_threshold = 15000 autovacuum_analyze_threshold =10000which should trigger the cleanup after generating 15000 dead tuples and analyze the after reached dead tuble 10000.
One trouble is that these changes in postgresql.conf affect all tables (the whole cluster, in fact), and it may undesirably affect cleanups of small tables, including for example system catalogs.
When the small tables are cleaned up more frequently, the easiest solution is to simply ignore the problem entirely. Cleanup of small tables is going to be fairly cheap, and the improvement on large tables is usually so significant that even if you ignore small inefficiency on small tables, the overall effect is still very positive.
But if you decided to change the configuration in a way that would significantly delay cleanup on small tables (as for example with setting scale_factor=0 and threshold=4000), it’s better to apply those changes only to particular tables using ALTER TABLE:
ALTER TABLE small_table SET (autovacuum_vacuum_scale_factor = 0);ALTER TABLE small_table SET (autovacuum_analyze_scale_factor = 0);ALTER TABLE small_table SET (autovacuum_vacuum_threshold = 5000);ALTER TABLE small_table SET (autovacuum_analyze_threshold = 2500);
Try to keep the configuration as simple as possible, and override the parameters for as few tables as possible. It’s also a good idea to include this into your internal documentation, including the reasoning for particular values.How to Increase the performance of Autovacuum
autovacuum_max_workers :
log_autovacuum_min_duration (integer)
(integer) 3-8
autovacuum_naptime (integer) 1min->
autovacuum_vacuum_cost_delay (integer) 20ms->10ms
autovacuum_vacuum_cost_limit (integer) -1->
0 comments:
Post a Comment