84
Enterprise DB MVCC SHANKARNAG

mvcc

Embed Size (px)

DESCRIPTION

mvcc

Citation preview

Slide 1

EnterpriseDBMVCC

SHANKARNAG

Why Unmask MVCC?

Predict concurrent query behaviorManageMVCCperformance effectsUnderstand storage space reuseMVCC Unmasked4Outline

1. Introduction to MVCC

2. MVCC Implementation Details

3. MVCC Cleanup Requirements and BehaviorMVCC Unmasked5What is MVCC?

Multiversion Concurrency Control (MVCC) allows Postgres to offer highconcurrency even during signicant database read/write activity. MVCCspecically offers behavior where "readers never block writers, andwriters never block readers".

This presentation explains how MVCC is implemented in Postgres, andhighlights optimizations which minimize the downsides of MVCC.MVCC Unmasked6

Which Other Database Systems Support MVCC?

Oracle

DB2 (partial)

MySQL with InnoDB

Informix

Firebird

MSSQL (optional, disabled by default)MVCC Unmasked7MVCC BehaviorINSERT

DELETE

old (delete)

UPDATE

new (insert)Cre 40Exp

Cre 40Exp 47

Cre 64Exp 78

Cre 78Exp

MVCC Unmasked8Implementation Details

All queries were generated on an unmodied version of Postgres. Thecontrib module pageinspect was installed to show internal heap pageinformation and pg_freespacemap was installed to show free space mapinformation.MVCC Unmasked12Setup

CREATE TABLE mvcc_demo (val INTEGER);CREATE TABLEDROP VIEW IF EXISTS mvcc_demo_page0;DROP VIEWCREATE VIEW mvcc_demo_page0 ASSELECT (0, || lp || ) AS ctid,CASE lp_flagsWHEN 0 THEN UnusedWHEN 1 THEN NormalWHEN 2 THEN Redirect to || lp_offWHEN 3 THEN DeadEND,t_xmin::text::int8 AS xmin,t_xmax::text::int8 AS xmax,t_ctidFROM heap_page_items(get_raw_page(mvcc_demo, 0))ORDER BY lp;CREATE VIEWMVCC Unmasked13INSERT Using Xmin

DELETE FROM mvcc_demo;DELETE 0INSERT INTO mvcc_demo VALUES (1);INSERT 0 1SELECT xmin, xmax, * FROM mvcc_demo;xmin | xmax | val------+------+-----5409 |0 |1(1 row)

MVCC Unmasked14Just Like INSERTINSERT

DELETE

old (delete)

UPDATE

new (insert)Cre 40Exp

Cre 40Exp 47

Cre 64Exp 78

Cre 78Exp

MVCC Unmasked15DELETE Using Xmax

DELETE FROM mvcc_demo;DELETE 1INSERT INTO mvcc_demo VALUES (1);INSERT 0 1SELECT xmin, xmax, * FROM mvcc_demo;xmin | xmax | val------+------+-----5411 |0 |1(1 row)

BEGIN WORK;BEGINDELETE FROM mvcc_demo;DELETE 1MVCC Unmasked16DELETE Using Xmax

SELECT xmin, xmax, * FROM mvcc_demo;xmin | xmax | val------+------+-----(0 rows)

SELECT xmin, xmax, * FROM mvcc_demo;xmin | xmax | val------+------+-----15411 | 5412 |(1 row)

SELECT txid_current();txid_current--------------5412(1 row)

COMMIT WORK;COMMITMVCC Unmasked17Just Like DELETEINSERT

DELETE

old (delete)

UPDATE

new (insert)Cre 40Exp

Cre 40Exp 47

Cre 64Exp 78

Cre 78Exp

MVCC Unmasked18UPDATE Using Xmin and Xmax

DELETE FROM mvcc_demo;DELETE 0INSERT INTO mvcc_demo VALUES (1);INSERT 0 1SELECT xmin, xmax, * FROM mvcc_demo;xmin | xmax | val------+------+-----5413 |0 |1(1 row)

BEGIN WORK;BEGINUPDATE mvcc_demo SET val = 2;UPDATE 1MVCC Unmasked19UPDATE Using Xmin and Xmax

SELECT xmin, xmax, * FROM mvcc_demo;xmin | xmax | val------+------+-----5414 |0 |2(1 row)

SELECT xmin, xmax, * FROM mvcc_demo;xmin | xmax | val------+------+-----15413 | 5414 |(1 row)

COMMIT WORK;COMMIT

MVCC Unmasked20Just Like UPDATEINSERT

DELETE

old (delete)

UPDATE

new (insert)Cre 40Exp

Cre 40Exp 47

Cre 64Exp 78

Cre 78Exp

MVCC Unmasked21Aborted Transaction IDs Remain

DELETE FROM mvcc_demo;DELETE 1INSERT INTO mvcc_demo VALUES (1);INSERT 0 1

BEGIN WORK;BEGINDELETE FROM mvcc_demo;DELETE 1ROLLBACK WORK;ROLLBACK

SELECT xmin, xmax, * FROM mvcc_demo;xmin | xmax | val------+------+-----15415 | 5416 |(1 row)

MVCC Unmasked22

0001001010100000101001000000001000010110101000101010000010010010undo Unmasked transaction.the00000 In ProgressStatus flagsAborted IDs Can Remain BecauseTransaction Status Is Recorded Centrally

pg_clogXID

028

024

020

016

012

008

004Creation XID: 15

xminExpiration XID: 27

xmaxTuple01 Aborted10 CommittedTransaction Id (XID)

Transaction roll back marks the transaction ID as aborted. All sessionswill ignore such transactions; it is not ncessary to revisit each row toMVCC23Row Locks Using Xmax

DELETE FROM mvcc_demo;DELETE 1INSERT INTO mvcc_demo VALUES (1);INSERT 0 1

BEGIN WORK;BEGINSELECT xmin, xmax, * FROM mvcc_demo;xmin | xmax | val------+------+-----5416 |0 |1(1 row)SELECT xmin, xmax, * FROM mvcc_demo FOR UPDATE;xmin | xmax | val------+------+-----0 |15416 |(1 row)

MVCC Unmasked24Row Locks Using Xmax

SELECT xmin, xmax, * FROM mvcc_demo;xmin | xmax | val------+------+-----5416 | 5417 |1(1 row)COMMIT WORK;COMMIT

The tuple bit HEAP_XMAX_EXCL_LOCK is used to indicate that xmax is alocking xid rather than an expiration xid.MVCC Unmasked25Multi-Statement Transactions

Multi-statement transactions require extra tracking because eachstatement has its own visibility rules. For example, a cursors contentsmust remain unchanged even if later statements in the same transactionmodify rows. Such tracking is implemented using system command idcolumns cmin/cmax, which is internally actually is a single column.MVCC Unmasked26INSERT Using Cmin

DELETE FROM mvcc_demo;DELETE 1

BEGIN WORK;BEGININSERT INTO mvcc_demo VALUES (1);INSERT 0 1INSERT INTO mvcc_demo VALUES (2);INSERT 0 1INSERT INTO mvcc_demo VALUES (3);INSERT 0 1MVCC Unmasked27INSERT Using Cmin

SELECT xmin, cmin, xmax, * FROM mvcc_demo;xmin | cmin | xmax | val------+------+------+-----5419 |5419 |5419 |0 |1 |2 |0 |0 |0 |123(3 rows)COMMIT WORK;COMMITMVCC Unmasked28DELETE Using Cmin

DELETE FROM mvcc_demo;DELETE 3

BEGIN WORK;BEGININSERT INTO mvcc_demo VALUES (1);INSERT 0 1INSERT INTO mvcc_demo VALUES (2);INSERT 0 1INSERT INTO mvcc_demo VALUES (3);INSERT 0 1MVCC Unmasked29DELETE Using Cmin

SELECT xmin, cmin, xmax, * FROM mvcc_demo;xmin | cmin | xmax | val------+------+------+-----5421 |5421 |5421 |0 |1 |2 |0 |0 |0 |123(3 rows)

DECLARE c_mvcc_demo CURSOR FORSELECT xmin, xmax, cmax, * FROM mvcc_demo;DECLARE CURSORMVCC Unmasked30this transaction and therefore never visible outside this transaction.DELETE Using Cmin

DELETE FROM mvcc_demo;DELETE 3SELECT xmin, cmin, xmax, * FROM mvcc_demo;xmin | cmin | xmax | val------+------+------+-----(0 rows)

FETCH ALL FROM c_mvcc_demo;xmin | xmax | cmax | val------+------+------+-----5421 | 5421 |5421 | 5421 |5421 | 5421 |0 |1 |2 |123(3 rows)COMMIT WORK;COMMIT

A cursor had to be used because the rows were created and deleted inMVCC Unmasked31UPDATE Using Cmin

DELETE FROM mvcc_demo;DELETE 0BEGIN WORK;BEGININSERT INTO mvcc_demo VALUES (1);INSERT 0 1INSERT INTO mvcc_demo VALUES (2);INSERT 0 1INSERT INTO mvcc_demo VALUES (3);INSERT 0 1SELECT xmin, cmin, xmax, * FROM mvcc_demo;xmin | cmin | xmax | val------+------+------+-----5422 |5422 |5422 |0 |1 |2 |0 |0 |0 |123(3 rows)DECLARE c_mvcc_demo CURSOR FORSELECT xmin, xmax, cmax, * FROM mvcc_demo;MVCC Unmasked32UPDATE Using Cmin

UPDATE mvcc_demo SET val = val * 10;UPDATE 3SELECT xmin, cmin, xmax, * FROM mvcc_demo;xmin | cmin | xmax | val------+------+------+-----5422 |5422 |5422 |3 |3 |3 |0 | 100 | 200 | 30(3 rows)

FETCH ALL FROM c_mvcc_demo;xmin | xmax | cmax | val------+------+------+-----0 |1 |2 |1235422 | 5422 |5422 | 5422 |5422 | 5422 |(3 rows)COMMIT WORK;COMMITMVCC Unmasked33Modifying Rows From Different Transactions

DELETE FROM mvcc_demo;DELETE 3INSERT INTO mvcc_demo VALUES (1);INSERT 0 1SELECT xmin, xmax, * FROM mvcc_demo;xmin | xmax | val------+------+-----5424 |0 |1(1 row)

BEGIN WORK;BEGININSERT INTO mvcc_demo VALUES (2);INSERT 0 1INSERT INTO mvcc_demo VALUES (3);INSERT 0 1INSERT INTO mvcc_demo VALUES (4);INSERT 0 1MVCC Unmasked34Modifying Rows From Different Transactions

SELECT xmin, cmin, xmax, * FROM mvcc_demo;xmin | cmin | xmax | val------+------+------+-----5424 |5425 |5425 |5425 |0 |0 |1 |2 |0 |0 |0 |0 |1234(4 rows)UPDATE mvcc_demo SET val = val * 10;UPDATE 4MVCC Unmasked35Modifying Rows From Different Transactions

SELECT xmin, cmin, xmax, * FROM mvcc_demo;xmin | cmin | xmax | val------+------+------+-----5425 |5425 |5425 |5425 |3 |3 |3 |3 |0 | 100 | 200 | 300 | 40(4 rows)

SELECT xmin, xmax, cmax, * FROM mvcc_demo;xmin | xmax | cmax | val------+------+------+-----3 |15424 | 5425 |(1 row)

COMMIT WORK;COMMIT

MVCC Unmasked36Combo Command Id

Because cmin and cmax are internally a single system column, it isimpossible to simply record the status of a row that is created andexpired in the same multi-statement transaction. For that reason, aspecial combo command id is created that references a local memoryhash that contains the actual cmin and cmax values.MVCC Unmasked37UPDATE Using Combo Command Ids

-- use TRUNCATE to remove even invisible rowsTRUNCATE mvcc_demo;TRUNCATE TABLE

BEGIN WORK;BEGINmvcc_demo;

mvcc_demo;

mvcc_demo;

mvcc_demo VALUES (1);

mvcc_demo VALUES (2);

mvcc_demo VALUES (3);DELETE FROMDELETE 0DELETE FROMDELETE 0DELETE FROMDELETE 0INSERT INTOINSERT 0 1INSERT INTOINSERT 0 1INSERT INTOINSERT 0 1

MVCC Unmasked38UPDATE Using Combo Command Ids

SELECT xmin, cmin, xmax, * FROM mvcc_demo;xmin | cmin | xmax | val------+------+------+-----5427 |5427 |5427 |3 |4 |5 |0 |0 |0 |123(3 rows)

DECLARE c_mvcc_demo CURSOR FORSELECT xmin, xmax, cmax, * FROM mvcc_demo;DECLARE CURSORUPDATE mvcc_demo SET val = val * 10;UPDATE 3MVCC Unmasked39UPDATE Using Combo Command Ids

SELECT xmin, cmin, xmax, * FROM mvcc_demo;xmin | cmin | xmax | val------+------+------+-----5427 |5427 |5427 |6 |6 |6 |0 | 100 | 200 | 30(3 rows)

FETCH ALL FROM c_mvcc_demo;xmin | xmax | cmax | val------+------+------+-----0 |1 |2 |1235427 | 5427 |5427 | 5427 |5427 | 5427 |(3 rows)

MVCC Unmasked40UPDATE Using Combo Command Ids

SELECT t_xmin AS xmin,t_xmax::text::int8 AS xmax,t_field3::text::int8 AS cmin_cmax,(t_infomask::integer & X0020::integer)::bool AS is_combocidFROM heap_page_items(get_raw_page(mvcc_demo, 0))ORDER BY 2 DESC, 3;xmin | xmax | cmin_cmax | is_combocid------+------+-----------+-------------5427 | 5427 |5427 | 5427 |5427 | 5427 |0 | t1 | t2 | t5427 |5427 |5427 |0 |0 |0 |6 | f6 | f6 | f(6 rows)COMMIT WORK;COMMIT

The last query uses /contrib/pageinspect, which allows visibility ofinternal heap page structures and all stored rows, including those notvisible in the current snapshot. (Bit 0x0020 is internally calledHEAP_COMBOCID.)MVCC Unmasked41MVCC Implementation Summaryxmin: creation transaction number, set byINSERTand UPDATExmax: expire transaction number, set byUPDATEandDELETE;also usedfor explicit row locks

cmin/cmax: used to identify the command number that created orexpired the tuple; also used to store combo command ids when thetuple is created and expired in the same transaction, and for explicitrow locksMVCC Unmasked42

Traditional Cleanup Requirements

Traditional single-row-version (non-MVCC) database systems requirestorage space cleanup:

deleted rows

rows created by aborted transactionsMVCC Unmasked43

MVCC Cleanup Requirements

MVCC has additional cleanup requirements:

The creation of a new row during UPDATE (rather than replacing theexisting row); the storage space taken by the old row must eventuallybe recycled.

The delayed cleanup of deleted rows (cleanup cannot occur until thereare no transactions for which the row is visible)MVCC-speciccleanupPostgres handles both traditional andrequirements.

MVCC Unmasked44

Cleanup Behavior

Fortunately, Postgres cleanup happens automatically:

On-demand cleanup of a single heap page during row access,specically when a page is accessed by SELECT, UPDATE, and DELETE

In bulk by an autovacuum processes that runs in the backgroundVACUUM.Cleanup can also be initiated manually by

MVCC Unmasked45

Aspects of Cleanup

Cleanup involves recycling space taken by several entities:

heap tuples/rows (the largest)

heap item pointers (the smallest)

index entriesMVCC Unmasked46

Internal Heap PagePage HeaderItemItemItemTupleTupleSpecial8K

TupleMVCC Unmasked47

PageHeaderItemItemItemTuple3Tuple2Tuple1SpecialIndexes Point to Items, Not Tuples

Indexes

8KMVCC Unmasked48

Page HeaderItemSpecial8KHeap Tuple Space Recycling

IndexesItem

DeadItem

DeadTuple3

Indexes prevent item pointers from being recycled.

MVCC Unmasked49

unused.Page HeaderItemSpecial8KVACUUM Later Recycle Items

IndexesItem

UnusedItem

UnusedTuple3VACUUM performs index cleanup, then can mark dead items asMVCC Unmasked50Cleanup of Deleted Rows

TRUNCATE mvcc_demo;TRUNCATE TABLE

-- force page to < 10% emptyINSERT INTO mvcc_demo SELECT 0 FROM generate_series(1, 240);INSERT 0 240

-- compute free space percentageSELECT (100 * (upper - lower) / pagesize::float8)::integer AS free_pctFROM page_header(get_raw_page(mvcc_demo, 0));free_pct----------6(1 row)INSERT INTO mvcc_demo VALUES (1);INSERT 0 1MVCC Unmasked51Cleanup of Deleted Rows

SELECT * FROM mvcc_demo_page0OFFSET 240;ctid| case | xmin | xmax | t_ctid---------+--------+------+------+---------(0,241) | Normal | 5430 |0 | (0,241)(1 row)DELETE FROM mvcc_demo WHERE val > 0;DELETE 1INSERT INTO mvcc_demo VALUES (2);INSERT 0 1

SELECT * FROM mvcc_demo_page0OFFSET 240;ctid| case | xmin | xmax | t_ctid---------+--------+------+------+---------(0,241) | Normal | 5430 | 5431 | (0,241)0 | (0,242)(0,242) | Normal | 5432 |(2 rows)

MVCC Unmasked52Cleanup of Deleted Rows

DELETE FROM mvcc_demo WHERE val > 0;DELETE 1INSERT INTO mvcc_demo VALUES (3);INSERT 0 1

SELECT * FROM mvcc_demo_page0OFFSET 240;ctid| case | xmin | xmax | t_ctid---------+--------+------+------+---------(0,241) | Dead|||(0,242) | Normal | 5432 | 5433 | (0,242)(0,243) | Normal | 5434 |0 | (0,243)(3 rows)

In normal, multi-user usage, cleanup might have been delayed becauseother open transactions in the same database might still need to view theexpired rows. However, the behavior would be the same, just delayed.MVCC Unmasked53Cleanup of Deleted Rows

-- force single-page cleanup via SELECTSELECT * FROM mvcc_demoOFFSET 1000;val-----(0 rows)

SELECT * FROM mvcc_demo_page0OFFSET 240;ctid| case | xmin | xmax | t_ctid---------+--------+------+------+---------(0,241) | Dead(0,242) | Dead||||||0 | (0,243)(0,243) | Normal | 5434 |(3 rows)

MVCC Unmasked54

Page HeaderItemSpecial8KSame as this Slide

IndexesItem

DeadItem

DeadTuple3MVCC Unmasked55Cleanup of Deleted Rows

SELECT pg_freespace(mvcc_demo);pg_freespace--------------(0,0)(1 row)

VACUUM mvcc_demo;VACUUM

SELECT * FROM mvcc_demo_page0OFFSET 240;ctid| case | xmin | xmax | t_ctid---------+--------+------+------+---------(0,241) | Unused |(0,242) | Unused |||||0 | (0,243)(0,243) | Normal | 5434 |(3 rows)

MVCC Unmasked56

Page HeaderItemSpecial8KSame as this Slide

IndexesItem

UnusedItem

UnusedTuple3MVCC Unmasked57Free Space Map (FSM)

SELECT pg_freespace(mvcc_demo);pg_freespace--------------(0,416)(1 row)

VACUUM also updates the free space map (FSM), which records pagescontaining signicant free space. This information is used to providetarget pages for INSERTs and some UPDATEs (those crossing pageboundaries). Single-page vacuum does not update the free space map.MVCC Unmasked58Another Free Space Map Example

TRUNCATE mvcc_demo;TRUNCATE TABLE

VACUUM mvcc_demo;VACUUM

SELECT pg_freespace(mvcc_demo);pg_freespace--------------(0 rows)MVCC Unmasked59Another Free Space Map Example

INSERT INTO mvcc_demo VALUES (1);INSERT 0 1VACUUM mvcc_demo;VACUUMSELECT pg_freespace(mvcc_demo);pg_freespace--------------(0,8128)(1 row)INSERT INTO mvcc_demo VALUES (2);INSERT 0 1VACUUM mvcc_demo;VACUUMSELECT pg_freespace(mvcc_demo);pg_freespace--------------(0,8096)(1 row)MVCC Unmasked60Another Free Space Map Example

DELETE FROM mvcc_demo WHERE val = 2;DELETE 1VACUUM mvcc_demo;VACUUM

SELECT pg_freespace(mvcc_demo);pg_freespace--------------(0,8128)(1 row)MVCC Unmasked61VACUUM Also Removes End-of-File Pages

DELETE FROM mvcc_demo WHERE val = 1;DELETE 1VACUUM mvcc_demo;VACUUM

SELECT pg_freespace(mvcc_demo);pg_freespace--------------(0 rows)SELECT pg_relation_size(mvcc_demo);pg_relation_size------------------0(1 row)

VACUUM FULL shrinks the table le to its minimum size, but requires anexclusive table lock.MVCC Unmasked62Optimized Single-Page Cleanup of Old UPDATE Rows

The storage space taken by old UPDATE tuples can be reclaimed just likedeleted rows. However, certain UPDATE rows can even have their itemsreclaimed, i.e. it is possible to reuse certain old UPDATE items, ratherthan marking them as dead and requiring VACUUM to reclaim themafter removing referencing index entries.

Specically, such item reuse is possible with special HOT update(heap-only tuple) chains, where the chain is on a single heap page andall indexed values in the chain are identical.MVCC Unmasked63Single-Page Cleanup of HOT UPDATE Rows

HOT update items can be freed (marked unused) if they are in themiddle of the chain, i.e. not at the beginning or end of the chain. At thehead of the chain is a special Redirect item pointers that are referencedby indexes; this is possible because all indexed values are identical in aHOT/redirect chain.

Index creation with HOT chains is complex because the chains mightcontain inconsistent values for the newly indexed columns. This ishandled by indexing just the end of the HOT chain and allowing theindex to be used only by transactions that start after the index has beencreated. (Specically, post-index-creation transactions cannot see theinconsistent HOT chain values due to MVCC visibility rules; they only seethe end of the chain.)MVCC Unmasked64

Page HeaderItemSpecial8KInitial Single-Row State

IndexesItem

UnusedItem

UnusedTuple, v1MVCC Unmasked65

PageHeaderItemItemItemUnusedTuple,v2Tuple,v1Specialchain.UPDATE Adds a New Row

Indexes

8K

No index entry added because indexes only point to the head of the HOTMVCC Unmasked66

PageHeaderItemItemItemRedirectTuple,v2Tuple,v3SpecialRedirect Allows Indexes To Remain Valid

Indexes

8KMVCC Unmasked67

PageHeaderItemItemItemRedirectTuple,v4Tuple,v3SpecialUPDATE Replaces Another Old Row

Indexes

8KMVCC Unmasked68

Page HeaderItemItemItemTuple, v4Special8KAll Old UPDATE Row Versions Eventually Removed

IndexesRedirectUnusedThis cleanup was performed by another operation on the same page.MVCC Unmasked69Cleanup of Old Updated Rows

TRUNCATE mvcc_demo;TRUNCATE TABLEINSERT INTO mvcc_demo SELECT 0 FROM generate_series(1, 240);INSERT 0 240INSERT INTO mvcc_demo VALUES (1);INSERT 0 1

SELECT * FROM mvcc_demo_page0OFFSET 240;ctid| case | xmin | xmax | t_ctid---------+--------+------+------+---------0 | (0,241)(0,241) | Normal | 5437 |(1 row)

MVCC Unmasked70Cleanup of Old Updated Rows

UPDATE mvcc_demo SET val = val + 1 WHERE val > 0;UPDATE 1SELECT * FROM mvcc_demo_page0OFFSET 240;ctid| case | xmin | xmax | t_ctid---------+--------+------+------+---------(0,241) | Normal | 5437 | 5438 | (0,242)0 | (0,242)(0,242) | Normal | 5438 |(2 rows)

MVCC Unmasked71Cleanup of Old Updated Rows

UPDATE mvcc_demo SET val = val + 1 WHERE val > 0;UPDATE 1SELECT * FROM mvcc_demo_page0OFFSET 240;ctid|case| xmin | xmax | t_ctid---------+-----------------+------+------+---------(0,241) | Redirect to 242 |||(0,242) | Normal| 5438 | 5439 | (0,243)(0,243) | Normal| 5439 |0 | (0,243)(3 rows)

No index entry added because indexes only point to the head of the HOTchain.MVCC Unmasked72

PageHeaderItemItemItemRedirectTuple,v2Tuple,v3SpecialSame as this Slide

Indexes

8KMVCC Unmasked73Cleanup of Old Updated Rows

UPDATE mvcc_demo SET val = val + 1 WHERE val > 0;UPDATE 1

SELECT * FROM mvcc_demo_page0OFFSET 240;ctid|case| xmin | xmax | t_ctid---------+-----------------+------+------+---------(0,241) | Redirect to 243 |||(0,242) | Normal| 5440 |0 | (0,242)| 5439 | 5440 | (0,242)(0,243) | Normal(3 rows)

MVCC Unmasked74

PageHeaderItemItemItemRedirectTuple,v4Tuple,v3SpecialSame as this Slide

Indexes

8KMVCC Unmasked75Cleanup of Old Updated Rows

-- transaction now committed, HOT chain allows tid to be marked as UnusedSELECT * FROM mvcc_demoOFFSET 1000;val-----(0 rows)

SELECT * FROM mvcc_demo_page0OFFSET 240;ctid|case| xmin | xmax | t_ctid---------+-----------------+------+------+---------(0,241) | Redirect to 242 |||(0,242) | Normal| 5440 |0 | (0,242)(0,243) | Unused|||(3 rows)MVCC Unmasked76

Page HeaderItemItemItemTuple, v4Special8KSame as this Slide

IndexesRedirectUnusedMVCC Unmasked77VACUUM Does Not Remove the Redirect

VACUUM mvcc_demo;VACUUM

SELECT * FROM mvcc_demo_page0OFFSET 240;ctid|case| xmin | xmax | t_ctid---------+-----------------+------+------+---------(0,241) | Redirect to 242 |||(0,242) | Normal| 5440 |0 | (0,242)(0,243) | Unused|||(3 rows)MVCC Unmasked78Cleanup Using Manual VACUUM

TRUNCATE mvcc_demo;TRUNCATE TABLEINSERT INTO mvcc_demo VALUES (1);INSERT 0 1INSERT INTO mvcc_demo VALUES (2);INSERT 0 1INSERT INTO mvcc_demo VALUES (3);INSERT 0 1

SELECT ctid, xmin, xmaxFROM mvcc_demo_page0;ctid | xmin | xmax-------+------+------(0,1) | 5442 |(0,2) | 5443 |(0,3) | 5444 |000(3 rows)DELETE FROM mvcc_demo;DELETE 3MVCC Unmasked79Cleanup Using Manual VACUUM

SELECT ctid, xmin, xmaxFROM mvcc_demo_page0;ctid | xmin | xmax-------+------+------(0,1) | 5442 | 5445(0,2) | 5443 | 5445(0,3) | 5444 | 5445(3 rows)

-- too small to trigger autovacuumVACUUM mvcc_demo;VACUUM

SELECT pg_relation_size(mvcc_demo);pg_relation_size------------------0(1 row)MVCC Unmasked80The Indexed UPDATE Problem

The updating of any indexed columns prevents the use of redirectitems because the chain must be usable by all indexes, i.e. a redirect/HOTUPDATE cannot require additional index entries due to an indexed valuechange.

In such cases, item pointers can only be marked as dead, like DELETEdoes.UPDATEqueries modied indexed columns.No previously shown

MVCC Unmasked81Index mvcc_demo Column

CREATE INDEX i_mvcc_demo_val on mvcc_demo (val);CREATE INDEXMVCC Unmasked82UPDATE of an Indexed Column

TRUNCATE mvcc_demo;TRUNCATE TABLEINSERT INTO mvcc_demo SELECT 0 FROM generate_series(1, 240);INSERT 0 240INSERT INTO mvcc_demo VALUES (1);INSERT 0 1

SELECT * FROM mvcc_demo_page0OFFSET 240;ctid| case | xmin | xmax | t_ctid---------+--------+------+------+---------0 | (0,241)(0,241) | Normal | 5449 |(1 row)

MVCC Unmasked83UPDATE of an Indexed Column

UPDATE mvcc_demo SET val = val + 1 WHERE val > 0;UPDATE 1SELECT * FROM mvcc_demo_page0OFFSET 240;ctid| case | xmin | xmax | t_ctid---------+--------+------+------+---------(0,241) | Normal | 5449 | 5450 | (0,242)(0,242) | Normal | 5450 |0 | (0,242)(2 rows)UPDATE mvcc_demo SET val = val + 1 WHERE val > 0;UPDATE 1SELECT * FROM mvcc_demo_page0OFFSET 240;ctid| case | xmin | xmax | t_ctid---------+--------+------+------+---------(0,241) | Dead|||(0,242) | Normal | 5450 | 5451 | (0,243)0 | (0,243)(0,243) | Normal | 5451 |(3 rows)

MVCC Unmasked84UPDATE of an Indexed Column

UPDATE mvcc_demo SET val = val + 1 WHERE val > 0;UPDATE 1

SELECT * FROM mvcc_demo_page0OFFSET 240;ctid| case | xmin | xmax | t_ctid---------+--------+------+------+---------(0,241) | Dead(0,242) | Dead||||||(0,243) | Normal | 5451 | 5452 | (0,244)0 | (0,244)(0,244) | Normal | 5452 |(4 rows)

MVCC Unmasked85UPDATE of an Indexed Column

SELECT * FROM mvcc_demoOFFSET 1000;val-----(0 rows)

SELECT * FROM mvcc_demo_page0OFFSET 240;ctid| case | xmin | xmax | t_ctid---------+--------+------+------+---------(0,241) | Dead(0,242) | Dead(0,243) | Dead|||||||||0 | (0,244)(0,244) | Normal | 5452 |(4 rows)

MVCC Unmasked86UPDATE of an Indexed Column

VACUUM mvcc_demo;VACUUM

SELECT * FROM mvcc_demo_page0OFFSET 240;ctid| case | xmin | xmax | t_ctid---------+--------+------+------+---------(0,241) | Unused |(0,242) | Unused |(0,243) | Unused |||||||0 | (0,244)(0,244) | Normal | 5452 |(4 rows)

MVCC Unmasked87

Cleanup SummaryReuseNon-HOTHOTCleanupHeapItemItemCleanUpdateMethod

Single-PageTriggered By

SELECT, UPDATE,Scope

single heapTuples?

yesState

deadState

unusedIndexes?

noFSM

noDELETEpageVACUUMautovacuumall potentialyesunusedunusedyesyesor manuallyheap pagesCleanup is possible only when there are no active transactions for which the tuples arevisible.

HOT items are UPDATE chains that span a single page and contain identical indexedcolumn values.

In normal usage, single-page cleanup performs the majority of the cleanup work, whileVACUUM reclaims dead item pointers, removes unnecessary index entries, andupdates the free space map (FSM).MVCC Unmasked88

Conclusion