25
Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the shared memory structures used by Postgres. Creative Commons Attribution License http://momjian.us/presentations Last updated: February, 2019 1 / 25

Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

  • Upload
    vodang

  • View
    279

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Inside PostgreSQL Shared Memory

BRUCE MOMJIAN

POSTGRESQL is an open-source, full-featured relational database.

This presentation gives an overview of the shared memory structures

used by Postgres.

Creative Commons Attribution License http://momjian.us/presentations

Last updated: February, 2019

1 / 25

Page 2: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Outline

1. File storage format

2. Shared memory creation

3. Shared buffers

4. Row value access

5. Locking

6. Other structures

2 / 25

Page 3: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

File System /data

Postgres

Postgres

Postgres

/data

3 / 25

Page 4: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

File System /data/base

Postgres

Postgres

Postgres

/data

/pg_clog/pg_multixact/pg_subtrans/pg_tblspc

/pg_wal

/global

/pg_twophase

/base

4 / 25

Page 5: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

File System /data/base/db

Postgres

Postgres

Postgres

/data /base /16385 (production)

/1 (template1)

/17982 (devel)/16821 (test)

/21452 (marketing)

5 / 25

Page 6: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

File System /data/base/db/table

Postgres

Postgres

Postgres

/data /base /16385 /24692 (customer)

/27214 (order)/25932 (product)/25952 (employee)/27839 (part)

6 / 25

Page 7: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

File System Data Pages

Postgres

Postgres

Postgres

8k 8k 8k 8k

/data /base /16385 /24692

7 / 25

Page 8: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Data Pages

Postgres

Postgres

Postgres

Page Header Item Item Item

Tuple

Tuple Tuple Special

8K

8k 8k 8k 8k

/data /base /16385 /24692

8 / 25

Page 9: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

File System Block Tuple

Postgres

Postgres

Postgres

Page Header Item Item Item

Tuple

Tuple Tuple Special

8K

8k 8k 8k 8k

/data /base /16385 /24692

Tuple

9 / 25

Page 10: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

File System Tuple

hoff − length of tuple header

infomask − tuple flags

natts − number of attributes

ctid − tuple id (page / item)

cmax − destruction command id

xmin − creation transaction id

xmax − destruction transaction id

cmin − creation command id

bits − bit map representing NULLs

OID − object id of tuple (optional)

Tuple

Value Value ValueValue Value Value ValueHeader

int4in(’9241’)

textout()

’Martin’

10 / 25

Page 11: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Tuple Header C Structures

typedef struct HeapTupleFields{ TransactionId t_xmin; /* inserting xact ID */ TransactionId t_xmax; /* deleting or locking xact ID */

union { CommandId t_cid; /* inserting or deleting command ID, or both */ TransactionId t_xvac; /* VACUUM FULL xact ID */ } t_field3;} HeapTupleFields;

typedef struct HeapTupleHeaderData{ union { HeapTupleFields t_heap; DatumTupleFields t_datum; } t_choice;

ItemPointerData t_ctid; /* current TID of this or newer tuple */

/* Fields below here must match MinimalTupleData! */

uint16 t_infomask2; /* number of attributes + various flags */

uint16 t_infomask; /* various flag bits, see below */

uint8 t_hoff; /* sizeof header incl. bitmap, padding */

/* ^ − 23 bytes − ^ */

bits8 t_bits[ 1]; /* bitmap of NULLs −− VARIABLE LENGTH */

/* MORE DATA FOLLOWS AT END OF STRUCT */} HeapTupleHeaderData;

11 / 25

Page 12: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Shared Memory Creation

postmaster postgres postgres

Program (Text)

Data

Program (Text)

Data

Shared Memory

Program (Text)

Data

Shared Memory Shared Memory

Stack Stack

fork

()

Stack

12 / 25

Page 13: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Shared Memory

Shared Buffers

Proc Array

PROC

Multi−XACT Buffers

Two−Phase Structs

Subtrans Buffers

CLOG Buffers

XLOG Buffers

Shared Invalidation

Lightweight Locks

Lock Hashes

Auto Vacuum

Btree Vacuum

Buffer Descriptors

Background Writer Synchronized Scan

Semaphores

Statistics

LOCK

PROCLOCK

13 / 25

Page 14: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Shared Buffers

Page Header Item Item Item

Tuple

Tuple Tuple Special

8KPostgres

Postgres

Postgres

8k 8k 8k 8k

/data /base /16385 /24692

Shared Buffers

LWLock − for page changes

Pin Count − prevent page replacement

read()

write()

8k 8k 8k

Buffer Descriptors

14 / 25

Page 15: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

HeapTuples

Shared Buffers

PostgresHeapTuple

C pointer

hoff − length of tuple header

infomask − tuple flags

natts − number of attributes

ctid − tuple id (page / item)

cmax − destruction command id

xmin − creation transaction id

xmax − destruction transaction id

cmin − creation command id

bits − bit map representing NULLs

OID − object id of tuple (optional)

Tuple

Value Value ValueValue Value Value ValueHeader

int4in(’9241’)

textout()

’Martin’

Page Header Item Item Item

Tuple

Tuple Tuple Special

8K

8k 8k 8k

15 / 25

Page 16: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Finding A Tuple Value in C

Datumnocachegetattr(HeapTuple tuple, int attnum, TupleDesc tupleDesc, bool *isnull){ HeapTupleHeader tup = tuple−>t_data; Form_pg_attribute *att = tupleDesc−>attrs;

{ int i;

/* * Note − This loop is a little tricky. For each non−null attribute, * we have to first account for alignment padding before the attr, * then advance over the attr based on its length. Nulls have no * storage and no alignment padding either. We can use/set * attcacheoff until we reach either a null or a var−width attribute. */ off = 0; for (i = 0;; i++) /* loop exit is at "break" */ { if (HeapTupleHasNulls(tuple) && att_isnull(i, bp)) continue; /* this cannot be the target att */

if (att[i]−>attlen == − 1) off = att_align_pointer(off, att[i]−>attalign, − 1, tp + off); else /* not varlena, so safe to use att_align_nominal */ off = att_align_nominal(off, att[i]−>attalign);

if (i == attnum) break;

off = att_addlength_pointer(off, att[i]−>attlen, tp + off); } }

return fetchatt(att[attnum], tp + off);}

16 / 25

Page 17: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Value Access in C

#define fetch_att(T,attbyval,attlen) \( \ (attbyval) ? \ ( \ (attlen) == ( int) sizeof(int32) ? \ Int32GetDatum(*((int32 *)(T))) \ : \ ( \ (attlen) == ( int) sizeof(int16) ? \ Int16GetDatum(*((int16 *)(T))) \ : \ ( \ AssertMacro((attlen) == 1), \ CharGetDatum(*(( char *)(T))) \ ) \ ) \ ) \ : \ PointerGetDatum(( char *) (T)) \)

17 / 25

Page 18: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Test And Set Lock

Can Succeed Or Fail

0/1

1

0

Success

Was 0 on exchange

Lock already taken

Was 1 on exchange

Failure

1

1

18 / 25

Page 19: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Test And Set Lock

x86 Assembler

static __inline__ inttas( volatile slock_t *lock){ register slock_t _res = 1;

/* * Use a non−locking test before asserting the bus lock. Note that the * extra test appears to be a small loss on some x86 platforms and a small * win on others; it’s by no means clear that we should keep it. */ __asm__ __volatile__( " cmpb $0,%1 \n" " jne 1f \n" " lock \n" " xchgb %0,%1 \n" "1: \n": "+q"(_res), "+m"(*lock):: "memory", "cc"); return ( int) _res;}

19 / 25

Page 20: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Spin Lock

Always Succeeds

0/1

1

Success

Was 0 on exchange

Failure

Was 1 on exchange

Lock already taken

Sleep of increasing duration

0 1

1

Spinlocks are designed for short-lived locking operations, like access

to control structures. They are not be used to protect code that

makes kernel calls or other heavy operations.20 / 25

Page 21: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Light Weight Locks

Shared Buffers

Proc Array

PROC

Multi−XACT Buffers

Two−Phase Structs

Subtrans Buffers

CLOG Buffers

XLOG Buffers

Shared Invalidation

Lightweight Locks

Auto Vacuum

Btree Vacuum

Background Writer Synchronized Scan

Semaphores

Statistics

LOCK

PROCLOCK

Lock Hashes

Buffer Descriptors

Sleep On Lock

Light weight locks attempt to acquire the lock, and go to sleep on a

semaphore if the lock request fails. Spinlocks control access to the

light weight lock control structure.21 / 25

Page 22: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Database Object Locks

PROCLOCKPROC LOCK

Lock Hashes

22 / 25

Page 23: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Proc

Proc Array

used usedusedempty empty empty

PROC

23 / 25

Page 24: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Other Shared Memory Structures

Shared Buffers

Proc Array

PROC

Multi−XACT Buffers

Two−Phase Structs

Subtrans Buffers

CLOG Buffers

XLOG Buffers

Shared Invalidation

Lightweight Locks

Lock Hashes

Auto Vacuum

Btree Vacuum

Buffer Descriptors

Background Writer Synchronized Scan

Semaphores

Statistics

LOCK

PROCLOCK

24 / 25

Page 25: Inside PostgreSQL Shared Memory - Momjian · Inside PostgreSQL Shared Memory BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. ... " jne 1f \n"

Conclusion

http://momjian.us/presentations https://www.flickr.com/photos/john_getchel/

25 / 25