27
Updates in MonetDB/XQuery Database T Peter Boncz (CWI) Sjoerd Mullender update actions Jens Teubner XQUF parsing Niels Nes logging Stefan Manegold the rest everything you always wanted to know about Updates in MonetDB/XQuery but were afraid to ask

Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Embed Size (px)

DESCRIPTION

everything you always wanted to know about Updates in MonetDB/XQuery but were afraid to ask. Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging Stefan Manegoldthe rest. XQuery Update Facility (XQUF) semantics & the update tape - PowerPoint PPT Presentation

Citation preview

Page 1: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Peter Boncz (CWI)

Sjoerd Mullender update actionsJens Teubner XQUF parsingNiels Nes loggingStefan Manegold the rest

everything you always wanted to know about

Updates in MonetDB/XQuerybut were afraid to ask

Page 2: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Overview• XQuery Update Facility (XQUF)

• semantics & the update tape

• Updatable XML storage in BATs• maintaining order in an array without O(N) cost

• Snapshot Isolation• why we want it, how we got it

• Concurrency Control• optimistic, with “abort convoys”

• Durability• physical logging

• Conclusion & Future Challenges

Page 3: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XQuery Update Facility (XUF)

January 2006, first proposal

Internal primitives:upd:insertBeforeupd:insertAfterupd:insertIntoupd:insertIntoAsLastupd:insertAttributesupd:deleteupd:replaceValueupd:rename

Pending update list concept

upd:applyUpdates

Page 4: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

insert

<item id="{id}">

<location>Brazil</location>

<quantity>200</quantity>

<name>XML in a nutshell</name>

<payment>Credit Card, Personal check</payment>

<shipping>Will ship internationally</shipping>

<incategory category="category1"/>

</item>

as last into

fn:doc("xmark.xml")/site/regions/samerica

Example

Page 5: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Semantics

let $root = doc(“foo.xml”)

for $i in (1,2,3)

return

do insert <x>$i</x> as first into $root),

do insert <y>$i</y> as first into $root))

Page 6: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Semantics

let $root = doc(“foo.xml”)

for $i in (1,2,3)

return

(do insert <x>$i</x> as first into $root),

do insert <y>$i</y> as first into $root))

We need to

• define an execution order, and

• enforce it

Page 7: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

The Update Tapeupdate = sequence ( int, node, node/str, node/str)

fn:delete() (DELETE, node, nil, nil)

fn:insert_*() (INSERT, tgt-node, tgt-level, expr-node)

fn:set-attr() (ATTR, node, qn, val)

fn:unset-attr() (ATTR, node, qn, nil)

fn:set-text() (TEXT, node, val, nil)

fn:set-pi() (PI, node, ins-val, arg-val)

fn:set-comment() (COMMENT, node, val, nil)

( element construction ), that combines updates, will enforce the correct order of the update tape.

Pathfinder compiler automatically inserts call to

fn:update(item*)

on the result of all update queries

Page 8: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XPath Accellerator [SIGMOD02]

pre posta 0 9b 1 3c 2 2d 3 0e 4 1f 5 8g 6 4h 7 7i 8 5j 9 6

Node-based relational encoding of XQuery's data model

<a> <b> <c> <d/> <e/> </c> </b> <f> <g/> <h> <i/> <j/> </h> </f></a>

descendant

ancestor following

preceding

Page 9: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XML Storage Revisited

pre size level0 9 01 3 12 2 23 0 34 0 35 4 16 0 27 2 28 0 39 0 3

pre posta 0 9b 1 3c 2 2d 3 0e 4 1f 5 8g 6 4h 7 7i 8 5j 9 6

post = pre + size - level

Page 10: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Updates: Mission Impossible?

pre posta 0 9b 1 3c 2 2d 3 0e 4 1f 5 8g 6 4h 7 7i 8 5j 9 6

size(following) = O(N) killer (?)

<a> <b> <c> <d/> <e/> </c> </b> <f> <g/> <h> <i/> <j/> </h> </f></a>

descendant

ancestor following

precedingINSERT SUBTREE

SIZE + |I|

PRE+ |I|

Page 11: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XML Storage Revisited

rid size level nid0 11 0 N01 5 1 N12 -1 null null3 0 null null4 2 2 N25 0 3 N36 0 3 N47 4 1 N58 0 2 N6

9 2 2 N710 0 3 N811 0 3 N9

pre size level0 9 01 3 12 2 23 0 34 0 35 4 16 0 27 2 28 0 39 0 3

pre size level0 11 01 5 12 -1 null3 null null4 2 25 0 36 0 37 4 18 0 29 2 210 0 311 0 3

pre posta 0 9b 1 3c 2 2d 3 0e 4 1f 5 8g 6 4h 7 7i 8 5j 9 6

post = pre + size - level

Allow holes Define logical pages

Page 12: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XML Storage Revisited

rid size level nid0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N87 0 3 N98 2 2 N2

9 0 3 N310 0 3 N411 4 1 N5

pre size level0 9 01 3 12 2 23 0 34 0 35 4 16 0 27 2 28 0 39 0 3

pre size level0 11 01 5 12 -1 null3 null null4 2 25 0 36 0 37 4 18 0 29 2 210 0 311 0 3

pre posta 0 9b 1 3c 2 2d 3 0e 4 1f 5 8g 6 4h 7 7i 8 5j 9 6

post = pre + size - level

Allow holes Define logical pages

page map0 01 22 1

rid = pre.swizzle( )

Page 13: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column

MonetDB: • rid not stored but computed (virtual oid)• allows positional lookup/join

Not stored no need to update it either

rid size level nid0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N87 0 3 N98 2 2 N2

9 0 3 N310 0 3 N411 4 1 N5

Page 14: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

XML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column

rid size level nid0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N87 0 3 N98 2 2 N2

9 0 3 N310 0 3 N411 4 1 N5

Updatable document collection:- pf:add-doc(URI, docname, perc>0)- pf:add-doc(URI, docname, collname, perc>0)

pre := nid.leftfetchjoin(nid_rid).swizzle(map_pid)

Read-only document collection:- pf:add-doc(URI, docname, 0)- pf:add-doc(URI, docname, collname, 0)NID = RID = PREpre := nid.leftfetchjoin(nid_rid).swizzle(map_pid) = FREE!!

pre size level nid0 11 0 N01 5 1 N12 -1 null null3 0 null null4 2 2 N25 0 3 N36 0 3 N47 4 1 N58 0 2 N6

9 2 2 N710 0 3 N811 0 2 N9

Page 15: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Versus 2-phase locking (2PL) == full serializability

Why not 2PL XML:

• lock semantics much more complex than in relational case (order matters!!)

• node-level locking in staircase join?? (now 10 cycles/node…)

Page 16: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation

Page 17: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Versus 2-phase locking (2PL) == full serializability

Why not 2PL XML:

• lock semantics much more complex than in relational case (order matters!!)

• node-level locking in staircase join?? (now 10 cycles/node…)

Why Snapshot Isolation:

• great for read-queries, great for ll_scj (runs unmodified)

• quite strong. Better than repeatable read. Oracle/Postgres do it.

Problem with Snapshot Isolation:

• in XQuery, it is unknown at compile-time what to snapshot (fn:doc(..))

Page 18: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Read Query1 Read Query 2 Update Query

rid size level Nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

Isolation By Shadow Paging (copy-on-write mmap)

• rid/pre delete/insert + attr-replace

Touch one byte per physical page: *addr = *addr;

MMU traps, OS replaces page by a copy

• we would like to replace the master copy once, not all client copies

Page 19: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Read Query1 Read Query 2 Update Query

rid size level Nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N87 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

Isolation By Shadow Paging (copy-on-write mmap)

• rid/pre delete/insert + attr-replace

Touch one byte per physical page: *addr = *addr;

MMU traps, OS replaces page by a copy

• we would like to replace the master copy once, not all client copies

Isolate-page

Page 20: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Read Query1 Read Query 2 Update Query

rid size level Nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N87 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

Isolation By Shadow Paging (copy-on-write mmap)

• rid/pre delete/insert + attr-replace

Touch one byte per physical page: *addr = *addr;

MMU traps, OS replaces page by a copy

Isolate-page

Page 21: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Read Query1 Read Query 2 Update Query

rid size level Nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N87 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

Isolation By Shadow Paging (copy-on-write mmap)

• rid/pre delete/insert + attr-replace

Touch one byte per physical page: *addr = *addr;

MMU traps, OS replaces page by a copy

• we would like to replace the master copy once, not all client copies

Master-update

Page 22: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Durability Masters become dirty

• no time to flush them during query

• log all changes to a WAL

= log all tuples that changed = entire pages

Recovery:

• after a crash, we do not know whether dirty pages got saved

• solution: overwrite tables with values from the WAL

Checkpointing Thread:

• every 5 minutes, if ‘many’ changes occurred, checkpoint

• memory mapped bats are sync()-ed ony dirty pages get written

• checkpoint locks collection, halts query processing

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

Page 23: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Durability Masters become dirty

• no time to flush them during query

• log all changes to a WAL

= log all tuples that changed = entire pages

Recovery:

• after a crash, we do not know whether dirty pages got saved

• solution: overwrite tables with values from the WAL

Checkpointing Thread:

• every 5 minutes, if ‘many’ changes occurred, checkpoint

• memory mapped bats are sync()-ed ony dirty pages get written

• checkpoint locks collection, halts query processing

rid size level nid

0 11 0 N01 5 1 N12 -1 null null3 0 null null4 0 2 N65 2 2 N76 0 3 N8

7 0 3 N9

Page 24: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

The Update Sequence Execute Query

• build update tape

• queries get isolated copies of a document (VM copy-on-write mmap)

Prepare Intensional Updates

• execute update tape.

• does not modify masters (except append-only tables)

Commit Phase (locked phase – per doc-collection)

• precommit

• detect conflicts (not the size-ancestors)

•write WAL (globally locked)

• read master-size-ancestors, use delta, log result

• update master tables

• isolate first! Only then update masters.

• update index structures

Page 25: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Many more Issues Solved

Conflicting Updates

• detect conflicting queries:

• look at RID page numbers and attr-IDs

• reacting to conflicts:

• abort query + automatic restart

• run CONVOY of 5 next update queries serially

Indexing and Updates

• Runtime QN NID mapping, with hash table

• read-only: not a hash, but keep sorted & persistent

• keep INS + DEL deltas to commit without changing the hash table

• Runtime NID ATTR hash table

• isolation loses you MonetDB dynamic hash table reuse

• share an old copy, exploit append-mostly

ACID properties on the Meta Level

• Shredding a new doc into a collection Query

• Shredding a new doc into a collection Update

• Using a collection Deleting/adding documents

• Meta Querying Deleting/adding documents

Concurrency

Updates Checkpoint

Shredding Query

Shredding Updates

Allocating New Pages and NIDS

• Offload shredding interference with freelist

• Unlocked access to private pages

Page 26: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Snapshot Isolation Versus 2-phase locking (2PL) == full serializability

Why not 2PL XML:

• lock semantics much more complex than in relational case (order matters!!)

• node-level locking in staircase join?? (now 10 cycles/node…)

Why Snapshot Isolation:

• great for read-queries, great for ll_scj (runs unmodified)

• quite strong. Better than repeatable read. Oracle/Postgres do it.

Problem with Snapshot Isolation:

• in XQuery, it is unknown at compile-time what to snapshot (fn:doc(..))

2PL (++)375 transactions/5 minutes

= 1.2 transaction/sec

Page 27: Peter Boncz (CWI) Sjoerd Mullenderupdate actions Jens TeubnerXQUF parsing Niels Neslogging

Updates in MonetDB/XQuery Database Techniek: XML Lecture(Part2)

Conclusions It works! Reasonable/good performance!

• transaction mgmt as a module extension outside a kernel works

• identified VM primitives that databases really need

Future work:

• Test on XML update benchmark TPOX (DB2: 700 trans/second)

• Packed Memory Arrays: alternative for page remapping?

• page remapping is technically O(N)

• Engineering:

• support for value-indexing (does PF support it already)

• asynchronous WAL writing to boost throughput

• port MIL to C primitives; port C primitives to Monet5