Sedna: A Native XML DBMS

Sedna: A Native XML DBMSSedna: A Native XML DBMS

Andrey Fomichev

Maxim Grinev

Sergey Kuznetsov

Institute for System Programming of RAS

SOFSEM 2006

23 January

AgendaAgenda

Sedna overview and goalsData organizationMemory managementQuery evaluationConclusion

ChallengesChallenges

Fernandez, M.F., Semeon, J.: Growing XQuery. ECOOP 2003

Extending XQuery with data update facilities Growing XQuery to a program language

Physical layer for supporting these aspects is required. The layer is primarily based on

Data structures Memory management

Sedna OverviewSedna Overview

Full-featured database system (external and main memory management, query and update facilities, concurrency etc.)

Native XML database Based on the XQuery language and the

XQuery/XPath data model XUpdate language Implemented in Scheme and C/C++ Supported platforms are Windows and Linux

AgendaAgenda


Data OrganizationData Organization

Descriptive schema driven storage strategy is used, which consists in clustering nodes of XML document according to their position in descriptive schema

Direct pointers are used to represent relations between nodes of an XML document such as parent, child and sibling relationships

Descriptive Schema (Data Guide)Descriptive Schema (Data Guide)<library> <book> <title>Foundation on databases</title> <author>Abiteboul</author> <author>Hull</author> <author>Vianu</author> </book> . . . <book> <title>An Introduction to Database Systems</title> <author>Date</author> <issue> <publisher>Addison-Wesley</publisher> <year>2004</year> </issue> </book> <paper> <title>A Relational Model for Large Shared Data Banks</title> <author>Codd</author> <paper> . . . <paper> <title>The Complexity of Relational Query

Languages</title> <author>Codd</author> <paper></library>

library

book paper

title author issue

publisher year

title book

/child::library/child::book/child::title

library

book

title

Data StructuresData Structurestitle

. . .

node handle

Indirection table

children “by descriptive schema”

next-in-block

right-sibling

prev-in-block

left-sibling

parent

label

Structural query Structural query efficiencyefficiency

When we answer structural queries like

We Read only blocks containing necessary

information and do not read other blocks Every block, which is being read, does

contain only those nodes that are to be in the answer

/child::library/child::book/child::title

Node updates Node updates efficiencyefficiency

Node descriptors have fixed size aside the block

Node descriptors are partly ordered

Immutable numbering scheme

Indirection table for parents

node right-sibling

left-sibling

parent

indirectiontable

child child…

AgendaAgenda


Memory ManagementMemory Management Pointers are used to present relationships between

nodes and traversing nodes results in intensive pointer dereferencing, so the dereferencing operation should be effective

Database address space should be big enough to represent large volumes of data

OS memory management restrictions Restriction on the size of address space caused by

32-bit architecture that prevails nowadays We can’t control the page replacement (swapping)

procedure

Layered Address Space (LAS)Layered Address Space (LAS)

Layered Address Space

OS Virtual Process Address Space

Transaction

process

Buffer Manager

External Memory (Disk)

(layer, addr)

addr

MapViewOfFile(Windows)

mmap (Linux)

Buffer Memory

VirtualLock (Windows)

mlock (Linux)

layer * LAYER_SIZE + addr

Sedna Memory Management Sedna Memory Management BenefitsBenefits

Emulating 64-bit virtual address space on the standard 32-bit architecture allows removing restrictions on the size of database

Pointer dereferencing in LAS is comparable to dereferencing of ordinary pointer in a low-level programming language because we map the layer to process virtual address space on an equality basis

The same pointer representation in main and secondary memory is used that allows avoiding costly pointer swizzling

AgendaAgenda


Query Evaluation AspectsQuery Evaluation Aspects

Suspended element constructors Different strategies for XPath queries

evaluation Combining Lazy and Strict Semantics

Element constructorsElement constructors

XML element construction requires deep copy of its content (so, the operation is heavy)

Suspended element constructors (the copy is performed on demand when some operation gets into the constructed element)

Different strategies for XPath Different strategies for XPath queries evaluationqueries evaluationlibrary

book paper

title author issue

publisher year

title book

/library/book[issue/year=2004]

/library/book/issue/

year[.=2004]/../..

year

book

Combining Lazy and Strict Combining Lazy and Strict Semantics (1)Semantics (1)

Iterative result computation (open; next; close)

Iterative result computation with functional programming language give lazy evaluation

On the other hand, strict semantic of a language is more efficient comparing with lazy semantics

So, we combine strict and lazy semantics for XQuery

Combining Lazy and Strict Combining Lazy and Strict Semantics (2)Semantics (2)

Query evaluations starts in lazy mode Every function call is a reason to switch to

strict mode if the sizes of arguments are relatively small

The large input sequence for any physical operation in the strict mode is the subject to switch to lazy mode

ConclusionConclusion

Efficient evaluation of structured XPath queries

Local node-level updates Effective processing of XML data in main

memory comparable to general purpose programming language

Thank you for your attentionThank you for your attention

You can find more about Sedna at

http://modis.ispras.ru/Development/sedna.htm

Documents

Sedna: A Native XML DBMS