22
Sedna: A Native XML Sedna: A Native XML DBMS DBMS Andrey Fomichev Maxim Grinev Sergey Kuznetsov Institute for System Programming of RAS SOFSEM 2006 23 January

Sedna: A Native XML DBMS

  • Upload
    oni

  • View
    57

  • Download
    0

Embed Size (px)

DESCRIPTION

Sedna: A Native XML DBMS. Andrey Fomichev Maxim Grinev Sergey Kuznetsov Institute for System Programming of RAS. SOFSEM 2006 23 January. Agenda. Sedna overview and goals Data organization Memory management Query evaluation Conclusion. Challenges. - PowerPoint PPT Presentation

Citation preview

Page 1: Sedna: A Native XML DBMS

Sedna: A Native XML DBMSSedna: A Native XML DBMS

Andrey Fomichev

Maxim Grinev

Sergey Kuznetsov

Institute for System Programming of RAS

SOFSEM 2006

23 January

Page 2: Sedna: A Native XML DBMS

AgendaAgenda

Sedna overview and goalsData organizationMemory managementQuery evaluationConclusion

Page 3: Sedna: A Native XML DBMS

ChallengesChallenges

Fernandez, M.F., Semeon, J.: Growing XQuery. ECOOP 2003

Extending XQuery with data update facilities Growing XQuery to a program language

Physical layer for supporting these aspects is required. The layer is primarily based on

Data structures Memory management

Page 4: Sedna: A Native XML DBMS

Sedna OverviewSedna Overview

Full-featured database system (external and main memory management, query and update facilities, concurrency etc.)

Native XML database Based on the XQuery language and the

XQuery/XPath data model XUpdate language Implemented in Scheme and C/C++ Supported platforms are Windows and Linux

Page 5: Sedna: A Native XML DBMS

AgendaAgenda

Sedna overview and goalsData organizationMemory managementQuery evaluationConclusion

Page 6: Sedna: A Native XML DBMS

Data OrganizationData Organization

Descriptive schema driven storage strategy is used, which consists in clustering nodes of XML document according to their position in descriptive schema

Direct pointers are used to represent relations between nodes of an XML document such as parent, child and sibling relationships

Page 7: Sedna: A Native XML DBMS

Descriptive Schema (Data Guide)Descriptive Schema (Data Guide)<library> <book> <title>Foundation on databases</title> <author>Abiteboul</author> <author>Hull</author> <author>Vianu</author> </book> . . . <book> <title>An Introduction to Database Systems</title> <author>Date</author> <issue> <publisher>Addison-Wesley</publisher> <year>2004</year> </issue> </book> <paper> <title>A Relational Model for Large Shared Data Banks</title> <author>Codd</author> <paper> . . . <paper> <title>The Complexity of Relational Query

Languages</title> <author>Codd</author> <paper></library>

library

book paper

title author issue

publisher year

title book

/child::library/child::book/child::title

library

book

title

Page 8: Sedna: A Native XML DBMS

Data StructuresData Structurestitle

. . .

node handle

Indirection table

children “by descriptive schema”

next-in-block

right-sibling

prev-in-block

left-sibling

parent

label

Page 9: Sedna: A Native XML DBMS

Structural query Structural query efficiencyefficiency

When we answer structural queries like

We Read only blocks containing necessary

information and do not read other blocks Every block, which is being read, does

contain only those nodes that are to be in the answer

/child::library/child::book/child::title

Page 10: Sedna: A Native XML DBMS

Node updates Node updates efficiencyefficiency

Node descriptors have fixed size aside the block

Node descriptors are partly ordered

Immutable numbering scheme

Indirection table for parents

node right-sibling

left-sibling

parent

indirectiontable

child child…

Page 11: Sedna: A Native XML DBMS

AgendaAgenda

Sedna overview and goalsData organizationMemory managementQuery evaluationConclusion

Page 12: Sedna: A Native XML DBMS

Memory ManagementMemory Management Pointers are used to present relationships between

nodes and traversing nodes results in intensive pointer dereferencing, so the dereferencing operation should be effective

Database address space should be big enough to represent large volumes of data

OS memory management restrictions Restriction on the size of address space caused by

32-bit architecture that prevails nowadays We can’t control the page replacement (swapping)

procedure

Page 13: Sedna: A Native XML DBMS

Layered Address Space (LAS)Layered Address Space (LAS)

Layered Address Space

OS Virtual Process Address Space

Transaction

process

Buffer Manager

External Memory (Disk)

(layer, addr)

addr

MapViewOfFile(Windows)

mmap (Linux)

Buffer Memory

VirtualLock (Windows)

mlock (Linux)

layer * LAYER_SIZE + addr

Page 14: Sedna: A Native XML DBMS

Sedna Memory Management Sedna Memory Management BenefitsBenefits

Emulating 64-bit virtual address space on the standard 32-bit architecture allows removing restrictions on the size of database

Pointer dereferencing in LAS is comparable to dereferencing of ordinary pointer in a low-level programming language because we map the layer to process virtual address space on an equality basis

The same pointer representation in main and secondary memory is used that allows avoiding costly pointer swizzling

Page 15: Sedna: A Native XML DBMS

AgendaAgenda

Sedna overview and goalsData organizationMemory managementQuery evaluationConclusion

Page 16: Sedna: A Native XML DBMS

Query Evaluation AspectsQuery Evaluation Aspects

Suspended element constructors Different strategies for XPath queries

evaluation Combining Lazy and Strict Semantics

Page 17: Sedna: A Native XML DBMS

Element constructorsElement constructors

XML element construction requires deep copy of its content (so, the operation is heavy)

Suspended element constructors (the copy is performed on demand when some operation gets into the constructed element)

Page 18: Sedna: A Native XML DBMS

Different strategies for XPath Different strategies for XPath queries evaluationqueries evaluationlibrary

book paper

title author issue

publisher year

title book

/library/book[issue/year=2004]

/library/book/issue/

year[.=2004]/../..

year

book

Page 19: Sedna: A Native XML DBMS

Combining Lazy and Strict Combining Lazy and Strict Semantics (1)Semantics (1)

Iterative result computation (open; next; close)

Iterative result computation with functional programming language give lazy evaluation

On the other hand, strict semantic of a language is more efficient comparing with lazy semantics

So, we combine strict and lazy semantics for XQuery

Page 20: Sedna: A Native XML DBMS

Combining Lazy and Strict Combining Lazy and Strict Semantics (2)Semantics (2)

Query evaluations starts in lazy mode Every function call is a reason to switch to

strict mode if the sizes of arguments are relatively small

The large input sequence for any physical operation in the strict mode is the subject to switch to lazy mode

Page 21: Sedna: A Native XML DBMS

ConclusionConclusion

Efficient evaluation of structured XPath queries

Local node-level updates Effective processing of XML data in main

memory comparable to general purpose programming language

Page 22: Sedna: A Native XML DBMS

Thank you for your attentionThank you for your attention

You can find more about Sedna at

http://modis.ispras.ru/Development/sedna.htm