Upload
oni
View
57
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Sedna: A Native XML DBMS. Andrey Fomichev Maxim Grinev Sergey Kuznetsov Institute for System Programming of RAS. SOFSEM 2006 23 January. Agenda. Sedna overview and goals Data organization Memory management Query evaluation Conclusion. Challenges. - PowerPoint PPT Presentation
Citation preview
Sedna: A Native XML DBMSSedna: A Native XML DBMS
Andrey Fomichev
Maxim Grinev
Sergey Kuznetsov
Institute for System Programming of RAS
SOFSEM 2006
23 January
AgendaAgenda
Sedna overview and goalsData organizationMemory managementQuery evaluationConclusion
ChallengesChallenges
Fernandez, M.F., Semeon, J.: Growing XQuery. ECOOP 2003
Extending XQuery with data update facilities Growing XQuery to a program language
Physical layer for supporting these aspects is required. The layer is primarily based on
Data structures Memory management
Sedna OverviewSedna Overview
Full-featured database system (external and main memory management, query and update facilities, concurrency etc.)
Native XML database Based on the XQuery language and the
XQuery/XPath data model XUpdate language Implemented in Scheme and C/C++ Supported platforms are Windows and Linux
AgendaAgenda
Sedna overview and goalsData organizationMemory managementQuery evaluationConclusion
Data OrganizationData Organization
Descriptive schema driven storage strategy is used, which consists in clustering nodes of XML document according to their position in descriptive schema
Direct pointers are used to represent relations between nodes of an XML document such as parent, child and sibling relationships
Descriptive Schema (Data Guide)Descriptive Schema (Data Guide)<library> <book> <title>Foundation on databases</title> <author>Abiteboul</author> <author>Hull</author> <author>Vianu</author> </book> . . . <book> <title>An Introduction to Database Systems</title> <author>Date</author> <issue> <publisher>Addison-Wesley</publisher> <year>2004</year> </issue> </book> <paper> <title>A Relational Model for Large Shared Data Banks</title> <author>Codd</author> <paper> . . . <paper> <title>The Complexity of Relational Query
Languages</title> <author>Codd</author> <paper></library>
library
book paper
title author issue
publisher year
title book
/child::library/child::book/child::title
library
book
title
Data StructuresData Structurestitle
. . .
node handle
Indirection table
children “by descriptive schema”
next-in-block
right-sibling
prev-in-block
left-sibling
parent
label
Structural query Structural query efficiencyefficiency
When we answer structural queries like
We Read only blocks containing necessary
information and do not read other blocks Every block, which is being read, does
contain only those nodes that are to be in the answer
/child::library/child::book/child::title
Node updates Node updates efficiencyefficiency
Node descriptors have fixed size aside the block
Node descriptors are partly ordered
Immutable numbering scheme
Indirection table for parents
node right-sibling
left-sibling
parent
indirectiontable
child child…
AgendaAgenda
Sedna overview and goalsData organizationMemory managementQuery evaluationConclusion
Memory ManagementMemory Management Pointers are used to present relationships between
nodes and traversing nodes results in intensive pointer dereferencing, so the dereferencing operation should be effective
Database address space should be big enough to represent large volumes of data
OS memory management restrictions Restriction on the size of address space caused by
32-bit architecture that prevails nowadays We can’t control the page replacement (swapping)
procedure
Layered Address Space (LAS)Layered Address Space (LAS)
Layered Address Space
OS Virtual Process Address Space
Transaction
process
Buffer Manager
External Memory (Disk)
(layer, addr)
addr
MapViewOfFile(Windows)
mmap (Linux)
Buffer Memory
VirtualLock (Windows)
mlock (Linux)
layer * LAYER_SIZE + addr
Sedna Memory Management Sedna Memory Management BenefitsBenefits
Emulating 64-bit virtual address space on the standard 32-bit architecture allows removing restrictions on the size of database
Pointer dereferencing in LAS is comparable to dereferencing of ordinary pointer in a low-level programming language because we map the layer to process virtual address space on an equality basis
The same pointer representation in main and secondary memory is used that allows avoiding costly pointer swizzling
AgendaAgenda
Sedna overview and goalsData organizationMemory managementQuery evaluationConclusion
Query Evaluation AspectsQuery Evaluation Aspects
Suspended element constructors Different strategies for XPath queries
evaluation Combining Lazy and Strict Semantics
Element constructorsElement constructors
XML element construction requires deep copy of its content (so, the operation is heavy)
Suspended element constructors (the copy is performed on demand when some operation gets into the constructed element)
Different strategies for XPath Different strategies for XPath queries evaluationqueries evaluationlibrary
book paper
title author issue
publisher year
title book
/library/book[issue/year=2004]
/library/book/issue/
year[.=2004]/../..
year
book
Combining Lazy and Strict Combining Lazy and Strict Semantics (1)Semantics (1)
Iterative result computation (open; next; close)
Iterative result computation with functional programming language give lazy evaluation
On the other hand, strict semantic of a language is more efficient comparing with lazy semantics
So, we combine strict and lazy semantics for XQuery
Combining Lazy and Strict Combining Lazy and Strict Semantics (2)Semantics (2)
Query evaluations starts in lazy mode Every function call is a reason to switch to
strict mode if the sizes of arguments are relatively small
The large input sequence for any physical operation in the strict mode is the subject to switch to lazy mode
ConclusionConclusion
Efficient evaluation of structured XPath queries
Local node-level updates Effective processing of XML data in main
memory comparable to general purpose programming language
Thank you for your attentionThank you for your attention
You can find more about Sedna at
http://modis.ispras.ru/Development/sedna.htm