28
Lore: A Database Management System for Semi-structured Data Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom Stanford University Mustafa Emre ERKOÇ 20.11.2002

Lore: A Database Management System for Semi-structured Data

  • Upload
    prem

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Lore: A Database Management System for Semi-structured Data. Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom Stanford University. Mustafa Emre ERKO Ç 20.11.2002. What is Lore (Lightweight Object Repository): - PowerPoint PPT Presentation

Citation preview

Page 1: Lore: A Database Management System for  Semi-structured  Data

Lore: A Database Management System for

Semi-structured Data

Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom

Stanford University

Mustafa Emre ERKOÇ20.11.2002

Page 2: Lore: A Database Management System for  Semi-structured  Data

What is Lore (Lightweight Object Repository):A DBMS designed specifically for managing semi structured data.

In the paper: Storage manager Indexing Query Processing and Optimization User Interfaces is described.

Needs for such a DBMS: 1)The data can be irregular and not conform to a rigid schema.

2)There can be changes in the structure of the schema such as:data elements can change types, data not conforming previous schema can be edit and other schema modifications

What Lore Does: since it is designed for managing semi structured data, there is no existence of schema. Lore data model is very simple, self describing and nested object model called OEM(Object Exchange Model)

Page 3: Lore: A Database Management System for  Semi-structured  Data

Outline of Presentation :

Representing and Querying Semi Structured Model

Data Model

Lorel Query Language

System Architecture

Query and Update Processing

Query Operators and Query Plan

Query Indexing

Lindex and Vindex

Index Query Plan

Update Query Plan

Physical Storage

Novel Features

External Data Manager

Data Guides

Interfaces

API

Web

System Status and Future Work

Page 4: Lore: A Database Management System for  Semi-structured  Data

What has done so far?

The Lorel query language for semi structured data. 11/96

Enabling query formulation and optimization in semi structured database. 08/97

Representing and querying changes in semi structured data. 02/97

Querying semi structured heterogeneous information.12/96

Query optimization for semi structured data.Integrating dynamically-fetched external information into a dbms for semi structured data. 05/97

Page 5: Lore: A Database Management System for  Semi-structured  Data
Page 6: Lore: A Database Management System for  Semi-structured  Data

Representing and querying The object exchange model

Oem is designed for semistructured data Data can be thought as labeled directed graphThe vertices are objects and names are special labels serve as aliases for objects and as entry points into database Every object has a unique object identifier (oid) Example : &5Atomic Object: object that has no outgoing edge. Contain a value from one of the basic atomic types such as integer, real,string, gif etc

Example: Object &7 with value ClarkComplex Objects: all other objects that may have subobjects.

Example: &3 and subobjects &8 &9 &10 and &11

Any object that cannot be accessed by a path from some name is considered to be deleted.

1) Members have zero, one or more offices2) An office is sometimes string and sometimes integer and sometimes complex object3) Room may be a string and integer

Page 7: Lore: A Database Management System for  Semi-structured  Data

Representing and querying semistuctured dataLorel query language

Lorel is an extension of OQL.Basic building block of lorel is the simple path expression which is a name followed by a sequence of labels. Ex : DBGruop.Member.Office

Ex: select DBGruop.Member.OfficeWhere DBGruop.Member.Age>30

Result is office “Gates 252”Office

Building ”CIS”Room “411”

This query will not yield a run-time error if an Age object has a string value or is complex or They are singled-value, set-value or even empty for some group members.The OQL style of this query is:

Select OFrom DBGruop.Member M, M.Office OWhere exits A in M.Age : A>30

As will be seen later the first step of query processing is to rewriting the query into OQL style

Page 8: Lore: A Database Management System for  Semi-structured  Data

Representing and querying semistuctured datalorel query language

One can specify pattern for path pattern for labels

pattern for atomic values.Example:Select DBGroup.Member.NameWhere DBGroup.Member.Office(.Room%|.Cubicle)?

like”%252”Result Name “Jones” Name “ Smith”The following query illustrates subqueries and constructed results.

Select N.Name,(select M.Project.Title where M.Project.Title !=“lore”)

From DBGroup.Member MWhere M.Project.Title = “lore”

Result: Name “ Jones” Title “ Tsimmis”

Page 9: Lore: A Database Management System for  Semi-structured  Data

System Architecture

•Access to Lore is through a variety of applications or directly from applicationProgram Interface (API )•There is a simple textual interface generally used by developers.•The graphical interface is the primary interface for the end users. It provides powerfultools for query results, formulating simple queries and mechanism for viewing themultimedia atomic types

The query compilation layer is consist of -The parser -Preprocessor- Query plan generator -Query optimizer.Query optimizer, in addition to doing some transformations on the query plan, decides whether the use of indexes is feasible.

The Data Engine Layer houses - OEM object manager - Query operators-External data manager - Various utilities. the object manager functions as the translation layer between OEM and the low level file constructs.It supports : Fetching an object, comparing two objects, performing simple coercions, cache of frequently accessed objects

Page 10: Lore: A Database Management System for  Semi-structured  Data

System Architecture

Page 11: Lore: A Database Management System for  Semi-structured  Data

Query and Update Processing in Lore

Query processing in Lorel is fairly conventional with some exceptions:•Becuase of the flexibility of Lorel, the preprocessing of the parse tree to produce OQLlike query is complex.•Although the Lore engine is built around standard operators, some take an original flavorEx: scan operator.•A unique feature of Lore is its automatic coercion of atomic values. Coercion has an impact on the implementation of comparators.

The result of a Lorel query is always a set of OEM object which become subobject of a newly created result object.

Page 12: Lore: A Database Management System for  Semi-structured  Data

Query and Update Processing in LoreQuery processing

Query execution strategy is based on familiar database operators. A recursive iteratorapproach is used in query processing.With the iterators execution begins at the top of the query plan,with each node in theplan requesting a tuple from its children and performing some operations on the tuplesAfter a nodes complete its operation, it passes a resulting tuple up to its parent.The tuples on which the operation takes places is Object Assignments. OA is a simpleData structure containing slots corresponding to range variables in the query

If OA1holds the oid for member “Smith” then OA2 and Oa3 can hold the oid’s For one of the Smith’s Office subobject and one of the Smith’s Age subobject respectively

Page 13: Lore: A Database Management System for  Semi-structured  Data

Query and Update Processing in LoreQuery operators

Join, project, select operators are nearly identical to their corresponding relational operators.Aggregation operator calls its child exhaustively, storing the result temporarily or computing the aggregate incrementally. When the child can produce no more valid OA’s, a new object is created whose value the final aggregation. Other operators : SetOp, ArithOp, CreateSet, and Gruopby.

SetOp handles the set operations; Union, intersect and except.

ArithOp handles arithmatic operators as; addition, multiplication etc.

CreateSet is used to package the result of an arbitrary subquery before proceeding.

Gruopby handles queries that include a groupby expression.

Page 14: Lore: A Database Management System for  Semi-structured  Data

Example: select M.Name, count (M.Publication) from DBGroup.Member M Where M/Dept = ‘CS’

OQL like translation is select (select N from M.Name N), count (select P

from M.Publication) from DB.Group.Member M where exist D in M.Dept : D = ‘CS’

Page 15: Lore: A Database Management System for  Semi-structured  Data

Query and update processing in LoreQuery Optimization & Indexing

Query processor currently implement only a few simple heuristic optimization techniqueE.g Selection operators are pushed down the query tree, and in some cases redundant operator are eliminated or combined

Lore explored query plans that use indexes when feasible. In a relational DBMS, an indexis created on an attribute in order to locate tuples with particular attribute values quickly.In Lore, such a value index, alone is not sufficient since path to an object is as important as the value of object. So there are two types of indexes:

Link (Edge) Index :also called Lindex, takes an oid and a label , and returns the oid’s of All parents via specified label. It essentially provides “parent pointers”

Value Index: also called Vindex, takes a label, operator and value. It returns all atomic objects having an incoming edge with specified label and a value satisfying the specified operator and value.

Page 16: Lore: A Database Management System for  Semi-structured  Data

Indexing Value Indexing

Value indexing in Lore requires some novel features due to its non-strict typing system. Indexing system deals with coercion involving integers, reals, and stringsonly. But the situation is simplified by always coercing integers to reals as shownin the table.

Three types of Vindexes are maintained:String vindexReal vindexString coerced to real vindex

When using a Vindex for comparison there are two cases:1- If the comparison type string then: i-do a look up in the string vindex ii- if theValue can be coerced to a real then look up fir the coerced value in the real vindex2- If the value is real or iteger: i-do a look up in the real vindex; ii- also look up in the String coerced to real vindex

Page 17: Lore: A Database Management System for  Semi-structured  Data

Indexing Index query plan

If the query contains a comparison between a path expression and an integer, real or string and if the appropriate indexes are exist then a query plan that uses indexes will be created

Ex:Select OFrom DBGruop.Member M, M.Office OWhere exits A in M.Age : A>30

Query plans using indexes are different in the shape from those based on Scan operatorsindex plans traverse the database bottom-up , while scan based plans perform a top-downtraversal.

Page 18: Lore: A Database Management System for  Semi-structured  Data

IndexingIndex query plan

The new operators are:Vindex: iteratively finds all atomic objects with the value bigger than 30 and an incomingedge labeled Age, placing their oids in slot OA2Lindex: iteratively places into OA1 all parents of the object in OA2 via an Age edge.Once: the object could potentially have several partners via Age, however; since Age is existentially quantified in the query, we only want to consider each parent once, even if the it has several Age subobject; this is why we use once here.Name_Obj: in the second lindex, operator finds all parents of OA1 object via member edgeand place them in OA0. The objects in AO0 should be named as DBGroup and the purpose of Name_Obj operator is to do this.

Page 19: Lore: A Database Management System for  Semi-structured  Data

Query and Update processing in LoreUpdate query plan

Update is done by adding an Update operator to the query execution engine.Ex: update P.Member += (select DBGroup.Member where DBGroup.Member.Name=‘Clark’) from DBGroup.Project p where P.Title = ‘Lore’ or P.Title = ‘Tsimmis’This update operator add an edge between the projects whose titles are Lore andTsimmis, and members called Clark.

Here we use Create_edge operation. The other valid operations are: Destroy_edge,modify_atomic

Page 20: Lore: A Database Management System for  Semi-structured  Data

Query and Update Processing in Lore Bulk Loading and Physical Storage

Bulk LoadingData can be added to a Lore database in two ways:1) By using update statement 2) By load file

In the second case, a textual description of an OEM database is accepted by a loas utilitywhich includes useful features such as symbolic references for shared subobject, as wellas the ability to incorporate new data into an existing database.

Physical StructureLore arranges objects in a physical disk pages; each page has a number of slots with a single object in each slot.Object are variable length and so Lore places object according to a first-fit algorithm. Loreuse an object forwarding mechanism to handle object that grow too large objects that may span page.In addition, Lore supports for large objects that span many pages.

Page 21: Lore: A Database Management System for  Semi-structured  Data

Novel Features External Data

External Data Manager enables dynamic retrieval of information from other data sourcesbased on the queries issued to Lore.During the query evaluation, externally obtained data is combined with resident data and so The distinction between two type of data is invisible to end user.During the query processing, when the execution engine discover an external object information is fetched from external source to answer query. Also the fetched information is stored in database until it becomes stale.

Page 22: Lore: A Database Management System for  Semi-structured  Data

Novel FeaturesExternal Data

Specifications for an external objects:i-The location of wrapper program that fetches the external data and translate it into OEMii-A quantum that indicates that indicates the time interval until the fetched information become stale. iii-A set of arguments that are used to limit the information fetched in a call to the external source

Arguments sent to the external source can come from 3 places:i-the query being processed (query defined)ii-values of other objects in the local database(data defined)iii-constant values tied to external object(hard-coded)

Page 23: Lore: A Database Management System for  Semi-structured  Data

Novel FeaturesExternal Data

Many calls to an external sources quickly dominate query processing time.

To limit the number of calls some mechanism is used.

One of them is to track the argument sets used by previous queries and determinewhen previously fetched information or entirely subsumes information required by the current argument set

Page 24: Lore: A Database Management System for  Semi-structured  Data

Novel Features Data Guides

Lore does not have an explicit schema so query formulation and optimization are particularly challenging.A dataguide is a concise and accurate structure of an OEM database, stored as an OEM object. Each possible path expression of a database is encoded exactly once in the dataguides.

Considering the role of missing schema, theDataguides can guide the query processorIn relational or O2 systems the schema is

explicitly created before any data is loaded, However, dataguides are dynamically generatedor maintained.

Page 25: Lore: A Database Management System for  Semi-structured  Data

Interfaces to Lore Application Programming Interface

API provides a gateway between Lore and any user interface or client application. For instance, it is used by the system’s textual interface, which passes user commands to Lore and present result of queries in a hierarchical display.

API is composed of a small collection of C++ classes. At the highest level it is used :-as a client program to connect to a Lore database.-To submit queries-To process query result.

LoreConnection class is to connect a specific databaseSubmit function is to submit Lorel queries. It is also used for other Lore command suchas index creation and updates. When submit is called with a query, it returns query resultsas a LoreOem object. LoreOem only contains oid and the actual value is fetched from database on demand.

Page 26: Lore: A Database Management System for  Semi-structured  Data

User Interfaces to LoreWeb interface

By visiting an url and choosing a database, a user can connect to web interface. The user may submit a textual Lorel query or select a sample prewritten query.Result are displayed in an HTML, hierarchical format.

Page 27: Lore: A Database Management System for  Semi-structured  Data

System status and Future work

Lore is consist of approximately 60.000 lines of C++ codes.Some language features such as external predicates and functions are still under implementation. General path expression are not implemented in their full generality, although a substantial and very useful is.

It is planned to explore how Lorel could be translated to SQL 3 and thus implementedon top of an object relational dbms.Performance IssuesThere is a little performance analysis are done for Lore. Comparing the performance of lore against the implemented lorel on top of O2

There is a significant additional research to do in query optimization, including query writing, operation ordering, selecting the best use of indexes in query plans. Currently all expansions of path expressions in query path are done at run-time. It is planned to explore the compile-time approach and compare its performance against theRun-time approach.

Page 28: Lore: A Database Management System for  Semi-structured  Data

System Status and Future Work

It is also considering to investigate more sophisticated techniques for customizing the presentation of OEM object in a Web environment.

Finally, it is planned to incorporate a special text type along with a full-text indexing System because many applications appropriate for a DMBS such as Lore, include a Significant amount of text data.