33
OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla [email protected]

OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla [email protected]

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

OEM and LORE Query Language

Sanjay Madria

Department of Computer Science

University of Missouri-Rolla

[email protected]

Page 2: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

• Source : http://WWW-DB.Stanford.EDU/lore/

Page 3: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Semistructured Data (SSD)• No explicit schema; Irregular and incomplete data• Schema may be hidden or mixed with data• Examples:

– Semi-structured data arises mainly from the integration of heterogeneous data sources; both structured and non-rigid structured

• Information sources change, or new sources added.• semantic discrepancies among heterogeneous data

sources

– Data from the web• Overall site structure may change often.

– Biological data

Page 4: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Characteristics of SSD

• Missing or additional attributes

• Multiple attributes

• Different types in different objects

• Heterogeneous collections

• Self-describing

• Irregular, no priori structure

Page 5: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Object Exchange Model (OEM)

Motivation • Self-describing data model• information exchange and extraction• Handle incomplete and irregular data• Why a new data model? … it not a new model.

Page 6: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

LORE

• Lore : Lightweight Object Repository– Lightweight because

• Object Model supported is lightweight

• No multiuser or heavyweight DBMS features

Page 7: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Lore - motivation

• Relational data model has null values, and OO models have inheritance and complex objects. Both have difficulties in designing schemas to incorporate irregular data.

• To manage semi-structured data, as in such environment :– Difficult to decide in advance on a single, correct schema

as

• structure of the data may evolve rapidly, or

• data elements may change types, or

• data not conforming to previous structure may be added

Page 8: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

• Thus:– Need for management of semi-structured data!

• Data managed by Lore is not confined to a schema and it may be irregular or incomplete.

• OEM is the Lore’s data model.

• Lorel is Lore’s query language.

Page 9: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Object Exchange Model (OEM)• Data in this model can be thought of as a labeled

directed graph.– Schema-less and self-describing.– nodes are objects and – edges are labeled with attribute names and,– leaf nodes have atomic values

• Object nesting.• Vertices in graph are objects.

– Each object has a unique object identifier (oid), such as &5.

– Atomic objects have no outgoing edges and are of types such as int, real, string, gif, java, etc.

– All other objects that have outgoing edges are called complex objects.

Page 10: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

OEM (Cont.)• Examples:

– Object &3 is complex, and its subobjects are &8, &9, &10, and &11.

– Object &7 is atomic and has value “Clark”.

• DBGroup is a name that denotes object &1.(Names are entry points into the database).

• Type and structure heterogeneity : Observe that members may have 0, 1 or more offices, office is a string and sometimes, also complex object, a room may be string and integer

• DbGroup.Member denotes all member-labeled subobjects

Page 11: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

DBGroup

&1&1

&2&2 &3&3 &4&4 &5&5 &6&6

&11&11&8&8 &9&9 &10&10 &12&12 &13&13 &14&14&7&7 &15&15 &16&16

&17&17 &18&18 &19&19 &20&20

Member

Member

MemberMember

OfficeAge

Name Project

ProjectProject

Project

Name

Name AgeOffice

Office

RoomBuilding Building Room

“Clark” “Smith”

46 “Gates 252” “Lore” “Tsimmis”

“Jones” 28

An OEM Database

“CIS” “411” “CIS” 252

Page 12: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Object Exchange Model - OEM

•Each value exchanged is given an explicit label.

Object temp-in-Fahrenheit, integer, 80 -“temp-in-Fahrenheit” is the label. -Each object is self-describing, with a label, type and value.

set-of-temps, set, {cmpnt1, cmpnt2} cmpnt1 is temp-in-Fahrenheit, integer, 80 cmpnt2 is temp-in-Celsius, integer, 20

Page 13: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Labels• Lables Play two roles

– identifying an object (component)– identifying the meaning of an object (component)

• Person-name both identifies cmpnt1 and coveys its meaning.

person-record, set, {cmpnt1, cmpnt2, cmpnt3} cmpnt1 is person-name, string, ``Fred’’ cmpnt2 is office-num-in-bldg-5, integer, 333 cmpnt3 is department, string, ``toy’’

• In relational data this corresponds to ….

Page 14: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Labels - Issues

•Labels are relative (more specific) to the source of the data object.

•Similar labels from different sources need to be resolved.

• Labels provide the flexibility in representing

object structure

Page 15: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

OEM - Specification• Each object in OEM has the following structure:

– Label: A variable character string describing what the object represents.

– Type: The data type of the object’s value. Each is either an atom type, or type set.

– Value: A variable-length value of the object.– Object-ID: A unique variable-length identifier

for the object or null.

Label Type Value Object-ID

Page 16: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

OEM - Summary• OEM is an information exchange model. It does not specify how

objects are stored at source.

• OEM does specify how objects are received at a client, but after objects are received they can be stored in any way the client likes.

• Each source has a distinguished object with lexical identifier ``root’’.

Page 17: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Example• <biblio,set,{doc1,doc2,…,docn}>• doc1 is <doc, set, {auths1, topic1, call-no1}>• auths1 is <auth-set,set {auth11}>• auth11 is <auth-ln, string, ``Ullman’’>• topic1 is <topic, string,``Databases’’>• call-no1 is <internal-call-no, integer, 25>• doc2 is <doc, set, {auths2, topic2, call-no2}>• auths2 is <auth-set,set {auth21, auth22,

auth23}>• auth21 is <auth-ln, string, ``Aho’’> • auth22 is <auth-ln, string, ``Hopcroft’’> • auth23 is <auth-ln, string, ``Ullman’’> •

• topic2 is <topic, string,``Algorithms’’>

• call-no1 is <dewey-decimal, string, ``BR273’’>

• docn is <doc, set, {authsn, topicn, call-non}>

• authsn is <auth,string,

``Crichton’’>• topic1 is <topic,

string,``Dinosaurs’’>• call-no1 is <fictional-

call-no, integer, 95>• biblio is the root object.

Page 18: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

OEM - QLSELECT Fetch-expression

FROM Object

WHERE Condition

• The result of this query is itself an object, with special label ``answer’’:

answer, set, {obj1, obj2, …, objn}

• Each returned obji is a component of object specified in the From clause of the query, where the component is located by the Fetch-expression and satisfies the Condition.

Page 19: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Path• The notion of path is used in both Fetch-Expression in

the Select clause and the condition in the Where clause.

• Path describes traversals through an object using subobject structure and labels.

• Example: ``biblio.doc.auth’’

• Paths are used in Fetch-Expression to specify which components are are returned in the answer object.

• Paths are used in the condition to qualify the fetched objects or other (related) components in the same object structure.

Page 20: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Queries - Simple• Retrieve the topic of each document for which

``Ullman’’ is one of the authors:SELECT biblio.doc.topic

FROM root

WHERE biblio.doc.auth-set.auth-ln = ``Ullman’’

• Intuitively, the query’s where clause finds all paths through subobject structure with the sequence of labels [biblio,doc,auth-set,auth-ln] such that the object at the end of the path has value ``Ullman.’’

<answer, set, {obj1, obj2}>

obj1 is <topic, string, ``Databases’’>

obj2 is <topic, string, “Algorithms”>

Page 21: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Queries - ``wild-cards’’• Retrieve all documents with internal call number:

SELECT biblio.?.topic

FROM root

WHERE biblio.?.internal-call-no

• ``?’’ label matches any label. For this query, the doc labels can be replaced by any other strings and query would produce the same result. By convention, two occurrences of ? In the same query must match the same label unless variables are used.

<answer, set, {obj1}>

obj1 is <topic, string, ``Databases’’>

Page 22: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Queries - ``wild-paths’’• Retrieve all documents with internal call number:

SELECT *.topic

FROM root

WHERE *.internal-call-no

• Symbol ``*’’ matches any path of length one or more. The use of * followed by a single label is a convenient and common way to locate objects with a certain label in complex structure. Similar to ?, two occurrences of * in the same query must match the same sequence of labels, unless variables are used.

<answer, set, {obj1}>

obj1 is <topic, string, ``Databases’’>

Page 23: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Queries - variables• Retrieve each document for which both ``Hopcroft’’

and ``Aho’’ are co-authors:SELECT biblio.doc

FROM root

WHERE biblio.doc.auth-set.auth-ln=``Aho’’ and

biblio.doc.auth-set.auth-ln=``Hopcroft’’

• Here, the query finds all the paths with structure [biblio, doc, auth-set], and with two distinct path completions with label auth with values ``Aho’’ and ``Hopcroft’’

<answer, set, {obj1}>

obj1 is the complete doc2

Page 24: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Lorel Query Language

• Need query language that supports path expressions for traversing graph data and handling of ‘typeless’ data.

• A simple path expression is a name followed by a sequence of labels.– DBGroup.Member.Office.– Set of objects that can be reached starting with

the DBGroup object, following edges labeled as member and then office.

Page 25: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Lorel (cont.)

• Example:– select DBGroup.Member.Office

where DBGroup.Member.Age < 30

• Result:– Office “Gates 252”– Office

Building “CIS”Room “411”

Page 26: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Lorel Query Rewrite

• Previous query rewritten to (OQL style)– select O

from DBGroup.Member M, M.Office Owhere exists y in M.Age : y < 30

Page 27: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Lorel Query Features • Explicitly handle coercion.

• Automatic type coercion 0.5 < “0.9” should return true

• Comparison on age transformed to existential condition.

• Since all properties are set-valued in OEM.• A user can ask DBGroup.Member.Age < 30

regardless of whether Age is single valued, set valued, or unknown.

Page 28: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

• Path expression queries -specification for a set of possible paths through the graph– Example - * is a path expression that matches

any number of labels

• Use of Data guides – Structural summary of the database

Page 29: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Lorel (cont.)

• General path expressions are loosely specified patterns for labels in the database.(‘|’ disjunction, ‘?’ label pattern optional)

• Example:– select DBGroup.Member.Name

where DBGroup.Member.Office(.Room%|.Cubicle)?like “%252”

• Result:– Name “Jones”

Name “Smith”

Page 30: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Lorel Queries - Simple Path Expression

•Retrieve the offices of members with age greater than 30 years:

Query SELECT DBGroup.Member.OfficeWHERE DBGroup.Member.Age > 30

Result Office “Gates 252”

Office

Building “CIS”

Room “411”

Page 31: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Queries - General Path Expression

Query SELECT DBGroup.Member.NameWHERE DBGroup.Member.Office(.Room

%|.Cubicle)?

Like “%252”

Result Name “Jones”

Name “Smith”

•Room% matches all labels starting from Room, like Room68. “|” stands for disjunction. “?” indicates that the label pattern is optional. “like %252” specifies that the data value should end with string “252”.

Page 32: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Queries - SubQueriesRetrieve Lore project members who work on other

projects

Query SELECT M.Name,

( SELECT M.Project.Title

WHERE M.Project.Title != “Lore”)

FROM DBGroup.Member M

WHERE M.Project.Title = “Lore”

Result Member

Name “Jones”

Title “Tsimmis”

Page 33: OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla madrias@umr.edu

Lore - Summary• Lore does facilitate query and updates on semi-

structural databases

• There has been more work done on optimization using: data guides (vldb97).

• How is this related to WWW?

• XML-QL and related work provides the answer.