39
A syntax for Data by Jose Carlos Cabrera Zuniga

A syntax for Data

  • Upload
    mrinal

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

A syntax for Data. by Jose Carlos Cabrera Zuniga. Preface. - PowerPoint PPT Presentation

Citation preview

Page 1: A syntax for Data

A syntax for Data

by Jose Carlos Cabrera Zuniga

Page 2: A syntax for Data

Preface

In this presentation, it is going to be introduced the relation between semistructured data and XML. To accomplish with this objective, first it is showed the semistructured data concept. Then, it is showed the use of XML to represent this kind of data.

Page 3: A syntax for Data

Semistructured Data

Semistructured data is often explained as schemaless or self describing, terms that indicate that there is no separate description of the type or structure of data.

Page 4: A syntax for Data

{name: “Alan”, tel: 2157786, email: “[email protected]” }

labels

data

Page 5: A syntax for Data

{

name: {first: “Alan”, last: “Black”},

tel: 2157786,

email: “[email protected]

}

Page 6: A syntax for Data

{ name: {first: “Alan”, last: “Black”}, tel: 2157786, email: “[email protected]”}

nameemail

tel

2157786

[email protected]

first last

“Alan” “Black”

Page 7: A syntax for Data

{ person:

{name: “Alan”, tel: 2157786, email: “[email protected]” }

person:

{name: “Sara”, tel: 2136877, email: “[email protected]” }

person:

{name: “Fred”, tel: 2157786, email: “[email protected]” }

}

Page 8: A syntax for Data

One of the main strengths of semistructured data is its ability to accommodate variations in structures…

{ person: {name: “Alan”, tel: 2157786, email: “[email protected]” } person: { name: {first: “Sara”, last: “Green”} tel: 2136877, email: “[email protected]” } person: {name: “Fred”, tel: 2157786, Height: 183 }}

Page 9: A syntax for Data

In semistructured data, we make the conscious choice of forgetting any type the data might have had, and we serialize it by annotating each data item explicitly with its description (such a name, tel, etc.). Such data is called selfdescribing.

Page 10: A syntax for Data

Base Types:

• Numbers start with a digit.• Strings start with a quotation mark “

• There are many other types, with defined textual encodings, such as date, time, wav, that we would like to include. For each one it would be necessary to develop a notation (in many cases it is not necessary to re-invent a notation).

Page 11: A syntax for Data

REPRESENTING RELATIONAL DATABASES

A relational database is normally described by a schema such as

r1(a,b,c) r2(c,d)

where r1 an r2 are the names of the relations, and a, b, c and c, d are the column names of the two relations.

Page 12: A syntax for Data

{ r1: { row: { a: a1, b:b1, c: c1}, row: { a: a2, b:b2, c: c2} }, r2: { row: { c:c2, d:d2}, row: { c:c3, d:d3}, row: { c:c4, d:d4}, }}

a b ca1 b1 c1a2 b2 c2

c d c2 d2c3 d3c4 d4

r1(a,b,c)

r2(c,d)

Page 13: A syntax for Data

{ r1: { row: { a: a1, b:b1, c: c1}, row: { a: a2, b:b2, c: c2} }, r2: { row: { c:c2, d:d2}, row: { c:c3, d:d3}, row: { c:c4, d:d4}, }}

r1 r2

row row rowrow

row

a b c

a1 b1 c1

a b c

a2 b2 c2 c2 d2

c d

c3 d3

c d

c4 d4

c d

One representation of a relational database

Page 14: A syntax for Data

{ r1: { row: { a: a1, b:b1, c: c1}, row: { a: a2, b:b2, c: c2} }, r2: { row: { c:c2, d:d2}, row: { c:c3, d:d3}, row: { c:c4, d:d4}, }}

r1 r2

rowrow row

row row

a b c

a1 b1 c1

a b c

a2 b2 c2 c2 d2

c d

c3 d3

c dc4 d4

c d

Other representation of a relational database

r1 r2 r2

Page 15: A syntax for Data

Representing Object Databases

Modern database applications handle objects, either through an object-relational or an object database. Such data can be represented as semistructured data, too.

Page 16: A syntax for Data

Example. Tree persons: Mary, who has two children, John and Jane.

{ person: &o1 { name: “Mary”, age: 45, child: &o2, child: &o3, },

person: &o2 { name: “John”, age: 17, relatives: { mother: &o1, sister: &o3} },

person: &o3 { name: “Jane”, country: “Canada”, mother: &o1 }}

Page 17: A syntax for Data

person personperson

&o1 &o3

“Mary” 45

name

“John”

name namecountry

age

relatives

17 “Jane” “Canada”

age

mother sister

child

&o2

mother

child

Page 18: A syntax for Data

The presence of a node label such as &o1 before a structure binds &o1 to the identity of that structure.

The names &o1, &o2, &o3 are called object identities, or oids.

At this point, the data is no longer a tree but a graph, in which each node has a unique oid.

An oid can be used to access logically and physically a collection of data.

Page 20: A syntax for Data

In our simple syntax for semistructured data, we allow both nodes with explicit oids and nodes without oids: the system will explicitly assign a unique oid automatically, when the data is parsed. Thus {a:&o1{b:&o2 5}} and {a:{b:5}} denote isomorphic graphs, as does {a:&o1 {b:5}}.

What could happen with:

{a: {b:3}, a: {b:3} } ?

Page 21: A syntax for Data

SPECIFICATION OF SYNTAX

Let’s call ssd-expression to any semistructured data expression.

<ssd-expr> ::= <value> | oid <value> |oid<value> ::= atomicvalue | <complexvalue><complexvalue> ::= {label: <ssd-expr>, … , label:<ssd-expr>}

Atomicvalue: any number or string of charactersOid : like &123

Page 22: A syntax for Data

Definition. We say that an object identifier o is defined in an ssd-expression s if either s is of the form o v for some value v or s is of the form {l1:e1, … , ln:en} and o is defined in one of the e1, … , en. If it occurs in any other way in s, we say it is used in s.

Definition. (Consistency) For an ssd-expression s to be consistent it must satisfy the following properties:

• Any object identifier is defined at most once in s.• If an object identifier o is used in s, it must be defined in s.

Note. This definition must be extended if it is necessary to consider external resources and external oids.

Page 23: A syntax for Data

THE OBJECT EXCHANGE MODEL (OEM)

An oem object is a quadruple

(label, oid, type, value)

Where label is a character string, oid is the object’s identifier, and type is either complex or some identifier denoting an atomic type (like integer, string, gif-image, etc.). When type is complex, then the object is called a complex object, and value is a set (or list) of oids. Otherwise the object is an atomic object, and value is an atomic value of that type.

Page 24: A syntax for Data
Page 25: A syntax for Data

Thus OEM data is essentially a graph, like the semistructured data described in this section, but in which labels are attached to nodes rather than edges.

Page 26: A syntax for Data

Definition. A graph ( N, E ) consist of a set N of nodes and a set E of edges. Associated with each edge e in E there is an (ordered) pair of nodes, the source node s(e) and the target node t(e).

s(e)

t(e)

e

Page 27: A syntax for Data

Definition. A path is a sequence e1, … , ek of edges such that t(ei) = s(ei+1), 1<= i <= k – 1. Such a path is called a path from the source s(e1) of e1 to the target t(ek) of ek. The number of edges in this path, k, is its length.

s(e1)

t(e1) t(e2) t(ek)

s(ek)

Page 28: A syntax for Data

Definition. A node r is a root for a graph (N, E) if there is a path from r to n for every n in N, n <> r.

Definition. A cycle in a graph is a path between a node and itself. A graph with no cycles is called acyclic.

Definition. A rooted graph is a tree if there is a unique path from r to n for every n in N, n <> r.

Definition. A node is terminal node or a leaf if it is not the source of any edge in E.

Page 29: A syntax for Data

The followed model of semistructured data is that of an edge-labeled graph.

Page 30: A syntax for Data

XML and Semistructured Data

{ person : { name: {first: “Alan”, last: “Black”}, tel: 2157786, email: “[email protected]” }}

<person>

<name> Alan </name> <tel> 2157786 </tel> <email> [email protected] </email> </person>

Page 31: A syntax for Data

For trees, let’s call T a translation function such that

T(AtomicValue ) = AtomicValue T({ l1 : v1 , … , ln : vn }) = < l1 > T[ v1 ] </l1 > …

< ln > T[ vn ] </l1 >

person

name tel email

Alan 2157786 [email protected]

person

name

ageemail

Alan 2157786 [email protected]

Page 32: A syntax for Data

For graphs:

< state id = “s2” > <scode> NE </scode> <sname> Nevada </sname></state>

<state id=“c2”> <ccode> CCN </ccode> <cname> Carson City </cname> <state-of idref = “s2” /></city>

Observe that <state-of> is an empty element; its only purpose is for reference.

Page 33: A syntax for Data

a a

b c

some string

The ssd-expressions for the next graph are:

a: { b: some string }

a: { c: some string }

Page 34: A syntax for Data

a a

b c

some string

<a> <b id=“&o123” > some string </b></a><a c=“&o123”/>

If the attribute c is a reference attribute…

<a b = “&o123”/><a> <c id=“&o123”> some string </c> </a>

Assuming that b is now a reference attribute.

This a is an empty element

Page 35: A syntax for Data

ORDER

The semistructured data model described is based on unordered collections, while XML is ordered. For example the following two pieces of semistructured data are equivalent:

person:{firstname: “John”, lastname: “Smith”}Person:{lastname: “Smith”, firstname: “John”}

Page 36: A syntax for Data

While the following two XML doc. are not equivalent:

<person> <firstname> John </firstname> <lastname> Smith </lastname></person>

<person> <lastname> Smith </lastname> <firstname> John </firstname> </person>

Page 37: A syntax for Data

To make things worse, attributes are NOT ORDERED in XML. For example, are equivalent:

<person firstname=“john” lastname=“Smith”/>

<person lastname=“Smith” firstname=“john”/>

Applications that uses XML for data exchange are likely to ignore order…

Page 38: A syntax for Data

MIXING ELEMENTS AND TEXT

XML allow us to mix PCDATA and subelements within an element:

<Person> This is my best friend <Name> Alessandreia </Name> <Age> 25 </Age> I am not too sure of the following email <Email> [email protected] </Email></Person>

In order to translate XML back into the syntax of ssd-expressions it is necessary to add some surrounding

standard tag for the PCDATA

Page 39: A syntax for Data

XM

L END