22
What Are Real DTDs Lik e Group Members : Xijie Zeng Peiyu Cai Presentor : Xijie Zeng

What Are Real DTDs Like

  • Upload
    cutter

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

What Are Real DTDs Like. Group Members : Xijie Zeng Peiyu Cai Presentor : Xijie Zeng. Outline. Overview Introduction Local properties Global properties. Overview. XML is widely used in a variety of areas DTDs with different structures define XML with different usages - PowerPoint PPT Presentation

Citation preview

Page 1: What Are Real DTDs Like

What Are Real DTDs Like

Group Members :Xijie ZengPeiyu Cai

Presentor :Xijie Zeng

Page 2: What Are Real DTDs Like

Outline

Overview Introduction Local properties Global properties

Page 3: What Are Real DTDs Like

Overview

XML is widely used in a variety of areas

DTDs with different structures define XML with different usages

A survey based on a number of DTDs in our real world

Page 4: What Are Real DTDs Like

Introduction DTDs are from XML.org DTD repository Three DTD categories :

app :Describe objects interchanged between programs/applications

data :Describe data stored in database

meta :Describe the structure of document markup

60 DTDs- 7 are app, 13 are data, 40 are meta

Page 5: What Are Real DTDs Like

Introduction (cont.) A DTD can be described as a collection of ele

ment declarations of the form e α where e is the element name and α is the content model. The content model α::= ε| pcdata |e |α,α| α| α|α* | α+ | α?

Page 6: What Are Real DTDs Like

Introduction (cont.)Email DTD<!ELEMENT email (head, body)><!ELEMENT head (from, to+, cc*, subject)><!ELEMENT from EMPTY><!ATTLIST from name CDATA #IMPLIED

address CDATA #REQUIRED><!ELEMENT to EMPTY><!ATTLIST to name CDATA #IMPLIED

address CDATA #REQUIRED><!ELEMENT cc EMPTY><!ATTLIST cc name CDATA #IMPLIED

address CDATA #REQUIRED>

<!ELEMENT subject (#PCDATA)><!ELEMENT body (text, attachment*)><!ELEMENT text (#PCDATA)><!ELEMENT attachment EMPTY><!ATTLIST attachment encoding (mime|binhex) "m

ime" file CDATA #REQUIRED>

email (head, body)head (from, to+, cc*, subject)from (ε)

to (ε)

cc (ε)

subject (pcdata)body (text, attachment*)text (pcdata)attachment (ε)

Page 7: What Are Real DTDs Like

Introduction (cont.)

Local propertiesDescribe content models in individual element declarations

Global propertiesDescribe the graph-theoretic structure of the whole DTD

Page 8: What Are Real DTDs Like

Local properties Content model classification

(1) pcdata (2) ε (3) any

No restriction on subelements (4) Mixed content

body (text, attachment*)text (pcdata)

(5) “|” only but not mixed content (6) “,” only (7) Complex content

Contains both “|” and “,”directory (dirname, dirinfo?, dirdesc?, (file | directory)*)

(8) List α * α +

(9) Single α ?

body1 (pcdata, attatchment*)

Page 9: What Are Real DTDs Like

Local properties (cont.)

Content model classification

Page 10: What Are Real DTDs Like

Local properties (cont.) Syntactic complexitydepth(ε) = 0;depth(е) = 1;depth(α*) = depth(α+) = depth(α?) =depth(pcdata) = 1;depth(α1,α2,…, αn) = depth(α1|α2,…|αn) =

depth(α) + 1;

max(depth(αi)) + 1;

Page 11: What Are Real DTDs Like

Local properties (cont.) An examplehead (from, to+, cc*, subject)depth(from, to+, cc*, subject)

= depth(cc*) + 1= depth(cc) + 1 + 1= 1 + 1 + 1 = 3

Page 12: What Are Real DTDs Like

Local properties (cont.) Determinism

If a content model DOES NOT require look ahead when parsing, it is a deterministic content model.non-deterministic content model : (a, b) | (a, c)

deterministic content model : a, (b|c) Result

It detects 5 non-deterministic content models in 4 DTDs.

Page 13: What Are Real DTDs Like

Local properties (cont.) Ambiguity

Definition : An expression R is ambiguous if and only if there exists some string s in R such that there can be distinct ways to parse string s.partner (name?, onetime?, partnrid?, partnrtype?, syncind?, name*, parentid?, partnridx?, partnrratg*)

ResultIt detects 2 ambiguous content models.

Page 14: What Are Real DTDs Like

Global properties ReachabilityDefinition : An element name e’ is reachable from e, denoted

by e e’ , if either e αand e’ occurs in α, or e e” and e” e’.

An example :email (head, body)head (from, to+, cc*, subject)

Definition : An element name e is reachable if r e, where r is the name of the root element. Otherwise element name e is called unreachable or useless.

email head email subjecthead subject

Page 15: What Are Real DTDs Like

Global properties (cont.) Reachability

Unreachable element names in DTDs

Page 16: What Are Real DTDs Like

Global properties (cont.) Recursions

Definition : A content model αis derivable from an element name e, denoted by e α, if either e α, or e α’, e’ α”, and α= α’[e’/ α”], where α= α’[e’/ α”] denotes the content model obtained by substituting α” for all occurrences of e’ in α’.

An example :email (head, body) head (from, to+, cc*, subject)

Definition : A DTD is recursive if and only if it has an element name e such that e e and e is reachable.

email (head, body)

head (from, to+, cc*, subject)

(from, to+, cc*, subject, body)email

Page 17: What Are Real DTDs Like

Global properties (cont.) Recursions Definition : A DTD is linear recursive if and only if it is recursive and for any

reachable element name e and any e α, e occurs at most once inαand the occurrence is not enclosed in “*” or “+”. A DTD is said to be non-linear recursive if it is recursive but is not linear recursive.

An example of non-linear recursive :directory (dirname, dirinfo?, dirdesc?, (file | directory)*)

An example of linear recursive :e (pcdata | e)

ResultNo linear recursive DTD is found in the sample DTDs.There are 7, 2 and 26 non-linear recursive DTDs in the app, data and me

ta category respectively.

Page 18: What Are Real DTDs Like

Global properties (cont.) Chain of stars

An example :entity (name*, contact*, location*, phone*, fax*)location (city*, otherinfo?)There is a chain of 2 stars.

Page 19: What Are Real DTDs Like

Global properties (cont.) Chain of stars

Page 20: What Are Real DTDs Like

Global properties (cont.) Hubs

Definition : Fan-in of an element name e is the cardinality of the set {e’ | e’ αand e occurs in α}. An element name with a large fan-in value is called hub.

An example :email (head, body)head (from, to+, cc*, subject)from (ε)to (ε)cc (ε)subject (pcdata)body (text, attachment*)text (pcdata)attachment (ε)

The fan-in value of email element is 0, and the fan-in value of all other elements in this DTD is 1.

Page 21: What Are Real DTDs Like

Global properties (cont.)Result :

Fan-in of elements in data DTDs Fan-in of elements in meta DTDs

Page 22: What Are Real DTDs Like

Summary Local properties

Content model classification Syntactic complexity Determinism Ambiguity

Global properties Reachability Recursions Chain of stars Hubs

One drawback of this survey It does not study any properties of attributes