17
1 eXtensible Markup Language Reda Bendraou [email protected] http://pagesperso-systeme.lip6.fr/Reda.Bendraou XML 2 Agenda Part I : The XML Standard Goals Why XML ? XML document Structure Well-formed XML document Part II : XML: DTD and Schema DTD XML Schema 3 What is XML? XML stands for EXtensible Markup Language XML is a markup language much like HTML XML tags are not predefined. You must define/invent your own tags XML was designed to describe data XML is a W3C Recommendation 4 The Main Difference Between XML and HTML XML was designed to store, carry, and exchange data. XML was not designed to display data. XML is not a replacement for HTML. XML and HTML were designed with different goals: XML was designed to describe data and to focus on what data is. HTML was designed to display data and to focus on how data looks. HTML is about displaying information, while XML is about describing information.

chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

1

eXtensible M

arkup Language

Reda Bendraou Reda.Bendraou@

lip6.fr

http://pagesperso-systeme.lip6.fr/R

eda.Bendraou

XML

2

Agenda

Part I : The X

ML S

tandard

Goals

W

hy XM

L ?

XM

L document S

tructure

Well-form

ed XM

L document

Part II : XM

L: DTD

and Schem

a

DTD

XML Schem

a

3

What is XM

L?

XM

L stands for EXtensible M

arkup Language

X

ML is a m

arkup language much like H

TML

XM

L tags are not predefined. You m

ust define/invent your ow

n tags

XM

L was designed to describe data

XM

L is a W3C

Recom

mendation

4

The Main Difference Betw

een XML and HTM

L

XML w

as designed to store, carry, and exchange data. XM

L was not designed to display data.

XM

L is not a replacement for HTM

L. X

ML and H

TML w

ere designed with different goals:

X

ML w

as designed to describe data and to focus on what

data is.

HTM

L was designed to display data and to focus on how

data looks.

HTML is about displaying inform

ation, while XM

L is about describing inform

ation.

Page 2: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

5

How can XM

L be Used?

To describe and to structure data

To Separate Data from

HTML

You need to parse your XM

L document to display data

To parse: CSS, XSLT, DO

M API, etc.

To Exchange Data

To Store Data

6

XM

L Goals

A

need for representing data:

E

asily readable :

That fits to the W

EB technology (to be easily integrated within W

EB servers)

W

ith a clear separation between :

In a standardized w

ay - By humains

- By machines

- presentation aspects (format,

colors, etc..) - Data (sem

antics)

7

XM

L: Origines

Som

e well know

n Formats :

HTM

L = Hyper Text Markup Language (only presentation)

SG

ML = Standard G

eneralized Markup Language (to structure the

document

Tag based Language O

ther notations :

ASN.1= Abstract S

yntax Notation (ITU

-T)

CD

R, X

DR

= Com

mon/eX

tenal Data R

epresentation etc…

..

8

HTM

L Draw

backs

Sim

ple, readable !

WEB Com

patibility! BUT

Not extensible ! (a fixed set of standardized tags and attributes)

M

ixing between the Form

and the Content ! (i.e. presentation tag with a data : <H

1> Intensive Care </H

1>)

Brow

sers / Versions Incompatibility

No way to check the docum

ent: - structure (Tags ordre ), - data (type, value), - sem

antic

Page 3: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

9

SG

ML D

rawbacks

pow

erful, extensible, standard (ISO

8879-1986)!

M

eta-language for documenting huge and com

plex

specifications (i.e.  automobile,  avionic,  etc…

)

…Nevertheless

Too com

plex ! -> Too hard to implem

ent, too hard to use !

Not necessarily W

EB com

patible!

10

A D

efinition:

XM

L :

A

n extensible and configurable language

A hierarchical representation of data

http://w

ww

.w3.org/X

ML/

XM

L

- An HTML variant!

(WEB com

patibility, readable, HTML-like syntax)

- A subset of SGM

L !

(flexibility, rigor)

11

XM

L document structure

header :

Equivalent to H

TML <H

EA

D>,

M

eta–data:

body :

Equivalent to H

TML <B

OD

Y>

Structured data :

- Processing instructions - com

ments

(no interpretable by the parser)

- Enclosing tags - Attributes (e.g. <gangster nam

e='Ocean')>

- Data within tags (e.g. <title> O

cean's 12 </title>) (a tree structure)

12

<letter> <location> S

omew

here in space</ location >

<sender> ObiW

an Kenobi </ sender >

<receiver> Luke Skyw

alker </ receiver > <introduction> D

ear padawan, </introduction>

<body_Letter>  …May  the  force  be  w

ith  you  </ body_Letter > <signature/>

</letter>

<?xml version = "1.0" standalone="yes" encoding="IS

O-8859-1"?>

XM

L Exam

ple : A letter

A stand alone document

Processing instruction

A single Tag (empty, no data)

Header

Body

Character set used (Latin)

document XM

L

Start tag

End tag

Data

Page 4: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

13

XML Prologue

(Example)

<?xml version="1.0" standalone = "no" encoding="IS

O-8859-1" ?>

<!DOCTYPE

liste_CD

SYSTEM "C

Ds.dtd">

A key word ! (Define the docum

ent Type)

This is a non-standalone XML docum

ent (uses an external docum

ent)

In conformity with an external definition

(specified in "CDs.dtd")

XML 1.0 docum

ent

14

XML docum

ent body (Example)

<CD_list> <CD

> <artist type="individual">Frank S

inatra</artist> <title no_pistes="4">In The W

ee Sm

all Hours</title>

<tracks> <track>In The W

ee Sm

all Hours</track>

<track>Mood Indigo</track>

</tracks> <price m

oney="euro" payment="C

B">12.99</price>

<to_buy/> </CD

> <CD

>  …..    </CD

> …

.. </ CD_list >

6

8

1

3 5

3 4

2

7 9

15

XM

L document body

(a tree view of the exam

ple)

liste_CD

piste

titre

pistes

artiste

CD

en_vente

piste

CD

prix

racine du document

Fra

nk S

ina

traIn T

he

We

e S

ma

ll Ho

urs

Mo

od

Ind

igo

In T

he

We

e S

ma

ll Ho

urs

12

,99

CD_list

/CD_list

Document Root

tracks

title

track

track

price

to_buy

16

XM

L document body

(som

e explanations)

A

tree-like structure (see slide 12)

The body's root is unique (1)(2).

Tags are either :

The content betw

een a pair of tags (3) is either :

S

ome tags m

ay have attributes (5)(8),

- by pairs : Start (1) ,and End (2), - unique (4).

- A simple value : a string (6), a real (7), etc.,

- A tree structure of other tags (9). - A m

ix of both (not shown in the example).

Page 5: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

17

Element Nam

ing N

ames should be short and sim

ple (e.g. <Cd_Title>)

XML elem

ents must follow

these naming rules:

Nam

es can contain letters, numbers, and other characters

N

ames m

ust not start with a num

ber or punctuation characters

Avoid (':', '-', '.', '!', etc)

Nam

es must not start w

ith the letters xml (or XM

L, or Xml, etc)

N

ames cannot contain spaces

18

XML Attributes

X

ML elem

ents can have attributes in the start tag, just like HTM

L

A

ttributes are used to provide additional information about

elements.

N

o order!

S

yntax : name=‘value’ or nam

e=“value”

Forbidden Characters : ^, %

et &

P

redefined attributes :

<book tongue=“FR”  date=“09/2000”  id="IS

BN

-123"/>

xml:lang=“fr”

xml:id="a_unique_tag_indentifier"

xml:idref ="a_reference_tow

ards_a_tag"

example :

19

X

ML w

ith correct syntax is Well Form

ed :

X

ML docum

ents must have a root elem

ent

XM

L elements m

ust have a closing tag

XM

L tags are case sensitive

XM

L attribute values must alw

ays be quoted

X

ML elem

ents must be properly nested

Well Form

ed XM

L Docum

ent

20

Well Form

ed XM

L Docum

ent

A W

ell Formed X

ML

A

link with a style sheet is rendered possible

C

an be reused by syntactic parser/analyzer (i.e. browse the XM

L tree and transform it)

C

andidate to be a valid XM

L document

<?xm

l version = "1.0" standalone="yes"?> <!– unw

ell formed X

ML D

ocument! -->

<artist> <nam

e> Picasso

<surname>

</name> P

ablo </surnam

e> </artist>

Improperly

Nested tags

counterexample :

Page 6: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

21

A V

alid XM

L Docum

ent

A "Valid" XML docum

ent is a "Well Form

ed" XML docum

ent, which also

conforms to the rules of a Docum

ent Type Definition (DTD) or an XML

Schema (.xsd)

D

efinition :

Conditions :

Docum

ent => Well Form

ed (syntactically correct),

Docum

ent structure conforms to the definition given in its D

TD (cf. D

TD),

R

eferences on document elem

ents can be resolved

Then

The XML docum

ent can be exchanged ! (standardized format)

-Internal: within the X

ML docum

ent not recom

mended

(using the DOCTYPE tag)

-External reuse, exchange

(a reference to a file containing the definition within the DOCTYPE tag)

22

Exercise

G

ive a Well Form

ed XM

L document w

hich represents a set of bibliographic references

If you w

ant to process these bibliographic references, w

hat would you need ?

Your first feedbacks upon X

ML ?

23

The document does not specify :

Tags : Attributes : Tag contents : - D

ata Types (i.e. String, enumeration, etc.)

First reflections on XM

L (1)

- Attribute nam

es (for each tag) - A

ttribute types (i.e. String, enumeration, etc.)

- Attribute values (i.e. Range, form

at etc.) - Tag nam

es

- constraints upon : - The order, - M

ultiplicity (no. of occurrences), - Com

position (the hierarchy). 24

First reflections on XM

L(2)

Questions : W

hen do we have to use Tags and w

hen do we have to use attributes?

How

do we indicate w

hat should be displayed/printed and how ?

D

oes Attributes order has any im

portance? N

o

Tags entities

Attributes properties

style C

SS

transformations

XSLT,  DOM,  XPath  …

..

Page 7: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

25

XM

L (Part II)

XM

L Docum

ent Definitions:

DTD, XM

L Schema

26

A w

ell formed and V

alid XM

L Docum

ent

W

ell Formed D

ocument

E

lements properly nested, Syntactically correct, etc.

N

ot necessarily conforms to a D

TD or X

ML S

chema

A V

alid Docum

ent

Well Form

ed + conforms to a D

TD (or a S

chema)

27

DTD

(Docum

ent Type Definition)

A DTD defines the legal elem

ents of an XM

L document

D

efines the «vocabulary» and structure of the docum

ent

A

Gram

mar w

hich phrases (instances) are X

ML docum

ents

M

ay be Internal to the document or E

xternal (referenced w

ithin an XM

L document)

28

Why use a DTD?

W

ith DTD

, each of your XM

L files can carry a description of its ow

n format w

ith it.

W

ith a DTD

, independent groups of people can agree to use a com

mon D

TD for interchanging data.

Your application can use a standard D

TD to verify

that the data you receive from the outside w

orld is valid.

Y

ou can also use a DTD

to verify your own data.

Page 8: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

29

DTD

Contents: E

lement and A

ttribute

Elem

ents are the main building blocks of both X

ML docum

ents.

Syntax:

<!ELE

ME

NT tag (content) > O

r

<!ELE

ME

NT elem

ent-name category> (i.e. E

MP

TY, A

NY

, #PC

DA

TA)

D

efines a Tag

E.g.. : <!E

LEM

EN

T book (author, editor)>

Attributes provide extra inform

ation about elements.

Syntax

<!ATTLIS

T element-nam

e [attribute-name, attribute type #m

ode [default value]]*>

Defines the list of attributes for a given tag

E

.g. : <!ATTLIS

T author gender C

DA

TA #R

EQ

UIR

ED

city C

DA

TA #IM

PLIE

D>

<!ATTLIS

T editor city C

DA

TA #FIXE

D "P

aris">

30

Elem

ents Ordering/ S

tructuring

Elem

ent contents structuring:

(a, b) sequence e.g. (name, surnam

e, street, city)

(a | b) either / or e.g. (yes | no)

a? optional elem

ent [0,1] e.g. (name, surnam

e?, street, city)

a* zero or m

ore occurrences [0,N] e.g. (product*, custom

er)

a+ one or m

ore occurrences [1,N] e.g. (product*, seller+)

31

Data Types

C

DA

TA (C

haracter Data)

O

nly text inside a CD

ATA

section will be ignored by the parser.

P

CD

ATA

(Parsed C

haracter Data)

Text found betw

een the start tag and the end tag of an XML elem

ent.

is text that will be parsed by a parser

A

n Elem

ent with no descendants, no attributes, only text

E

numeration

A

list of values separated by « | »

ID et ID

RE

F

Key and R

eference for attributes

AN

Y

A

ny combination of parsable data

E

MP

TY

To declare E

mpty elem

ents

32

Exam

ple of an External D

TD (m

essage.dtd) <?xm

l version="1.0"?> <!D

OC

TYP

E m

essage SY

STE

M "m

essage.dtd"> <m

essage> <to>D

ave</to> <from

>Susan</from

> <subject>R

eminder</subject>

<text>Don't forget to buy m

ilk on the way hom

e.</text> </m

essage> In a separate file, "m

essage.dtd" you define your DTD

as follows:

<!ELE

ME

NT m

essage (to, from, subject, text)>

<!ELEM

EN

T to (#PC

DA

TA)>

<!ELEM

EN

T from (#P

CD

ATA)>

<!ELEM

EN

T subject (#PC

DA

TA)>

<!ELEM

EN

T text (#PC

DA

TA)>

Page 9: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

33

Exam

ple of an Internal DTD

<?xm

l version="1.0"?> <!D

OC

TYPE

message [

<!ELE

ME

NT m

essage (to,from,subject,text)>

<!ELE

ME

NT to (#P

CD

ATA)>

<!ELEM

EN

T from (#P

CD

ATA)> <!E

LEM

EN

T subject (#PC

DATA)>

<!ELE

ME

NT text (#P

CD

ATA)> ]> <m

essage> <to>Dave</to> <from

>Susan</from>

<subject>Reminder</subject>

<text>Don't forget to buy milk on the w

ay home.</text>

</message>

34

Why encourage the use of an external D

TD?

To P

romote reusability

To share tags and structures

The D

efinition may be local or distant

<!D

OC

TYPE doc S

YSTEM

"doc.dtd">

<!DO

CTYP

E doc PU

BLIC "w

ww

.e-xmlm

edia.com/doc.dtd">

A

clear separation between a definition and its

instances

35

DTD

Entity D

eclaration

Definition: E

ntities are variables that represent other values. The value of the entity is substituted for the entity w

hen the XML docum

ent is parsed.

Entities can be defined internally or externally to your D

TD.

Syntax

Internal declaration: <!E

NTITY entity-nam

e entity-value>

E

xternal declaration: <!E

NTITY entity-nam

e SYS

TEM

"entity-UR

L">

To use it &

entity-name

Exam

ple (internal): <!E

NTITY w

ebsite "http://ww

w.TheS

carms.com

">

Exam

ple (external): <!E

NTITY w

ebsite SYS

TEM

"http://ww

w.TheS

carms.com

/entity.xml">

The above entity make this line of XM

L valid.

XML line:

<url>&w

ebsite</url> E

valuates to: <url>http://w

ww

.TheScarm

s.com</url>

36

DTD

: Sum

mary

D

efines the document structure w

ith a list of legal elements

D

TD Building blocks : E

LEME

NT, ATTLIST, E

NTITY, P

CD

ATA, C

DA

TA;

E

LEME

NT

Sim

ple :

•C

omposition :

•O

ccurrence Multiplicity :

- Sequence of elements ordered List

(a, b, c) - Alternatives choices

(a | b | c) - M

ix

(a, (b | c), d)

? (zero or one) ), * (zero or m

ore), + (one or m

ore)

- empty (EM

PTY) - N

o constraints on type (ANY), - text (#PCDATA

)

Page 10: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

37

Exercise

P

ropose a DTD

that allows the definition of XM

L docum

ents representing a Catalog of Products

(products, description, price, etc).

G

ive an example of an X

ML D

ocuments that

conforms to the D

TD you propose

C

ould you provide some advantages, draw

backs of using D

TD.

38

The Catalog D

TD

<!DO

CTYP

E C

ATA

LOG

[ <!E

LEM

EN

T CA

TALO

G (P

RO

DU

CT+)>

<!ELE

ME

NT P

RO

DU

CT (S

PE

CIFIC

ATIO

NS

+, OP

TION

S?, P

RIC

E+, N

OTE

S?)>

<!ELE

ME

NT S

PE

CIFIC

ATIO

NS

(#PC

DA

TA)>

<!ELEM

EN

T OPTIO

NS

(#PCD

ATA

)> <!E

LEM

EN

T PR

ICE

(#PC

DA

TA)>

<!ELE

ME

NT N

OTE

S (#P

CD

ATA

)> <!A

TTLIST P

RO

DU

CT N

AM

E C

DA

TA #IM

PLIE

D

C

ATE

GO

RY (H

andTool | Table | Shop-P

rofessional) "HandTool"

P

AR

TNU

M C

DA

TA #IM

PLIE

D

PLAN

T  (Pittsburgh  |  M

ilwaukee  |  C

hicago)  "Chicago“

IN

VE

NTO

RY (InS

tock | Backordered | D

iscontinued) "InStock">

<!ATTLIS

T SP

EC

IFICA

TION

S W

EIG

HT C

DA

TA #IM

PLIE

D

PO

WE

R C

DA

TA #IM

PLIE

D>

<!ATTLIS

T OP

TION

S FIN

ISH

(Metal | P

olished | Matte) "M

atte"

AD

AP

TER

(Included | Optional | N

otApplicable) "Included"

C

AS

E (HardS

hell | Soft | N

otApplicable) "H

ardShell">

<!ATTLIS

T PR

ICE

MS

RP

CD

ATA

#IMP

LIED

W

HO

LES

ALE

CD

ATA

#IMP

LIED

S

TRE

ET C

DA

TA #IM

PLIE

D

SH

IPP

ING

CD

ATA

#IMP

LIED

> <!E

NTITY A

UTH

OR

"John Doe">

<!EN

TITY CO

MP

AN

Y "JD P

ower Tools, Inc.">

<!EN

TITY EM

AIL "jd@

jd-tools.com"> ]>

39

DTD

: Draw

backs

W

eak Data Typing

B

asically one data type: Text (String)

A

re not Object O

riented (No Inheritance )

Specific language: N

o XM

L-Based

H

ard to parse/interpret

Proposition to overcom

e these limitations:

W

3C's XML-Schem

a

40

XM

L Schem

a

XM

L Schem

a is an XM

L-based alternative to DTD

.

The XM

L Schem

a language is also referred to as XM

L Schem

a D

efinition (XSD

).

A

n XM

L Schem

a describes the structure of an XM

L document :

D

ocument's possible elem

ents

Element attributes and their type

XM

L Schem

as use XM

L Syntax

N

o new language as for the D

TD

XM

L Schem

as support many D

ata Types

M

ore advantages

Data Structures and rich data typing

extensibility thanks to inheritance

analyzable by standard X

ML parsers

Page 11: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

41

The root element of an X

ML S

chema

The <schem

a> element is the root elem

ent of every XM

L S

chema:

<?xm

l version="1.0"?> <xs:schem

a> //schem

a body... //...

</xs:schema>

The <schem

a> element m

ay contain some attributes. A

schema declaration often looks som

ething like this: <?xm

l version="1.0"?> <xs:schem

a xmlns:xs="http://w

ww

.w3.org/2001/XM

LSchema">

//...

//...

</xs:schema>

42

XS

D, H

ow to?

R

eference a Schem

a in an XM

L Docum

ent: <?xm

l version="1.0"?> <note xm

lns:xsi="http://ww

w.w

3.org/2001/XM

LSchema-

instance" xsi:schemaLocation="path_to_your_file.xsd">

<to>Tove</to> <from

>Jani</from>

<heading>Rem

inder</heading> <body>D

on't forget me this w

eekend!</body> </note>

Your docum

ent Root tag

43

XS

D, H

ow to?

D

efine a Simple Elem

ent

A

n XML elem

ent that can contain only text (typed data).

It cannot contain any other elements or attributes

Syntax: <xs:elem

ent name="xxx" type="yyy"/>

E.g.

An X

ML D

ocument (a chunk):

<lastnam

e>Refsnes</lastnam

e> <age>36</age> <dateborn>1970-03-27</dateborn>

The C

orresponding simple elem

ent definition in the XM

L Schem

a <xs:elem

ent name="lastnam

e" type="xs:string"/> <xs:elem

ent name="age" type="xs:integer"/>

<xs:element nam

e="dateborn" type="xs:date"/>

Also:

<xs:elem

ent name="color" type="xs:string" default="red"/> (if no value)

Or

<xs:elem

ent name="color" type="xs:string" fixed="red"/> (can't be m

odified)

44

XS

D, H

ow to?

D

efine XSD

Attributes

All attributes are declared as sim

ple types

If an element has attributes, it is considered as a com

plex type

Syntax: <xs:attribute nam

e="xxx" type="yyy"/> E

.g. <xs:attribute nam

e="language" type="xs:string"/>

Also:

<xs:attribute name="lang" type="xs:string" default="E

N"/> (if no value,

use default)

<xs:attribute name="lang" type="xs:string" fixed="E

N"/> (can't be

modified)

A

ttributes are optional by default. To specify that the attribute is required, use the "use" property:

<xs:attribute nam

e="lang" type="xs:string" use="required"/>

Page 12: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

45

Basic types (1) Type

Description

string represents a String

boolean A Boolean value: true or false.

decimal

represents a decimal

float represents a float.

double represents a double.

duration represents a duration

dateTime

represents a value: date/time.

time

(format : hh:m

m:ss.sss ).

date represents a date (form

at : CCYY-M

M-D

D).

gYearMonth

represents Gregorian year and m

onth (format : CCYY-M

M)

46

Basic types (2)

Type Description

gYear represents a year (form

at : CCYY).

gMonthD

ay represents m

onth's day (format : M

M-D

D)

gDay

represents a day (format : D

D).

gMonth

represents a month (form

at : MM

).

hexBinary represents a binary hexadecim

al content.

base64Binary represents a 64 base binary content

anyUR

I represents U

RI address (ex.: http://www.site.com).

NO

TATION

represents a qualified nam

e.

47

Basic types (3)

Type Description

token represents a string w

ithout line feeds, carriage returns, tabs, leading and trailing spaces, and m

ultiple spaces language

represents a string that contains a valid language id N

MTO

KEN

a string that represents the NMTO

KEN attribute in XM

L (only used with schem

a attributes) id

a string that represents the ID attribute in XM

L (only used with schem

a attributes) ID

REF, ID

REFS

represents attributes IDREF, ID

REFS type

ENTITY, EN

TITIES represents the types: ENTITY, ENTITIES

integer represents an integer value (signed, arbitrary length)

nonPositiveInteger an integer containing only non-positive values (.., -2, -1, 0)

negativeInteger An integer containing only negative values ( .., -2, -1.)

48

Basic types (4)

Type Description

long Long integer between {-9223372036854775808 - 223372036854775807}

int A signed 32-bit integer between {-2147483648 - 2147483647}

short Short Integer {-32768 - 32767}

byte Between {-128 - 127}

nonNegativeInteger

An integer containing only non-negative values (0, 1, 2, ..) unsignedLong

An unsigned 64-bit integer long

A signed 64-bit integer {0 - 18446744073709551615} unsignedInt

An unsigned 32-bit integer {0 - 4294967295}

unsignedShort An unsigned 16-bit integer {0 - 65535}

unsignedByte An unsigned 8-bit integer {0 - 255}

positiveInteger An integer containing only positive values (1, 2, ..)

Page 13: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

49

XS

D Com

plex Elements

A

complex elem

ent contains other elements and/or

attributes

four kinds of com

plex elements:

em

pty elements

elem

ents that contain only other elements

elem

ents that contain attributes and only text

elements that contain both other elem

ents and text

Note: E

ach of these elements m

ay contain attributes as w

ell!

50

XS

D, H

ow to?

Define a Com

plex Element in XM

L Schema

tw

o different ways :

E

.g.: somew

here in a XM

L document you have for instance:

<em

ployee> // a com

plex Element that contains only other elem

ents

<firstname>John</firstnam

e>

<lastname>Sm

ith</lastname>

</em

ployee> First possible XM

L Schema:

<xs:elem

ent name="em

ployee">

<xs:com

plexType>

<xs:sequence>

<xs:element nam

e="firstname" type="xs:string"/>

<xs:element nam

e="lastname" type="xs:string"/>

</xs:sequence>

</xs:complexType>

</xs:element>

51

XS

D, H

ow to?

Define a Com

plex Element in XM

L Schema

E

.g.: somew

here in a XM

L document you have for instance:

<em

ployee>

<firstname>John</firstnam

e>

<lastname>S

mith</lastnam

e>

</employee>

Second possible XML Schem

a: <xs:elem

ent name="em

ployee" type="personinfo"/> //…

//……

<xs:com

plexType name="personinfo">

<xs:sequence>

<xs:element nam

e="firstname" type="xs:string"/>

<xs:element nam

e="lastname" type="xs:string"/>

</xs:sequence> </xs:com

plexType>

If you use the second w

ay (above), several elements can refer to the sam

e complex type as

being their typed

E.g.

<xs:element nam

e="employee" type="personinfo"/>

<xs:element nam

e="student" type="personinfo"/> <xs:elem

ent name="m

ember" type="personinfo"/>

52

XS

D, H

ow to?

Define Com

plex Text-Only Elem

ents

contains only simple content (text and attributes),

we add a sim

pleContent elem

ent around the content.

E.g.

<shoesize country="france">35</shoesize>

The corresponding XML Schem

a: <xs:elem

ent name="shoesize">

<xs:complexType>

<xs:simpleContent> // to indicate that it does not contain other elem

ents <xs:extension base="xs:integer"> // to indicate the text type <xs:attribute nam

e="country" type="xs:string" /> // the attribute type </xs:extension> </xs:sim

pleContent> </xs:com

plexType> </xs:elem

ent>

Page 14: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

53

XS

D, H

ow to?

Define Com

plex Types with M

ixed Content

A m

ixed complex type elem

ent can contain attributes, elements, and text.

E

.g. <letter> D

ear Mr.

<nam

e>John Sm

ith</name>. Your order <orderid>1032</orderid> w

ill be shipped on <shipdate>2001-07-13</shipdate>

</letter> The corresponding XM

L Schema:

<xs:element nam

e="letter"> <xs:com

plexType mixed="true">

<xs:sequence> <xs:elem

ent name="nam

e" type="xs:string"/> <xs:elem

ent name="orderid" type="xs:positiveInteger"/>

<xs:element nam

e="shipdate" type="xs:date"/> </xs:sequence> </xs:com

plexType> </xs:elem

ent>

54

XS

D Indicators

Indicators control HO

W elem

ents have to be used in docum

ents

Order indicators:

All

specifies that the child elements can appear in any order

each child elem

ent must occur only once

E

.g.: <xs:elem

ent name="person">

<xs:complexType>

<xs:all> <xs:elem

ent name="firstnam

e" type="xs:string"/> <xs:elem

ent name="lastnam

e" type="xs:string"/> </xs:all> </xs:com

plexType> </xs:elem

ent>

55

XS

D Indicators

O

rder indicators:

Sequence

specifies that the child elements m

ust appear in a specific order

E.g.:

<xs:element nam

e="person"> <xs:com

plexType> <xs:sequence> <xs:elem

ent name="firstnam

e" type="xs:string"/> <xs:elem

ent name="lastnam

e" type="xs:string"/> </xs:sequence> </xs:com

plexType> </xs:elem

ent>

56

XS

D Indicators

O

rder indicators:

Choice

specifies that either one child element or another can occur

E

.g.: <xs:elem

ent name="person">

<xs:complexType>

<xs:choice> <xs:elem

ent name="em

ployee" type="employee"/>

<xs:element nam

e="mem

ber" type="mem

ber"/> </xs:choice> </xs:com

plexType> </xs:elem

ent>

Page 15: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

57

XS

D Indicators

O

ccurrence indicators:

Occurrence indicators are used to define how

often an element can

occur

maxO

ccurs: the maxim

um num

ber of times an elem

ent can occur:

minO

ccurs : the minim

um num

ber of times an elem

ent can occur

E.g.:

<xs:element nam

e="person"> <xs:com

plexType> <xs:sequence>

<xs:element nam

e="full_name" type="xs:string"/>

<xs:elem

ent name="child_nam

e" type="xs:string"

m

axOccurs="10" m

inOccurs="0"/>

</xs:sequence>

</xs:complexType>

</xs:element>

58

XML Schem

a extension Element

The extension elem

ent extends an existing simpleType

or complexType elem

ent.

Parent enclosing elements: sim

pleContent O

r complexContent

<xs:complexType nam

e="fullpersoninfo"> <xs:com

plexContent>

<xs:extension base="personinfo"> <xs:sequence> <xs:elem

ent name="address" type="xs:string"/>

<xs:element nam

e="city" type="xs:string"/> <xs:elem

ent name="country" type="xs:string"/>

</xs:sequence> </xs:extension> </xs:com

plexContent>

</xs:complexType>

</xs:schema>

<xs:complexType nam

e="personinfo"> <xs:sequence> <xs:elem

ent name="firstnam

e" type="xs:string"/> <xs:elem

ent name="lastnam

e" type="xs:string"/> </xs:sequence> </xs:com

plexType>

59

XML Schem

a Restrictions R

estrictions are used to define acceptable values for X

ML elem

ents or attribute .

Parent enclosing elements: sim

pleType E

.g.: <xs:elem

ent name="car">

<xs:simpleType>

<xs:restriction base="xs:string">

<xs:enumeration value="A

udi"/>

<xs:enum

eration value="Golf"/>

<xs:enumeration value="B

MW

"/>

</xs:restriction> </xs:sim

pleType> </xs:elem

ent>

60

XML Schem

a Restrictions C

onstraint D

escription enum

eration D

efines a list of acceptable values fractionD

igits Specifies the m

aximum

number of decim

al places allowed. M

ust be equal to or greater than zero length

Specifies the exact number of characters or list item

s allowed. M

ust be equal to or greater than zero m

axExclusive Specifies the upper bounds for num

eric values (the value must be less than this value)

maxInclusive

Specifies the upper bounds for numeric values (the value m

ust be less than or equal to this value) m

axLength Specifies the m

aximum

number of characters or list item

s allowed. M

ust be equal to or greater than zero m

inExclusive Specifies the low

er bounds for numeric values (the value m

ust be greater than this value) m

inInclusive Specifies the low

er bounds for numeric values (the value m

ust be greater than or equal to this value) m

inLength Specifies the m

inimum

number of characters or list item

s allowed. M

ust be equal to or greater than zero

pattern D

efines the exact sequence of characters that are acceptable totalD

igits Specifies the exact num

ber of digits allowed. M

ust be greater than zero w

hiteSpace Specifies how

white space (line feeds, tabs, spaces, and carriage returns) is handled

Restrictions for D

ata types

Page 16: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

61

XM

L Schem

a Patterns

C

onstraints on values of basic types

With use of R

egular Expressions

Exam

ple <xs:elem

ent name="letter">

<xs:simpleType>

<xs:restriction base="xs:string"> <xs:pattern value="([a-z])*"/> </xs:restriction> </xs:sim

pleType> </xs:elem

ent>

/* The acceptable value is zero or m

ore occurrences of low

ercase letters from a to z */

62

XM

L Schem

a : example (1)

<xsd:schema xm

lns:xsd="http://ww

w.w

3.org/1999/XM

LSchema">

<xsd:element nam

e="purchaseOrder" type="PurchaseO

rderType"/> <xsd:elem

ent name="com

ment" type="xsd:string"/>

<xsd:complexType nam

e="PurchaseOrderType">

<xsd:sequence>

<xsd:element nam

e="shipTo" type="USA

ddress"/>

<xsd:elem

ent name="billTo" type="U

SAddress"/>

<xsd:element ref="com

ment" m

inOccurs="0"/>

<xsd:element nam

e="items" type="Item

s"/>

</xsd:sequence>

<xsd:attribute name="orderD

ate" type="xsd:date"/> </xsd:com

plexType>

63

XM

L Schem

a : example (2)

<xsd:complexType nam

e="Items">

<xsd:sequence>

<xsd:elem

ent name="item

" minO

ccurs="0" maxO

ccurs="unbounded">

<xsd:complexType>

<xsd:sequence>

<xsd:element nam

e="productNam

e" type="xsd:string"/>

<xsd:element nam

e="quantity">

<xsd:sim

pleType>

<xsd:restriction base="xsd:positiveInteger">

<xsd:maxExclusive value="100"/>

</xsd:restriction>

</xsd:sim

pleType>

</xsd:element>

<xsd:elem

ent name="U

SPrice" type="xsd:decimal"/>

<xsd:elem

ent ref="comm

ent" minO

ccurs="0"/>

<xsd:element nam

e="shipDate" type="xsd:date" m

inOccurs="0"/>

</xsd:sequence>

<xsd:attribute nam

e="partNum

" type="SKU

" use="required"/>

</xsd:complexType>

</xsd:elem

ent>

</xsd:sequence> </xsd:com

plexType> </xsd:schem

a>

64

Som

e XM

L Tools Editor

Tool Support

Tibco (Extensibility)

XM

L Authority 2.0

DTD

S

chema

Altova

XM

L Spy (2007

available:1 month trial)

DTD

S

chema

Dasan

Tagfree 2000 DTD

E

ditor D

TD

Data Junction

XM

L Junction S

chema

Insight Soft.

XM

LMate 2.0

DTD

S

chema

Microstar Soft.

Near &

Far Designer

DTD

Page 17: chema The Agenda umentlaser.cs.umass.edu/courses/cs520-620.Spring15/lectures/xml_schema_4pages.pdf1 e X M arkup L anguage aou r erso-u XML 2 Agenda : The rd Goals ? e Well-ument ema

65

Exercise

G

ive the XM

L schema of w

hat could an address book

G

ive an example of an X

ML docum

ent that contains a list of contacts and that conform

s to the schem

a you proposed

Your feedbacks ?

66

XM

L Schem

a, summ

ary

Expressive and extensible

R

ich data typing

More and m

ore used

Model exchange: X

MI 2.0

W

eb Services: S

OA

P, WS

DL

A

bit too complex!

67

XM

L : Wrap-up (P

art I, II)

M

eta-language ! Infinite number of :

A

tree-like structure !

S

eparation :

Form

alization :

A

standard and bunch of tools available for XM

L developers

- tags and, - Their associated attributes

- Data

(.xml)

- Logical syntax (.dtd or .xsd)

- Well-form

ed documents (correct XM

L syntax) - valid (well form

ed and conforms to a DTD or XSD)