1 Information Management Lecture 9 - XML: eXtensible Markup Language J. Michael Moshell University...

Preview:

Citation preview

1

Information Management

Lecture 9 - XML:

eXtensible Markup LanguageJ. Michael Moshell

University of Central Florida

Original image* by Moshell et al .

Imagery is fromWikimedia except where marked with *. Licensing is listed.

-2 -

Purposes of XML:

• Make data more easily used• Make data last longer (across generations of technology)

Strategy of XML:

• Provide a basis for creating 'dialects' for special purposes- Thus, XML is a meta-language

• Provide tools you can use, rather than re-invent

Structure of XML:

• Inject <tags> into text files

-3 -

But first a word from theCompetitive Analysis Talks:

• I have not yet received email with presentationfrom:

* Pixelators* Hive Mind

Please get these to me TODAY so that I canhave grades back for you on Thursday.

-4 -

XML Syntax:

Declaration:

Nested elements:

<?xml version="1.0" encoding="UTF-8">

<student> <person>

<last-name>Wilson</last-name><first-name>Henry</first-name><address>122 Smith Road</address>

</person> <major>Digital Media</major> <transcript>

<course semester="Fall 06">DIG 4921c</course><course semester="Fall 06">DIG 4526 </course>

</transcript></student>

-5 -

XML Syntax:

Declaration:

Nested elements:

<?xml version="1.0" encoding="UTF-8">

<student> <person>

<last-name>Wilson</last-name><first-name>Henry</first-name><address>122 Smith Road</address>

</person> <major>Digital Media</major> <transcript>

<course semester="Fall 06">DIG 4921c</course><course semester="Fall 06">DIG 4526 </course>

</transcript></student>

content

-6 -

XML Syntax:

Declaration:

Nested elements:

<?xml version="1.0" encoding="UTF-8">

<student> <person>

<last-name>Wilson</last-name><first-name>Henry</first-name><address>122 Smith Road</address>

</person> <major>Digital Media</major> <transcript>

<course semester="Fall 06">DIG 4921c</course><course semester="Fall 06">DIG 4526 </course>

</transcript></student>

attribute

-7 -

XML Syntax:

Declaration:

Nested elements:

<?xml version="1.0" encoding="UTF-8">

<student> <person>

<last-name>Wilson</last-name><first-name>Henry</first-name><address>122 Smith Road</address>

</person> <major>Digital Media</major> <transcript>

<course semester="Fall 06">DIG 4921c</course><course semester="Fall 06">DIG 4526 </course>

</transcript></student> valuename

-8 -

XML Syntax:

Declaration:

Nested elements:

<?xml version="1.0" encoding="UTF-8">

<student> <person>

<last-name>Wilson</last-name><first-name>Henry</first-name><address>122 Smith Road</address>

</person> <major>Digital Media</major> <transcript>

<course semester="Fall 06">DIG 4921c</course><course semester="Fall 06">DIG 4526 </course>

</transcript></student> valuename

-9 -

Real World Example:

E-commerce (Euro processing) in a PHP application

function sendResponse($status, $statusmessage, $neworderid, $batchid){ echo '<?xml version="1.0" encoding="utf-8"?>'; echo "<responsemessage>"; echo "<status>".$status."</status>"; echo "<statusmessage>".$statusmessage."</statusmessage>"; echo "<neworderid>".$neworderid."</neworderid>"; echo "<batchid>".$batchid."</batchid>"; echo "</responsemessage>";}

-10 -

Consider this structure

Nested elements:<student> <person>

<last-name>Wilson</last-name><first-name>Henry</first-name><address>122 Smith Road</address>

</person> <major>Digital Media</major> <transcript>

<course semester="Fall 06">DIG 4921c</course><course semester="Fall 06">DIG 4526 </course>

</transcript></student>

It makes sense: the student's informationis GROUPED by the <student>tag.

-11 -

Consider this structure

Nested elements:<class> <student>

<last-name>Wilson</last-name><first-name>Henry</first-name><address>122 Smith Road</address>

</student> <major>Digital Media</major> <transcript>

<course semester="Fall 06">DIG 4921c</course><course semester="Fall 06">DIG 4526 </course>

</transcript></class>

It is dumb. The 'class' structure does notwrap a class worth of information.

-12 -

This raises a Question;

Nested elements:<student> <person>

<last-name>Wilson</last-name><first-name>Henry</first-name><address>122 Smith Road</address>

</person> <major>Digital Media</major> <transcript>

<course semester="Fall 06">DIG 4921c</course><course semester="Fall 06">DIG 4526 </course>

</transcript></student>

How does one represent the 'grammar' ofan element ... e. g. A transcript will consist of

zero or more courses.

-13 -

Two kinds of "grammaticality"

Analogy (loose):

Well-formed: conforms to basic syntax rules, but may be meaningless.

Like this: Colorless green ideas sleep furiously.

Valid: matches a specified set of rules for UNDERSTANDING it.

Like this: Most tree leaves contain chlorophyll, which captures

solar energy and stores it in chemical form.

1. Well-formedness (standard XML)2. Validity (based on a schema)

-14 -

Two kinds of "grammaticality"

Well-formed:

• one ROOT ELEMENT - e. g. <student> ... </student> per document

• all non-empty elements are delimited with start & end tags.

• Empty elements are delimited properly

- intentionally empty placemarkers: <thisway />

- temporarily empty placemarkers: <likethis></likethis>

• All attribute values are quoted.

• Tags do not overlap.

• Document complies to its character set definition.

1. Well-formedness (standard XML)2. Validity (based on a schema)

-15 -

This raises a Question;

Nested elements:<student> <person>

<last-name>Wilson</last-name><first-name>Henry</first-name><address>122 Smith Road</address>

</person> <major>Digital Media</major> <transcript>

<course semester="Fall 06">DIG 4921c</course><course semester="Fall 06">DIG 4526 </course><gradepoint>3.62</gradepoint>

</transcript></student>

How does one represent the 'grammar' ofan element ... e. g. A transcript will consist of

zero or more courses.

This will be done via a SCHEMA.

-16 -

The oldest schema type: DTD

DTDs contain these types of declarations:

• element type declarations

• attribute list declarations

• entity declarations

• notation declarations

Let's explore each in turn.

Document Type Definition

-17 -

DTD: Element Type Declarations

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE note [ <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>

]> <note>

<to>Tove</to> <from>Jani</from> <heading>Reminder</heading>

<body>Don't forget me this weekend!</body> </note>

content model

-18 -

DTD: Element Type Declarations

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE note [ <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>

]> <note>

<to>Tove</to> <from>Jani</from> <heading>Reminder</heading>

<body>Don't forget me this weekend!</body> </note>

content model

Aside: Cardinality in content models- From our previous example: <transcript>

<course semester="Fall 06">DIG 4921c</course><course semester="Fall 06">DIG 4526 </course><gradepoint>3.62</gradepoint>

</transcript>

<!ELEMENT transcript (course*, gradepoint?)

zero or more zero or one

-19 -

DTD: Element Type Declarations

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE note [ <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>

]> <note>

<to>Tove</to> <from>Jani</from> <heading>Reminder</heading>

<body>Don't forget me this weekend!</body> </note>

content model

Aside: Cardinality in content models- From our previous example: <transcript>

<course semester="Fall 06">DIG 4921c</course><course semester="Fall 06">DIG 4526 </course><gradepoint>3.62</gradepoint>

</transcript>

<!ELEMENT transcript (course*, gradepoint?) >

zero or more zero or one

+:one or more

-20 -

DTD: Element Type Declarations

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE note [ <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>

]> <note>

<to>Tove</to> <from>Jani</from> <heading>Reminder</heading>

<body>Don't forget me this weekend!</body> </note>

Note:

DTD are NOTwritten in XML.

Tags NOT paired!

-21 -

PCDATA ="Parsablecharacter"

DTD: Element Type Declarations

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE note [ <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>

]> <note>

<to>Tove</to> <from>Jani</from> <heading>Reminder</heading>

<body>Don't forget me this weekend!</body> </note>

-22 -

DTD: Attribute List Declarations

<!ATTLIST element-name attribute-name attribute-type default-value>

example:

DTD example:

<!ATTLIST payment type PCDATA "check">

XML example: <payment type="check" />

from http://www.w3schools.com/dtd/

-23 -

DTD: Attribute List Declarations

<!ATTLIST element-name attribute-name attribute-type default-value>

example:

DTD example:

<!ATTLIST payment type PCDATA "check">

XML example: <payment type="check" />

from http://www.w3schools.com/dtd/

-24 -

DTD: Attribute List Declarations

<!ATTLIST element-name attribute-name attribute-type default-value>

example:

DTD example:

<!ATTLIST payment type PCDATA "check">

XML example: <payment type="check" />

from http://www.w3schools.com/dtd/

-25 -

DTD: Attribute List Declarations

<!ATTLIST element-name attribute-name attribute-type default-value>

example:

DTD example:

<!ATTLIST payment type PCDATA "check">

XML example: <payment type="check" />

from http://www.w3schools.com/dtd/

-26 -

DTD: Attribute List Declarations

<!ATTLIST element-name attribute-name attribute-type default-value>

example:

DTD example:

<!ATTLIST payment type PCDATA "check">

XML example: <payment type="check" > </payment>

from http://www.w3schools.com/dtd/

-27 -

So, why Attributes?

What's the difference between an Attribute and the contents of an Element?

<!DOCTYPE DeathStory [<!ELEMENT DeathStory (Murderer, Victim) ><!ELEMENT Murderer (#PCDATA)><!ATTLIST Murderer Trustworthiness PCDATA ""><!ELEMENT Victim (#PCDATA)>]><DeathStory>

<Murderer Trustworthiness="not very">

Dirk Dugan</Murderer><Victim>

Tess Truhart</Victim>

</DeathStory>

-28 -

So, why Attributes?

<!DOCTYPE DeathStory [<!ELEMENT DeathStory (Murderer, Victim) ><!ELEMENT Murderer (#PCDATA)><!ATTLIST Murderer Trustworthiness CDATA ""><!ELEMENT Victim (#PCDATA)>]><DeathStory>

<Murderer Trustworthiness="not very">

Dirk Dugan</Murderer><Victim>

Tess Truhart</Victim>

</DeathStory>

1. Elementsare hierarchical(can contain otherelements) butattributes arejust strings or lists of strings.

What's the difference between an Attribute and the contents of an Element?

-29 -

So, why Attributes?

<!DOCTYPE DeathStory [<!ELEMENT DeathStory (Murderer, Victim) ><!ELEMENT Murderer (#PCDATA)><!ATTLIST Murderer Trustworthiness CDATA ""><!ELEMENT Victim (#PCDATA)>]><DeathStory>

<Murderer Trustworthiness="not very">

Dirk Dugan</Murderer><Victim>

Tess Truhart</Victim>

</DeathStory>

1. Elementsare hierarchical(can contain otherelements) butattributes arejust strings or lists of strings.

2. Attributes:un-ordered,un-repeatable.

What's the difference between an Attribute and the contents of an Element?

-30 -

Enumerated Attributes

Here is how to create a short-list of allowed options.

<!DOCTYPE DeathStory [<!ELEMENT DeathStory (Murderer, Victim) ><!ELEMENT Murderer (#PCDATA)><!ATTLIST Murderer Trustworthiness (NotVery | No | Yes ) #REQUIRED><!ELEMENT Victim (#PCDATA)>]><DeathStory>

<Murderer Trustworthiness="NotVery">Dirk Dugan

</Murderer><Victim>

Tess Truhart</Victim>

</DeathStory>

Must be Tokens(no spaces).

Must have quotes.

-31 -

The opposite of Parsable Char Data… is CDATA which is "nonparsable char data"

Parsable data must not contain stuff like <, &But some data (like Javascript) may have a lot of these charactersSo the CDATA attribute type looks like this example

<script> <![CDATA[

function between($a,$b,$c){

if ($a<$b && $b<$c)return 1;

elsereturn 0;

} ] ]</script>

-32 - -32 -

Other types of Schemas for XML

The DTD is like the "Latin" of XML Schemas – It's the oldest, and the "background" for other schemas

A popular 'modern' schema system is called (confusingly)XML Schema

(oh no…)

or (better) XSD

Unlike DTD, the XSD is written in XML.That's why I didn't want to confuse you with it…..

-33 - -33 -

Creating some XMLLet's define our own DTD and example XML

• Objective: Represent a garage sale.

<!DOCTYPE GarageSale[<!ELEMENT GarageSale (Date, Place, Item+)><!ELEMENT Date (#PCDATA)><!ELEMENT Place (#PCDATA)><!ELEMENT Item (Name, Price)>

<!ELEMENT Name (#PCDATA)><!ELEMENT Price (#PCDATA)><!ATTLIST Price Negotiable (Yes | No) #REQUIRED >

]><GarageSale><Date>10 Feb 06</Date><Place>Here</Place> <Item>

<Name>hammer</Name><Price Negotiable="Yes">10</Price>

</Item></GarageSale>

DTD

XML

-34 -

Creating some XML for practiceStep 1: Extend the Garage Sale

• Objective: Extend the garage sale in these ways.Work with a friendWrite the XML, then the DTD (it's easier this way)Write it on paper. I will come see!

1. Add a contact phone number. (Just use PCDATA).2. Add a sales item, e. g. 'nail', whose price is not negotiable.3. Add a text element to Item, which is 'Description'.4. Add an attribute to 'Description', that specifies whether the description is in French or English.

-35 -

Creating some XML for practice Step 2: Create your own Document

• First: Verbal description of a simple thing to modelBe creative! But not too complicated.(if you can't think of a topic: how about a photo album, with

annotations (meta-data) associated with each image.(OR – how about your term project!?)

Make a simple 'prototype' to show how it would look, like this:

Place Taken:Nice, FranceDate Taken: 15 Jan 07Time of Day: 2 PMCamera: Fuji DigitalLocation: Bai d'angesPeople: Carole Mann

Release form signed?(yes) (no)

-36 -

Creating some XML Step 2: Create your own Document

• Second: Develop your example XML, then a DTDWork in pairs – two heads are thicker than one!

I will put up to the Garage Sale exampleso you have something to look at, as a model.

Place Taken:Nice, FranceDate Taken: 15 Jan 07Time of Day: 2 PMCamera: Fuji DigitalLocation: Bai d'angesPeople: Carole Mann

Release form signed?(yes) (no)

-37 - -37 -

A template for your usein the in-class design exercise.

• Objective: Represent a garage sale.

<!DOCTYPE GarageSale[<!ELEMENT GarageSale (Date, Place, Item+)><!ELEMENT Date (#PCDATA)><!ELEMENT Place (#PCDATA)><!ELEMENT Item (Name, Price)>

<!ELEMENT Name (#PCDATA)><!ELEMENT Price (#PCDATA)><!ATTLIST Price Negotiable (Yes | No) #REQUIRED >

]><GarageSale><Date>10 Feb 06</Date><Place>Here</Place> <Item>

<Name>hammer</Name><Price Negotiable="Yes">10</Price>

</Item></GarageSale>

DTD

XML

-38 - -38 -

What else about XML? a) Yes, it will be on midterm! b) No, we have not yet discussed

XML Namespaces

So that topic will NOT be on midtermbut it will be introduced later.