28
1 XML eXtensible Markup Language

1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

Embed Size (px)

Citation preview

Page 1: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

1

XML

eXtensible Markup Language

Page 2: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

2

Introduction and Motivation

Dr. Praveen Madiraju

Modified from Dr.Sagiv’s slides

Page 3: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

3

XML vs. HTML

• HTML is a HyperText Markup language– Designed for a specific application,

namely, presenting and linking hypertext documents

• XML describes structure and content (“semantics”)– The presentation is defined separately

from the structure and the content

Page 4: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

4

An Address Book asan XML document

<addresses><person>

<name> Donald Duck</name><tel> 414-222-1234 </tel><email> [email protected] </email>

</person><person>

<name> Miki Mouse</name><tel> 123-456-7890 </tel><email>[email protected]</email>

</person></addresses>

Page 5: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

5

Main Features of XML

• No fixed set of tags– New tags can be added for new

applications• An agreed upon set of tags can be

used in many applications– Namespaces facilitate uniform and

coherent descriptions of data• For example, a namespace for address

books determines whether to use <tel> or <phone>

Page 6: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

6

Main Features of XML (cont’d)

• XML has the concept of a schema– DTD and the more expressive XML

Schema• XML is a data model

– Similar to the semistructured data model

• XML supports internationalization (Unicode) and platform independence (an XML file is just a character file)

Page 7: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

7

XML is the Standard forData Exchange

• Web services (e.g., ecommerce) require exchanging data between various applications that run on different platforms

• XML (augmented with namespaces) is the preferred syntax for data exchange on the Web

Page 8: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

8

XML is not Alone• XML Schemas strengthen the data-modeling

capabilities of XML (in comparison to XML with only DTDs)

• XPath is a language for accessing parts of XML documents

• XLink and XPointer support cross-references• XSLT is a language for transforming XML

documents into other XML documents (including XHTML, for displaying XML files)– Limited styling of XML can be done with CSS

alone

• XQuery is a lanaguage for querying XML documents

Page 9: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

9

The Two Facets of XML

• Some XML files are just text documents with tags that denote their structure and include some metadata (e.g., an attribute that gives the name of the person who did the proofreading)– See an example on the next slide– XML is a subset of SGML (Standard

Generalized Markup Language)

• Other XML documents are similar to database files (e.g., an address book)

Page 10: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

10

XML can Describethe Structure of a Document

• <book year="1994"><title>TCP/IP Illustrated</title><author>

<last>Stevens</last><first>W.</first>

</author><publisher>Addison-Wesley</publisher><price>65.95</price>

</book>

Page 12: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

12

The Structure of XML• XML consists of tags and text• Tags come in pairs <date> ... </date>• They must be properly nested

– good <date> ... <day> ... </day> ... </date>

– bad <date> ... <day> ... </date>... </day>

(You can’t do <i> ... <b> ... </i> ...</b> in HTML)

Page 13: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

13

A Useful AbbreviationAbbreviating elements with empty contents:• <br/> for <br></br>• <hr width=“10”/> for <hr width=“10”></hr>For example:

<family> <person id = “lisa”>

<name> Lisa Simpson </name> <mother idref = “marge”/>

<father idref = “homer”/></person>...

</family>

Note that a tag may have a set of attributes, each consisting of a name and a value

Page 14: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

14

XML TextXML has only one “basic” type – text

It is bounded by tags, e.g., <title> The Big Sleep </title> <year> 1935 </ year> – 1935 is still

text

• XML text is called PCDATA – (for parsed character data)

• It uses a 16-bit encoding, e.g., \&\#x0152 for the Hebrew letter Mem

Page 15: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

15

XML Structure

• Nesting tags can be used to express various structures, e.g., a tuple (record):

<person><name> Lisa Simpson</name><tel> 02-828-1234 </tel><tel> 054-470-777 </tel><email> [email protected] </email>

</person>

Page 16: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

16

XML Structure (cont’d)

• We can represent a list by using the same tag repeatedly:

<addresses><person> … </person><person> … </person><person> … </person><person> … </person>…

</addresses>

Page 17: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

17

XML Structure (cont’d)<addresses>

<person><name> Donald Duck</name><tel> 04-828-1345 </tel><email> [email protected] </email>

</person><person>

<name> Miki Mouse</name><tel> 03-426-1142 </tel><email>[email protected]</email>

</person></addresses>

Page 18: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

18

TerminologyThe segment of an XML document between an opening and a corresponding closing tag is called an element

<person> <name> Bart Simpson </name>

<tel> 02 – 444 7777 </tel> <tel> 051 – 011 022 </tel>

<email> [email protected] </email> </person>

element

element, a sub-element of

not an element

Page 19: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

19

An XML Document is a Treeperson

name emailtel tel

Bart Simpson

02 – 444 7777

051 – 011 022

[email protected]

Leaves are either empty or contain PCDATA

Page 20: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

20

Mixed ContentAn element may contain a mixture of sub-elements and PCDATA

<airline> <name> British Airways </name> <motto> World’s <dubious> favorite</dubious>

airline </motto></airline>

Page 21: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

21

The Header Tag

• <?xml version="1.0" standalone="yes/no" encoding="UTF-8"?>– Standalone=“no” means that there is an

external DTD

– You can leave out the encoding attribute and the processor will use the UTF-8 default

Page 22: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

22

Processing Instructions<?xml version="1.0"?><?xml-stylesheet  href="doc.xsl"

type="text/xsl"?>

<!DOCTYPE doc SYSTEM "doc.dtd">

<doc>Hello, world!<!-- Comment 1 --></doc>

<?pi-without-data?><!-- Comment 2 --><!-- Comment 3 -->

Page 23: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

23

Using CDATA<HEAD1>

Entering a Kennel Club Member

</HEAD1>

<DESCRIPTION>Enter the member by the name on his or her papers. Use the NAME tag. The NAME tag has two attributes. Common (all in lowercase, please!) is the dog's call name. Breed (also in all lowercase) is the dog's breed. Please see the breed reference guide for acceptable breeds. Your entry should look something like this:

</DESCRIPTION>

<EXAMPLE><![CDATA[<NAME common="freddy" breed"=springer-spaniel">Sir Fredrick of Ledyard's End</NAME>]]>

</EXAMPLE>

We want to seethe text as is,even though

it includes tags

Page 25: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

25

Well-Formed XML Documents

• An XML document (with or without a DTD) is well-formed if– Tags are syntactically correct

– Every tag has an end tag

– Tags are properly nested

– There is a root tag

– A start tag does not have two occurrences of the same attribute

An XML document must be well formed

Page 26: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

26

Representing relational databases

A relational database for school:student: course:

enroll:

cno title credit

331 DB 3.0350 Web 3.0… … …

id name gpa

001 J oe 3.0002 Mary 4.0… … …

id cno

001 331001 350002 331… …

Page 27: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

27

XML representation<school>

<student id=“001”>

<name> Joe </name> <gpa> 3.0 </gpa>

</student>

<student id=“002”>

<name> Mary </name> <gpa> 4.0 </gpa>

</student>

<course cno=“331”>

<title> DB </title> <credit> 3.0 </credit>

</course>

<course cno=“350”>

<title> Web </title> <credit> 3.0 </credit>

</course>

Page 28: 1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides

28

XML representation

<enroll><id> 001 </id> <cno> 331 </cno>

</enroll><enroll>

<id> 001 </id> <cno> 350 </cno></enroll><enroll>

<id> 002 </id> <cno> 331 </cno></enroll>

</school>