57
1 • XML- Extensible Markup Language

XML- Extensible Markup Language

  • Upload
    mele

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

XML- Extensible Markup Language. HTML to XML. HTML documents Emerging Web Standards - XML XML good for data interchange across platforms enterprise wide conversion HTML to XML - IBM, Microsoft. XML - Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: XML- Extensible Markup Language

1

• XML- Extensible Markup Language

Page 2: XML- Extensible Markup Language

2

HTML to XML

• HTML documents

• Emerging Web Standards - XML

• XML good for data interchange across platforms enterprise wide

• conversion HTML to XML - IBM, Microsoft

Page 3: XML- Extensible Markup Language

3

XML - Motivation• In HTML, both the tag semantics and tags are

fixed. There is limited and strict interpretation of tags.

• HTML is widely successful in disseminating documents across internet.

• Though data can be disseminated through HTML, its extraction is painful, and laborious.

• EDI has been a predominate mode of exchanging data among businesses. But it has very rigid format that requires highly customized applications.

Page 4: XML- Extensible Markup Language

4

XML - Introduction• XML aims to provide ease of authoring HTML

documents with ease of data exchange that is possible with EDI.

• Tags are used to markup documents.

• XML is a meta-language for describing markup languages.

• XML provides a facility to define tags and structural relationships between them.

• No pre-defined tag set implied no preconceived semantics, semantics of XML document is defined by applications that process them

Page 5: XML- Extensible Markup Language

5

XML - Goals• Straightforward to use over internet

• Support wide variety of applications, authoring, browsing, content analysis, etc.

• Easy to write programs that process XML documents and validate them.

• XML documents must be human-legible and reasonably clear.

• Design of XML shall be formal and concise - expressed as EBNF (extended Backus Naur Form) - amenable to modern compiler tools and techniques.

Page 6: XML- Extensible Markup Language

6

XML-features• Some structure - not rigid• Extensibility - User defined tags• nested elements• validation - documents may specify their

own grammar• DTD (Document Type Descriptor) - schema

exists with data as tag names• Application -EDI - extraction, conversion, ,

transformation, integration • can be modeled using DOM

Page 7: XML- Extensible Markup Language

7

More terminology• RDF - Resource Description Framework - a

method to describe metdata for XML documents

• XSL - Extensible Stylesheet Language - language for transforming and formatting XML.

• Transformation Language - XSLT, XPath, Xpointer, Xlink

Page 8: XML- Extensible Markup Language

8

Example-HTML• Print - Sanjay Madria

Web Warehouse Tutorial, ADBIS’99

HTML

<H2> Sanjay Madria </H2>

<I> Web Warehouse Tutorial, ADBIS’99</I>

Very difficult to understand, structure is hidden, describes only appearance

Page 9: XML- Extensible Markup Language

9

XML• <Ref>

<Speaker> <Firstname> Sanjay</firstname>

<Lastname> Madria</lastnaame>

</Speaker>

<Title > Web Warehouse Tutorial</Title>

<Conference> ADBIS’99</Conference>

</empty>

</Ref>

another format:

<Firstname Value “Sanjay”/>

Page 10: XML- Extensible Markup Language

10

• XML can Separate Data from HTML

• XML is used to Exchange Data

• XML can be used to Share Data

• XML can be used to Store Data

• XML can be used to Create new Languages (WML)

Page 11: XML- Extensible Markup Language

11

XML

• <Person> - a start-tag

• </Person> - a end tag

• Tags are also called markups.

• Tags must be balanced; close in inverse order of their opening

• Tags are defined by users, no predefined tags

Page 12: XML- Extensible Markup Language

12

<person>

<name> Alan </name>

<age> 42 </age>

<email> [email protected] </ email >

</person>

Element - <Person>…..</Person>

Subelement – Age

Page 13: XML- Extensible Markup Language

13

• XML elements must follow these naming rules:

• Names can contain letters, numbers, and other characters

• Names must not start with a number or "_" (underscore)

• Names must not start with the letters xml (or XML or Xml ..)

• Names can not contain spaces

Page 14: XML- Extensible Markup Language

14

<table><description> People on the fourth floor </description>

<people><person>

<name> Alan </name><age> 42 </age><email> [email protected] </ email >

</person><person>

<name> Patsy </name><age> 36 </age><email> [email protected] </ email >

</person><person>

<name> Ryan </name><age> 58 </age><email> [email protected] </ email >

</person>

</people></table>

Page 15: XML- Extensible Markup Language

15

<married></married>

Can be abbreviated to

<married/>

Page 16: XML- Extensible Markup Language

16

XML Attributes(Name, value) pair

<product>

<name language=“French”> trompette six trous </name>

<price currency=“Euro”> 420.12 </price>

<address format=“XLB56” language=“French”>

<street>31 rue Croix-Bosset</ street>

<zip>92310</zip><city>Sevres</city>

<country>France</country>

</address>

</product>

Att.

Page 17: XML- Extensible Markup Language

17

• Attributes takes always string values (“..”)

• A given attribute may occur only once within a tag, while subelements within same tag can repeat attributes

Page 18: XML- Extensible Markup Language

18

• XML tags are case sensitive

• With XML, White Space is Preserved

• <b><i>This text is bold and italic</b></i>

• Ok in HTML• <b><i>This text is bold and

italic</i></b>

Page 19: XML- Extensible Markup Language

19

• XML Elements are Extensible

• Extract to

• MESSAGE To: ToveFrom: Jani

• Don't forget me this weekend!

Page 20: XML- Extensible Markup Language

20

<?xml version="1.0" ?> - <note>  <to>Tove</to>  

<from>Jani</from>  

<heading>Reminder</heading>  

<body>Don't forget me this weekend!</body>  

</note>

Page 21: XML- Extensible Markup Language

21

• <note> <date>1999-08-01</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

• No problem

Page 22: XML- Extensible Markup Language

22

• Book Title: My First XML

• Chapter 1: Introduction to XML

• What is HTML

• What is XML

• Chapter 2: XML Syntax

• Elements must have a closing tag

• Elements must be correctly nested

Page 23: XML- Extensible Markup Language

23

• <book> • <title>My First XML</title> • <prod id="33-657"

media="paper"></prod>• <chapter>Introduction to XML • <para>What is HTML</para> • <para>What is XML</para> • </chapter> • <chapter>XML Syntax <para>Elements

must have a closing tag</para> <para>Elements must be properly nested</para> </chapter>

• </book>

Page 24: XML- Extensible Markup Language

24

• <person sex="female"> <firstname>Anna</firstname> <lastname>Smith</lastname>

• <person> <sex>female</sex> <firstname>Anna</firstname> <lastname>Smith</lastname> </person>

Page 25: XML- Extensible Markup Language

25

Bad Design

• <note day="12" month="11" year="99" to="Tove" from="Jani" heading="Reminder" body="Don't forget me this weekend!"> </note>

Page 26: XML- Extensible Markup Language

26

• <note date="12/11/99"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

Page 27: XML- Extensible Markup Language

27

• <note> <date>12/11/99</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

Page 28: XML- Extensible Markup Language

28

• <note> <date> <day>12</day> <month>11</month> <year>99</year> </date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

Page 29: XML- Extensible Markup Language

29

• PCDATA• XML parsers treat all text as Parsable

Characters (PCDATA).• When an XML element is parsed, the text

between the XML tags is also parsed:• CDATA• Everything inside a CDATA section is

ignored by the parser.• Starts with "<![CDATA[" and ends with

"]]>":

Page 30: XML- Extensible Markup Language

30

<person> <name> Alan </name>

<age> 42 </age>

<email> [email protected] </ email >

</person>

or

<person name=“Alan” age = “42” email = “[email protected]” />

or

<person age = “42” >

<name> Alan </name>

<email> [email protected] </ email >

</person>

Page 31: XML- Extensible Markup Language

31

person

emailage

name

person

ageemailname

Alan 42 [email protected] 42 [email protected]

Page 32: XML- Extensible Markup Language

32

• XML can associates unique identifier to elements, as the value of certain attribute Called id

• Refer that element using idref

Page 33: XML- Extensible Markup Language

33

• <messages> • <note ID="501"> • <to>Tove</to> • <from>Jani</from>

<heading>Reminder</heading> <body>Don't forget me this weekend!</body>

• </note> • <note ID="502"> <to>Jani</to>

<from>Tove</from> <heading>Re: Reminder</heading> <body>I will not!</body> </note>

• </messages>

Page 34: XML- Extensible Markup Language

34

<state id=“s2”><scode>NE</scode><sname>Nevada</sname>

</state><city id=“c2”>

<ccode>CCN</ccode><cname>Carson City</cname><state-of idref = “s2”/>

</city>

Page 35: XML- Extensible Markup Language

35

a

b c

a

Page 36: XML- Extensible Markup Language

36

<a><b id=“&o123”> some string </b></a>

<a c=“&o123”/>

Assume c as reference attribute

<a b=“&o123”/>

<a><c id=“&o123”> some string </b></a>

Assume b as reference attribute

Page 37: XML- Extensible Markup Language

37

<geography><states>

<state id=“s1”><scode>ID</scode><sname>Idaho</sname><capital idref=“c1”/><cities-in idref=“c1”/><cities-in idref=“c3”/>……

</state><state id=“s2”>

<scode>NE</scode><sname>Nevada</sname><capital idref=“c2”/><cities-in idref=“c2”/>…….

</state>….

</states>

Page 38: XML- Extensible Markup Language

38

<cities><city id=“c1”>

<ccode>BOI</ccode><cname>Boise</cname><state-of idref = “s1”/>

</city><city id=“c2”>

<ccode>CCN</ccode><cname>Carson City</cname><state-of idref = “s2”/>

</city><city id=“c3”>

<ccode>MOC</ccode><cname>Moscow</cname><state-of idref = “s1”/>

</city>…

</cities>

</geography>

Page 39: XML- Extensible Markup Language

39

Ordering

person:{firstname: “John”, lastname:“Smith”}

person:{lastname: “Smith”,firstname: “John”}

As SSD, both are same

Page 40: XML- Extensible Markup Language

40

These two are not same as XML documents

<person><firstname>John</firstname>

<lastname>Smith </lastname></person>

<person><lastname>Smith </lastname> <firstname>John</firstname></person>

The following two are equivalent as attributes are not ordered

<person firstname=“John”lastname=“Smith”/>

<person lastname=“Smith” firstname=“John”/>

Page 41: XML- Extensible Markup Language

41

Mixing elements and Text

<Person>

This is my best friend

<Name> Alan </Name>

<Age> 42 </Age>

I am not too sure of the following email

<Email> [email protected] </Email >

</Person>

Page 42: XML- Extensible Markup Language

42

<!- - this is a comment - -> - Comments are allowed anywhere except inside markup and is a part of the document.

<?xml-stylesheet href=“book.css” type=“text/css”?> - Processing instructions for applications

<?xml version=“1.0”?> This is not PI, not passed to application.

<![CDATA[<start>this is an incorrect element </end>]]>

<!DOCTYPE name [markupdeclarations]><?xml….?><!DOCTYPE name [markupdeclarations]><name>…</name>

Page 43: XML- Extensible Markup Language

43

<db><person> <name> Alan </name>

<age> 42 </age>

<email> [email protected] </ email >

</person>

<person>… </person>

</db>

<!DOCTYPE db [

<!ELEMENT db (person*)>

<!ELEMENT person (name,age,email)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT age (#PCDATA)>

<!ELEMENT email (#PCDATA)>

]>

Page 44: XML- Extensible Markup Language

44

Recursion

<!ELEMENT node (leaf | (node,node))> <!ELEMENT leaf (#PCDATA)>

An example of such XML document is<node>

<node><node> <leaf> 1 </leaf> </node><node> <leaf> 2 </leaf> </node>

</node><node>

<leaf> 3 </leaf></node>

</node>

Page 45: XML- Extensible Markup Language

45

<db>

<r1><a> a1 </a><b> b1 </b><c> c1 </c></r1>

<r1><a> a2 </a><b> b2 </b><c> c2 </c></r1>

<r2><c> c2 </c><d> d2 </d></r2><r2><c> c3 </c><d> d3 </d></r2> <r2><c> c4 </c><d> d4 </d></r2>

<db>

Page 46: XML- Extensible Markup Language

46

<!DOCTYPE db [

<!ELEMENT db (r1*,r2*)>

<!ELEMENT r1 (a,b,c)>

<!ELEMENT r2 (c,d)>

<!ELEMENT a (#PCDATA)>

<!ELEMENT b (#PCDATA)>

<!ELEMENT c (#PCDATA)>

<!ELEMENT d (#PCDATA)>

]>

Page 47: XML- Extensible Markup Language

47

<!ELEMENT r2 ((c,d) | (d,c))>

<!ELEMENT db ((r1|r2)*)>

<!ELEMENT r1 (a,b?,c+)>

<!DOCTYPE db [<!ELEMENT …>…]>

<!DOCTYPE db SYSTEM “schema.dtd”>

<!DOCTYPE db SYSTEM “http://www.schemaauthority.com/schema.dtd”>

Page 48: XML- Extensible Markup Language

48

<product><name language=“French” department = “music”>

trompette six trous </name><price currency=“Euro”> 420.12 </price>

</product>

<!ATTLIS name language CDATA #REQUIRED

department CDATA #IMPLIED><!ATTLIS price currency CDATA #IMPLIED>

Page 49: XML- Extensible Markup Language

49

IDREF – attribute’s value is some other element’s identifier

iDREFS – attribute’s value is a list of identifiers, separated by spaces

<!DOCTYPE family [

<!ELEMENT family (person*)>

<!ELEMENT person (name)>

<!ELEMENT name (#PCDATA)>

<!ATTLIS person id ID #REQUIRED

mother IDREF #IMPLIED

father IDREF #IMPLIED

children IDREFS #IMPLIED>]>

Page 50: XML- Extensible Markup Language

50

<family><person id=“jane” mother=“mary” father=“john”><name> Jane Doe </name></person><person id=“john” children =“jane jack” ><name> John Doe </name></person><person id=“mary” children =“jane jack” ><name> Mary Smith </name></person><person id=“jack” mother=“smith” father=“john”><name> Jack Smith </name></person>

</family>

Page 51: XML- Extensible Markup Language

51

<!DOCTYPE geography [<!ELEMENT geography (state | city)*><!ELEMENT state (scode,sname,capital,cities-in*)>

<!ATTLIST state id ID #REQUIRED><!ELEMENT scode (#PCDATA)><!ELEMENT sname (#PCDATA)><!ELEMENT capital EMPTY>

<!ATTLIST capital idref IDREF #REQUIRED><!ELEMENT cities-in EMPTY>

<!ATTLIST cities-in idref IDREF #REQUIRED><!ELEMENT city (ccode,cname,state-of)>

<!ATTLIST city id ID><!ELEMENT ccode (#PCDATA)><!ELEMENT cname (#PCDATA)><!ELEMENT state-of EMPTY>

<!ATTLIST state-of idref IDREF #REQUIRED>]>

Page 52: XML- Extensible Markup Language

52

<capital idref=“…”/><!DOCTYPE geography [

…<!ELEMENT state (scode,sname,capital,cities-in)>…<!ELEMENT cities-in EMPTY>

<!ATTLIST cities-in idrefs IDREFS #REQUIRED>…

]>

<!ATTLIST state id ID #REQUIREDcapital IDREF #REQUIREDcities-in IDREFS #REQUIRED >

Page 53: XML- Extensible Markup Language

53

<?xml version=“1.0”?><!DOCTYPE report [<!ENTITY %abstract SYSTEM “/u/abitebou/LEBOOK/abstract”><!ENTITY %content SYSTEM “/u/suciu/LEBOOK/lebook”>]><report>

<meta keywords=“xml,www,web,semistructured”author=“Abiteboul,Buneman,Suciu”date=“25.12.98”/>

<title>Data on the web </title>%abstract;%content;

</report>

Page 54: XML- Extensible Markup Language

54

Limitations of DTD

• Impose Order• No notion of atomic type, for example “age” can

be integer, but in DTD, it will be PCDATA• No constraints• Do not constrain the type of IDREFs; state-of

must be an identifier of a state element, while cities-in must be of type city

• Name tag may corresponds to classname and student name both

Page 55: XML- Extensible Markup Language

55

<xs1:template match=“/”><HTML>

<HEAD><TITLE>Bibliography Entries</TITLE>

</HEAD><BODY>

<xs1:apply-templates/> </BODY>

</HTML></xs1:template><xs1:template match=“title”>

<TD><xs1:value-of/>

</TD></xs1:template><xs1:template match=“author”>

<TD><xs1:value-of/>

</TD></xs1:template>

Page 56: XML- Extensible Markup Language

56

<xs1:template match=“book”><TR>

<xs1:apply-templates select=“title”/> <xs1:apply-templates select=“author”/>

</TR></xs1:template><xs1:template match=“bib”>

<TABLE><TBODY>

<xs1:apply-templates/> </TBODY>

</TABLE></xs1:template>

Page 57: XML- Extensible Markup Language

57

<HTML><HEAD>

<TITLE>Bibliography Entries</TITLE></HEAD><BODY>

<TABLE><TBODY>

<TR><TD>t1</TD><TD>a1</TD><TD>a2</TD></TR><TR><TD>t2</TD><TD>a3</TD><TD>a4</TD></TR><TR><TD>t3</TD><TD>a5</TD><TD>a6 </TD><TD>

a7</TD></TR> </TBODY>

</TABLE> </BODY>

</HTML>