Upload
nola
View
63
Download
0
Embed Size (px)
DESCRIPTION
XML- Extensible Markup Language. HTML to XML. HTML documents Emerging Web Standards - XML XML good for data interchange across platforms enterprise wide conversion HTML to XML - IBM, Microsoft. XML - Motivation. - PowerPoint PPT Presentation
Citation preview
1
• XML- Extensible Markup Language
2
HTML to XML
• HTML documents
• Emerging Web Standards - XML
• XML good for data interchange across platforms enterprise wide
• conversion HTML to XML - IBM, Microsoft
3
XML - Motivation• In HTML, both the tag semantics and tags are
fixed. There is limited and strict interpretation of tags.
• HTML is widely successful in disseminating documents across internet.
• Though data can be disseminated through HTML, its extraction is painful, and laborious.
• EDI has been a predominate mode of exchanging data among businesses. But it has very rigid format that requires highly customized applications.
4
XML - Introduction• XML aims to provide ease of authoring HTML
documents with ease of data exchange that is possible with EDI.
• Tags are used to markup documents.
• XML is a meta-language for describing markup languages.
• XML provides a facility to define tags and structural relationships between them.
• No pre-defined tag set implied no preconceived semantics, semantics of XML document is defined by applications that process them
5
XML - Goals• Straightforward to use over internet
• Support wide variety of applications, authoring, browsing, content analysis, etc.
• Easy to write programs that process XML documents and validate them.
• XML documents must be human-legible and reasonably clear.
• Design of XML shall be formal and concise - expressed as EBNF (extended Backus Naur Form) - amenable to modern compiler tools and techniques.
6
XML-features• Some structure - not rigid• Extensibility - User defined tags• nested elements• validation - documents may specify their
own grammar• DTD (Document Type Descriptor) - schema
exists with data as tag names• Application -EDI - extraction, conversion, ,
transformation, integration • can be modeled using DOM
7
More terminology• RDF - Resource Description Framework - a
method to describe metdata for XML documents
• XSL - Extensible Stylesheet Language - language for transforming and formatting XML.
• Transformation Language - XSLT, XPath, Xpointer, Xlink
8
Example-HTML• Print - Sanjay Madria
Web Warehouse Tutorial, ADBIS’99
HTML
<H2> Sanjay Madria </H2>
<I> Web Warehouse Tutorial, ADBIS’99</I>
Very difficult to understand, structure is hidden, describes only appearance
9
XML• <Ref>
<Speaker> <Firstname> Sanjay</firstname>
<Lastname> Madria</lastnaame>
</Speaker>
<Title > Web Warehouse Tutorial</Title>
<Conference> ADBIS’99</Conference>
</empty>
</Ref>
another format:
<Firstname Value “Sanjay”/>
10
• XML can Separate Data from HTML
• XML is used to Exchange Data
• XML can be used to Share Data
• XML can be used to Store Data
• XML can be used to Create new Languages (WML)
11
XML
• <Person> - a start-tag
• </Person> - a end tag
• Tags are also called markups.
• Tags must be balanced; close in inverse order of their opening
• Tags are defined by users, no predefined tags
12
<person>
<name> Alan </name>
<age> 42 </age>
<email> [email protected] </ email >
</person>
Element - <Person>…..</Person>
Subelement – Age
13
• XML elements must follow these naming rules:
• Names can contain letters, numbers, and other characters
• Names must not start with a number or "_" (underscore)
• Names must not start with the letters xml (or XML or Xml ..)
• Names can not contain spaces
14
<table><description> People on the fourth floor </description>
<people><person>
<name> Alan </name><age> 42 </age><email> [email protected] </ email >
</person><person>
<name> Patsy </name><age> 36 </age><email> [email protected] </ email >
</person><person>
<name> Ryan </name><age> 58 </age><email> [email protected] </ email >
</person>
</people></table>
15
<married></married>
Can be abbreviated to
<married/>
16
XML Attributes(Name, value) pair
<product>
<name language=“French”> trompette six trous </name>
<price currency=“Euro”> 420.12 </price>
<address format=“XLB56” language=“French”>
<street>31 rue Croix-Bosset</ street>
<zip>92310</zip><city>Sevres</city>
<country>France</country>
</address>
</product>
Att.
17
• Attributes takes always string values (“..”)
• A given attribute may occur only once within a tag, while subelements within same tag can repeat attributes
18
• XML tags are case sensitive
• With XML, White Space is Preserved
• <b><i>This text is bold and italic</b></i>
• Ok in HTML• <b><i>This text is bold and
italic</i></b>
19
• XML Elements are Extensible
• Extract to
• MESSAGE To: ToveFrom: Jani
• Don't forget me this weekend!
20
<?xml version="1.0" ?> - <note> <to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
21
• <note> <date>1999-08-01</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
• No problem
22
• Book Title: My First XML
• Chapter 1: Introduction to XML
• What is HTML
• What is XML
• Chapter 2: XML Syntax
• Elements must have a closing tag
• Elements must be correctly nested
23
• <book> • <title>My First XML</title> • <prod id="33-657"
media="paper"></prod>• <chapter>Introduction to XML • <para>What is HTML</para> • <para>What is XML</para> • </chapter> • <chapter>XML Syntax <para>Elements
must have a closing tag</para> <para>Elements must be properly nested</para> </chapter>
• </book>
24
• <person sex="female"> <firstname>Anna</firstname> <lastname>Smith</lastname>
• <person> <sex>female</sex> <firstname>Anna</firstname> <lastname>Smith</lastname> </person>
25
Bad Design
• <note day="12" month="11" year="99" to="Tove" from="Jani" heading="Reminder" body="Don't forget me this weekend!"> </note>
26
• <note date="12/11/99"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
27
• <note> <date>12/11/99</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
28
• <note> <date> <day>12</day> <month>11</month> <year>99</year> </date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
29
• PCDATA• XML parsers treat all text as Parsable
Characters (PCDATA).• When an XML element is parsed, the text
between the XML tags is also parsed:• CDATA• Everything inside a CDATA section is
ignored by the parser.• Starts with "<![CDATA[" and ends with
"]]>":
30
<person> <name> Alan </name>
<age> 42 </age>
<email> [email protected] </ email >
</person>
or
<person name=“Alan” age = “42” email = “[email protected]” />
or
<person age = “42” >
<name> Alan </name>
<email> [email protected] </ email >
</person>
32
• XML can associates unique identifier to elements, as the value of certain attribute Called id
• Refer that element using idref
33
• <messages> • <note ID="501"> • <to>Tove</to> • <from>Jani</from>
<heading>Reminder</heading> <body>Don't forget me this weekend!</body>
• </note> • <note ID="502"> <to>Jani</to>
<from>Tove</from> <heading>Re: Reminder</heading> <body>I will not!</body> </note>
• </messages>
34
<state id=“s2”><scode>NE</scode><sname>Nevada</sname>
</state><city id=“c2”>
<ccode>CCN</ccode><cname>Carson City</cname><state-of idref = “s2”/>
</city>
35
a
b c
a
36
<a><b id=“&o123”> some string </b></a>
<a c=“&o123”/>
Assume c as reference attribute
<a b=“&o123”/>
<a><c id=“&o123”> some string </b></a>
Assume b as reference attribute
37
<geography><states>
<state id=“s1”><scode>ID</scode><sname>Idaho</sname><capital idref=“c1”/><cities-in idref=“c1”/><cities-in idref=“c3”/>……
</state><state id=“s2”>
<scode>NE</scode><sname>Nevada</sname><capital idref=“c2”/><cities-in idref=“c2”/>…….
</state>….
</states>
38
<cities><city id=“c1”>
<ccode>BOI</ccode><cname>Boise</cname><state-of idref = “s1”/>
</city><city id=“c2”>
<ccode>CCN</ccode><cname>Carson City</cname><state-of idref = “s2”/>
</city><city id=“c3”>
<ccode>MOC</ccode><cname>Moscow</cname><state-of idref = “s1”/>
</city>…
</cities>
</geography>
39
Ordering
person:{firstname: “John”, lastname:“Smith”}
person:{lastname: “Smith”,firstname: “John”}
As SSD, both are same
40
These two are not same as XML documents
<person><firstname>John</firstname>
<lastname>Smith </lastname></person>
<person><lastname>Smith </lastname> <firstname>John</firstname></person>
The following two are equivalent as attributes are not ordered
<person firstname=“John”lastname=“Smith”/>
<person lastname=“Smith” firstname=“John”/>
41
Mixing elements and Text
<Person>
This is my best friend
<Name> Alan </Name>
<Age> 42 </Age>
I am not too sure of the following email
<Email> [email protected] </Email >
</Person>
42
<!- - this is a comment - -> - Comments are allowed anywhere except inside markup and is a part of the document.
<?xml-stylesheet href=“book.css” type=“text/css”?> - Processing instructions for applications
<?xml version=“1.0”?> This is not PI, not passed to application.
<![CDATA[<start>this is an incorrect element </end>]]>
<!DOCTYPE name [markupdeclarations]><?xml….?><!DOCTYPE name [markupdeclarations]><name>…</name>
43
<db><person> <name> Alan </name>
<age> 42 </age>
<email> [email protected] </ email >
</person>
<person>… </person>
…
</db>
<!DOCTYPE db [
<!ELEMENT db (person*)>
<!ELEMENT person (name,age,email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT age (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
44
Recursion
<!ELEMENT node (leaf | (node,node))> <!ELEMENT leaf (#PCDATA)>
An example of such XML document is<node>
<node><node> <leaf> 1 </leaf> </node><node> <leaf> 2 </leaf> </node>
</node><node>
<leaf> 3 </leaf></node>
</node>
45
<db>
<r1><a> a1 </a><b> b1 </b><c> c1 </c></r1>
<r1><a> a2 </a><b> b2 </b><c> c2 </c></r1>
<r2><c> c2 </c><d> d2 </d></r2><r2><c> c3 </c><d> d3 </d></r2> <r2><c> c4 </c><d> d4 </d></r2>
<db>
46
<!DOCTYPE db [
<!ELEMENT db (r1*,r2*)>
<!ELEMENT r1 (a,b,c)>
<!ELEMENT r2 (c,d)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT b (#PCDATA)>
<!ELEMENT c (#PCDATA)>
<!ELEMENT d (#PCDATA)>
]>
47
<!ELEMENT r2 ((c,d) | (d,c))>
<!ELEMENT db ((r1|r2)*)>
<!ELEMENT r1 (a,b?,c+)>
<!DOCTYPE db [<!ELEMENT …>…]>
<!DOCTYPE db SYSTEM “schema.dtd”>
<!DOCTYPE db SYSTEM “http://www.schemaauthority.com/schema.dtd”>
48
<product><name language=“French” department = “music”>
trompette six trous </name><price currency=“Euro”> 420.12 </price>
</product>
<!ATTLIS name language CDATA #REQUIRED
department CDATA #IMPLIED><!ATTLIS price currency CDATA #IMPLIED>
49
IDREF – attribute’s value is some other element’s identifier
iDREFS – attribute’s value is a list of identifiers, separated by spaces
<!DOCTYPE family [
<!ELEMENT family (person*)>
<!ELEMENT person (name)>
<!ELEMENT name (#PCDATA)>
<!ATTLIS person id ID #REQUIRED
mother IDREF #IMPLIED
father IDREF #IMPLIED
children IDREFS #IMPLIED>]>
50
<family><person id=“jane” mother=“mary” father=“john”><name> Jane Doe </name></person><person id=“john” children =“jane jack” ><name> John Doe </name></person><person id=“mary” children =“jane jack” ><name> Mary Smith </name></person><person id=“jack” mother=“smith” father=“john”><name> Jack Smith </name></person>
</family>
51
<!DOCTYPE geography [<!ELEMENT geography (state | city)*><!ELEMENT state (scode,sname,capital,cities-in*)>
<!ATTLIST state id ID #REQUIRED><!ELEMENT scode (#PCDATA)><!ELEMENT sname (#PCDATA)><!ELEMENT capital EMPTY>
<!ATTLIST capital idref IDREF #REQUIRED><!ELEMENT cities-in EMPTY>
<!ATTLIST cities-in idref IDREF #REQUIRED><!ELEMENT city (ccode,cname,state-of)>
<!ATTLIST city id ID><!ELEMENT ccode (#PCDATA)><!ELEMENT cname (#PCDATA)><!ELEMENT state-of EMPTY>
<!ATTLIST state-of idref IDREF #REQUIRED>]>
52
<capital idref=“…”/><!DOCTYPE geography [
…<!ELEMENT state (scode,sname,capital,cities-in)>…<!ELEMENT cities-in EMPTY>
<!ATTLIST cities-in idrefs IDREFS #REQUIRED>…
]>
<!ATTLIST state id ID #REQUIREDcapital IDREF #REQUIREDcities-in IDREFS #REQUIRED >
53
<?xml version=“1.0”?><!DOCTYPE report [<!ENTITY %abstract SYSTEM “/u/abitebou/LEBOOK/abstract”><!ENTITY %content SYSTEM “/u/suciu/LEBOOK/lebook”>]><report>
<meta keywords=“xml,www,web,semistructured”author=“Abiteboul,Buneman,Suciu”date=“25.12.98”/>
<title>Data on the web </title>%abstract;%content;
</report>
54
Limitations of DTD
• Impose Order• No notion of atomic type, for example “age” can
be integer, but in DTD, it will be PCDATA• No constraints• Do not constrain the type of IDREFs; state-of
must be an identifier of a state element, while cities-in must be of type city
• Name tag may corresponds to classname and student name both
55
<A xml:Links=“simple” HREF=“http:www-rocq.inria.fr/verso”>
Verso Group </A>
<AHREF=mailto:[email protected]>Serge</A>
<node xml:link=“simple” HREF=
“report.xml#root().child(2,section).child(3,subsection)”>
56
<payment-choice xml:link=“extended”inline=“true”role=“form”title=“choose form of payment”><target xml:link=“locator”
href=http://foo.bar.fr/visa-form.xml”role=“form”title=“visa”show=“embed”actuate=“user”bahavior=“type;field=1;no-scroll” />
<target xml:link=“locator”href=http://foo.bar.fr/klee-form.xml”role=“form”title=“kleebox”show=“embed”actuate=“user”bahavior=“type;field=1;no-scroll” />
</payment-choice>
57
Where<book>
<publisher><name>Morgan Kaufmann</name></publisher>
<title> $T</title>
<author> $A</author>
</book> in “www.a.b.b/bib.xml”
Construct $A
The pattern above corresponds to
{book: { publisher: {name : “Morgan Kaufmann”},
title: T,
author A}}
58
Where <book><publisher><name>Morgan Kaufmann</></><title> $T</><author> $A</></> in “www.a.b.b/bib.xml”<book year=“1991”>
<!- - A good introductory text--><title> An Introduction to Parallel Algorithms and Architectures </title> <author> <lastname> Leighton </lastname></author><publisher><name> Morgan Kaufmann </name> </publisher>
</book> <book year=“1995”>
<title> Active database systems </title> <author> <lastname> Ceri </lastname></author><author> <lastname> Widom </lastname></author><publisher><name> Morgan Kaufmann </name> </publisher>
</book>
59
Construct <result>
<author> $A</>
<title> $T</>
</>
<publisher><name> Morgan Kaufmann </name> </publisher>
<result> <author> …</author>
<title> …</title> </ result >
60
<answer>Where <book>
<publisher><name>Morgan Kaufmann</></><title> $T</><author> $A</></> in “www.a.b.b/bib.xml”
<result><author> <lastname> Leighton </lastname></author><title> An Introduction to Parallel Algorithms and Architectures </title>
</result><result>
<author> <lastname> Ceri </lastname></author><title> Active database systems </title>
</result><result>
<author> <lastname> Widom </lastname></author><title> Active database systems </title>
</result>Construct <result>
<author> $A</><title> $T</>
</></answer>
61
Where <book> <title> $T </title><price> $p</price></book> in “www.a.b.c/bib.xml”’,
Construct <result> <booktitle> $T </booktitle><bookprice> $P </bookprice>
</result>The above query will ignore books having no price attribute!
Where <book> $B</book> in “www.a.b.c/bib.xml”’,<title> $T </title> in $B
Construct <result> <booktitle> $T </booktitle>where <price> $P </price> in $Bconstruct <bookprice> $P </bookprice>
</result>
62
<result> <booktitle> …</booktitle> </result><result> <booktitle> …</booktitle>
<bookprice> … </bookprice></result><result> <booktitle> …</booktitle>
<bookprice> … </bookprice></result><result> <booktitle> …</booktitle> </result><result> <booktitle> …</booktitle>
<bookprice> … </bookprice></result>…
63
Where <book> <author> $A </></> in “www.a.b.c/bib.xml”’,
Construct <result>
<author> $A </>
Where <book> <author> $A </author>
<title> $T </title>
</book> in “www.a.b.c/bib.xml”’,
Construct <title> $T</>
</>
64
Where <book><publisher><name>Morgan Kaufmann</name></publisher>
</book>element_as $B in “abc.xml”
Construct $BWhere <book>$T</ book> in “abc.xml”
<publisher><name>Morgan Kaufmann</name></publisher> in $T
Construct<book>$T</book>
Where <book><publisher><name>Morgan Kaufmann</name></publisher>
</book>content_as $C in “abc.xml”Construct <result> $C</result>Where <book>$C</ book> in “abc.xml”
<publisher><name>Morgan Kaufmann</name></publisher> in $CConstruct <result> $C</result>
65
Where <book language=“French”><title></> element_as $T </> in “abc.xml”
Construct $T
Where <book language=“$L”></> in “abc.xml”Construct <result>$L</result>
Where <book><author> $A </author> </book>content_as $B1 in “abc.xml”,
<book><author> $A </author> </book>content_as $B2 in “abc.xml”,
B1 != B2Construct <result>$A</result>
66
Where <$P> <title> $T </title>
<year> 1995</>
<$E>Smith</>
</> in “www.a.b.c/bib.xml”
$E in {author, editor}
Construct <$P> <title> $T </title>
<$E> Smith </>
</>
67
<!ELEMENT part (name, brand, part*)>
<!ELEMENT name (PCDATA)>
<!ELEMENT brand (PCDATA)>
Where <part*><name> $R </> <brand> Ford</></>
in “www.a.b.c/bib.xml”
Construct <result> $R </>
68
<name> $R </> <brand> Ford</><part><name> $R </> <brand> Ford</></><part><part><name> $R </> <brand> Ford</></></><part><part><part><name> $R </> <brand> Ford</></></></>…Where <$*><name> $r </> <brand> Ford</></>
in “www.a.b.c/bib.xml”Construct <result> $r </>
<*,brand> Ford</>
Where <part+.(subpart|component.piece)>$r </>in “www.a.b.c/parts.xml”
Construct <result> $r </>
69
<a> <b> $X </b><c> $Y</c></a><a> <b> 1 </b><c> 2</c></a><a> <c> 2 </c><b> 1</b></a>
<a> <b> 1 </b><c> 2</c> <b> 3 </b><c> 4</c>
</a>
<a> <c> $Y </c><b> $X</b></a>
70
<a[$I]>…</>
<$X[$J]>…</>
Where <person> $P </> in “www.a.b.c/people.xml”,
<firstname[$I]> $X </> in $P,
<lastname[$J] >$Y </> in $P,
$J < $I
Construct <person> $P </>
71
Where <pub> $P </> in “www.a.b.c/bib.xml”,
<title> $T </> in $P,
<year> $Y </> in $P
<month> $Z </> in $P
Order-by $Y,$Z
Construct <result> $T </>
72
Where <pub> $P </> in “www.a.b.c/bib.xml”,
Construct <pub> where <author[$I]> $A </> in $POrder-by $I descending
Construct <author> $A </>
where <$E> $V </> in $P
$E != “author”
construct <$E> $V </>
</pub>
73
<bib> <book><title>t1</title><author>a1</author><author>a2</author>
</book> <paper><title>t2</title>
<author>a3</author><author>a4</author>
</paper><book><title>t3</title>
<author>a5</author><author>a6</author><author>a7</author>
</book></bib>
74
<xs1:template>
<xs1:apply-templates/>
</xs1:template>
<xs1:template match=“/bib/*/title”>
<result>
<xs1:value-of/>
</result>
</xs1:template>
75
<result>t1</result>
<result>t2</result>
<result>t3</result>
Where <bib> <book> <title> $T </> </> </> in input
Construct <result> $T </result>
76
bib matches a bib element
* matches any element
/ matches the root
/bib matches a bib element immediately after root
bib/paper matches a paper following a bib
bib//paper matches a paper following a bib at any depth
//paper matches a paper at any depth
paper|book matches a paper or a book
@language matches a language attribute
bib/book/@language the language attribute of a book
77
<!- - this is comment 1- - ><!- - this is comment 2- - ><bib>…<bib><!- - this is comment 3- - >
bib/paper[year and publisher/name and @language]
<xs1:template match=“pattern”>template
<xs1:template>
78
<xs1:template match=“A”> <xs1:element name=“B”>
<xs1:value-of/> <xs1:element/>
</xs1:template>
<xs1:template match=“A”><B><xs1:value-of/></B>
</xs1:template>
<xs1:template match=“*”> <xs1:element name=“name()”>
<xs1:value-of/> </xs1:element>
</xs1:template>
79
<xs1:template match=“/”><HTML>
<HEAD><TITLE>Bibliography Entries</TITLE>
</HEAD><BODY>
<xs1:apply-templates/> </BODY>
</HTML></xs1:template><xs1:template match=“title”>
<TD><xs1:value-of/>
</TD></xs1:template><xs1:template match=“author”>
<TD><xs1:value-of/>
</TD></xs1:template>
80
<xs1:template match=“book”><TR>
<xs1:apply-templates select=“title”/> <xs1:apply-templates select=“author”/>
</TR></xs1:template><xs1:template match=“bib”>
<TABLE><TBODY>
<xs1:apply-templates/> </TBODY>
</TABLE></xs1:template>
81
<HTML><HEAD>
<TITLE>Bibliography Entries</TITLE></HEAD><BODY>
<TABLE><TBODY>
<TR><TD>t1</TD><TD>a1</TD><TD>a2</TD></TR><TR><TD>t2</TD><TD>a3</TD><TD>a4</TD></TR><TR><TD>t3</TD><TD>a5</TD><TD>a6 </TD><TD>
a7</TD></TR> </TBODY>
</TABLE> </BODY>
</HTML>