Upload
mele
View
45
Download
0
Tags:
Embed Size (px)
DESCRIPTION
XML- Extensible Markup Language. HTML to XML. HTML documents Emerging Web Standards - XML XML good for data interchange across platforms enterprise wide conversion HTML to XML - IBM, Microsoft. XML - Motivation. - PowerPoint PPT Presentation
Citation preview
1
• XML- Extensible Markup Language
2
HTML to XML
• HTML documents
• Emerging Web Standards - XML
• XML good for data interchange across platforms enterprise wide
• conversion HTML to XML - IBM, Microsoft
3
XML - Motivation• In HTML, both the tag semantics and tags are
fixed. There is limited and strict interpretation of tags.
• HTML is widely successful in disseminating documents across internet.
• Though data can be disseminated through HTML, its extraction is painful, and laborious.
• EDI has been a predominate mode of exchanging data among businesses. But it has very rigid format that requires highly customized applications.
4
XML - Introduction• XML aims to provide ease of authoring HTML
documents with ease of data exchange that is possible with EDI.
• Tags are used to markup documents.
• XML is a meta-language for describing markup languages.
• XML provides a facility to define tags and structural relationships between them.
• No pre-defined tag set implied no preconceived semantics, semantics of XML document is defined by applications that process them
5
XML - Goals• Straightforward to use over internet
• Support wide variety of applications, authoring, browsing, content analysis, etc.
• Easy to write programs that process XML documents and validate them.
• XML documents must be human-legible and reasonably clear.
• Design of XML shall be formal and concise - expressed as EBNF (extended Backus Naur Form) - amenable to modern compiler tools and techniques.
6
XML-features• Some structure - not rigid• Extensibility - User defined tags• nested elements• validation - documents may specify their
own grammar• DTD (Document Type Descriptor) - schema
exists with data as tag names• Application -EDI - extraction, conversion, ,
transformation, integration • can be modeled using DOM
7
More terminology• RDF - Resource Description Framework - a
method to describe metdata for XML documents
• XSL - Extensible Stylesheet Language - language for transforming and formatting XML.
• Transformation Language - XSLT, XPath, Xpointer, Xlink
8
Example-HTML• Print - Sanjay Madria
Web Warehouse Tutorial, ADBIS’99
HTML
<H2> Sanjay Madria </H2>
<I> Web Warehouse Tutorial, ADBIS’99</I>
Very difficult to understand, structure is hidden, describes only appearance
9
XML• <Ref>
<Speaker> <Firstname> Sanjay</firstname>
<Lastname> Madria</lastnaame>
</Speaker>
<Title > Web Warehouse Tutorial</Title>
<Conference> ADBIS’99</Conference>
</empty>
</Ref>
another format:
<Firstname Value “Sanjay”/>
10
• XML can Separate Data from HTML
• XML is used to Exchange Data
• XML can be used to Share Data
• XML can be used to Store Data
• XML can be used to Create new Languages (WML)
11
XML
• <Person> - a start-tag
• </Person> - a end tag
• Tags are also called markups.
• Tags must be balanced; close in inverse order of their opening
• Tags are defined by users, no predefined tags
12
<person>
<name> Alan </name>
<age> 42 </age>
<email> [email protected] </ email >
</person>
Element - <Person>…..</Person>
Subelement – Age
13
• XML elements must follow these naming rules:
• Names can contain letters, numbers, and other characters
• Names must not start with a number or "_" (underscore)
• Names must not start with the letters xml (or XML or Xml ..)
• Names can not contain spaces
14
<table><description> People on the fourth floor </description>
<people><person>
<name> Alan </name><age> 42 </age><email> [email protected] </ email >
</person><person>
<name> Patsy </name><age> 36 </age><email> [email protected] </ email >
</person><person>
<name> Ryan </name><age> 58 </age><email> [email protected] </ email >
</person>
</people></table>
15
<married></married>
Can be abbreviated to
<married/>
16
XML Attributes(Name, value) pair
<product>
<name language=“French”> trompette six trous </name>
<price currency=“Euro”> 420.12 </price>
<address format=“XLB56” language=“French”>
<street>31 rue Croix-Bosset</ street>
<zip>92310</zip><city>Sevres</city>
<country>France</country>
</address>
</product>
Att.
17
• Attributes takes always string values (“..”)
• A given attribute may occur only once within a tag, while subelements within same tag can repeat attributes
18
• XML tags are case sensitive
• With XML, White Space is Preserved
• <b><i>This text is bold and italic</b></i>
• Ok in HTML• <b><i>This text is bold and
italic</i></b>
19
• XML Elements are Extensible
• Extract to
• MESSAGE To: ToveFrom: Jani
• Don't forget me this weekend!
20
<?xml version="1.0" ?> - <note> <to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
21
• <note> <date>1999-08-01</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
• No problem
22
• Book Title: My First XML
• Chapter 1: Introduction to XML
• What is HTML
• What is XML
• Chapter 2: XML Syntax
• Elements must have a closing tag
• Elements must be correctly nested
23
• <book> • <title>My First XML</title> • <prod id="33-657"
media="paper"></prod>• <chapter>Introduction to XML • <para>What is HTML</para> • <para>What is XML</para> • </chapter> • <chapter>XML Syntax <para>Elements
must have a closing tag</para> <para>Elements must be properly nested</para> </chapter>
• </book>
24
• <person sex="female"> <firstname>Anna</firstname> <lastname>Smith</lastname>
• <person> <sex>female</sex> <firstname>Anna</firstname> <lastname>Smith</lastname> </person>
25
Bad Design
• <note day="12" month="11" year="99" to="Tove" from="Jani" heading="Reminder" body="Don't forget me this weekend!"> </note>
26
• <note date="12/11/99"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
27
• <note> <date>12/11/99</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
28
• <note> <date> <day>12</day> <month>11</month> <year>99</year> </date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
29
• PCDATA• XML parsers treat all text as Parsable
Characters (PCDATA).• When an XML element is parsed, the text
between the XML tags is also parsed:• CDATA• Everything inside a CDATA section is
ignored by the parser.• Starts with "<![CDATA[" and ends with
"]]>":
30
<person> <name> Alan </name>
<age> 42 </age>
<email> [email protected] </ email >
</person>
or
<person name=“Alan” age = “42” email = “[email protected]” />
or
<person age = “42” >
<name> Alan </name>
<email> [email protected] </ email >
</person>
32
• XML can associates unique identifier to elements, as the value of certain attribute Called id
• Refer that element using idref
33
• <messages> • <note ID="501"> • <to>Tove</to> • <from>Jani</from>
<heading>Reminder</heading> <body>Don't forget me this weekend!</body>
• </note> • <note ID="502"> <to>Jani</to>
<from>Tove</from> <heading>Re: Reminder</heading> <body>I will not!</body> </note>
• </messages>
34
<state id=“s2”><scode>NE</scode><sname>Nevada</sname>
</state><city id=“c2”>
<ccode>CCN</ccode><cname>Carson City</cname><state-of idref = “s2”/>
</city>
35
a
b c
a
36
<a><b id=“&o123”> some string </b></a>
<a c=“&o123”/>
Assume c as reference attribute
<a b=“&o123”/>
<a><c id=“&o123”> some string </b></a>
Assume b as reference attribute
37
<geography><states>
<state id=“s1”><scode>ID</scode><sname>Idaho</sname><capital idref=“c1”/><cities-in idref=“c1”/><cities-in idref=“c3”/>……
</state><state id=“s2”>
<scode>NE</scode><sname>Nevada</sname><capital idref=“c2”/><cities-in idref=“c2”/>…….
</state>….
</states>
38
<cities><city id=“c1”>
<ccode>BOI</ccode><cname>Boise</cname><state-of idref = “s1”/>
</city><city id=“c2”>
<ccode>CCN</ccode><cname>Carson City</cname><state-of idref = “s2”/>
</city><city id=“c3”>
<ccode>MOC</ccode><cname>Moscow</cname><state-of idref = “s1”/>
</city>…
</cities>
</geography>
39
Ordering
person:{firstname: “John”, lastname:“Smith”}
person:{lastname: “Smith”,firstname: “John”}
As SSD, both are same
40
These two are not same as XML documents
<person><firstname>John</firstname>
<lastname>Smith </lastname></person>
<person><lastname>Smith </lastname> <firstname>John</firstname></person>
The following two are equivalent as attributes are not ordered
<person firstname=“John”lastname=“Smith”/>
<person lastname=“Smith” firstname=“John”/>
41
Mixing elements and Text
<Person>
This is my best friend
<Name> Alan </Name>
<Age> 42 </Age>
I am not too sure of the following email
<Email> [email protected] </Email >
</Person>
42
<!- - this is a comment - -> - Comments are allowed anywhere except inside markup and is a part of the document.
<?xml-stylesheet href=“book.css” type=“text/css”?> - Processing instructions for applications
<?xml version=“1.0”?> This is not PI, not passed to application.
<![CDATA[<start>this is an incorrect element </end>]]>
<!DOCTYPE name [markupdeclarations]><?xml….?><!DOCTYPE name [markupdeclarations]><name>…</name>
43
<db><person> <name> Alan </name>
<age> 42 </age>
<email> [email protected] </ email >
</person>
<person>… </person>
…
</db>
<!DOCTYPE db [
<!ELEMENT db (person*)>
<!ELEMENT person (name,age,email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT age (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
44
Recursion
<!ELEMENT node (leaf | (node,node))> <!ELEMENT leaf (#PCDATA)>
An example of such XML document is<node>
<node><node> <leaf> 1 </leaf> </node><node> <leaf> 2 </leaf> </node>
</node><node>
<leaf> 3 </leaf></node>
</node>
45
<db>
<r1><a> a1 </a><b> b1 </b><c> c1 </c></r1>
<r1><a> a2 </a><b> b2 </b><c> c2 </c></r1>
<r2><c> c2 </c><d> d2 </d></r2><r2><c> c3 </c><d> d3 </d></r2> <r2><c> c4 </c><d> d4 </d></r2>
<db>
46
<!DOCTYPE db [
<!ELEMENT db (r1*,r2*)>
<!ELEMENT r1 (a,b,c)>
<!ELEMENT r2 (c,d)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT b (#PCDATA)>
<!ELEMENT c (#PCDATA)>
<!ELEMENT d (#PCDATA)>
]>
47
<!ELEMENT r2 ((c,d) | (d,c))>
<!ELEMENT db ((r1|r2)*)>
<!ELEMENT r1 (a,b?,c+)>
<!DOCTYPE db [<!ELEMENT …>…]>
<!DOCTYPE db SYSTEM “schema.dtd”>
<!DOCTYPE db SYSTEM “http://www.schemaauthority.com/schema.dtd”>
48
<product><name language=“French” department = “music”>
trompette six trous </name><price currency=“Euro”> 420.12 </price>
</product>
<!ATTLIS name language CDATA #REQUIRED
department CDATA #IMPLIED><!ATTLIS price currency CDATA #IMPLIED>
49
IDREF – attribute’s value is some other element’s identifier
iDREFS – attribute’s value is a list of identifiers, separated by spaces
<!DOCTYPE family [
<!ELEMENT family (person*)>
<!ELEMENT person (name)>
<!ELEMENT name (#PCDATA)>
<!ATTLIS person id ID #REQUIRED
mother IDREF #IMPLIED
father IDREF #IMPLIED
children IDREFS #IMPLIED>]>
50
<family><person id=“jane” mother=“mary” father=“john”><name> Jane Doe </name></person><person id=“john” children =“jane jack” ><name> John Doe </name></person><person id=“mary” children =“jane jack” ><name> Mary Smith </name></person><person id=“jack” mother=“smith” father=“john”><name> Jack Smith </name></person>
</family>
51
<!DOCTYPE geography [<!ELEMENT geography (state | city)*><!ELEMENT state (scode,sname,capital,cities-in*)>
<!ATTLIST state id ID #REQUIRED><!ELEMENT scode (#PCDATA)><!ELEMENT sname (#PCDATA)><!ELEMENT capital EMPTY>
<!ATTLIST capital idref IDREF #REQUIRED><!ELEMENT cities-in EMPTY>
<!ATTLIST cities-in idref IDREF #REQUIRED><!ELEMENT city (ccode,cname,state-of)>
<!ATTLIST city id ID><!ELEMENT ccode (#PCDATA)><!ELEMENT cname (#PCDATA)><!ELEMENT state-of EMPTY>
<!ATTLIST state-of idref IDREF #REQUIRED>]>
52
<capital idref=“…”/><!DOCTYPE geography [
…<!ELEMENT state (scode,sname,capital,cities-in)>…<!ELEMENT cities-in EMPTY>
<!ATTLIST cities-in idrefs IDREFS #REQUIRED>…
]>
<!ATTLIST state id ID #REQUIREDcapital IDREF #REQUIREDcities-in IDREFS #REQUIRED >
53
<?xml version=“1.0”?><!DOCTYPE report [<!ENTITY %abstract SYSTEM “/u/abitebou/LEBOOK/abstract”><!ENTITY %content SYSTEM “/u/suciu/LEBOOK/lebook”>]><report>
<meta keywords=“xml,www,web,semistructured”author=“Abiteboul,Buneman,Suciu”date=“25.12.98”/>
<title>Data on the web </title>%abstract;%content;
</report>
54
Limitations of DTD
• Impose Order• No notion of atomic type, for example “age” can
be integer, but in DTD, it will be PCDATA• No constraints• Do not constrain the type of IDREFs; state-of
must be an identifier of a state element, while cities-in must be of type city
• Name tag may corresponds to classname and student name both
55
<xs1:template match=“/”><HTML>
<HEAD><TITLE>Bibliography Entries</TITLE>
</HEAD><BODY>
<xs1:apply-templates/> </BODY>
</HTML></xs1:template><xs1:template match=“title”>
<TD><xs1:value-of/>
</TD></xs1:template><xs1:template match=“author”>
<TD><xs1:value-of/>
</TD></xs1:template>
56
<xs1:template match=“book”><TR>
<xs1:apply-templates select=“title”/> <xs1:apply-templates select=“author”/>
</TR></xs1:template><xs1:template match=“bib”>
<TABLE><TBODY>
<xs1:apply-templates/> </TBODY>
</TABLE></xs1:template>
57
<HTML><HEAD>
<TITLE>Bibliography Entries</TITLE></HEAD><BODY>
<TABLE><TBODY>
<TR><TD>t1</TD><TD>a1</TD><TD>a2</TD></TR><TR><TD>t2</TD><TD>a3</TD><TD>a4</TD></TR><TR><TD>t3</TD><TD>a5</TD><TD>a6 </TD><TD>
a7</TD></TR> </TBODY>
</TABLE> </BODY>
</HTML>