56
XML eXtensible Markup Language Prof. Muhammad Saeed

XML eXtensible Markup Language Prof. Muhammad Saeed

Embed Size (px)

Citation preview

Page 1: XML eXtensible Markup Language Prof. Muhammad Saeed

XMLeXtensible Markup Language

Prof. Muhammad Saeed

Page 2: XML eXtensible Markup Language Prof. Muhammad Saeed

Truly Portable Data Easily readable by human users Very expressive (semantics near data) Very flexible and customizable (no finite tag set) Easy to use from programs (libs available) Easy to convert into other representations Many additional standards and tools Widely used and supported

Advantages

Page 3: XML eXtensible Markup Language Prof. Muhammad Saeed

XML Basics

Page 4: XML eXtensible Markup Language Prof. Muhammad Saeed

XML BasicsBasic Text

<?xml version = “1.0”?><!-- This is Student Data Xml File Student.xml --><student>

<Name><FirstName> Aaliya </FirstName><LastName> Shaheen </LastName>

</Name><Department> Computer Science </Department><Age> 18.5 </Age>

</student>Processing XML Document ( parsers, processor)Validating XML Document

Document Type Definition, DTD W3C XML Schema

Page 5: XML eXtensible Markup Language Prof. Muhammad Saeed

(Freely definable) tags: student, Name, FirstName,Age, .... with start tag: < student > etc. and end tag: </ student > etc.

Elements: < student > ... </ student > Elements have a name (student) and a content (...) Elements may be nested. Elements may be empty: <this_is_empty/> Element content is typically parsed character data (PCDATA), i.e., strings

with special characters, and/or nested elements (mixed content if both). Each XML document has exactly one root element and forms a tree. Elements with a common parent are ordered.

XML Basics(Tags and Elements)

Page 6: XML eXtensible Markup Language Prof. Muhammad Saeed

<CATALOG><CD>

<TITLE>Nayyara Sings Faiz</TITLE><ARTIST>Nayyara Noor</ARTIST><COUNTRY>Pakistan</COUNTRY><COMPANY>EMI</COMPANY><PRICE>250.00</PRICE><YEAR>1976</YEAR>

</CD><CD>

<TITLE>A Tribute To Faiz Ahmed Faiz</TITLE><ARTIST>Iqbal Bano</ARTIST><COUNTRY>Pakistan</COUNTRY><COMPANY>EMI</COMPANY><PRICE>300.00</PRICE><YEAR>1990</YEAR>

</CD></CATALOG>

XML Example(Elements)

Page 7: XML eXtensible Markup Language Prof. Muhammad Saeed

<?xml version="1.0" encoding="ISO-8859-1"?><note> <to>VC</to> <from>Chairperson</from> <heading>Reminder</heading> <body>Department Meeting on Nov. 11, 2013!</body></note>

<?xml version="1.0" encoding=“UTF-8"?>UTF is Universal character set Transformation Format

XML Another Example

Page 8: XML eXtensible Markup Language Prof. Muhammad Saeed

<person gender="female"> <firstname>Natasha</firstname> <lastname>Ahmed</lastname></person>

XML Attribute

Page 9: XML eXtensible Markup Language Prof. Muhammad Saeed

Elements may have attributes (in the start tag) that have a name and

a value, e.g. <section number=“1“>.

What is the difference between elements and attributes?• Only one attribute with a given name per element (but an arbitrary

number of subelements)• Attributes have no structure, simply strings (while elements can

have subelements)

As a rule of thumb:• Content into elements• Metadata into attributes

Example:<person born=“1912-06-23“ died=“1954-06-07“>

Alan Turing</person> proved that…

Page 10: XML eXtensible Markup Language Prof. Muhammad Saeed

Elements may have attributes (in the start tag) that have a name and

a value, e.g. <section number=“1“>.

What is the difference between elements and attributes?• Only one attribute with a given name per element (but an arbitrary

number of subelements)• Attributes have no structure, simply strings (while elements can

have subelements)

As a rule of thumb:• Content into elements Attributes• Metadata into attributes

Example:<person born=“1912-06-23“ died=“1954-06-07“>

Alan Turing</person> proved that…

Page 11: XML eXtensible Markup Language Prof. Muhammad Saeed

Common Errors in XML Files Placing whitespace character before the XML Declaration. Omitting the start tag or its end tag. Using different cases for start and end tags. Using a whitespace character in an XML element name. Nesting XML tags improperly.

Page 12: XML eXtensible Markup Language Prof. Muhammad Saeed

XML Namespaces<movie: SecondWorldWar> Allies </movie: SecondWorldWar><article: SecondWorldWar> Allies </article: SecondWorldWar>

<dbs:book xmlns:dbs=“http://www-dbs/dbs“>

Unique URI to identify the namespace

Signal that namespace definition happens

Prefix as abbrevation of URI

Page 13: XML eXtensible Markup Language Prof. Muhammad Saeed

<root xmlns:h="http://www.w3.org/TR/html5/"xmlns:f="http://www.w3schools.com/Furniture">

<h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr></h:table>

<f:table> <f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length></f:table>

</root>

Page 14: XML eXtensible Markup Language Prof. Muhammad Saeed

XML Default Namespaces<table xmlns="http://www.w3.org/TR/html4/"> <tr> <td>Apples</td> <td>Bananas</td> </tr></table>

<table xmlns="http://www.w3schools.com/furniture"> <name>African Coffee Table</name> <width>80</width> <length>120</length></table>

Page 15: XML eXtensible Markup Language Prof. Muhammad Saeed

<bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book></bookstore>

XML Tree Structure

Page 16: XML eXtensible Markup Language Prof. Muhammad Saeed

article

author title text

sectionabstract

The index

Areas

Can be traced …

title=“…“

number=“1“

The rooy cause of terrorism ....

Faiza

The Torrorist Phenomenon

Write XML for Tree Structure

Page 17: XML eXtensible Markup Language Prof. Muhammad Saeed

address

name email phone birthday

first last year month day

Page 18: XML eXtensible Markup Language Prof. Muhammad Saeed

Some special characters must be escaped using entities:< → &lt;& → &amp;(will be converted back when reading the XML doc)

Some other characters may be escaped, too:> → &gt;“ → &quot;‘ → &apos;

XML Syntax +

Page 19: XML eXtensible Markup Language Prof. Muhammad Saeed

ValidityDTDSchema

Page 20: XML eXtensible Markup Language Prof. Muhammad Saeed

A well-formed document has a tree structure and obeys all the XML rules.

A particular application may add more rules in either a DTD (document type definition) or in a schema.

Many specialized DTDs and schemas have been created to describe particular areas.

These range from disseminating news bulletins (RSS) to chemical formulas.

DTDs were developed first, so they are not as comprehensive as schema.

Validity

Page 21: XML eXtensible Markup Language Prof. Muhammad Saeed

Sometimes XML is too flexible:

• Most Programs can only process a subset of all possible XML applications

• For exchanging data, the format (i.e., elements, attributes and their semantics) must be fixed

Document Type Definitions (DTD) for establishing the vocabulary for one XML application (in some sense comparable to schemas in databases)

A document is valid with respect to a DTD if it conforms to the rules specified in that DTD.

Most XML parsers can be configured to validate.

Document Type Definitions

Page 22: XML eXtensible Markup Language Prof. Muhammad Saeed

<!ELEMENT article (title,author+,text)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT author (#PCDATA)>

<!ELEMENT text (abstract,section*,literature?)>

<!ELEMENT abstract (#PCDATA)>

<!ELEMENT section (#PCDATA|index)+>

<!ELEMENT literature (#PCDATA)>

<!ELEMENT index (#PCDATA)>

Content of the title element is parsed character data

Content of the article element is a title element, followed by one or more author elements,followed by a text element

Content of the text element may contain zero or more section elements in this position

DTD Example

Page 23: XML eXtensible Markup Language Prof. Muhammad Saeed

Element Declarations in DTDsOne element declaration for each element type:<!ELEMENT element_name content_specification>

where content_specification can be (#PCDATA) parsed character data (child) one child element (c1,…,cn) a sequence of child elements c1…cn (c1|…|cn)one of the elements c1…cn

For each component c, possible counts can be specified: c exactly one such element c+ one or more c* zero or more c? zero or one

Plus arbitrary combinations using parenthesis:<!ELEMENT f ((a|b)*,c+,(d|e))*>

Page 24: XML eXtensible Markup Language Prof. Muhammad Saeed

Elements with mixed content: <!ELEMENT text (#PCDATA|index|cite|glossary)*>

Elements with empty content: <!ELEMENT image EMPTY>

Elements with arbitrary content (this is nothing for production-level DTDs):

<!ELEMENT thesis ANY>

Element Declarations in DTDs

Page 25: XML eXtensible Markup Language Prof. Muhammad Saeed

Attributes are declared per element:<!ATTLIST section number CDATA #REQUIRED

title CDATA #REQUIRED>

element name

attribute name

attribute type

attribute default

Attribute Declarations in DTDs

Page 26: XML eXtensible Markup Language Prof. Muhammad Saeed

Attributes are declared per element:<!ATTLIST section number CDATA #REQUIRED

title CDATA #REQUIRED>

declares two required attributes for element section.

Possible attribute defaults: #REQUIRED is required in each element instance #IMPLIED is optional #FIXED default always has this default value default has this default value if the attribute is

omitted from the element instance

Attribute Declarations in DTDs

Page 27: XML eXtensible Markup Language Prof. Muhammad Saeed

CDATA string data (A1|…|An) enumeration of all possible values of the

attribute (each is XML name) ID unique XML name to identify the element IDREF refers to ID attribute of some other element

(„intra-document link“) IDREFS list of IDREF, separated by white space plus some more

Attribute Types in DTDs

Page 28: XML eXtensible Markup Language Prof. Muhammad Saeed

<ATTLIST publication type (journal|inproceedings) #REQUIRED

pubid ID #REQUIRED>

<ATTLIST cite cid IDREF #REQUIRED>

<ATTLIST citation ref IDREF #IMPLIED

cid ID #REQUIRED>

<publications>

<publication type=“journal“ pubid=“Weikum01“>

<author>Gerhard Weikum</author>

<text>In the Web of 2010, XML <cite cid=„12“/>...</text>

<citation cid=„12“ ref="XML98“/>

<citation cid=„15“>...</citation>

</publication>

<publication type=“inproceedings“ pubid=“XML98“>

<text>XML, the extended Markup Language, ...</text>

</publication>

</publications>

Attribute Example

Page 29: XML eXtensible Markup Language Prof. Muhammad Saeed

<ATTLIST publication type (journal|inproceedings) #REQUIRED

pubid ID #REQUIRED>

<ATTLIST cite cid IDREF #REQUIRED>

<ATTLIST citation ref IDREF #IMPLIED

cid ID #REQUIRED>

<publications>

<publication type=“journal“ pubid=“Weikum01“>

<author>Gerhard Weikum</author>

<text>In the Web of 2010, XML <cite cid=„12“/>...</text>

<citation cid=„12“ ref=„XML98“/>

<citation cid=„15“>...</citation>

</publication>

<publication type=“inproceedings“ pubid=“XML98“>

<text>XML, the extended Markup Language, ...</text>

</publication>

</publications>

Attribute Example

Page 30: XML eXtensible Markup Language Prof. Muhammad Saeed

Document Type Declaration in the XML document:<!DOCTYPE article SYSTEM “http://www-dbs/article.dtd“>

keywords

Root element URI for the DTD

Linking DTD and XML Docs

Page 31: XML eXtensible Markup Language Prof. Muhammad Saeed

Internal DTD:<?xml version=“1.0“?>

<!DOCTYPE article [

<!ELEMENT article (title,author+,text)>

...

<!ELEMENT index (#PCDATA)>

]>

<article>

...

</article>

Both ways can be mixed, internal DTD overwrites external entity information:<!DOCTYPE article SYSTEM „article.dtd“ [

<!ENTITY % pub_content (title+,author*,text)

]>

Linking DTD and XML Docs

Page 32: XML eXtensible Markup Language Prof. Muhammad Saeed

No support for basic data types like integers, doubles, dates, times, …

No structured, self-definable data types No type derivation id/idref links are quite loose (target is not specified)

XML Schema

Flaws of DTDs

Page 33: XML eXtensible Markup Language Prof. Muhammad Saeed

XML Schema is an XML application Provides simple types (string, integer, dateTime,

duration, language, …) Allows defining possible values for elements Allows defining types derived from existing types Allows defining complex types Allows posing constraints on the occurrence of

elements Allows forcing uniqueness and foreign keys

XML Schema Basics

Page 34: XML eXtensible Markup Language Prof. Muhammad Saeed

<xs:schema> <xs:element name=“article“> <xs:complexType> <xs:sequence> <xs:element name=“author“ type=“xs:string“/> <xs:element name=“title“ type=“xs:string“/> <xs:element name=“text“> <xs:complexType> <xs:sequence> <xs:element name=“abstract“ type=“xs:string“/>

<xs:element name=“section“ type=“xs:string“ minOccurs=“0“ maxOccurs=“unbounded“/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element></xs:schema>

Simplified XML Schema Example

Page 35: XML eXtensible Markup Language Prof. Muhammad Saeed

XML QueryXpathXQuery

Page 36: XML eXtensible Markup Language Prof. Muhammad Saeed

XPath and XQuery are query languages for XML data, bothstandardized by the W3C and supported by various database products.Their search capabilities include logical conditions over element and attribute content (first-order predicate logic a la SQL; simple conditions only in XPath) regular expressions for pattern matching of element names along paths or subtrees within XML data+ joins, grouping, aggregation, transformation, etc. (XQuery only)

In contrast to database query languages like SQL an XML querydoes not necessarily (need to) know a fixed structural schema for the underlying data.A query result is a set of qualifying nodes, paths, subtrees,or subgraphs from the underyling data graph,or a set of XML documents constructed from this raw result.

Querying XML with XPath and XQuery

Page 37: XML eXtensible Markup Language Prof. Muhammad Saeed

• XPath is a simple language to identify parts of the XML document (for further processing)

• XPath operates on the tree representation of the document

• Result of an XPath expression is a set of elements or attributes

• Discuss abbreviated version of XPath

XPath

Page 38: XML eXtensible Markup Language Prof. Muhammad Saeed

An XPath expression usually is a location path that consists of location steps, separated by /:/article/text/abstract: selects all abstract elements

A leading / always means the root element Each location step is evaluated in the context of a node

in the tree, the so-called context node Possible location steps:

child element x: select all child elements with name xAttribute @x: select all attributes with name xWildcards * (any child), @* (any attribute) Multiple matches, separated by |: x|y|z

Elements of XPath

Page 39: XML eXtensible Markup Language Prof. Muhammad Saeed

Standard: / (context node is the result of the preceding location step)article/text/abstract (all the abstract nodes of articles)

Select any descendant, not only children: //article//index (any index element in articles)

Select the parent element: .. Select the content node: .

The latter two are important when using predicates.

Combining Location Steps

Page 40: XML eXtensible Markup Language Prof. Muhammad Saeed

• Added with [] to the location step

• Used to restricts elements that qualify as result of a location step to those that fulfil the predicate:– a[b] elements a that have a subelement b– a[@d] elements a that have an attribute d

– Plus conditions on content/value:• a[b=„c“]• A[@d>7]• <, <=, >=, !=, …

Predicates in Location Steps

Page 41: XML eXtensible Markup Language Prof. Muhammad Saeed

/literature/book/author retrieves all book authors:starting with the root, traverses the tree, matches elementnames literature, book, author, and returns elements<author>Suciu, Dan</author>,<author>Abiteboul, Serge</author>, ...,<author><firstname>Jeff</firstname> <lastname>Ullman</lastname></author>

/literature/*/author authors of books, articles, essays, etc.

/literature//author authors that are descendants of literature

/literature//@year value of the year attribute of descendants of literature

/literature//author[firstname] authors that have a subelement firstname

/literature/(book|article)/author authors of books or articles

/literature/book[price < „50“]

/literature/book[author//country = „Germany“]

low priced books

books with German author

XPath by Example

Page 42: XML eXtensible Markup Language Prof. Muhammad Saeed

XQuery is an extremely powerful query language for XML data.A query has the form of a so-called FLWR(For-Let-Where-Order-Return) expression:

FOR $var1 IN expr1, $var2 IN expr2, ...LET $var3 := expr3, $var4 := expr4, ...WHERE conditionRETURN result-doc-construction

The FOR clause evaluates expressions (which may be XPath-stylepath expressions) and binds the resulting elements to variables.For a given binding each variable denotes exactly one element.

The LET clause binds entire sequences of elements to variables.

The WHERE clause evaluates a logical condition with each ofthe possible variable bindings and selects those bindings thatsatisfy the condition.

The RETURN clause constructs, from each of the variable bindings,an XML result tree. This may involve grouping and aggregationand even complete subqueries.

Xquery, Basic Concepts

Page 43: XML eXtensible Markup Language Prof. Muhammad Saeed

// find Web-related articles by Dan Suciu from the year 1998

<results> {FOR $a IN document(“literature.xml“)//article FOR $n IN $a//author, $t IN $a/title WHERE $a/@year = “1998“ AND contains($n, “Suciu“) AND contains($t, “Web“) RETURN <result> $n $t </result> } </results>

// find articles co-authored by authors who have jointly written a book after 1995

<results> {FOR $a IN document(“literature.xml“)//article FOR $a1 IN $a//author, $a2 IN $a//author WHERE SOME $b IN document(“literature.xml“)//book SATISFIES $b//author = $a1 AND $b//author = $a2 AND $b/@year>“1995“ RETURN <result> $a1 $a2 <wrote> $a </wrote> </result> }

</results>

XQuery Examples

Page 44: XML eXtensible Markup Language Prof. Muhammad Saeed

Transforming XML File

Page 45: XML eXtensible Markup Language Prof. Muhammad Saeed

XSLT is used to transform one xml document into another, often an html document.

The Transform classes are now part of Java 1.4. A program is used that takes as input one xml

document and produces as output another. If the resulting document is in html, it can be viewed

by a web browser. This is a good way to display xml data.

XSLT (Extensible Stylesheet Language Transformations)

Page 46: XML eXtensible Markup Language Prof. Muhammad Saeed

<?xml version=“1.0”/>

<address>

<name>Alice Lee</name>

<email>[email protected]</email>

<phone>212-346-1234</phone>

<birthday>1985-03-22</birthday>

</address>

<?xml version="1.0" encoding="ISO-8859-1"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="address"> <html><head><title>Address Book</title></head> <body>

<xsl:value-of select="name"/><br/><xsl:value-of select="email"/><br/><xsl:value-of select="phone"/><br/><xsl:value-of select="birthday"/>

</body> </html> </xsl:template>

</xsl:stylesheet>

A Style Sheet to Transform address.xml

address.xml

ResultAlice Lee

[email protected]

212-346-1234

1985-03-22

Page 47: XML eXtensible Markup Language Prof. Muhammad Saeed

Parsers

Page 48: XML eXtensible Markup Language Prof. Muhammad Saeed

There are two principal models for parsers. SAX – Simple API for XML

Uses a call-back method Similar to javax listeners

DOM – Document Object Model Creates a parse tree Requires a tree traversal

Parsers

Page 49: XML eXtensible Markup Language Prof. Muhammad Saeed

XML in C#

Page 50: XML eXtensible Markup Language Prof. Muhammad Saeed

The .NET XML Classes The parent class for the nodes found in an XML data stream is

called XmlNode Depending upon the specific type of node the XmlNode class has six

derived classes. System.Xml.XmlLinkedNode System.Xml.XmlAttribute System.Xml.XmlDocument System.Xml.XmlDocumentFragment System.Xml.XmlEntity System.Xml.XmlNotation

Page 51: XML eXtensible Markup Language Prof. Muhammad Saeed

The .NET XML Classes

Page 52: XML eXtensible Markup Language Prof. Muhammad Saeed

XmlTextWriter xtw = new XmlTextWriter(@"D:\firstXml.xml",null); xtw.Formatting = Formatting.Indented; xtw.Indentation = 3; xtw.WriteStartDocument(); xtw.WriteComment("This is my XML File"); xtw.WriteEndDocument(); xtw.WriteStartElement("One"); xtw.WriteEndElement(); xtw.Flush(); xtw.Close();

XML Document Writer Class

Page 53: XML eXtensible Markup Language Prof. Muhammad Saeed

XmlTextReader xtr = new XmlTextReader(@"D:\MyXml.xml",null);xtr.WhitespaceHandling=WhitespaceHandling.All;if(xtr.Read()==true){

string s1 = xtr.NodeType.ToString();string s2= xtr.Name.ToString();String s3=xtr.Value.ToString();

} xtr.Close();

XML Document Reader Class

Page 54: XML eXtensible Markup Language Prof. Muhammad Saeed

Document Object Model(DOM)XmlDocument doc1=new Document();doc1.Load(@”D:\myFile.xml”);doc1.Save(@”d:\designdata.xml”);

Page 55: XML eXtensible Markup Language Prof. Muhammad Saeed

csc XMLFile.cs /doc:XMLFile.xml

Compilation of C# File to XML File

Page 56: XML eXtensible Markup Language Prof. Muhammad Saeed

THE END