Web Information Systems XML

Preview:

Citation preview

XML

Web Information Systems - 2015

XML

eXtensible Markup Language

w3c standard

Why? Store and transport data

Easy data exchange

Create more languages WSDL (Web Service Description Language)

RDF (Resource Description Framework)

RSS (Really Simple Syndication)

Self-describing data

Easy to learn

Must learn

3 Major Components

XML

XSL (eXtensible Stylesheet Language)

Style sheet language for XML documents XSD (XML Schema Definition)

Describes the structure of an XML document

XML Document

<?xml version="1.0"?>

<!-- this is a sample -->

<note>

<to>Tove</to>

<from source=”contacts”>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

Processing Instruction

Comment

Element

Attribute

XML Documents

Well formed and Valid

Well formed

Should only contain one root element

All tags should have corresponding end tag

Tags never overlap(<author><name> …

</author></name>)

Attributes must be quoted Valid

Must be well formed and conforms to the schema

XML Documents

Has tree structure

Tags are case sensitive

<name> is different from <Name> Comments

<!-- this is a comment -->

XML Elements

Can contain

Other elements

Text

Attributes Valid names

<name>, <first_name>, <first2names> Invalid names

<2nd_name>, <$amount>, <first name>

XML elements and Attributes

Data goes as elements

<person><name>john</name></person> Meta data goes as attributes

<image type='gif'><name>graph.gif</name></image>

1.0 vs 1.1

• 1.0 – everything not permitted is forbidden

• 1.1 – everything not forbidden is permitted

• 1.0 is compatible with 1.1, not vise-versa

• Forward compatible

• Does not affect to English documents

XML Namespaces

There can be common elements in multiple domains

File in hardware and office<file>

<length>18</length>

<price>3.69</price>

<file>

<file>

<content>Employee data</content>

<numberOfPages>25</numberOfPages>

</file>

XML Namespaces

How to distinguish?

Solution : namespaces<h:file xmlns:h="http://www.hardware.com/">

<h:length>18</h:length>

<h:price>3.69</h:price>

<h:file>

<o:file xmlns:o="http://www.office.com/people">

<o:content>Employee data</o:content>

<o:numberOfPages>25</o:numberOfPages>

</o:file>

XML Namespaces

How to distinguish?

Solution : namespaces<h:file xmlns:h="http://www.hardware.com/">

<h:length>18</h:length>

<h:price>3.69</h:price>

<h:file>

<o:file xmlns:o="http://www.office.com/people">

<o:content>Employee data</o:content>

<o:numberOfPages>25</o:numberOfPages>

</o:file>

XML Parsers

• A piece of software which reads the content from the XML documents and present it

to the application

• Implementing xml parser

• Java way• SAX (Simple API for XML)

• DOM (Document Object Model)

• StAX (Streaming API for XML)

XML Parsers

XML Parser Demos

XML Parsers

Feature StAX SAX DOM

API Type Pull, Streaming Push, Streaming In memory tree

Ease of Use High Medium High

XPath Capability No No Yes

CPU and Memory

Efficiency

Good Good Varies

Forward Only Yes Yes No

Read XML Yes Yes Yes

Write XML Yes No Yes

CRUD No No Yes

XSL

• XSL is a language for expressing stylesheets

• eXtensible Stylesheet Language

• XSLT (XSL Transformations)

• XPath

• XML vocabulary for specifying formatting semantics

XPath

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>

<book>

<title lang="en">Harry Potter</title>

<price>29.99</price>

</book>

<book>

<title lang="en">Learning XML</title>

<price>39.95</price>

</book>

</bookstore>

/bookstore

XPath

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>

<book>

<title lang="en">Harry Potter</title>

<price>29.99</price>

</book>

<book>

<title lang="en">Learning XML</title>

<price>39.95</price>

</book>

</bookstore>

/bookstore/book

XPath

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>

<book>

<title lang="en">Harry Potter</title>

<price>29.99</price>

</book>

<book>

<title lang="en">Learning XML</title>

<price>39.95</price>

</book>

</bookstore>

//book

XPath

• Few Examples

How to refer to the body element ? /note/body [ '/' means root ]

How to get the source attribute ? /note/from/@source

How to get all elements with a source attribute ? //*[@source]

XSLT

• A language to convert XML documents to other

formats

• w3c Recommendation

• Uses XPath

Recommended