45
XML – eXtensible Markup XML – eXtensible Markup Language Language

XML – eXtensible Markup Language

Embed Size (px)

DESCRIPTION

XML – eXtensible Markup Language. The World Wide Web and What We Would Like to Do with It. XML has a lot of hype surrounding it This week we discuss: Why XML is needed Basic technologies used together with XML In the next few weeks: challenges in using XML. XML in One Slide. - PowerPoint PPT Presentation

Citation preview

XML – eXtensible Markup XML – eXtensible Markup LanguageLanguage

The World Wide Web and The World Wide Web and What We Would Like to Do with ItWhat We Would Like to Do with It

• XML has a lot of hype surrounding it

• This week we discuss:

– Why XML is needed

– Basic technologies used together with XML

• In the next few weeks: challenges in using

XML

XML in One SlideXML in One Slide

• Basically, XML looks like HTML.

• However, in XML, you can use any tag

names that you want

• Example:<person><name> Lisa Simpson</name><tel> 02-828-1234 </tel><tel> 054-470-777 </tel><email> [email protected] </email>

</person>

Is that all? Big Deal?!

Motivation (1): The Semantic WebMotivation (1): The Semantic Web

Example 1: A Homepage on the WebExample 1: A Homepage on the Web

Tom Sawyer's Homepage

Tom's Friends

Tom's Hobbies:•Boating on the Mississippi River•Chewing Gum•Painting the Fence

Web Pages are Written in HTMLWeb Pages are Written in HTML

• HTML is a markup language

• An HTML page consists of tags with

attributes and data

• HTML describes the style of the page (e.g.,

color, font type, etc.)

<html> <body>

<h1>Tom Sawyer's Homepage</h1> <img src="tom.jpg">

Hi'ya all. Did you know that my best friend is <b>Huckleberry Finn</b>? Sometimes, I like <b>Becky Thatcher</b>?

<p> <font color = "red">

Here are some of my hobbies:

<ul>

<li> Boating on the Mississippi River

<li> Chewing gum

<li> Painting the fence

</ul>

</font>

If you want to discuss common interests, contact me at

<a href="mailto:[email protected]">[email protected]</a>

</body></html>

Automatically Using InformationAutomatically Using Information

• Tom Sawyer has a homepage. So do a lot of

other people. It would be nice to be able to

do the following things automatically (via a

computer program)

– Querying the Page: Find Tom Sawyer's email

address and the names of his friends

– Querying Similar Pages: Find people who

have interests in common with Tom Sawyer

Automatically Using InformationAutomatically Using Information

• Site Personalization: Tom Sawyer's interests

should be automatically recognized by sites

– When Tom Sawyer enters Amazon, he should get

"book recommendations" that match his interests

– When Tom Sawyer enters a site that sells food, he

should be told about sales on gum

– This should all happen without Tom having to tell

every site about his interests

Can we Automatically use the Can we Automatically use the Information?Information?

• In order to perform the tasks described before, we

have to:

– Find web pages that describe people

– Extract the relevant information

• Problems:

– How can we know if a page describes a person?

– How can we know what to extract? (Everyone has their

own style for their homepage...)

– How can we "understand" the extracted information

(What parts of the page describe which information?)

Example 2: Weather ForecastingExample 2: Weather Forecasting

National Weather Service: Weather Forecasting and

Weather AlertsFlood Alerts in

Mississippi

Wouldn't it be great if…Wouldn't it be great if…

Wouldn't it be great if Tom could get

automatic updates of weather problems in

Mississippi? It is dangerous to go boating if there are

floods…

Example 3: News AlertsExample 3: News Alerts

Yahoo NewsTraffic Jam in the Mississippi River

Wouldn't it be great if…Wouldn't it be great if…

Wouldn't it be great if Tom could get

automatic updates of important news

related to Mississippi?

He might want to choose a different

river to go boating…

Can these things be done?Can these things be done?• Once again, we need to FIND the relevant pages

and EXTRACT the relevant data

• HTML pages are constantly changing

• How can we figure out what data is relevant and

what the data is talking about automatically? (even

when the page changes)

• HTML describes only style and not meaning (or

semantics)

It is difficult (perhaps impossible) to perform these tasks

Two Basic ApproachesTwo Basic Approaches

• If the information on the Web was neatly organized

in a huge database, these problems could be

solved.

But its not – What should we do?

• AI, NLP Approach: Use smart techniques to

recognize information, e.g., recognize patterns

about how things are written

• DB Approach: Turn the Web in to a “database”, by

writing it in XML

The Semantic WebThe Semantic Web

• The Semantic Web is a machine-understandable

Web

• The meaning of data (i.e., the semantics of data)

should be encoded together with the data

• Tim Berners-Lee, the inventor of the Web (by

putting together the ideas of hyper-text, TCP/IP,

DNS) is one of the main people behind the

Semantic Web

Main Technologies NeededMain Technologies Needed

• XML: The syntax for marking up text with meaning

• RDF: Defines objects and relationships between

them

• OWL: Defines ontologies which connect different

concepts (e.g., a car is an automobile, a car is a

type of locamotive)

• Web Services: Allow services given online to be

accessed programmatically

Here is a simplified version of how it could work

<Person>

<name>Thomas Sawyer</name>

<gender>Male</gender>

<mbox resource="mailto:[email protected]"/>

<picture resource="http://www.cs.huji.ac.il/~sarina/tom.jpg"/>

<speaks>English</speaks>

<interest resource="Boating on the Mississippi"/>

<interest resource="Chewing Gum" />

<knows>

<Person>

<name>Huckleberry Finn</name>

<mbox resource="mailto:[email protected]"/>

<Person>

</knows>

</Person>

Simplified version of the FOAF standard

Is there XML on the Web? (1)Is there XML on the Web? (1)

• The weather forecasting site exports its forecasts

as RSS (a standard for marking up news) - this

data can easily be used by a program

Is there XML on the Web? (2)Is there XML on the Web? (2)

• Yahoo News (seen before) exports its news as

RSS - this data can easily be used by a program

The Sky’s The Limit: Doctor’s appointmentThe Sky’s The Limit: Doctor’s appointment“The Semantic Web”, “The Semantic Web”, Scientific American, May 2001Scientific American, May 2001

MomPhysician’s Agent

Lucy’s Agent

requiredtreatment

Schedule appointment

Insurance Co.

Provider sites

Rating

in-plan?close-by?

Specialist?

Pete’s Agent

Driving schedule

Motivation (2): Data ExchangeMotivation (2): Data Exchange

Exchanging DataExchanging Data

• Problem: Many data sources, each of a

different type (different vendor), with a different

schema.

– How can the data be combined and used together?

– How can different companies collaborate on their

data?

– What (proprietary?) format should be used to

exchange the data?

Usage Scenario: Company Usage Scenario: Company CollaborationCollaboration

• Several companies want to collaborate

• Need to share data

• Each company has a different type of database

system with a different schema

• Solution: Agree on a XML schema for exchange.

Import to and export from this schema

Motivation (3): Motivation (3): Separating Content From StyleSeparating Content From Style

Web Site DevelopmentWeb Site Development

• Web sites develop over time

• Important to separate style from data in order to allow changes to the site structure and appearance

• CSS separates style from data only in a limited way – HTML will still have tables, lists, etc

• Using XML, we can store data alone

• Using XSL, this data can be translated into HTML

• The data can be translated differently as the site develops

Write Once Use EverywhereWrite Once Use Everywhere

XML Stock Data

XSL

WML(hand-held

devices)

XSL

HTML(web browser

XSL

TEXT(Excel)

XML SyntaxXML Syntax

HTMLHTML

• Used for publishing hypertext on the World-

Wide Web

• Designed to describe how a Web browser

should arrange text, images and push-

buttons on a page

• Easy to learn, but does not convey structure

• Fixed tag set

HTML ExampleHTML Example

<HTML><HEAD><TITLE>Welcome to the DBI course</TITLE></HEAD><BODY>

<H1>Introduction</H1><IMG SRC= "dragon.gif" WIDTH="200" HEIGHT="150" >

</BODY></HTML>

Opening tag

Closing tag

Text (PCDATA)

“Bachelor” tag

Attribute nameAttribute value

XML Vs. HTMLXML Vs. HTML

• XML and HTML are “brothers”. They are both

special cases of SGML.

• HTML has specific tag and attribute names. These

are associated with a specific meaning

• XML can have any tag and attribute name. These

are not associated with any meaning

• HTML is used to specify visual style

• XML is used to specify meaningHTML XML

SGML

TerminologyTerminology

The segment of an XML document between an

opening and a corresponding closing tag is called

an element

<person> <name> Bart Simpson </name>

<tel> 02 – 444 7777 </tel> <tel> 051 – 011 022 </tel>

<email> [email protected] </email> </person>

element

element, a sub-element of

not an element

XML Document is a TreeXML Document is a Tree

• XML documents are abstractly modeled as trees,

as reflected by their nesting

• Sometimes, XML documents are graphs (by using

IDs and IDREFs)

person

name emailtel tel

Bart Simpson

02 – 444 7777

051 – 011 022

[email protected]

Example XML FragmentExample XML Fragment

<addresses><person>

<name> Donald Duck</name><tel> 04-828-1345 </tel><tel> 04-828-1374 </tel><email> [email protected] </email>

</person><person>

<name> Miki Mouse</name><tel> 03-426-1142 </tel>

</person></addresses>

Another ExampleAnother Example

An element may contain a mixture of sub-

elements and PCDATA

<airline> <name> British Airways </name> <motto> World’s <dubious> favorite</dubious>

airline </motto></airline>

A Complete XML DocumentA Complete XML Document

<?XML version ="1.0" encoding="UTF-8" standalone="no"?><!DOCTYPE addresses SYSTEM "http://www.addbook.com/addresses.dtd"><addresses>

<person><name>Lisa Simpson</name><tel> 02-828-1234 </tel><tel> 054-470-777 </tel><email> [email protected] </email>

</person></addresses>

Required

Optional

AttributesAttributes

• An opening tag may contain attributes

• These are typically used to describe the

contents of an element

<entry> <word language = “en”> cheese</word> <word language = “fr”> fromage</word> <word language = “ro”> branza </word> <meaning> A food made … </meaning></entry>

When to Use AttributesWhen to Use Attributes

• It’s not always clear when to use attributes

<person ssno= “123 4589”> <name> L. Simpson

</name> <email> [email protected] </email> ...</person>

<person> <ssno> 123 4589 </ssno> <name> L. Simpson </name> <email> [email protected] </email> ...</person>

When to Use AttributesWhen to Use Attributes

• It’s not always clear when to use attributes

<person ssno= “123 4589”> <name> L. Simpson

</name> <email> [email protected] </email> ...</person>

<person> <ssno> 123 4589 </ssno> <name> L. Simpson </name> <email> [email protected] </email> ...</person>

General Rule:

Use an element if you need to nest dataUse an attribute for “IDs”, i.e., identifying data

More on this soon…

Rules for XML (1)Rules for XML (1)

• XML is order sensitive, i.e. the following are

different:

• XML is case-sensitive, i.e., the following are

different: <person>, <Person>, <PERSON>

<entry> <word language = “en”> cheese</word> <word language = “fr”> fromage</word></entry><entry> <word language = “fr”> fromage</word> <word language = “en”> cheese</word></entry>

Rules for XML (2) Rules for XML (2)

• Tags come in pairs <date> ...</date>

• They must be properly nested. Which of the following are good?– <date> ... <day> ... </day> ... </date>

– <date> ... <day> ... </date>... </day>

– <date> ... <day> ... </day> </Date>

• There is a special shortcut for tags that have no text in between them (bachelor tags)– <person fname=“Sam” lname=“Iam” />

– <person fname=“Sam” lname=“Iam” ></person>

Rules for XML (3)Rules for XML (3)

• There should be exactly one top-level element.

This element is also called the root element

• Which of the following is legal?

<?xml version=“1.0”?><Question> Is this legal? </Question>

<?xml version=“1.0”?><Question> Is this legal? </Question><Answer> You tell me. </Answer>

Well Formed DocumentsWell Formed Documents

• A document is well-formed if it

– obeys all the above rules, and in addition

– does not repeat an attribute within a tag, i.e., the

following is illegal:

<a val=’12’ val=’13’> … </a>

Tables Versus XMLTables Versus XML

• Can you easily represent the contents of a

table in XML?

– Example: Projects(title, budget, managedBy),

Employees(name, age, ssn)

• Can you easily represent the contents of an

XML document in a table?

– Example: Remember the phone book