XML – eXtensible Markup Language

  • View

  • Download

Embed Size (px)


XML eXtensible Markup Language. The World Wide Web and What We Would Like to Do with It. XML has a lot of hype surrounding it This week we discuss: Why XML is needed Basic technologies used together with XML In the next few weeks: challenges in using XML. XML in One Slide. - PowerPoint PPT Presentation


  • XML eXtensible Markup Language

  • The World Wide Web and What We Would Like to Do with ItXML has a lot of hype surrounding itThis week we discuss:Why XML is neededBasic technologies used together with XMLIn the next few weeks: challenges in using XML

  • XML in One SlideBasically, XML looks like HTML.However, in XML, you can use any tag names that you wantExample:

    Lisa Simpson 02-828-1234 054-470-777 lisa@cs.huji.ac.il

    Is that all? Big Deal?!

  • Motivation (1): The Semantic Web

  • Example 1: A Homepage on the WebTom Sawyer's Homepage

    Tom's Friends

    Tom's Hobbies:Boating on the Mississippi RiverChewing GumPainting the Fence

  • Web Pages are Written in HTMLHTML is a markup language An HTML page consists of tags with attributes and dataHTML describes the style of the page (e.g., color, font type, etc.)

  • Tom Sawyer's Homepage Hi'ya all. Did you know that my best friend is Huckleberry Finn? Sometimes, I like Becky Thatcher? Here are some of my hobbies:

    Boating on the Mississippi River Chewing gum Painting the fence

    If you want to discuss common interests, contact me attom@mark.twain

  • Automatically Using InformationTom Sawyer has a homepage. So do a lot of other people. It would be nice to be able to do the following things automatically (via a computer program)Querying the Page: Find Tom Sawyer's email address and the names of his friendsQuerying Similar Pages: Find people who have interests in common with Tom Sawyer

  • Automatically Using InformationSite Personalization: Tom Sawyer's interests should be automatically recognized by sitesWhen Tom Sawyer enters Amazon, he should get "book recommendations" that match his interestsWhen Tom Sawyer enters a site that sells food, he should be told about sales on gumThis should all happen without Tom having to tell every site about his interests

  • Can we Automatically use the Information?In order to perform the tasks described before, we have to:Find web pages that describe peopleExtract the relevant informationProblems:How can we know if a page describes a person?How can we know what to extract? (Everyone has their own style for their homepage...)How can we "understand" the extracted information (What parts of the page describe which information?)

  • Example 2: Weather ForecastingNational Weather Service: Weather Forecasting and Weather AlertsFlood Alerts in Mississippi

  • Wouldn't it be great ifWouldn't it be great if Tom could get automatic updates of weather problems in Mississippi? It is dangerous to go boating if there are floods

  • Example 3: News AlertsYahoo NewsTraffic Jam in the Mississippi River

  • Wouldn't it be great ifWouldn't it be great if Tom could get automatic updates of important news related to Mississippi? He might want to choose a different river to go boating

  • Can these things be done?Once again, we need to FIND the relevant pages and EXTRACT the relevant dataHTML pages are constantly changingHow can we figure out what data is relevant and what the data is talking about automatically? (even when the page changes)HTML describes only style and not meaning (or semantics)

  • Two Basic ApproachesIf the information on the Web was neatly organized in a huge database, these problems could be solved. But its not What should we do?AI, NLP Approach: Use smart techniques to recognize information, e.g., recognize patterns about how things are writtenDB Approach: Turn the Web in to a database, by writing it in XML

  • The Semantic WebThe Semantic Web is a machine-understandable WebThe meaning of data (i.e., the semantics of data) should be encoded together with the dataTim Berners-Lee, the inventor of the Web (by putting together the ideas of hyper-text, TCP/IP, DNS) is one of the main people behind the Semantic Web

  • Main Technologies NeededXML: The syntax for marking up text with meaningRDF: Defines objects and relationships between themOWL: Defines ontologies which connect different concepts (e.g., a car is an automobile, a car is a type of locamotive)Web Services: Allow services given online to be accessed programmatically Here is a simplified version of how it could work

  • Thomas Sawyer Male English Huckleberry Finn

    Simplified version of the FOAF standard

  • Is there XML on the Web? (1)The weather forecasting site exports its forecasts as RSS (a standard for marking up news) - this data can easily be used by a program

  • Is there XML on the Web? (2)Yahoo News (seen before) exports its news as RSS - this data can easily be used by a program

  • The Skys The Limit: Doctors appointmentThe Semantic Web, Scientific American, May 2001

  • Motivation (2): Data Exchange

  • Exchanging DataProblem: Many data sources, each of a different type (different vendor), with a different schema. How can the data be combined and used together?How can different companies collaborate on their data?What (proprietary?) format should be used to exchange the data?

  • Usage Scenario: Company CollaborationSeveral companies want to collaborateNeed to share dataEach company has a different type of database system with a different schema

    Solution: Agree on a XML schema for exchange. Import to and export from this schema

  • Motivation (3): Separating Content From Style

  • Web Site DevelopmentWeb sites develop over timeImportant to separate style from data in order to allow changes to the site structure and appearanceCSS separates style from data only in a limited way HTML will still have tables, lists, etcUsing XML, we can store data aloneUsing XSL, this data can be translated into HTMLThe data can be translated differently as the site develops

  • Write Once Use EverywhereXML Stock Data

  • XML Syntax

  • HTMLUsed for publishing hypertext on the World-Wide WebDesigned to describe how a Web browser should arrange text, images and push-buttons on a pageEasy to learn, but does not convey structureFixed tag set

  • HTML Example

    Welcome to the DBI course


  • XML Vs. HTMLXML and HTML are brothers. They are both special cases of SGML.HTML has specific tag and attribute names. These are associated with a specific meaningXML can have any tag and attribute name. These are not associated with any meaningHTML is used to specify visual styleXML is used to specify meaningHTMLXMLSGML

  • TerminologyThe segment of an XML document between an opening and a corresponding closing tag is called an element Bart Simpson 02 444 7777 051 011 022 bart@tau.ac.il

  • XML Document is a TreeXML documents are abstractly modeled as trees, as reflected by their nestingSometimes, XML documents are graphs (by using IDs and IDREFs)Bart Simpson02 444 7777051 011 022bart@tau.ac.il

  • Example XML Fragment

    Donald Duck 04-828-1345 04-828-1374 donald@cs.technion.ac.il Miki Mouse 03-426-1142

  • Another ExampleAn element may contain a mixture of sub-elements and PCDATA

    British Airways Worlds favoriteairline

  • A Complete XML Document

    Lisa Simpson 02-828-1234 054-470-777 lisa@cs.huji.ac.il


  • Attributes An opening tag may contain attributes These are typically used to describe the contents of an element

    cheese fromage branza A food made

  • When to Use Attributes Its not always clear when to use attributes

    L. Simpson lisa@cs.huji.ac.il ...

    123 4589 L. Simpson lisa@cs.huji.ac.il ...

  • When to Use Attributes Its not always clear when to use attributes

    L. Simpson lisa@cs.huji.ac.il ...

    123 4589 L. Simpson lisa@cs.huji.ac.il ...

    General Rule:

    Use an element if you need to nest dataUse an attribute for IDs, i.e., identifying data

    More on this soon

  • Rules for XML (1)XML is order sensitive, i.e. the following are different:

    XML is case-sensitive, i.e., the following are different: , ,

    cheese fromage

    fromage cheese

  • Rules for XML (2) Tags come in pairs ...They must be properly nested. Which of the following are good? ... ... ... ... ... ... ... ... There is a special shortcut for tags that have no text in between them (bachelor tags)

  • Rules for XML (3)There should be exactly one top-level element. This element is also called the root element Which of the following is legal?

    Is this legal?

    Is this legal? You tell me.

  • Well Formed DocumentsA document is well-formed if itobeys all the above rules, and in addition does not repeat an attribute within a tag, i.e., the following is illegal:

  • Tables Versus XMLCan you easily represent the contents of a table in XML?Example: Projects(title, budget, managedBy), Employees(name, age, ssn)Can you easily represent the contents of an XML document in a table?Example: Remember the phone book