26
ICS 123 XML: It’s a Good Thing Richard N. Taylor & Eric M. Dashofy ICS 123 S2002

XML: It’s a Good Thing

  • Upload
    reese

  • View
    41

  • Download
    2

Embed Size (px)

DESCRIPTION

XML: It’s a Good Thing. Richard N. Taylor & Eric M. Dashofy ICS 123 S2002. Motivation. “I'll never go hungry again!” – Scarlett O’Hara “I’ll never write a parser again!” – Anonymous XML User Data encoding is a perpetual problem in computer applications - PowerPoint PPT Presentation

Citation preview

Page 1: XML: It’s a Good Thing

ICS 123

XML: It’s a Good Thing

Richard N. Taylor & Eric M. DashofyICS 123 S2002

Page 2: XML: It’s a Good Thing

ICS 123

2

Topic 10XML

Motivation

•“I'll never go hungry again!” –Scarlett O’Hara

•“I’ll never write a parser again!” – Anonymous XML User

•Data encoding is a perpetual problem in computer applications

•Lots of time is wasted writing parsers, lexers, marshalers, unmarshalers, data bindings, even meta-languages!

Page 3: XML: It’s a Good Thing

ICS 123

3

Topic 10XML

Existing Problems

App2 App3

App1

File Format 2File Format 3

File Format 1

Import Converter

Export Converter

3rd Party Converter

File Exchange

Page 4: XML: It’s a Good Thing

ICS 123

4

Topic 10XML

Why is this a problem?

•Everybody has a proprietary format

•Converters must be maintained by various parties

– This is an n2 problem!

•Something is usually lost in the translation

•Note: Same problems with data exchange across networked apps

Page 5: XML: It’s a Good Thing

ICS 123

5

Topic 10XML

Another Problem

Defining a File or Data Format

Parser

In-memoryRepresentation

Disk NetMeta-

Language

Helps to generate

Serializer

Data Bindings

Helps to generate

edits

Page 6: XML: It’s a Good Thing

ICS 123

6

Topic 10XML

Why is this a problem?

•Parsers, serializers, data bindings all have to be developed

•This development takes time

•Conflicting tools for assistance

•How do you evolve the file format?

Page 7: XML: It’s a Good Thing

ICS 123

7

Topic 10XML

Potential Solution

•To too many file formats:– Intermediate format

» Even better: Common format– An agreed-upon meta-language– Ability to extend language and ignore unknown constructs

•To tool-building:– Choose a suitable meta-language– Build tools surrounding that meta-language– Port those tools to different environments, but keep the APIs

semi-standard

Page 8: XML: It’s a Good Thing

ICS 123

8

Topic 10XML

What is XML

•Stolen from xml-computing.com:– eXtensible Markup Language– A way to represent structured data– a World Wide Web Consortium (W3C) standard – platform-independent – a way to create your own custom languages – license-free and well-supported – the future of computing?

•Buzzword-compliant!

Page 9: XML: It’s a Good Thing

ICS 123

9

Topic 10XML

Origins of XML

•From SGML– Standard Generalized Markup Language

•cf. HTML

•A document markup language– For annotating documents with metadata to make them

easier to interpret

Hi! My name is <NAME><FIRST>Eric</FIRST> <LAST>Dashofy</LAST></NAME>.

You can email me at <EMAIL>[email protected]</EMAIL>.

Page 10: XML: It’s a Good Thing

ICS 123

10

Topic 10XMLThe Times, They are a

Changin’

•XML is arguably more useful to simply encode data, outside the strict context of a document

<PERSON>

<NAME>

<FIRST>Eric</FIRST>

<LAST>Dashofy</LAST>

<DEPARTMENT>Information and Computer Science</DEPARTMENT>

<EMAIL>[email protected]</EMAIL>

</NAME>

</PERSON>

Page 11: XML: It’s a Good Thing

ICS 123

11

Topic 10XML

Terminology

•Tag– The markup of the document, enclosed in angle-brackets.

» <foo> is the start tag» </foo> is the end tag

– Tags may be nested, but may not cross» <A>foo<B>bar</B>baz</A> --OK!» <A>foo<B>bar</A>baz</B> --NO!

– Hierarchical data structure

Page 12: XML: It’s a Good Thing

ICS 123

12

Topic 10XML

Terminology

•Element– Stuff in between a start and end tag– Includes the tags– May contain nested elements– Ex:

» <a>foo</a>» <a>foo<b>bar</b></a>

• (nested)

Page 13: XML: It’s a Good Thing

ICS 123

13

Topic 10XML

Terminology

•Attribute– A way of annotating tags with additional info– Simple name-value pairs– Ex:

» <name lang=“English”>Henry</name>» <name lang=“Spanish”>Enrique</name>

Page 14: XML: It’s a Good Thing

ICS 123

14

Topic 10XML

Document

•A collection of elements, usually in a file

•One top-level element– Called the “root” element or “document” element– Some header stuff

<?xml version="1.0"?>

<person> <name> <first>Eric</first> <last>Dashofy</last> </name> <department>Information and Computer Science</department> <email>[email protected]</email></person>

Page 15: XML: It’s a Good Thing

ICS 123

15

Topic 10XML

Side-note:

•“If you don’t understand it, ignore it.”

Page 16: XML: It’s a Good Thing

ICS 123

16

Topic 10XML

Kinds of Documents

•“Well Formed”– Syntactically correct– All the start tags have end tags– All the start-quotes have end-quotes– etc.

•“Valid”– Well-formed, and conforms to some language specification

Page 17: XML: It’s a Good Thing

ICS 123

17

Topic 10XML

Why a meta-language?

•To define what elements, sub-elements, attributes are allowed

•And in what order

•So different organizations can agree on a real data format

– Well-formed documents don’t restrict how you encode the data, so they’re not very valuable

Page 18: XML: It’s a Good Thing

ICS 123

18

Topic 10XML

DTDs

•Document Type Definition– Part of XML 1.0– The original XML meta-language– Doesn’t look like XML– Like production rules

<!DOCTYPE FooDocument [ <!ELEMENT Foo (Bar*,Baz?,Booyah+)> <!ELEMENT Bar (#PCDATA)> <!ELEMENT Baz (#PCDATA)> <!ELEMENT Booyah (#PCDATA)> ]>

Page 19: XML: It’s a Good Thing

ICS 123

19

Topic 10XML

Namespaces

• “You keep on using that word, I do not think it means what you think it means.” –Inigo Montoya

• How can you make a document that draws elements from multiple DTDs?

<usa:address xmlns:usa=“http://www.dtds.com/usaddress.dtd”> <usa:street>1600 Pennsylvania Ave</usa:street> <usa:city>Washington</usa:city> <usa:state>DC</usa:state> <usa:zip>20509</usa:zip></usa:address>

<uk:address xmlns:uk=“http://www.dtds.com/ukaddress.dtd”> <uk:street>23B Baker Street</uk:street> <uk:city>London, England</uk:street> <uk:postcode>N22</uk:postcode></uk:address>

Page 20: XML: It’s a Good Thing

ICS 123

20

Topic 10XML

Why not DTDs?

•“Uhm, DTDs are bad, mmkay?” –Mr. Mackey– DTDs are lacking in some areas

» Don’t look like XML» Can’t specify at a level below elements

• i.e. can’t specify regular expressions on content

» Difficult to extend/add things to existing element definitions

» Difficult to implement modular languages

Page 21: XML: It’s a Good Thing

ICS 123

21

Topic 10XML

XML Schemas

•A DTD replacement from W3C– Look like XML / Easier to read– Contribute a type system to XML– Element, attribute definitions become types

» Single-inheritance model in the type system– Better namespace management

Page 22: XML: It’s a Good Thing

ICS 123

22

Topic 10XML

Example

<complexType name="Address"> <sequence> <element name="name" type="string"/> <element name="street" type="string"/> <element name="city" type="string"/> </sequence></complexType>

<complexType name="USAddress"> <complexContent> <extension base="Address"> <sequence> <element name="state" type="USState"/> <element name="zip" type="positiveInteger"/> </sequence> </extension> </complexContent> </complexType>

Page 23: XML: It’s a Good Thing

ICS 123

23

Topic 10XML

Example, cont.

<complexType name="UKAddress"> <complexContent> <extension base="Address"> <sequence> <element name="postcode" type="UKPostcode"/> </sequence> <attribute name="exportCode" type="positiveInteger" fixed="1"/> </extension> </complexContent> </complexType>

Page 24: XML: It’s a Good Thing

ICS 123

24

Topic 10XML

What do you get?

•Lots of tools for free– Parsers

» DOM and SAX– Serializers– Transformation

» XSL(T)– A meta-language (two, actually )– Data Bindings– Syntax-directed editors

Page 25: XML: It’s a Good Thing

ICS 123

25

Topic 10XML

Spotlight: DOM & SAX

•APIs for accessing XML documents– SAX: Lightweight, callback based

» “I saw an element! Ooh, I saw an attribute!”– DOM: Parses entire document into an object tree in memory

In-memoryRepresentation

XML Document

DOM Parser

Page 26: XML: It’s a Good Thing

ICS 123

26

Topic 10XML

Spotlight: Data Bindings

•DOM API is very, very generic– Example functions:

» appendChild(Element n)» setAttribute(String name, String value)

– No namespace management

•Data bindings are APIs guided by the language definition

– Example functions:» addComponent(Component c);» setIdentifier(String id);

•Data bindings can be generated automatically