37
CP3024 Lecture 9 XML: Extensible Markup Language

CP3024 Lecture 9 XML: Extensible Markup Language

Embed Size (px)

Citation preview

Page 1: CP3024 Lecture 9 XML: Extensible Markup Language

CP3024 Lecture 9

XML: Extensible Markup Language

Page 2: CP3024 Lecture 9 XML: Extensible Markup Language

What is a markup language?

Textual (i.e. person readable) language where significant elements are indicated by markers– <TITLE>XML</TITLE>

Examples are RTF, HTML, VRML, TEX etc.

Easy to process and can be manipulated by a variety of application programs

Page 3: CP3024 Lecture 9 XML: Extensible Markup Language

What does the Web use?

HTML– Hypertext Markup Language

Defined as the original Web languageBased on SGML (see later)Suited for hypertext, multimedia, small

simple documentsCurrently at version 4.01 (the last?)

Page 4: CP3024 Lecture 9 XML: Extensible Markup Language

Why change? - 1

Change in Web usage– no longer a mechanism for exchanging

scientific papers– presentational aspects are now seen as of

greater importance– extracting the meaning of a document using a

program will be a new growth area

HTML can't grow much more!

Page 5: CP3024 Lecture 9 XML: Extensible Markup Language

Why change? - 2

Extensibility– HTML does not allow users to specify their own tags

Structure– HTML cannot represent database schemas or object-

oriented hierarchies

Validation– HTML does not allow applications to check that the

structure of data is valid

Page 6: CP3024 Lecture 9 XML: Extensible Markup Language

What is SGML?

Standard Generalised Markup LanguageISO 8879Can define any document format of any

complexityEnables, extensibility, structure and

validationToo many optional features for the Web

Page 7: CP3024 Lecture 9 XML: Extensible Markup Language

What is XML?

Simplified subset of SGML designed for Web applications

Differs from HTML– Can define new tags– Structures may be nested to any level of

complexity– XML documents may define a grammar which

enables structural validation of that document

Page 8: CP3024 Lecture 9 XML: Extensible Markup Language

Where has XML come from?

Emanates from the Word Wide Web consortium (W3C)

Developed by XML working group chaired by Jon Bosak (Sun Microsystems)

Group includes representatives from Microsoft, Netscape, HP, Adobe, etc.

Last bastion against proprietary markup and Web fragmentation

Page 9: CP3024 Lecture 9 XML: Extensible Markup Language

Design Goals for XML - 1

XML shall be straightforwardly usable over the Internet

XML shall support a wide variety of applications

XML shall be compatible with SGML It shall be easy to write programs which

process XML documentsThe number of optional features is to be kept

to the absolute minimum

Page 10: CP3024 Lecture 9 XML: Extensible Markup Language

Design Goals for XML - 2

XML documents should be human-legibleThe XML design should be prepared

quicklyThe design of XML shall be formal and

conciseXML documents shall be easy to createTerseness in XML markup is of minimum

importance

Page 11: CP3024 Lecture 9 XML: Extensible Markup Language

The XML View of a Document

Taken from an example given by Jon Bosak

Page 12: CP3024 Lecture 9 XML: Extensible Markup Language

Structured Publishing

Taken from an example given by Jon Bosak

Page 13: CP3024 Lecture 9 XML: Extensible Markup Language

XML Example

<?xml version="1.0"?><sweepjoke><harry>Say <quote>Bye Bye </quote>, Sweep </harry><sweep> <quote>Bye Bye, Sweep</quote></sweep><laughter/></sweepjoke>

Page 14: CP3024 Lecture 9 XML: Extensible Markup Language

XML Markup

ElementsEntity referencesCommentsProcessing InstructionsMarked sectionsDocument type declarations (DTD)

Page 15: CP3024 Lecture 9 XML: Extensible Markup Language

Elements

Commonest form of markupDelimited by angle brackets (<, >)May be empty but normally consist of start

tag and end tagStart tag may contain attributes

– <a href="www.scit.wlv.ac.uk">

Page 16: CP3024 Lecture 9 XML: Extensible Markup Language

Entity References

In XML (and HTML) certain characters are reserved e.g. <

Entity references are used to insert these into documents

Entity references begin with an ampersand (&) and end with a semicolon (;)

You can define your own entitiesCan be used to insert Unicode characters

Page 17: CP3024 Lecture 9 XML: Extensible Markup Language

Comments

Begin with <!--End with -->Can contain any data except --XML processors are not required to pass

comments to an application

Page 18: CP3024 Lecture 9 XML: Extensible Markup Language

Processing Instructions (PIs)

Provide information to an applicationXML processors required to pass them onHave the form <?name pidata?>The name (PI target) identifies the PIData is optional and meaningful to an

application that recognises the target

Page 19: CP3024 Lecture 9 XML: Extensible Markup Language

Marked Sections

Parsers ignore everything in CDATA sections<![CDATA[

<head>if p < &lt;</head>

]]>

Only character string not allowed is ]]>Data is passed on to the application

Page 20: CP3024 Lecture 9 XML: Extensible Markup Language

Document Type Declarations

Optional in XML (not in SGML)Specify constraints on the sequence and

nesting of tagsCommunicates meta-information to the

parser about contentSequence and nesting of tags, attribute

values, external files, entities

Page 21: CP3024 Lecture 9 XML: Extensible Markup Language

Kinds of Declaration

Element type declarationsAttribute list declarationsEntity declarationsNotation declarations

Page 22: CP3024 Lecture 9 XML: Extensible Markup Language

Element Type Declaration

<!ELEMENT sweepjoke (harry+, sweep, laughter?)>

A sweepjoke consists of a harry element followed by a sweep element and a laughter element

The harry element may be repeated (+)– + indicates one or more

The laughter element is optional (?)

Page 23: CP3024 Lecture 9 XML: Extensible Markup Language

Sweepjoke Declaration

<!ELEMENT sweepjoke (harry+, sweep, laughter?)>

<!ELEMENT harry (#PCDATA | quote)*>

<!ELEMENT sweep (#PCDATA | quote)*>

<!ELEMENT quote (#PCDATA)*>

<!ELEMENT laughter EMPTY>

PCDATA indicates parseable character data

| indicates 'or'* indicates 'zero or more'

Page 24: CP3024 Lecture 9 XML: Extensible Markup Language

Attribute List Declaration

Identifies– which elements may have attributes– what attributes they may have– what values are permitted for an attribute– what value is the default

<!ATTLIST sweepjoke

name ID #REQUIRED

label CDATA #IMPLIED

status ( funny | notfunny ) 'funny'>

Page 25: CP3024 Lecture 9 XML: Extensible Markup Language

Entity Declarations

Allow a name to be associated with some other content

Internal entities associate a name with a string of literal text (e.g. &lt;)

External entities associate a name with the content of another file

Parameter entities enable text replacement within the DTD

Page 26: CP3024 Lecture 9 XML: Extensible Markup Language

Adding a DTD to an XML File

InlineExternal

– <?xml version="1.0"?>– <!DOCTYPE sweepjoke SYSTEM “sweep.dtd">

Page 27: CP3024 Lecture 9 XML: Extensible Markup Language

Links in XML

HTML anchors are a very limited form of hypertext

XML introduces– XPointers– XLinks

These standards are outside the scope of the XML standard

Page 28: CP3024 Lecture 9 XML: Extensible Markup Language

Presentation Issues

Use of a stylesheet is implicitPossible standards:

– DSSSL Document Style and Semantics Specification Language (ISO 10179)

– CSS Cascading Stylesheet Specification– XSL Extensible Style Language (uses XML

syntax)

Page 29: CP3024 Lecture 9 XML: Extensible Markup Language

XSL

XSL is an XML sylesheet language– XSLT is a language for transforming XML

documents– XSL formatting objects specify formatting

semantics

A set of rules to transform a documentXML can be transformed into HTML

Page 30: CP3024 Lecture 9 XML: Extensible Markup Language

XML Application Areas

Mediation between heterogeneous databases on the Web

Client centric web applicationsApplications requiring different views of

the same dataInformation discovery tailored to the needs

of differing individuals

Page 31: CP3024 Lecture 9 XML: Extensible Markup Language

Languages based on XML

MathMLSMILRDFXHTMLCML

Page 32: CP3024 Lecture 9 XML: Extensible Markup Language

RDF

Resource Description FrameworkIntegrates a variety of web-based metadata

activitiesProvides interoperability between

applications that exchange metadataAllows machine readable description of

Web resources

Page 33: CP3024 Lecture 9 XML: Extensible Markup Language

RDF Example

<?xml version="1.0"?> <?xml:namespace

ns = "http://www.w3.org/RDF/RDF/" prefix ="RDF" ?>

<?xml:namespace ns = "http://purl.oclc.org/DC/" prefix = "DC" ?>

<RDF:RDF> <RDF:Description RDF:

HREF = "http://uri-of-Document-1"> <DC:Creator>John Smith</DC:Creator>

</RDF:Description> </RDF:RDF>

Page 34: CP3024 Lecture 9 XML: Extensible Markup Language

XHTML

New Web languages are defined using XML

HTML 4.0 cannot be defined using XMLXHTML is XML compliant HTML

Page 35: CP3024 Lecture 9 XML: Extensible Markup Language

Major Changes

Documents must be well-formedElements and attributes must have lower

case namesEnd tags required in non-empty elementsAttribute values must be in quotesEmpty tags must be terminatedScripts will be processed by XHTML

Page 36: CP3024 Lecture 9 XML: Extensible Markup Language

XHTML Compatibility

Current browsers unlikely to understand all XHTML

E.g. <br/> may cause an errorCompatibility guidelines defined in

XHTML standardSee http://www.w3.org/TR/xhtml1/

Appendix C

Page 37: CP3024 Lecture 9 XML: Extensible Markup Language

Summary

XML significantly expands what is possible on the Web

XML preserves the basic Web ideasUsing XML is an order of magnitude more

difficult than writing HTMLSoftware is out there and more will soon

followThe opportunities are endless!