Upload
marlene-gilmore
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
CP3024 Lecture 9
XML: Extensible Markup Language
What is a markup language?
Textual (i.e. person readable) language where significant elements are indicated by markers– <TITLE>XML</TITLE>
Examples are RTF, HTML, VRML, TEX etc.
Easy to process and can be manipulated by a variety of application programs
What does the Web use?
HTML– Hypertext Markup Language
Defined as the original Web languageBased on SGML (see later)Suited for hypertext, multimedia, small
simple documentsCurrently at version 4.01 (the last?)
Why change? - 1
Change in Web usage– no longer a mechanism for exchanging
scientific papers– presentational aspects are now seen as of
greater importance– extracting the meaning of a document using a
program will be a new growth area
HTML can't grow much more!
Why change? - 2
Extensibility– HTML does not allow users to specify their own tags
Structure– HTML cannot represent database schemas or object-
oriented hierarchies
Validation– HTML does not allow applications to check that the
structure of data is valid
What is SGML?
Standard Generalised Markup LanguageISO 8879Can define any document format of any
complexityEnables, extensibility, structure and
validationToo many optional features for the Web
What is XML?
Simplified subset of SGML designed for Web applications
Differs from HTML– Can define new tags– Structures may be nested to any level of
complexity– XML documents may define a grammar which
enables structural validation of that document
Where has XML come from?
Emanates from the Word Wide Web consortium (W3C)
Developed by XML working group chaired by Jon Bosak (Sun Microsystems)
Group includes representatives from Microsoft, Netscape, HP, Adobe, etc.
Last bastion against proprietary markup and Web fragmentation
Design Goals for XML - 1
XML shall be straightforwardly usable over the Internet
XML shall support a wide variety of applications
XML shall be compatible with SGML It shall be easy to write programs which
process XML documentsThe number of optional features is to be kept
to the absolute minimum
Design Goals for XML - 2
XML documents should be human-legibleThe XML design should be prepared
quicklyThe design of XML shall be formal and
conciseXML documents shall be easy to createTerseness in XML markup is of minimum
importance
The XML View of a Document
Taken from an example given by Jon Bosak
Structured Publishing
Taken from an example given by Jon Bosak
XML Example
<?xml version="1.0"?><sweepjoke><harry>Say <quote>Bye Bye </quote>, Sweep </harry><sweep> <quote>Bye Bye, Sweep</quote></sweep><laughter/></sweepjoke>
XML Markup
ElementsEntity referencesCommentsProcessing InstructionsMarked sectionsDocument type declarations (DTD)
Elements
Commonest form of markupDelimited by angle brackets (<, >)May be empty but normally consist of start
tag and end tagStart tag may contain attributes
– <a href="www.scit.wlv.ac.uk">
Entity References
In XML (and HTML) certain characters are reserved e.g. <
Entity references are used to insert these into documents
Entity references begin with an ampersand (&) and end with a semicolon (;)
You can define your own entitiesCan be used to insert Unicode characters
Comments
Begin with <!--End with -->Can contain any data except --XML processors are not required to pass
comments to an application
Processing Instructions (PIs)
Provide information to an applicationXML processors required to pass them onHave the form <?name pidata?>The name (PI target) identifies the PIData is optional and meaningful to an
application that recognises the target
Marked Sections
Parsers ignore everything in CDATA sections<![CDATA[
<head>if p < <</head>
]]>
Only character string not allowed is ]]>Data is passed on to the application
Document Type Declarations
Optional in XML (not in SGML)Specify constraints on the sequence and
nesting of tagsCommunicates meta-information to the
parser about contentSequence and nesting of tags, attribute
values, external files, entities
Kinds of Declaration
Element type declarationsAttribute list declarationsEntity declarationsNotation declarations
Element Type Declaration
<!ELEMENT sweepjoke (harry+, sweep, laughter?)>
A sweepjoke consists of a harry element followed by a sweep element and a laughter element
The harry element may be repeated (+)– + indicates one or more
The laughter element is optional (?)
Sweepjoke Declaration
<!ELEMENT sweepjoke (harry+, sweep, laughter?)>
<!ELEMENT harry (#PCDATA | quote)*>
<!ELEMENT sweep (#PCDATA | quote)*>
<!ELEMENT quote (#PCDATA)*>
<!ELEMENT laughter EMPTY>
PCDATA indicates parseable character data
| indicates 'or'* indicates 'zero or more'
Attribute List Declaration
Identifies– which elements may have attributes– what attributes they may have– what values are permitted for an attribute– what value is the default
<!ATTLIST sweepjoke
name ID #REQUIRED
label CDATA #IMPLIED
status ( funny | notfunny ) 'funny'>
Entity Declarations
Allow a name to be associated with some other content
Internal entities associate a name with a string of literal text (e.g. <)
External entities associate a name with the content of another file
Parameter entities enable text replacement within the DTD
Adding a DTD to an XML File
InlineExternal
– <?xml version="1.0"?>– <!DOCTYPE sweepjoke SYSTEM “sweep.dtd">
Links in XML
HTML anchors are a very limited form of hypertext
XML introduces– XPointers– XLinks
These standards are outside the scope of the XML standard
Presentation Issues
Use of a stylesheet is implicitPossible standards:
– DSSSL Document Style and Semantics Specification Language (ISO 10179)
– CSS Cascading Stylesheet Specification– XSL Extensible Style Language (uses XML
syntax)
XSL
XSL is an XML sylesheet language– XSLT is a language for transforming XML
documents– XSL formatting objects specify formatting
semantics
A set of rules to transform a documentXML can be transformed into HTML
XML Application Areas
Mediation between heterogeneous databases on the Web
Client centric web applicationsApplications requiring different views of
the same dataInformation discovery tailored to the needs
of differing individuals
Languages based on XML
MathMLSMILRDFXHTMLCML
RDF
Resource Description FrameworkIntegrates a variety of web-based metadata
activitiesProvides interoperability between
applications that exchange metadataAllows machine readable description of
Web resources
RDF Example
<?xml version="1.0"?> <?xml:namespace
ns = "http://www.w3.org/RDF/RDF/" prefix ="RDF" ?>
<?xml:namespace ns = "http://purl.oclc.org/DC/" prefix = "DC" ?>
<RDF:RDF> <RDF:Description RDF:
HREF = "http://uri-of-Document-1"> <DC:Creator>John Smith</DC:Creator>
</RDF:Description> </RDF:RDF>
XHTML
New Web languages are defined using XML
HTML 4.0 cannot be defined using XMLXHTML is XML compliant HTML
Major Changes
Documents must be well-formedElements and attributes must have lower
case namesEnd tags required in non-empty elementsAttribute values must be in quotesEmpty tags must be terminatedScripts will be processed by XHTML
XHTML Compatibility
Current browsers unlikely to understand all XHTML
E.g. <br/> may cause an errorCompatibility guidelines defined in
XHTML standardSee http://www.w3.org/TR/xhtml1/
Appendix C
Summary
XML significantly expands what is possible on the Web
XML preserves the basic Web ideasUsing XML is an order of magnitude more
difficult than writing HTMLSoftware is out there and more will soon
followThe opportunities are endless!