35
XML What is XML? XML v.s. HTML XML Components Well-formed and Valid Document Type Definition (DTD) Extensible Style Language (XSL) SAX and DOM

XML What is XML? XML v.s. HTML XML Components Well-formed and Valid Document Type Definition (DTD) Extensible Style Language (XSL) SAX and DOM

Embed Size (px)

Citation preview

XML

What is XML?

XML v.s. HTML

XML Components

Well-formed and Valid

Document Type Definition (DTD)

Extensible Style Language (XSL)

SAX and DOM

What is XML ?

Extensible Markup Language(XML) is a meta-language that describes the content of the document(self-describing data) Derives from SGML. Interoperable with both HTML and SGML.

XML v.s. HTML

Markup languages generally combine two distinct

functions of representing text (document) –the ‘look’ and the ‘structure’. HTML and XML have different sets of goals. While

HTML was designed to display data and hence

focused on the ‘look’ of the data, XML was designed

to describe and carry data and hence focuses on

‘what data is’.

XML v.s. HTML

HTML is about displaying data and XML is about

describing data. HTML and XML are complementary to each other.

HTML explicitly defines a set of legal tags . <TABLE>….</TABLE>

XML allows any tags to be used ,you can create new tags.

<BOOK>….</BOOK>

XML Components

Prolog

Defines the xml version,entity definitions, and DOCTYPE

Components of the documentTags and attributesCDATA(character data)EntitiesProcessing instructionsComments

XML Prolog

XML Files always start with a prolog

<?xml version=“1.0” encoding=“ISO-8859-1” standalone=“no”?>

The version of xml is required

The encoding identified character set(default UTF-8)

The value standalone identifies if an external document is referenced for DTD of entity definition

The prolog can contain entities and DTD definitions

Prolog Example

<?xml version=“1.0” standalone=“yes”?><DOCTYPE authors[<!ELEMENT authors (name)*><!ELEMENT name (firstname, lastname)><!ELEMENT firstname (#PCDATA)><!ELEMENT lastname (#PCDATA)>]><authors>

<name><firstname>James</firstname><lastname>Gosling</lastname>

</name>…

</authors>

XML DOCTYPEDocument Type Declarations

Specifies the location of the DTD defining the syntax and structure of elements in the document

Common forms:<!DOCTYPE root [DTD]><!DOCTYPE root SYSTEM URL><!DOCTYPE root PUBLIC FPI-identifier URL>

The root identifies the starting element( root element) of the document

The DTD can be external to the XML document, referenced by a SYSTEM or PUBLIC URL

•PUBLIC URL refers to a DTD intended for public use

•SYSTEM UPL refers to a private DTD (located on the local file system or HTTP server)

DOCTYPE Examples

<!DOCTYPE book “book.dtd”>

Book must be the root element

DTD located in same directory of xml document

<!DOCTYPE book SYSTEM “http://.vishnu.cs.lamar.edu/~jingw/book.dtd

DTD located HTTP server: vishnu.cs.lamar.edu

XML DOCTYPE

Specifying a PUBLIC DTD

<!DOCTYPE root PUBLIC FPI-identifier URL>

The Formal Public Identifier(FPI) has four parts:

1. Connection of DTD to a formal standard- if defining yourself+ nonstandards body has approved the DTDISO if approved by formal standards committee

2. Group responsible for the DTD

3. Description and type of document

4. Language used in the DTD

PUBLIC DOCTYPE Example

<!DICTYPE Book

PUBLIC “-//w3c//DTD XHMTL 1.0 Transitional //EN”

“http://www.w3.org/TR?xhtml1/DTD/xhtml1-transitional.dtd”>

<!DICTYPE CWP

PUBLIC “-//Prenticd Hall//DTD Core Series 1.0 //EN”

“http://www.prenticehall.com/DTD/Core.dtd”>

XML Root ElementRequired for XML –aware applications to recognize beginning and end of document, it is the first element . All other elements must be nested within this root element.

Example:

<?xml version=”1.0” ?><book>

<title>123</tilte>…

</book>

XML Tags

Tag names:Case sensitiveStart with a letter or underscoreAfter first charcater, numbers, - and . are allowedConnot contain whitespacesAvoid use of colon expect for indicating namespaces

Tags can have attributes<message to=“[email protected]” from=“[email protected]”>

<priority/><text> what did you do ?</text>

</message>

All XML elements must have close tags.

Document CDATA

CDATA(character data) is not parsed

<?xml version=“1.0” encoding=“UTF-8”?>

<server>

<port status=“accept”>

<![CDATA[8001 <= port < 9000 ] ]>

</port>

</server>

Document Entities

Entities refer to a data item,typically textGeneral entity references start with & and end with ;The entity reference is replaced by it’s true value when parsedThe characters < > & ‘ “ require entity references to avoid conflicts with the XML application

&lt; &gt; &amp; &quot; &apos;

Entities are user definable<?xml version=“1.0” standalone=“yes” ?><!DOCTYPE book[<!ELEMENT book (title)><!ELEMENT title (#PCDATA) ><!ENTITY copyright “2001, Prentice Hall “> ]><book>

<title>web programming, &copyright; </title></book>

Processing Instructions

Application-specific instruction to the XML processor

<?processor-instruction?>

Example<?xml version=“1.0” ?><?xml-stylesheet type=“text/xml” href=“orders.xsl” ?><orders>

<order><count>37</count><price>49.99</price><book>

<isbn>0130896789</isbn><author>Marty Hall </author>

</book></order>

</orders>

XML Comments

Comments are the same as HTML comments

<!-- This is an xml and html comment -->

Well-formed versus Valid

An XML document can be well-formed if it follows basic syntax rules.

An XML document is valid if its structure matches a Document Type Definition (DTD) and it is well-formed.

Document Type Definition(DTD)

Defines Structure of the Document

• Allowable tags and their attributes

• Attribute values constraints

• Nesting of tags

• Number of occurrences for tags

• Entity definitions

DTD Example

<?xml version=“1.0” encoding=”UTF-8” ?>

<!ELEMENT TVSCHEDULE (CHANNEL+)><!ELEMENT CHANNEL (BANNER, DAY+)><!ELEMENT BANNER (#PCDATA)><!ELEMENT DAY ((DATE, HOLIDAY) | (DATE, PROGRAMSLOT+))+><!ELEMENT HOLIDAY (#PCDATA)><!ELEMENT DATE (#PCDATA)><!ELEMENT PROGRAMSLOT (TIME, TITLE, DESCRIPTION?)><!ELEMENT TIME (#PCDATA)><!ELEMENT TITLE (#PCDATA)> <!ELEMENT DESCRIPTION (#PCDATA)>

<!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED><!ATTLIST CHANNEL CHAN CDATA #REQUIRED><!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED><!ATTLIST TITLE RATING CDATA #IMPLIED><!ATTLIST TITLE LANGUAGE CDATA #IMPLIED>

Defining Elements

<!ELEMENT name definition/type>

<!ELEMENT CHANNEL (BANNER, DAY+)><!ELEMENT BANNER (#PCDATA)><!ELEMENT DAY ((DATE, HOLIDAY) | (DATE, PROGRAMSLOT+))+>

Types

ANY Any well-formed xml data

EMPTY Element cannot contain any text or child elements

PCDATA Character data only (should not contain markup)

Elements List of legal child elements (no character data)

Mixed May contain character data and/or child elements (cannot constrain order and number of child elements)

Defining Elements

Cardinality[none] Default(one and only one instance)

? 0,1

* 0,1,…, n

+ 1,2,…, n

List Operators, Sequence( in order)

<! ELEMENT book (title,price,author)>| Choice(one of several)

<! ELEMENT classroom (teacher | student)>

Defining Attribute

<!ATTLIST element attrName type modifier>

Example

<!ELEMENT Customer (#PCDATA)>

<!ATTLIST Customer id CDATA #IMPLIED>

<!ELEMENT Product (#PCDATA)><!ATTLIST Product

cost CDATA #FIXED “200” id CDATA #REQUIRED>

Attribute Type

CDATA

Essentially anything;simply unparsed data<!ATTLIST Customer id CDATA #IMPLIED>

Enumeration

Attribute(value1|value2|value3)[Modifier]

Eight other attribute typesID,IDREF,NMTOKEN,NMTOKENS,ENTIRY,ENTITIES,NOTATION

Attribute Modifiers

#IMPLIEDAttribute is not required

<!ATTLIST Customer id CDATA #IMPLIED>

#REQUIREDAttribute must be present

<!ATTLIST Customer id CDATA #REQUIRED>

#FIXED “value”Attribute is present and always has this value<!ATTLIST Product cost CDATA #FIXED “200”>

Default value (applies to enumeration)<!ATTLIST car color (red|white|blue) “white”>

Defining Entities

Specify entity reference resolution in a DTD using the ENTITY keyword.<!ENTITY name “replacement” >

<!ENTITY copyright “Copyright 2001” >

Limitations of DTDs

DTD itself is not in XML format – more work for parsers

Does not express data types (weak data typing)

No namespace support

Document can override external DTD definitions

No DOM support

XML Schema is intended to resolve these issues but … DTDs are going to be around for a while

Namespace

Namespaces identify collections of element type declarations so that they do not conflict with other element type declarations with the same name created by other programmers

Two predefined XML namespaces are xml and xsl.

You can create your own namespaces

Example:

<subject> English</subject> <subject>Thrombosis</subject>can be differentiated by using namespaces, as in <school:subject>English</school:subject> <medical:subject>Thrombosis</medical:subject>

XSL - Extensible Style Language

• Defines the layout of an xml document, an XSL style sheet provides the rules for displaying an XML document.

• XSLT is XSL transformations.• XML -> XSLT -> HTML

• In XML document include:<?xml-stylesheet type="text/xsl"

href=“myXSL.xsl"?>

XSL Example

• <?xml version="1.0" encoding="big5"?><xsl:stylesheet version="1.0" xmlns:xsl=“http://www.w3.org/TR/WD-xsl”><xsl:template match="/">........ HTML....</xsl:template> </xsl:stylesheet>

What is the SAX?

SAX is the Simple API for XML, originally a Java-only API. SAX was the first widely adopted API for XML in Java, and is a “de facto” standard.

SAX is an event-based API. The application implements handlers to deal with the different events, much like handling events in a graphical user interface.

What is the Document Object Model (DOM)?

Is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents.

Provides APIs that let you create nodes, modify them, delete and rearrange them. So it is relatively easy to create a DOM.

Maintains a recommended tree-based API for XML and HTML documents.

DOM/SAX Processing

DOM is a standard. It yields a tree in memory.

SAX yields a sequence of events corresponding to XML input.

Both generally destroy attribute ordering, insignificant white space, insignificant namespace aspects, …

Verification of a signature based on DOM/SAX requires serialization to a byte stream of the DOM tree or the SAX event stream.

Summary

XML is a self-describing meta data

DOCTYPE defines the root element and location of DTD

Document Type Definition(DTD) defines the grammar of the document

Required to validate the document

Constrains grouping and cardinality of elements

XSL is defined as a language for expressing stylesheetsIs a language for transforming XML documents

Is an XML vocabulary for specifying the formatting of XML documents

DOM and SAX are two most common low-level APIs, they are all in some form of standardization (SAX as a de facto, DOM by the W3C )

XML Resources•XML 1.0 Specification http://www.w3.org/TR/REC-xml

•WWW consortium’s Home Page on XML http://www.w3.org/XML/

•Sun Page on XML and Java http://java.sun.com/xml/

•Apache XML Project http://xml.coverpages.org/

•XML Resource Collection http://xml.coverpages.org/

•O’Reilly XML Resource Center http://www.xml.com/