38
Tutorial 1: XML Creating an XML Document

Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content

Embed Size (px)

Citation preview

Tutorial 1: XML

Creating an XML Document

2

Introducing XML

XML stands for Extensible Markup Language. A markup language specifies the structure and content of a document.

XML is used to create a wide variety of document types.

XML is a subset of the SGML.

XML is sitting between SGML and HTML.

3

The Limits of HTML

HTML was designed for formatting text on a Web page. It was not designed for dealing with the content of a Web page.

Because HTML is not extensible, it cannot be modified to meet specific needs.

Different browsers require different standards

4

The XML Design Goals

XML supports HTTP and MIME

XML must support a wide variety of applications

– XML can be used for other applications such as databases, financial transactions, and voice mail

XML must be compatible with SGML

5

The XML Design Goals

It must be easy to write programs that process XML documents

The number of optional features in XML must be kept small

XML documents should be clear and easily understood– XML documents are text files– The contents follow a tree-like structure

6

The XML Design Goals

The XML design should be prepared quickly

The design of XML must be exact and concise

XML documents must be easy to create

Terseness in XML markup is of minimal importance

7

DTDs and Schemas

Document type definitions (DTDs) or schemas contain rules for how the XML document containing the data.

A well-formed document contains no syntax errors.

An XML document that satisfies the rules of a DTD or schema (in addition to being well-formed) is said to be a valid document.

8

Working with XML Applications

XML has the ability to create XML applications. Many have been developed to work with specific types of documents.

Each application uses a defined set of tag names called a vocabulary. This makes it easier to exchange information between different organizations and computer applications.

10

The Structure of anXML Document

XML documents consist of three parts

– The prolog is optional and provides information about the document itself

– The document body contains the document’s content in a hierarchical tree structure

– The epilog is optional and contains any final comments or processing instructions

11

Creating the Prolog

The prolog consists of four parts: – XML declaration– Miscellaneous statements or comments– Processing instructions– Document type declaration

This order has to be followed or the parser will generate an error message.

None of these four parts is required.

12

The XML Declaration

The XML declaration is the first line of code. It tells the processor what follows is written using XML. It can also provide any information about how the parser should interpret the code. The syntax is:

<?xml version=“version number” encoding=“encoding type” standalone=“yes|no” ?>

An example:<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>

13

Inserting Comments

Comments go after the declaration. Comments may appear anywhere after the declaration. The syntax is:

<!- - comment text - ->

14

Working with Elements

The document body consists of the elements that are the basic building blocks of XML files.

XML supports two types of elements:– Closed elements, and

– empty elements

15

Empty Elements

An empty element or open element is an element that contains no content. The syntax is:<element />

Empty elements can be used to mark sections of the document for the XML parser. They can also contain attributes that can be used to store information.

16

Working with Elements

A closed element, has the following syntax:

<element> Content</element>

An example:

<artist>Miles Davis</artist> Element names are case sensitive

17

Nesting Elements

Elements can contain other elements:<tracks>

<track>So What (9:22)</track><track>Blue in Green (5:37)</track>

</tracks> Nested elements are called child elements. Elements must be nested correctly. Child elements

must be enclosed within their parent elements. All of elements in the body are children of a single

element called the document or root element

<items> <item>

<title>Kind of Blue</title><artist>Miles Davis</artist><tracks>

<track>So What</track><track>Freddie Freeloader</track><track>Blue in Green</track><track>All Blues</track><track>Flamenco Sketches</track>

</tracks> </item> <item>

.

. </item></items>

Music titles for the Jazz Warehouse monthly specials:Using items as the root element of the document

19

Working with Attributes

An attribute is a feature or characteristic of an element.

Attributes are text strings and must be placed in single or double quotes.

The syntax is:<element attribute=“value”> … </element>

Include the length of each music track as an attribute of the track:<track length=“9:22”>So What</track>

20

Working with Attributes

You may place the length information as a child element of the track element:<track>

<title>So What</title><length>9:22</length>

</track> If an attribute value is something you would want

displayed, it should be placed in an element. If the attribute is not necessary to understanding the document content, you can keep it as an attribute.

21

Character and Entity References

Special characters can be inserted into your XML document by using a character reference using:&#nnn;nnn is a number from the ISO/IEC character set

Some symbols can also be identified using an entity reference using the syntax: &#entity;entity is the name assigned to the symbol

Character and entity references in XML are the same as in HTML

23

Parsed Character Data (pcdata)

pcdata consists of all those characters that XML treats as parts of the code of XML document.– The XML declaration– The opening and closing tags of an element– Empty element tags– Character or entity references– Comments

The presence of pcdata can cause unexpected errors. To avoid to this problem, replace them with character or entity references.

24

Character Data

Once you remove parsed character data, the symbols remaining constitute a document’s actual content, known as character data.

Character data is not processed, but is treated as pure data content.

25

White Space

White space refers to any space (from pressing the spacebar), new line character (from pressing the Enter key), or tab character in a document.

White space appearing between element tags is treated as part of XML content.

26

White Space

White space is ignored unless it is part of the document’s data– White space is ignored when it is the only

character data between element tags– White space is ignored within a document’s prolog

and epilog and within any element tags– White space within an attribute value is not ignored

and is treated a part of the attribute value.

27

Creating a CDATA Section

A CDATA section is a large block of text the XML processor will interpret only as text. The syntax is:

<! [CDATA [

Text Block

] ]>

28

Creating a CDATA Section

In this example, a CDATA section stores several HTML tags within an element named htmlcode:

<htmlcode>

<![CDATA[

<h1>The Jazz Warehouse</h1>

<h2>Your Online Store for Jazz Music</h2>

] ]>

</htmlcode>

Parsing an XML document

30

XML Parsers

An XML processor (also called XML parser) evaluates the document to make sure it conforms to all XML specifications for structure and syntax.

Microsoft’s parser is called MSXML and is built directly in IE versions 5.0 and above.

Netscape includes a built-in XML parser, as do Firefox.

31

Displaying an XML Document in a Web Browser

XML documents can be opened in Internet Explorer or in Netscape Navigator.

If there are no syntax errors. IE will display the document’s contents in an expandable/collapsible outline format including all markup tags.

Netscape will display the contents but neither the tags nor the nested elements.

32

Displaying an XML Document in a Web Browser

To display the Jazz.xml file in a Web browser:

– Start the browser and open the Jazz.xml file located in your Data Disk.

– Click the minus (-) symbols.

– Click the resulting plus (+) symbols.

33

Linking to a Style Sheet

The easiest way to turn an XML document into a formatted document is to link the document to a style sheet.

The XML document and the style sheet are combined by the XML processor to display a single formatted document.

34

Linking to a Style Sheet

Two main style sheet languages used with XML:

– Cascading Style Sheets (CSS) is supported by most browsers and is relatively easy to learn and use

– Extensible Style Sheets (XSL) is more powerful, but not as easy to use as CSS

35

Linking to a Style Sheet

Some benefits to using style sheets:– By separating content from format, you can

concentrate on the appearance of the document– Different style sheets can be applied to the

same XML document– Any style sheet changes will be automatically

reflected in any Web page based upon the style sheet

36

Applying a Style to an Element

To apply a style sheet to a document, use:

selector {attribute1:value1; attribute2:value2; …}– selector is an element (or set of elements) from

the XML document.– attribute and value are the style attributes and

attribute values to be applied to the document.

artist {color:red; font-weight: bold}

37

Inserting Processing Instructions

The link from the XML document to a style sheet is created using a processing statement.

A processing instruction is a command that gives instructions to the XML parser.

<?xml-stylesheet type=“text/css” href=“url” ?>

rare_items

title

subtitle

item

message

artist

record

year

label

condition

priceus

priceuk