View
236
Download
2
Category
Tags:
Preview:
Citation preview
2
Introducing XML
XML stands for Extensible Markup Language. A markup language specifies the structure and content of a document.
XML is used to create a wide variety of document types.
XML is a subset of the SGML.
XML is sitting between SGML and HTML.
3
The Limits of HTML
HTML was designed for formatting text on a Web page. It was not designed for dealing with the content of a Web page.
Because HTML is not extensible, it cannot be modified to meet specific needs.
Different browsers require different standards
4
The XML Design Goals
XML supports HTTP and MIME
XML must support a wide variety of applications
– XML can be used for other applications such as databases, financial transactions, and voice mail
XML must be compatible with SGML
5
The XML Design Goals
It must be easy to write programs that process XML documents
The number of optional features in XML must be kept small
XML documents should be clear and easily understood– XML documents are text files– The contents follow a tree-like structure
6
The XML Design Goals
The XML design should be prepared quickly
The design of XML must be exact and concise
XML documents must be easy to create
Terseness in XML markup is of minimal importance
7
DTDs and Schemas
Document type definitions (DTDs) or schemas contain rules for how the XML document containing the data.
A well-formed document contains no syntax errors.
An XML document that satisfies the rules of a DTD or schema (in addition to being well-formed) is said to be a valid document.
8
Working with XML Applications
XML has the ability to create XML applications. Many have been developed to work with specific types of documents.
Each application uses a defined set of tag names called a vocabulary. This makes it easier to exchange information between different organizations and computer applications.
10
The Structure of anXML Document
XML documents consist of three parts
– The prolog is optional and provides information about the document itself
– The document body contains the document’s content in a hierarchical tree structure
– The epilog is optional and contains any final comments or processing instructions
11
Creating the Prolog
The prolog consists of four parts: – XML declaration– Miscellaneous statements or comments– Processing instructions– Document type declaration
This order has to be followed or the parser will generate an error message.
None of these four parts is required.
12
The XML Declaration
The XML declaration is the first line of code. It tells the processor what follows is written using XML. It can also provide any information about how the parser should interpret the code. The syntax is:
<?xml version=“version number” encoding=“encoding type” standalone=“yes|no” ?>
An example:<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
13
Inserting Comments
Comments go after the declaration. Comments may appear anywhere after the declaration. The syntax is:
<!- - comment text - ->
14
Working with Elements
The document body consists of the elements that are the basic building blocks of XML files.
XML supports two types of elements:– Closed elements, and
– empty elements
15
Empty Elements
An empty element or open element is an element that contains no content. The syntax is:<element />
Empty elements can be used to mark sections of the document for the XML parser. They can also contain attributes that can be used to store information.
16
Working with Elements
A closed element, has the following syntax:
<element> Content</element>
An example:
<artist>Miles Davis</artist> Element names are case sensitive
17
Nesting Elements
Elements can contain other elements:<tracks>
<track>So What (9:22)</track><track>Blue in Green (5:37)</track>
</tracks> Nested elements are called child elements. Elements must be nested correctly. Child elements
must be enclosed within their parent elements. All of elements in the body are children of a single
element called the document or root element
<items> <item>
<title>Kind of Blue</title><artist>Miles Davis</artist><tracks>
<track>So What</track><track>Freddie Freeloader</track><track>Blue in Green</track><track>All Blues</track><track>Flamenco Sketches</track>
</tracks> </item> <item>
.
. </item></items>
Music titles for the Jazz Warehouse monthly specials:Using items as the root element of the document
19
Working with Attributes
An attribute is a feature or characteristic of an element.
Attributes are text strings and must be placed in single or double quotes.
The syntax is:<element attribute=“value”> … </element>
Include the length of each music track as an attribute of the track:<track length=“9:22”>So What</track>
20
Working with Attributes
You may place the length information as a child element of the track element:<track>
<title>So What</title><length>9:22</length>
</track> If an attribute value is something you would want
displayed, it should be placed in an element. If the attribute is not necessary to understanding the document content, you can keep it as an attribute.
21
Character and Entity References
Special characters can be inserted into your XML document by using a character reference using:&#nnn;nnn is a number from the ISO/IEC character set
Some symbols can also be identified using an entity reference using the syntax: &#entity;entity is the name assigned to the symbol
Character and entity references in XML are the same as in HTML
23
Parsed Character Data (pcdata)
pcdata consists of all those characters that XML treats as parts of the code of XML document.– The XML declaration– The opening and closing tags of an element– Empty element tags– Character or entity references– Comments
The presence of pcdata can cause unexpected errors. To avoid to this problem, replace them with character or entity references.
24
Character Data
Once you remove parsed character data, the symbols remaining constitute a document’s actual content, known as character data.
Character data is not processed, but is treated as pure data content.
25
White Space
White space refers to any space (from pressing the spacebar), new line character (from pressing the Enter key), or tab character in a document.
White space appearing between element tags is treated as part of XML content.
26
White Space
White space is ignored unless it is part of the document’s data– White space is ignored when it is the only
character data between element tags– White space is ignored within a document’s prolog
and epilog and within any element tags– White space within an attribute value is not ignored
and is treated a part of the attribute value.
27
Creating a CDATA Section
A CDATA section is a large block of text the XML processor will interpret only as text. The syntax is:
<! [CDATA [
Text Block
] ]>
28
Creating a CDATA Section
In this example, a CDATA section stores several HTML tags within an element named htmlcode:
<htmlcode>
<![CDATA[
<h1>The Jazz Warehouse</h1>
<h2>Your Online Store for Jazz Music</h2>
] ]>
</htmlcode>
30
XML Parsers
An XML processor (also called XML parser) evaluates the document to make sure it conforms to all XML specifications for structure and syntax.
Microsoft’s parser is called MSXML and is built directly in IE versions 5.0 and above.
Netscape includes a built-in XML parser, as do Firefox.
31
Displaying an XML Document in a Web Browser
XML documents can be opened in Internet Explorer or in Netscape Navigator.
If there are no syntax errors. IE will display the document’s contents in an expandable/collapsible outline format including all markup tags.
Netscape will display the contents but neither the tags nor the nested elements.
32
Displaying an XML Document in a Web Browser
To display the Jazz.xml file in a Web browser:
– Start the browser and open the Jazz.xml file located in your Data Disk.
– Click the minus (-) symbols.
– Click the resulting plus (+) symbols.
33
Linking to a Style Sheet
The easiest way to turn an XML document into a formatted document is to link the document to a style sheet.
The XML document and the style sheet are combined by the XML processor to display a single formatted document.
34
Linking to a Style Sheet
Two main style sheet languages used with XML:
– Cascading Style Sheets (CSS) is supported by most browsers and is relatively easy to learn and use
– Extensible Style Sheets (XSL) is more powerful, but not as easy to use as CSS
35
Linking to a Style Sheet
Some benefits to using style sheets:– By separating content from format, you can
concentrate on the appearance of the document– Different style sheets can be applied to the
same XML document– Any style sheet changes will be automatically
reflected in any Web page based upon the style sheet
36
Applying a Style to an Element
To apply a style sheet to a document, use:
selector {attribute1:value1; attribute2:value2; …}– selector is an element (or set of elements) from
the XML document.– attribute and value are the style attributes and
attribute values to be applied to the document.
artist {color:red; font-weight: bold}
37
Inserting Processing Instructions
The link from the XML document to a style sheet is created using a processing statement.
A processing instruction is a command that gives instructions to the XML parser.
<?xml-stylesheet type=“text/css” href=“url” ?>
Recommended