What is XML? XML stands for EXtensible Markup Language XML is a markup language much like HTML ...

Preview:

Citation preview

XML

What is XML?

XML stands for EXtensible Markup Language

XML is a markup language much like HTML XML was designed to carry data, not to

display data. Bridge for data exchange on the Web XML tags are not predefined. You must

define your own tags XML is designed to be self-descriptive XML is a W3C Recommendation

What is XML? It is a protocol for containing and

managing data.– A family of technologies:

Formatting documents to filtering data– A philosophy for handling information.

SGML ( Standard Generalized Markup Language) as defined by ISO 8879.

Not well suited for serving documents over the WEB.

What is XML? XML is not a language itself. Rather,

it’s a metalanguage used for writing other languages, called XML vocabularies.

XHTML is one of those vocabularies XML is the most common tool for

data transmissions between all sorts of applications.

Understanding XML Syntax XML languages use tags to mark up text. <p>Here is an introduction to XML.</p> The above line is XHTML, but it’s also XML. XML allows you to construct your own tags, so you could

rewrite the previous markup as:<intro> Here is an introduction to XML. </intro>

In this example, the <intro> tag tells you the purpose of the text that it marks up.

One big advantage of XML is that tags can describe their content—that’s why XML languages are often called self-describing.

XML is flexible enough to allow for the creation of many different types of languages to describe data.

The only constraint on XML vocabularies is that they be well-formed.

Well-Formed Documents XML documents are well-formed if they

meet the following criteria:• The document contains one or more elements.• The document contains a single document element, which may contain other elements.• Each element closes correctly.• Elements are case-sensitive.• Attribute values are enclosed in quotation marks and cannot be empty.

XML Document:<?xml version="1.0“><!-- This XML document describes a DVD library --><library>

<DVD id="1"><title>Breakfast at Tiffany's</title><format>Movie</format><genre>Classic</genre></DVD><DVD id="2"><title>Contact</title><format>Movie</format><genre>Science fiction</genre></DVD><DVD id="3"><title>Little Britain</title><format>TV Series</format><genre>Comedy</genre></DVD>

</library>

XML Document: The document starts with an XML declaration:

<?xml version="1.0“> This declaration is optional and can contain a

number of attributes This XML document also includes a comment

describing its purpose:<!-- This XML document describes a DVD library -->

The document or root element is called <library>.

The document element contains a number of <DVD> elements, and each <DVD> element contains <title>, <format>, and <genre> elements. The <DVD> element also contains an id attribute

Encoding

Encoding is the process of converting unicode characters into their equivalent binary representation. When the XML processor reads an XML document, it encodes the document depending on the type of encoding. Hence, we need to specify the type of encoding in the XML declaration.

Encoding Types There are mainly two types of encoding: UTF-8 UTF-16 UTF stands for UCS Transformation Format, and UCS itself

means Universal Character Set. The number 8 or 16 refers to the number of bits used to represent a character. They are either 8(one byte) or 16(two bytes). For the documents without encoding information, UTF-8 is set by default.

Naming Rules in XML:

XML names cannot start with a number or punctuation.

XML names cannot include spaces. Don’t include a colon in a name. XML names are case-sensitive

Structure of an XML Document:

Each XML document is divided into two parts: Prolog:

Processing Comments DTD/Schema

Document: Elements Attribute Text CDATA Entity Comment

Valid XML Documents:

A "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD)

<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE note SYSTEM "Note.dtd"><note><to>Students</to><from>faculty</from><heading>Good Morning</heading><body>Hello Dear how are you</body></note>

Note: The DOCTYPE declaration in the example above, is a reference to an external DTD file.

Document Type Definition(DTD) The Document Type Definition(DTD)

describes a model of the structure of the content of an XML document. This model says what elements must be present, which ones are optional,what their attributes are and how they can be structured in relation to each other.

The DOCTYPE statement is the document type declaration. The DOCTYPE statement and the statements between the opening bracket and the closing bracket comprise the document type definition and define the rules of the document.

Internal DTD <?xml version=“1.0” standalone=“yes”?> <!DOCTYPE CATALOGS[ <!ELEMENT CATALOGS(book)*> <!ELEMENT

BOOK(BOOKNAME,AUTHORNAME,PUBLISHER,PAGES?,PRICE)> <!ELEMENT BOOKNAME(#PCDATA)> <!ELEMENT AUTHORNAME(#PCDATA)> <!ELEMENT PUBLISHER(#PCDATA)> <!ELEMENT PAGES(#PCDATA)> <!ELEMENT PRICE(#PCDATA)> <!ATTLIST PRICE CURRENCY CDATA #REQUIRED)> ]>

<CATALOGS> <BOOK> <BOOKNAME> WEB TECHOLOGY</BOOKNAME> ………………….

External DTD <xml version=“1.0”> <!ELEMENT CATALOGS(book)*> <!ELEMENT

BOOK(BOOKNAME,AUTHORNAME,PUBLISHER,PAGES?,PRICE)>

<!ELEMENT BOOKNAME(#PCDATA)> <!ELEMENT AUTHORNAME(#PCDATA)> <!ELEMENT PUBLISHER(#PCDATA)> <!ELEMENT PAGES(#PCDATA)> <!ELEMENT PRICE(#PCDATA)> <!ATTLIST PRICE CURRENCY CDATA #REQUIRED)>

Save the above code as catdtd.dtd.

<?xml version =“1.0”> <!DOCTYPE CATALOGS SYSTEM

“catdtd.dtd”> <CATALOGS> <BOOK> <BOOKNAME>…………. …. <BOOK> ………

XML DTD Example:

<!DOCTYPE note[<!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>]>

The DTD above is interpreted like this: !DOCTYPE note defines that the root element of this document is

note !ELEMENT note defines that the note element contains four

elements: "to,from,heading,body" !ELEMENT to defines the to element  to be of type "#PCDATA" !ELEMENT from defines the from element to be of type "#PCDATA" !ELEMENT heading defines the heading element to be of type

"#PCDATA" !ELEMENT body defines the body element to be of type "#PCDATA“

XML Parser

An XML parser (also called XML processor) determines the content and structure of an XML document by combining an XML document and its DTD(if one is present).

XML Document+ XML DTD->XML Parser->XML application

XML parser is a software library or a package that provides interface for client applications to work with XML documents. It checks for proper format of the XML document and may also validate the XML documents.

XML Parser:

All modern browsers have a built-in XML parser An XML parser converts an XML document into an XML

DOM object A DOM (Document Object Model) defines a standard

way for accessing and manipulating documents.

The XML DOM The XML DOM defines a standard way for accessing and

manipulating XML documents. The XML DOM views an XML document as a tree-

structure. All elements can be accessed through the DOM tree.

Their content (text and attributes) can be modified or deleted, and new elements can be created. The elements, their text, and their attributes are all known as nodes.

XML Namespaces XML Namespaces provide a method to

avoid element name conflicts. When using prefixes in XML, a so-

called namespace for the prefix must be defined.

The namespace is defined by the xmlns attribute in the start tag of an element.

The namespace declaration has the following syntax. xmlns:prefix="URL”.

The Namespace starts with the keyword xmlns.

The word name is the Namespace prefix. The URL is the Namespace identifier.

Namespaces can be declared in the elements where they are used or in the XML root element:

<root xmlns:h="http://www......"><h:table>  <h:tr>    <h:td>Apples</h:td>    <h:td>Bananas</h:td>  </h:tr></h:table>

……..

XML Schema XML Schema is commonly known as XML

Schema Definition (XSD). It is used to describe and validate the structure and the content of XML data.

XML schema defines the elements, attributes and data types. Schema element supports Namespaces. It is similar to a database schema that describes the data in a database.

You need to declare a schema in your XML document as follows:

<xs:schema xmlns:xs="http://www.xmllecture.com/XMLSchema">

Attributes in XSD provide extra information within an element. Attributes have name andtype property as shown below:

<xs:attribute name="x" type="y"/>

You can define XML schema elements in following ways:

Simple Type - Simple type element is used only in the context of the text. Some of predefined simple types are: xs:integer, xs:boolean, xs:string, xs:date.

<xs:element name="phone_number" type="xs:int" />

Complex Type - A complex type is a container for other element definitions. This allows you to specify which child elements an element can contain and to provide some structure within your XML documents.

<?xml version="1.0"?> <xs:schema

xmlns:xs="http://www.xmllecture.com/XMLSchema"> <xs:element name="contact">

<xs:complexType> <xs:sequence> <xs:element name="name"

type="xs:string" /> <xs:element name="company" type="xs:string" /> <xs:element name="phone" type="xs:int" /> </xs:sequence>

</xs:complexType> </xs:element> </xs:schema>

Recommended