14
White Paper on XML Basics XML Basics Document Control Change Record Date Author Version Change Reference 04-Apr-09 Anoosha Burlakanti, Preethi Phani Mummaleti 1.0.0 Initial Document Page 1 of 14

XML Basic

Embed Size (px)

DESCRIPTION

XML Basic

Citation preview

XML Basics

White Paper on XML Basics

XML BasicsDocument Control

Change RecordDateAuthorVersionChange Reference

04-Apr-09Anoosha Burlakanti,Preethi Phani Mummaleti1.0.0Initial Document

Reviewers

NamePosition

Krishna Mohan Adavi

Distribution

Copy No.NameLocation

1 Library MasterProject Library

2 Project Manager

3

4

Note: The copy numbers referenced above should be written into the Copy Number space on the cover of each distributed copy. If the document is not controlled, you can delete this table, the Note To Holders, and the Copy Number label from the cover page.

Note To Holders:

If you receive an electronic copy of this document and print it out, please write your name on the equivalent of the cover page, for document control purposes.

If you receive a hard copy of this document, please write your name on the front cover, for document control purposes.

Table of Contents:TopicPage No

1. Introduction To XML4

2. What is XML4

3. Why XML?4

4. Difference Between XML and HTML5

5. Characteristics of XML5

6. XML Tree6

7. XML Syntax Rules8

8. XML Elements10

9. XML Attributes10

10. Well-formed XML versus Valid XML11

11. XML Schema13

12. References13

Extensible Markup Language1. Introduction to XML: Markup languages evolved from early, private company and government forms into Standard Generalized Markup Language (SGML), Hypertext Markup Language (HTML), and eventually into XML. SGML can seem complex, and HTML (which was really just an element set) was just not powerful enough to identify information. XML is designed as an easy-to-use and easy-to-extend markup language. The Extensible Markup Language (XML) came into existence as a result of an attempt to facilitate the sharing of information (data) across different information systems working on different technology platforms, via the internet. The XML is a simplified subset of Standard Generalized Markup Language (SGML). XML stands for Extensible Markup Language .XML is a markup language much like HTML .It was designed to carry data, not to display data .XML tags are not predefined. You must define your own tags. It is designed to be self-descriptive. XML uses a DTD (Document Type Definition) to formally describe the data.

XML is a complement to HTML. It is important to understand that XML is not a replacement for HTML. In the future development of the Web, it is most likely that XML will be used to structure and describe the Web data, while HTML will be used to format and display the same data.

XML was designed to transport and store data.HTML was designed to display data.2. What is XML?

XML is a software- and hardware-independent tool for carrying information. XML stands for Extensible Markup Language

XML is a markup language much like HTML

XML was designed to carry data, not to display data

XML tags are not predefined. You must define your own tags

XML is designed to be self-descriptive

XML is a W3C Recommendation

XML (Extensible Markup Language) is a general-purpose specification for creating custom markup languages. It is classified as an extensible language, because it allows the user to define the mark-up elements. You can create content and mark it up with delimiting tags, making each word, phrase, or chunk into identifiable, sortable information.

XML is recommended by the World Wide Web Consortium (W3C). It is a fee-free open standard. The recommendation specifies lexical grammar and parsing requirements.

3. Why XML?

XML was created so that richly structured documents could be used over the web. The only viable alternatives, HTML and SGML, are not practical for this purpose.

HTML comes bound with a set of semantics and does not provide arbitrary structure.

SGML provides arbitrary structure, but is too difficult to implement just for a web browser. Full SGML systems solve large, complex problems that justify their expense. Viewing structured documents sent over the web rarely carries such justification.

This is not to say that XML can be expected to completely replace SGML. While XML is being designed to deliver structured content over the web, some of the very features it lacks to make this practical, make SGML a more satisfactory solution for the creation and long-time storage of complex documents. In many organizations, filtering SGML to XML will be the standard procedure for web delivery.Designed for ease-of-use with Standard Generalized Markup Language (SGML).Goal is to enable SGML to be served, received and processed beyond what is now possible with HTML.4. Difference between XML and HTML

XML is not a replacement for HTML.XML was designed to transport and store data, with focus on what data is.

HTML was designed to display data, with focus on how data looks.

HTML is about displaying information, while XML is about carrying information.

5. Characteristics of XML1) XML was created to structure, store, and transport information.

The following example is a note to Anoosha from Preethi, stored as XML:

AnooshaPreethiUrgentPlease call me!

The above example is self descriptive. It has sender and receiver information, it also has a heading and a message body.

This XML document does not do anything. It is just pure information wrapped in tags.

2) XML is Just Plain Text

XML is nothing special. It is just plain text. Software that can handle plain text can also handle XML.

However, XML-aware applications can handle the XML tags specially. The functional meaning of the tags depends on the nature of the application.

3) With XML You Can Invent Your Own Tags

The tags in the example above (like and ) are not defined in any XML standard. These tags are "invented" by the author of the XML document.

That is because the XML language has no predefined tags.

The tags used in HTML (and the structure of HTML) are predefined. HTML documents can only use tags defined in the HTML standard (like , , etc.).

XML allows the author to define his own tags and his own document structure.

4) XML is Not a Replacement for HTML

It is important to understand that XML is not a replacement for HTML but complement to HTML. In most web applications, XML is used to transport data, while HTML is used to format and display the data.

5) XML is a W3C Recommendation

XML became a W3C Recommendation on 10. February 1998.

6) XML is everywhereXML is now as important for the Web as HTML was to the foundation of the Web.XML is everywhere. It is the most common tool for data transmissions between all sorts of applications, and is becoming more and more popular in the area of storing and describing information.6. XML TreeXML documents form a tree structure that starts at "the root" and branches to "the leaves".

An Example XML Document

Consider the below example:

Anoosha Preethi Urgent Please call me!

The first line is the XML declaration. It defines the XML version (1.0) and the encoding used (ISO-8859-1 = Latin-1/West European character set).

The next line describes the root element of the document

The next 4 lines describe 4 child elements of the root (to, from, heading, and body):

AnooshaPreethiUrgentPlease call me!And finally the last line defines the end of the root element:

XML documents must contain a root element. This element is "the parent" of all other elements.

The elements in an XML document form a document tree. The tree starts at the root and branches to the lowest level of the tree.

All elements can have sub elements (child elements):

.....

The terms parent, child, and sibling are used to describe the relationships between elements. Parent elements have children. Children on the same level are called siblings (brothers or sisters).

All elements can have text content and attributes (just like in HTML).

Example:

The image above represents one book in the XML below:

Everyday Italian Giada De Laurentiis 2005 30.00 Harry Potter J K. Rowling 2005 29.99 Learning XML Erik T. Ray 2003 39.95

The root element in the example is . All elements in the document are contained within .

The element has 4 children: ,< author>, , .

7. XML Syntax RulesThe syntax rules of XML are very simple and logical. The rules are easy to learn, and easy to use.

1) All XML Elements Must Have a Closing TagIn HTML, you will often see elements that don't have a closing tag:

This is a paragraphThis is another paragraphIn XML all elements must have a closing tag:This is a paragraphThis is another paragraphNote: You might have noticed from the previous example that the XML declaration did not have a closing tag. This is not an error. The declaration is not a part of the XML document itself, and it has no closing tag.

2) XML Tags are Case SensitiveXML elements are defined using XML tags.XML tags are case sensitive. With XML, the tag is different from the tag .

Opening and closing tags must be written with the same case:This is incorrectThis is correctNote: "Opening and closing tags" are often referred to as "Start and end tags". Use whatever you prefer. It is exactly the same thing.

3)XML Elements Must be properly NestedIn XML, all elements must be properly nested within each other:This text is bold and italic (this is wrong)This text is bold and italic (this is correct)4)XML Documents Must Have a Root ElementXML documents must contain one element that is the parent of all other elements. This element is called the root element.

.....

5)XML Attribute Values must be quotedXML elements can have attributes in name/value pairs just like in HTML.

In XML the attribute value must always be quoted. Consider the two XML documents below. (This is wrong because date attribute in the note element is not quoted) Anoosha Preethi

(This is correct) Anoosha Preethi

6)Entity References

Some characters have a special meaning in XML.

If you place a character like "