E X TENSIBLE M ARKUP L ANGUAGE (XML). What is XML?  XML stands for EXtensible Markup Language  XML is mainly designed to carry (or transmit) data, not

  • View

  • Download

Embed Size (px)

Text of E X TENSIBLE M ARKUP L ANGUAGE (XML). What is XML?  XML stands for EXtensible...

  • What is XML?

    XML stands for EXtensible Markup LanguageXML is mainly designed to carry (or transmit) data , not to display data.XML is a computer language for defining markup languages to create structured documents. XML tags are not predefined. You must define your own tags

  • Difference between HTML and XML HTML is used to mark up text so it can be displayed to users on the screen.

    HTML describes both structure (e.g. , , ) and appearance (e.g. , , ).

    HTML uses a fixed, unchangeable set of tags

    HTML is not case-sensitive

    It is not strict about syntactical rules.

    Browsers ignore and/or correct as many HTML errors as they can. XML is used to carry data so it can be processed by computers.

    XML describes only content, or meaning.

    In XML, you can create your own tags

    XML is case-sensitive.

    Follows strict syntactical rules.

    Browsers process XML documents only if they are syntactically correct.

  • HTML and XML (similarity)*HTML and XML look similar, because they are both SGML languages ( Standard Generalized Markup Language)

    Both HTML and XML use elements enclosed in tags (Ex: This is an element)

    Both use tag attributes

    Both use entities (, &, ", ')

  • XML document structure (Basic points to remember)Each XML document should start with the version of XML Harry Potter J K. Rowling 2005 29.99 root element of the document

  • Format of root element of the document:

    ..... XML document structure (Basic points to remember)

  • A simple example for XML document

  • XML DocumentContains 2 auxiliary files:One specifies tag set and rules (DTD/XML schema)Second specifies how content should be displayed (CSS/XSLT)Xml document consists of many entities:Document entity (Physically within the document)Reference entity (Separate files)- Should have name and reference (A reference to an entity has the form: &entity_name;)Binary entity (binary data)(ex: images, sound files etc)

  • CDATA & PCDATABy default, all text inside an XML document is parsedBut text inside a CDATA section will be ignored by the parser.

    PCDATA - Parsed Character DataXML parsers normally parse all the text in an XML document.When an XML element is parsed, the text between the XML tags is also parsed:This text is also parsed

    The parser does this because XML elements can contain other elements, as in this example, where the element contains two other elements (first and last):BillGates

  • and the parser will break it up into sub-elements like this: Bill Gates

    CDATA - (Unparsed) Character DataThe term CDATA is used about text data that should not be parsed by the XML parser. Everything inside a CDATA section is ignored by the parser.To avoid errors, script code can be defined as CDATA.

    A CDATA section starts with ""

  • DTD - XML Building Blocks

    The Building Blocks of XML DocumentsElementsAttributesEntitiesPCDATACDATA

  • What is an XML Element?

    An XML element is everything from the element's start tag to the element's end tag.An element can contain:other elements (element-content)Text (text content)Attributes


  • Document Type Definitions (DTDs)A set of structural rules called declarations, which specify a set of elements that can appear in the document as well as how and where these elements may appear.Purpose: to provide a standard form for a collection of XML documents.Not all XML documents have or need a DTD.Two types of DTDsInternal DTD (appears within a XML document)External DTD (appears as a external file can be used with more than one document)

  • Declaring Elements within DTDDTD contains declarations that define elements, attributes, etc.Syntax: Ex:

    Empty ElementsEmpty elements are declared with the keyword - EMPTY


  • In many cases, it is necessary to specify the number of times that a child element may appear. This can be done by adding a DTD modifier to the child element specification.

    Modifier Meaning+one or more occurrences*zero or more occurrences? zero or one occurrence

  • Declaring attributes:In a DTD, attributes are declared with an ATTLIST declaration.An attribute declaration has the following syntax:

    Attribute types: there are many possible, but we will consider only CDATA

    DTD example: XML example:

  • The default-value can be one of the following:value - The default value of the attribute. #REQUIRED - The attribute is required.#IMPLIED - The attribute is not required. #FIXED value - The attribute value is fixed.DTD example: XML example: ...

  • Declaring entitiesEntities are variables used to define shortcuts to standard text or special characters.Entities are normally used to specify large blocks of data that need to be repeated throughout the document.

    Entities can be declared in two ways: internal or external.

  • An Internal Entity Declaration


    DTD Example: XML example: &writer;

    Note: An entity has three parts: an ampersand (&), an entity name, and a semicolon (;)

  • An External Entity Declaration


  • Internal and External DTDs

  • Internal and External DTDsInternal DTD DeclarationIf the DTD is declared inside the XML file, it should be wrapped in a DOCTYPE definition with the following syntax:

  • Example:!DOCTYPE note defines that the root element of this document is note.

    !ELEMENT note defines that the note element contains four elements: "to,from,heading,body

    #PCDATA Parsed Character Data Indicates that browser should parse the content


    John Robert Reminder Don't forget me

  • External DTD DeclarationIf the DTD is declared in an external file, it should be wrapped in a DOCTYPE definition with the following syntax:

  • XML Namespaces

  • XML Namespaces

    XML Namespaces provide a method to avoid element name conflicts.To use XML Namespaces, elements are given qualified names.

    Name ConflictsIn XML, element names are defined by the developer. This often results in a conflict when trying to mix XML documents from different XML applications.For example, this file carries HTML table information: Apples Bananas

  • Whereas this XML file carries user defined tags: African Coffee Table 80 120

    If these two files were added together, there would be a name conflict. Both contain a element, but the elements have different content and meaning.An XML parser will not know how to handle these differences.

    XML Namespaces (Contd.)

  • Solving the Name Conflict Using a Prefix:

    Name conflicts in XML can easily be avoided using a name prefix.Example:

    Apples Bananas African Coffee Table 80 120

  • The xmlns AttributeWhen using prefixes in XML, a so-called namespace for the prefix must be defined.The namespace is defined by the xmlns attribute in the start tag of an element. xmlns:prefix=URI

    Apples Bananas African Coffee Table 80 120

  • Namespaces can be declared in the elements where they are used or in the XML root element:

    Apples Bananas African Coffee Table 80 120

  • XML schemas

  • XML schemasXML Schema is an XML-based alternative to DTD.An XML schema describes the structure of an XML document.The XML Schema language is also referred to as XML Schema Definition (XSD).

    What is an XML Schema?The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD.

    Advantage of XML Schema:One of the greatest strength of XML Schemas is the support for data types.

  • An XML Schema:defines elements that can appear in a documentdefines attributes that can appear in a documentdefines which elements are child elementsdefines the order of child elementsdefines the number of child elementsdefines whether an element is empty or can include textdefines data types for elements and attributesdefines default and fixed values for elements and attributesXML schemas (Contd.)

  • Advantages of using data types:

    With support for data types:It is easier to describe allowable document contentIt is easier to validate the correctness of dataIt is easier to work with data from a databaseIt is easier to define data facets (restrictions on data)It is easier to define data patterns (data formats)It is easier to convert data between different data types

  • The element may contain some attributes. A schema declaration often looks something like this:Ex: note.xsd

    ... ... Defining a schema

  • Meaning of attributes
  • Defining a schema instanceThe XML Schema is like a class and XML document which adhere to an XML schema are basically instance of that schema.This XML document has a reference to an XML Schema:

    Tove Jani Reminder Don't forget me this weekend!

  • xmlns="http://www.w3schools.com" specifies the default namespace declaration (to be the one defined in its schema).

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" The above code indicates that this XML document is an instance of an XML Schema.

    xsi:schemaLocation=http://www.w3schools.com/note.xsdIndicates the file name where the default namespace is defined. This attribute has two values. The first value is the namespace to use. The second value is file name of the schema.Defining a schema instance (Contd.)

  • Overview of Data TypesThere are 2 categories of user defined XML schema data types.Simple data type: cannot have attr