XML DTD Training Module

Embed Size (px)

Citation preview

  • 7/31/2019 XML DTD Training Module

    1/30

    Training module for XML

    and DTD

    PreparedbySantanuNayak

    1507

    2011

  • 7/31/2019 XML DTD Training Module

    2/30

  • 7/31/2019 XML DTD Training Module

    3/30

    XML/DTDTrainingOverviewXML (Extensible Markup Language) has emerged as the leading standard for datainterchange between applications and between organizations. In this XML trainingclass, attendees learn the core fundamentals of XML and its related technologies likeXML, XSLT, DTD.

    XML/DTDTrainingPrerequisitesPrior knowledge of HTML and/or relational databases is helpful but not necessary.

    XML/DTDTrainingObjectivesTo learn how XML and its related technologies function like DTD, XSLT etc.To master the core syntax of XML, DTDTo learn the fundamentals of XSL

  • 7/31/2019 XML DTD Training Module

    4/30

  • 7/31/2019 XML DTD Training Module

    5/30

    TrainingOutline1. Introduction

    1.1 What is XML?

    1.2 The Difference Between XML and HTML

    2.

    Writing XML2.1 XML Tree

    2.2 XML Syntax Rules

    2.3 Rules for writing XML

    2.4 Elements, attributes, and values

    2.4.1 Element

    2.4.2 Attribute

    2.4.3 Value

    2.5 Declaring the XML version

    2.5.1 Version declaration2.5.2 Encoding declarations

    2.6 Creating the root element

    2.7 Nesting Element

    2.8 Writing comments

    2.9 Writing symbols and special character

    3. DTD (Document Type Definition) fundamentals3.1 What is DTD and what is the role of DTDs?

    3.2 Internal DTD

    3.3 External DTD3.4 Type of declarations in DTD

    3.4.1 Element declarations

    3.4.2 Attribute List Declarations

    3.4.3 Entity declarations

    3.5 PCDATA

    3.6 CDATA

    3.7 Special Character (Entities)

    4. Relation between XML and DTD and Validation4.1 Validation(Parsing)4.1.1 Well Formed XML

    4.1.2 Valid XML Documents

  • 7/31/2019 XML DTD Training Module

    6/30

  • 7/31/2019 XML DTD Training Module

    7/30

    1. IntroductionXML was designed to transport and store data where HTML was designed to displaydata.

    1.1 WhatisXML?XML stands for Extensible Markup LanguageXML is a markup language much like HTMLXML was designed to carry data, not to display dataXML tags are not predefined. You must define your own tagsXML is designed to be self-descriptiveXML is a W3C Recommendation

    1.2 TheDifferenceBetweenXMLandHTMLIt is important to understand that XML is not a replacement for HTML. In most webapplications, XML is used to transport data, while HTML is used to format and display thedata.

    The best description of XML is this:

    XM L i s a so f t w a r e - and ha r dw ar e - i ndependen t t oo l f o r car r y ing i n f o r m a t ion .

    XML and HTML were designed with different goals:

    XML was designed to transport and store data, with focus on what data isHTML was designed to display data, with focus on how data looks

    HTML is about displaying information, while XML is about carrying information.

  • 7/31/2019 XML DTD Training Module

    8/30

  • 7/31/2019 XML DTD Training Module

    9/30

    2.WritingXML2.1 XMLTreeXML documents form a tree structure that starts at "the root" and branches to "theleaves".

    XML Documents Form a Tree Structure

    XML documents must contain a r oo t e lem en t . This element is "the parent" of all otherelements.

    The elements in an XML document form a document tree. The tree starts at the root andbranches to the lowest level of the tree.

    All elements can have sub elements (child elements):

    .....

    The terms parent, child, and sibling are used to describe the relationships betweenelements. Parent elements have children. Children on the same level are called siblings(brothers or sisters).

    All elements can have text content and attributes (just like in HTML).

    2.2 XMLSyntaxRulesThe syntax rules of XML are very simple and logical. The rules are easy to learn, andeasy to use.

    1. All XML Elements Must Have a Closing Tag (where In HTML, some elements do nothave to have a closing tag)

    Example:

    < p > Th i s i s a p ar a g ra p h < / p >

    < p > Th i s is an o t h er p a r ag r ap h < / p >

    2. XML Tags are Case SensitiveExample:

    < M e ss ag e > T h is i s i n co r r e ct < / m e s sa g e> W r o n g

    < m e s sa g e> Th i s i s co r r e ct < / m e s sa g e> Co r r e ct

  • 7/31/2019 XML DTD Training Module

    10/30

    3. XML Elements Must be Properly NestedExample:

    < b > < i > T h is t e xt i s b ol d a nd i t a li c< / b > < / i > W r o n g

    < b > < i > Th i s t e x t is b o ld an d it a li c< / i > < / b > Co r r ec t

    4. XML Documents Must Have a Root ElementExample:

    < r o o t >

    < c h il d >

    < s u b ch i l d > . .. .. < / s u b ch i l d >< / ch il d>

    < / r oo t >

    5. XML Attribute Values Must be QuotedExample:

    W r o n g

    < n ot e d at e= 1 2/ 1 1 / 2 00 7 >

    < t o> To ve< / t o>

    < f ro m > Ja ni < / f ro m >

    < / n ot e>

    Correct

    < n ot e d at e= " 1 2 / 1 1 / 2 0 07 " >

    < t o> To ve< / t o>

    < f ro m > Ja ni < / f ro m >

    < / n ot e>

    6. Entity ReferencesExample:

    < m e ss ag e> i f sa la r y < 1 0 0 0 t h e n < / m e s sa g e>

    < m e s sa g e> i f s a la r y < 1 0 0 0 t h en < / m e s sa g e>

  • 7/31/2019 XML DTD Training Module

    11/30

    7. Comments in XMLThe syntax for writing comments in XML is similar to that of HTML.

    2.3 RulesforwritingXMLThere are nine basic rules for building good XML:

    1. All XML must have a root element.2. All tags must be closed.3. All tags must be properly nested.4. Tag names have strict limits.5. Tag names are case sensitive.6. Tag names cannot contain spaces.7. Attribute values must appear within quotes ("").8. White space is preserved.9. HTML tags should be avoided (optional).2.4 Elements,attributes,andvalues2.4.1 ElementA element is just a generic name for a Tag. An opening tag looks like , while aclosing tag has a slash that is placed before the element's name: . Allinformation that belongs to an element must be contained between the opening andclosing tags of an element.

    An element can contain:

    other elementstextattributesor a mix of all of the above...

    Example:

    < b o o k st o r e >

    < b o o k ca t e g or y = " CH I L DREN " >< t i t l e> H ar r y Po t t e r< / t i t l e>

    < a u t h or > J K . Ro w l i n g < / a u t h or >

    < y ea r> 2 0 05 < / y ear >

    < p r i ce > 2 9 . 9 9 < / p r i ce >

    < / book >

    < b o o k c at e g o r y = " W EB " >

    < t i t l e> Le ar n i n g X ML< / t i t l e>

    < a u t h or > Er i k T. Ra y < / a u t h or >

  • 7/31/2019 XML DTD Training Module

    12/30

    < y ea r> 2 0 03 < / y ear >

    < p r i ce > 3 9 . 9 5 < / p r i ce >

    < / book >

    < / b oo k st o r e>

    In the example above, and have e lem en t con t en t s, because theycontain other elements. also has an a t t r i b u t e (category="CHILDREN"). ,, , and have t e x t c on t e n t because they contain text.

    XMLNamingRulesforanElementXML elements must follow these naming rules:

    Names can contain letters, numbers, and other charactersNames cannot start with a number or punctuation characterNames cannot start with the letters xml (or XML, or Xml, etc)Names cannot contain spaces

    Any name can be used, no words are reserved.

    2.4.2AttributeAttributes are used to specify additional information about the element. It may help tothink of attributes as a means of specializing generic elements to fit your needs. Anattribute for an element appears within the opening tag.

    If there are multiple values an attribute may have, then the value of the attribute mustbe specified. For example, if a tag had a color attribute then the value would be: red,blue, green, etc. The syntax for including an attribute in an element is:

    In this example we will be using a made up XML element named "friend" that has anoptional attribute age.

    Example:

    < f r ie n d a g e= " 2 3 " > Sa m a n t h a< / f r i en d >

    2.4.3 ValueAttributes may have a default value OR a fixed value specified.

    A default value is automatically assigned to the attribute when no other value is specified.

    Example:

  • 7/31/2019 XML DTD Training Module

    13/30

    < n o te da y= " 1 4 " m o n t h= " 0 7 " y e ar = " 2 0 1 1 " t o = " Cr e st " f r o m = " S p r in g er "

    h e ad i n g = " B o ok " b od y = " H ap p y w e ek e nd !" > < / n o t e>

    2.5 DeclaringtheXMLversion2.5.1 VersiondeclarationVersion declaration, as a type of Processing Instruction, it is information for theapplication. XML documents start with an XML vers ion dec la ra t ion (XML decla ra t io n )which specifies the version of XML being used. For the time being there exists onlyversion 1.0 of XML. Although the XML version declaration is optional, it is suggested bythe W3C specification.

    It will look something like the following:

    < ?x m l v e r si o n = " 1 .0 " ? >

    2.5.2 EncodingdeclarationsEncod ing dec la ra t ions inform the processor what kind of code the document uses (e.g.UFT8, which is the same character Set as ASCII). All XML parsers must support 8-bit and16-bit Unicode encoding corresponding to ASCII. However, XML parsers may support alarger set.

    It will look something like the following:

    2.6 CreatingtherootelementThe goal of this step is to create a root element, which contains all other elements thatyou create. It is the most important element, as it contains the rest of the document andbecomes synonymous with your document type. It cannot be repeated.

    Markup documents, whether HTML, XML or SGML, employ a root element, which containsall other elements. The root element usually describes the focus or function of thedocument. The HTML element in HTML is a good root element because it reveals the

    name of the markup language.

    Example:

    < ?x m l v e r si o n = " 1 . 0 " st a n d a lo n e = " y e s" ?>

    < HELP>

  • 7/31/2019 XML DTD Training Module

    14/30

    < / H ELP>

    2.7 NestingElement

    To think of nesting in plain English, follow this rule: elements opened first must be closedlast. That means that the root element, the first element in an XML document, must alsobe closed last. Nested elements, ones that occur in the middle of the document, must beclosed before those that came before them.

    When an element appears within another element, it is said that the inner element is"nested". The term nested can be related directly to the word "nest". If an element isnested within another element, then it is surrounded, protected, or encapsulated by theouter element.

    Besides being such an easy term to understand, nesting also serves a wonderful purposeof keeping order in an XML document. Much like parentheses in a math problem,

    elements must be closed in the order that they are opened.This means that an element which is nested inside another element must end itself beforethe outer element. Below are two example XML documents (A & B). One is properlynested and the other has a small problem.

    Example:

    < p h o n e b oo k >

    < n u m b e r >

    < n a m e >

    < / n um b er >< / n am e>

    < / p h on eb oo k >

    2.8 WritingcommentsComments in XML are nearly identical to comments in HTML. Using comments allows youto understand code you wrote years before, or another developer to review.

    Comment tags are two parts, the part starting the comment and the part ending it. First,add the first part of the comment tag

    Write whatever comment you would like - just make sure you don't nest comments withinother comments.

    Close the comment tag -->

    Tips:

  • 7/31/2019 XML DTD Training Module

    15/30

    Comments cannot come at the very top of your document. In XML, only the XMLdeclaration can come first:Comments may not be nested one inside another. You must close your first commentbefore you open a second.

    Comments cannot occur within tags, e.g. .Never use the two dashes (--) anywhere but at the beginning and end of yourcomments.Anything in comments is effectively invisible to the XML parser, so be very carefulthat what remains is still valid and well-formed.

    2.9 WritingsymbolsandspecialcharacterWhile parsing the XML file, sometimes you may want to show some Extra characters. ForExample Suppose you want to have declaration like

    Balance > Investment

    in your XML File. Now "> " is a reserved character which is normally used to declare theEntity Name.

    To handle such kind of situations, you can replace these characters with these specialcharacters, which get substituted automatically while parsing the XML file.

    Character Reference

    & &

    < >

    " "

    ' '

    So you can declare the above declaration in this format for it to be valid

    Balance & l t ; Investment

    Char acter Referen ces :

  • 7/31/2019 XML DTD Training Module

    16/30

    The above List is for predefined characters. You can also use the Unicode value whiledeclaring custom characters. For example you can declare it as

    Balance & # 1 4 7 ; Investment

    A character reference like contains a hash mark (#) followed by a number. Thenumber is the Unicode value for a single character, such as 65 for the letter A, 147 forthe left-curly quote, or 148 for the right-curly quote.

  • 7/31/2019 XML DTD Training Module

    17/30

    3.DTD(DocumentTypeDefinition)fundamentals3.1WhatisDTDandwhatistheroleofDTDs?A DTD is a set of rules that defines what tags appear in a XML document, what attributes

    the tags may have and what a relationship the tags have with each other. When an XMLdocument is processed, it is compared within the DTD to be sure it is structured correctlyand all tags are used in the proper manner. This comparison process is called validationand it is performed by a tool called parser.

    The purpose of a DTD is to define the legal building blocks of an XML document. It definesthe document structure with a list of legal elements.

    A DTD can be declared inline in your XML document, or as an external reference.

    3.2 InternalDTDInternal DTD (markup declaration) are inserted within the doctype declaration. DTDsinserted this way are used in the that specific document. This might be the approach totake for the use of a small number of tags in a single document, as in this example:

    < !DOCTYPE f i lm [

    < !ENTI TY COM " Com edy" >

    < !ENTI TY SF "Sc ience Fic t ion" >

    < ! ELEM ENT f i lm ( t i t l e+ , gen r e , year ) >

    < !ELEMENT t i t le ( # PCDATA) >

    < ! ATTLI ST t i t l exm l : lang NMTOKEN " EN"

    i d I D # I M PLI ED >

    < !ELEMENT genr e ( # PCDATA) >

    < !ELEMENT year ( # PCDATA) >

    ] >

    Tootsie&COM;1982

    Jurassic Park

    &SF;1993

    3.3 ExternalDTDDTDs can be very complex and creating a DTD requires a certain amount of work. DTDsare stored as ASCII text files with the extension '.dtd'. In the following example we

  • 7/31/2019 XML DTD Training Module

    18/30

  • 7/31/2019 XML DTD Training Module

    19/30

  • 7/31/2019 XML DTD Training Module

    20/30

    WrappingIf the DTD is to be included in your XML source file, it should be wrapped in a DOCTYPEdefinition with the following syntax:

    example:

    ]>ToveJani

    ReminderDon't forget me this weekend

    Declaringonlyoneoccurrenceofthesameelement

    example

    The example declaration above declares that the child element message can only occurone time inside the note element.

    Declaringminimumoneoccurrenceofthesameelement

    example

    The + sign in the example above declares that the child element message must occur oneor more times inside the note element.

    Declaringzeroormoreoccurrencesofthesameelement

    example

  • 7/31/2019 XML DTD Training Module

    21/30

    The * sign in the example above declares that the child element message can occur zeroor more times inside the note element.

    Declaringzerooroneoccurrencesofthesameelement

    example

    The ? sign in the example above declares that the child element message can occur zeroor one times inside the note element.

    Declaringmixedcontent

    example

    The example above declares that the element note must contain at least one t o childelement, exactly one f r om child element, exactly one header , zero or more message ,and some other parsed cha r act e r da t a as well.

    3.4.2AttributeListDeclarationsIn the DTD, XML element attributes are declared with an ATTLIST declaration. Anattribute declaration has the following syntax:

    As you can see from the syntax above, the ATTLIST declaration defines the elementwhich can have the attribute, the name of the attribute, the type of the attribute, and thedefault attribute value.

    The a t t r i b u t e - t y p e can have the following values:

    Value Exp lanat ion

    CDATA The value is character data

    (eval|eval|..) The value must be an enumerated value

    ID The value is an unique id

    IDREF The value is the id of another element

    IDREFS The value is a list of other ids

    NMTOKEN The value is a valid XML name

  • 7/31/2019 XML DTD Training Module

    22/30

    NMTOKENS The value is a list of valid XML names

    ENTITY The value is an entity

    ENTITIES The value is a list of entities

    NOTATION The value is a name of a notation

    xml: The value is predefined

    The a t t r i bu t e - de f au l t - va lue can have the following values:

    Value Exp lanat ion

    #DEFAULT value The attribute has a default value

    #REQUIRED The attribute value must be included in the element

    #IMPLIED The attribute does not have to be included

    #FIXED value The attribute value is fixed

    Attributedeclarationexample

    DTD example:

    XML example:

    In the above example the element square is defined to be an empty element with theattributes width of type CDATA. The width attribute has a default value of 0.

    Defaultattributevalue

    Syntax:

    DTD example:

    XML example:

    Specifying a default value for an attribute, assures that the attribute will get a value evenif the author of the XML document didn't include it.

  • 7/31/2019 XML DTD Training Module

    23/30

    Impliedattribute

    Syntax:DTD example:

    XML example:

    Use an implied attribute if you don't want to force the author to include an attribute andyou don't have an option for a default value either.

    Requiredattribute

    Syntax:DTD example:

    XML example:

    Use a required attribute if you don't have an option for a default value, but still want toforce the attribute to be present.

    Fixedattributevalue

    Syntax:DTD example:

    XML example:

    Use a fixed attribute value when you want an attribute to have a fixed value withoutallowing the author to change it. If an author includes another value, the XML parser willreturn an error.

  • 7/31/2019 XML DTD Training Module

    24/30

    Enumeratedattributevalues

    Syntax:DTD example:

    XML example:or

    Use enumerated attribute values when you want the attribute values to be one of a fixedset of legal values.

    Not e !

    No attribute name may appear more than once in the same start-tag or empty-element tag.

    The attribute must have been declared; the value must be of the type declared forit.

    No External Entity References. Attribute values cannot contain direct or indirect entity references to external

    entities. The replacement text of any entity referred to directly or indirectly in an attribute

    value (other than " >

    & &

    " "

    ' '

  • 7/31/2019 XML DTD Training Module

    30/30

    4.RelationbetweenXMLandDTDandValidationThe relation and the purpose of having a DTD is, when your XML document is processed,it is compared to its associated DTD to be sure it is structured correctly and all tags areused in the proper manner. This comparison process is called validation and is

    performed by a tool called a parser.Remember, you don't need to have a DTD to create an XML document; you only need aDTD for a valid XML document.

    4.1Validation(Parsing)When a XML generate with correct syntax is called "Well Formed" XML. But when a XMLvalidated against a DTD is called "Valid" XML. We have to generate always a Valid XML.

    4.1.1 WellFormedXMLA "Well Formed" XML document has correct XML syntax.

    The syntax rules were described in the previous chapters:

    XML documents must have a root elementXML elements must have a closing tagXML tags are case sensitiveXML elements must be properly nestedXML attribute values must be quoted

    ToveJaniReminderDon't forget me this weekend!

    4.1.2 ValidXMLDocumentsA "Valid" XML document is a "Well Formed" XML document, which also conforms to therules of a Document Type Definition (DTD):

    ToveJaniReminderDon't forget me this weekend!