Session03 XML Validation DTD

Embed Size (px)

Citation preview

  • 8/14/2019 Session03 XML Validation DTD

    1/28

    2008 MindTree Consulting

    XML Validation

    DTD

    Sep-2009

  • 8/14/2019 Session03 XML Validation DTD

    2/28

    Slide 2

    Agenda

    Introduction to XML Validation

    DTD

    XML Schema

  • 8/14/2019 Session03 XML Validation DTD

    3/28

    2008 MindTree Consulting

    XML Validation

  • 8/14/2019 Session03 XML Validation DTD

    4/28

    Slide 4

    An Introduction to XML Validation

    One of the important innovations of XML is the ability to placepreconditions on the data the programs read, and to do this in a

    simple declarative way.

    XML allows you to say

    that every Order element must contain exactly one Customer element,

    that each Customer element must have an id attribute that contains an

    XML name token,

    that every ShipTo element must contain one or more Streets, one City,

    one State, and one Zip, and so forth.

    Checking an XML document against this list of conditions is called

    validation.

    Validation is an optional step but an important one.

  • 8/14/2019 Session03 XML Validation DTD

    5/28

    Slide 5

    Validation

    There are many reasons and opportunities to validate an XML document:When we receive one, before importing data into a legacy system

    When we receive one, before importing data into a legacy system, when we have

    produced or hand-edited one

    To test the output of an application, etc.

    Validation as firewall

    to serve as actual firewalls when we receive documents from the external world

    (as is commonly the case with Web Services and other XML communications),

    to provide check points when we design processes as pipelines of transformations.

    Validation can take place at several levels.

    Structural validation

    Data validation

  • 8/14/2019 Session03 XML Validation DTD

    6/28

    Slide 6

    Schema Languages

    There is more than one language in which you can express suchvalidation conditions. Generically, these are called schema

    languages, and the documents that list the constraints are called

    schemas.

    Different schema languages have different strengths and

    weaknesses.

    The document type definition (DTD) is the only schema language

    built into most XML parsers and endorsed as a standard part of XML.

    The W3C XML Schema Language (schemas for short, though its

    hardly the only schema language) addresses several limitations of

    DTDs.

    Many other schema languages have been invented that can easily

    be integrated with your systems.

  • 8/14/2019 Session03 XML Validation DTD

    7/28 2008 MindTree Consulting

    Document Type Definition (DTD)

  • 8/14/2019 Session03 XML Validation DTD

    8/28Slide 8

    Document Type Definition (DTD)

    XML 1.0 included a set of tools for defining XML document structures,called Document Type Definitions (DTDs).

    A DTD focuses on the element structure of a document. It says what

    elements a document may contain, what each element may and must

    contain in what order, and what attributes each element has. DTDs can be

    used for:defining reusable content (entities),

    some kinds of metadata information (notations).

    mechanisms for providing default values for attributes.

    Document type definitions (DTDs) serve two general purposes.They provide the syntax for describing/constraining the logical structure of

    a document. (Element/attribute declarations are used for it)

    They provide syntax for composing a logical document from physical

    entities. (entity/notation declarations are used to accomplish it.)

  • 8/14/2019 Session03 XML Validation DTD

    9/28Slide 9

    DTD Declarations

    DTDs contain

    several types

    of

    declarations

    DOCTYPE ENTITY NOTATION ELEMENT ATTLIST

  • 8/14/2019 Session03 XML Validation DTD

    10/28Slide 10

    The DOCTYPE declaration is the container for all other DTD

    declarations.

    The document type declaration is placed in the instance

    documents prolog, after the XML declaration but before the root

    element start-tag to associate the given document with a set of

    declarations.

    The name of the DOCTYPE must be the same as the name of the

    documents root element.

    Example:

  • 8/14/2019 Session03 XML Validation DTD

    11/28Slide 11

    DOCTYPE Syntax

    DOCTYPE may contain internal declarations (referred to as theinternal DTD subset ), may refer to declarations in external files

    (referred to as the external DTD subset ), or may use a combination

    of both techniques.

  • 8/14/2019 Session03 XML Validation DTD

    12/28Slide 12

    Internal Declarations

    The simplest way to define a DTD is through internal declarations. In this case, all

    declarations are simply placed between the open/close square brackets. The obvious

    downside to this approach is that you cant reuse the declarations across different

    XML document instances.

    ]>

    Billy Bob

    33

  • 8/14/2019 Session03 XML Validation DTD

    13/28Slide 13

    External Declarations

    DOCTYPE can also contain a reference to an external resourcecontaining the declarations. This type of declaration is useful

    because it allows you to reuse the declarations in multiple

    document instances.

    The DOCTYPE declaration references the external resource through

    public and system identifiers.

    A system identifier is a URI that identifies the location of the

    resource; a public identifier is a location-independent identifier.

    Processors can use the public identifier to determine how to retrieve the

    physical resource if necessary. The PUBLIC token identifies a public

    identifier followed by a backup system identifier.

  • 8/14/2019 Session03 XML Validation DTD

    14/28Slide 14

    Using external declarations examples

    Using external declarations (public

    identifier)

    Billy Bob

    33

    Using external declarations (systemidentifier)

    Billy Bob

    33

  • 8/14/2019 Session03 XML Validation DTD

    15/28Slide 15

    Internal and external declarations

    A DOCTYPE declaration can also use both the internal and external declarations.

    This is useful when youve decided to use external declarations but you need to extend them

    further or override certain external declarations.

    Note: only ENTITY and ATTLIST declarations may be overridden.

    Example

    Billy Bob

    33

  • 8/14/2019 Session03 XML Validation DTD

    16/28Slide 16

    An ELEMENT declaration defines an element of the specified name with thespecified content model. The content model defines the elements allowed

    children.

    Content Model Basics

    Syntax DescriptionANY Any child is allowed within the element.

    EMPTY No children are allowed within the element.

    (#PCDATA) PCDATA stands for parsed character data and means

    the element can contain text.

    (child1,child2,...) Only the specified children in the order given are

    allowed within the element.

    (child1|child2|...) Only one of the specified children is allowed within

    the element.

  • 8/14/2019 Session03 XML Validation DTD

    17/28

  • 8/14/2019 Session03 XML Validation DTD

    18/28Slide 18

    Elements - Examples

    Element and text content models

    Billy

    Smith

    43

    0.1

    Jill

    J

    Smith

    21

    Mixed content model

    < p SYSTEM "p.dtd">

    This is an example of mixed

    content!

  • 8/14/2019 Session03 XML Validation DTD

    19/28Slide 19

    aName2 aType default ...>

    Declaration Description

    Value Default value for attribute. If the

    attribute is not explicitly used on

    the given element, it will still

    exist in the logical documentwith the specified default value.

    #REQUIRED Attribute is required on the given

    element.

    #IMPLIED Attribute is optional on the givenelement.

    #FIXED

    "value"

    Attribute always has the

    specified fixed value.

    Type Description

    CDATA Arbitrary character data

    ID A name that is unique within the

    documentIDREF A reference to an ID value in the

    document

    ENTITY The name of an unparsed entity

    declared in the DTD

    ENTITIES A space-delimited list of ENTITY

    values

    NMTOKEN A valid XML name (NMTOKEN is

    essentially a word without spaces.)

    NMTOKENS A space-delimited list of

    NMTOKEN values

    Default declarations - After the attribute type,you must specify either a default value for the

    attribute or a keyword that specifies whether it is

    required.

    Attribute types-Attribute types make it possibleto constrain the attribute value in different ways.

    See the following list of type identifiers for details.

    Attribute enumerations

  • 8/14/2019 Session03 XML Validation DTD

    20/28

    Slide 20

    Attribute enumerations

    ...)>

    Example - Using attribute types

    name CDATA #REQUIRED

    species NMTOKEN #FIXED "human"

    id ID #REQUIRED

    mgr IDREF #IMPLIED

    manage IDREFS #IMPLIED>

    Example - Using attribute enumerations

    title (president|vice-pres|secretary|sales)

    #REQUIRED>

    format NOTATION (cs|lf) "cs">

    1927 N 52 E, Layton, UT, 84041

    Its also possible to define an attribute as an enumeration of tokens. The tokens may be of type NMTOKEN or NOTATION . In

    either case, the attribute value must be one of the specified enumerated values.

  • 8/14/2019 Session03 XML Validation DTD

    21/28

    Slide 21

    Entities are the most atomic unit of information in XML. Entities are usedto construct logical XML documents (as well as DTDs) from physical

    resources. There are several types of entities, each of which is declared

    using an ENTITY declaration.

    A given entity is either

    General or parameter Internal or external Parsed or unparsed

    General Entity may only be referenced in an XML document (not the DTD).

    Parameter Entity may only be referenced in a DTD (not the XML document).

    Internal Entity value defined inline.External Entity value contained in an external resource.

    Parsed Entity value parsed by a processor as XML/DTD content.

    Unparsed Entity value not parsed by XML processor.

  • 8/14/2019 Session03 XML Validation DTD

    22/28

    Slide 22

    Entity Syntax

    Distinct Entity Types Syntax Description

    Internal

    parameter

    "systemId">

    External

    parameter

    Internal general

    "systemId">

    External parsed

    general

    "systemId" NDATA nname>

    Unparsed

    Entity References Syntax Description

    &name; General

    %name; Parameter

    Name is used as the value of

    an attribute of type ENTITY

    or ENTITIES

    Unparsed

    Note that unparsed entities

    are always general and

    external whereas

    parameter/internal

    entities are always

    parsed.

  • 8/14/2019 Session03 XML Validation DTD

    23/28

    Slide 23

    Internal parameter entities

    Always parsed

    Referenced

    within

    ELEMENT,ATTRIBUTE,

    NOTATION,ENTITY

    Used toparameterize

    portions of theDTD

    (%name;) is

    replaced with

    the parsed

    content

    Internalparameter

    entities

    Example: Parameter entities in the internal subset

    %nameDecl;

    ]>

    Billy Bob

    Its common to override parameter entities defined in the

    external subset with declarations in the internal subset

    Parameter entities may not be referenced within other

    declarations in internal subset but it can be in external subset

    External parameter entities

  • 8/14/2019 Session03 XML Validation DTD

    24/28

    Slide 24

    External parameter entities

    Example

    %decls;

    ]>

    Billy Bob

    33

    External parameter entities are

    used to include declarations

    from external resources.

    External parameter entities are

    always parsed. A reference to an

    external parameter entity

    (%name;) is replaced with the

    parsed content.

    This example uses an external

    parsed entity (decls) to includethe set of declarations that are

    contained in person-decls.dtd.

  • 8/14/2019 Session03 XML Validation DTD

    25/28

    Slide 25

    Internal general entities

    Internal general entities always contain

    parsed XML content. The parsed content is

    placed in the logical XML document

    everywhere its referenced (&name;).

    Example : Using internal general entities

    "BillySmith">

    ]>

    &n;

    &a;

    The resulting logical document couldbe serialized as follows:

    Billy

    Smith

    33

  • 8/14/2019 Session03 XML Validation DTD

    26/28

    Slide 26

    External general parsed entities and Unparsed entities

    External general parsed entities

    External general parsed entities are used the same

    way as internal general entities except for the fact

    that they arent defined inline. They always contain

    parsed XML content that becomes part of the logical

    XML document wherever its referenced (&name;).

    Example:

    ]>

    &n;

    &a;

    Unparsed entities

    nname>

    Unparsed entities make it possible to attach

    arbitrary binary resources to an XML document.

    Unparsed entities are always general and

    external.

    Because unparsed entities can reference any

    binary resource, applications require additional

    information to determine the resources type.

    The notation name (nname) provides exactly this

    type of information

    Because unparsed entities dont contain XML

    content, they arent referenced the same way as

    other general entities (&name;), but rather

    through an attribute of type ENTITY/ENTITIES.

    ]>

    Aaron

  • 8/14/2019 Session03 XML Validation DTD

    27/28

    Slide 27

    Questions

  • 8/14/2019 Session03 XML Validation DTD

    28/28

    Thank you

    XML Technology, Semester 4

    SICSR Executive MBA(IT) @ MindTree, Bangalore, India

    By Neeraj Singh (toneeraj(AT)gmail(DOT)com

    )

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]