Chapter 4: Document Type Definitions
Chapter 4 ObjectivesLearn to create DTDsValidate an XML document against a DTDUse DTDs to create XML documents from multiple files
My First DTD
John Fitzgerald Johansen Doe
Preparing the Ground
Whats in a Name?
John Fitzgerald Johansen Doe
Try It OutWhats in a Name?
The Document Type DeclarationSystem Identifiers
The Document Type Declaration
Try It OutThe External DTD
Using External DTD
John Quincy Public
Anatomy of a DTD
Element declarations consist of three basic parts:
The ELEMENT declarationThe element nameThe element content model
John Fitzgerald Johansen Doe
Combining Sequences and Choice Using Groups
IndicatorDescription[none]As we have seen in all of our content models thus far, when no cardinality indicator is used, it indicates that the element must appear once and only once. This is the default behavior for elements used in content models. ?Indicates that the element may appear either once or not at all. +Indicates that the element may appear one or more times. *Indicates that the element may appear zero or more times.
Mixed ContentJeff is a developer and author for Beginning XML 4thedition.Jeff loves XML!
Empty and Any
Try It OutMaking Contact Part 1 Making Contact Part 2
How would you: guarantee that a contact had a least one first name? specify that a list of contacts could have many contacts on it?
TypeDescriptionCDATAIndicates that the attribute value is character data IDIndicates that the attribute value uniquely identifies the containing element IDREFIndicates that the attribute value is a reference, by ID, to a uniquely identifiable element IDREFSIndicates that the attribute value is a whitespace-separated list of IDREF values ENTITYIndicates that the attribute value is a reference to an external unparsed entity (we will learn more about entities later). The unparsed entity might be an image file or some other external resource such as an MP3 or some other binary file ENTITIESIndicates that the attribute value is a whitespace-separated list of ENTITY values NMTOKENIndicates that the attribute value is a name token. An NMTOKEN is a string of character data consisting of standard name characters NMTOKENSIndicates that the attribute value is a whitespace-separated list of NMTOKEN values Enumerated ListApart from using the default types, you can also declare an enumerated list of possible values for the attribute
Attribute Value DeclarationsHas a default value
Has a fixed value
Specifying Multiple Attributes
Try It OutMaking Contact Part 3
Jeff is a developer & author for Beginning XML 4th edition 2006 Wiley Publishing.Jeff loves XML!*Built-in entities*Character entities*General entities*Parameter entities
Built-in Entities&the & characterthe > character'the character"the character
Jeff is a developer & author for Beginning XML 4th edition 2006 Wiley Publishing.Jeff loves XML!
Character Entitiesthe character
External: SYSTEM or PUBLIC
Developing DTDsFor example, the following is valid:
The following is not valid:
source CDATA #IMPLIED>
DTD LimitationsSome limitations of DTDs include:
Differences between DTD syntax and XML syntaxPoor support for XML namespacesPoor data typingLimited content model descriptions
Try It OutMaking Contact Part 3 Making Contact Part 4 Making Contact Part 5
Other Validating ToolsXSDSome of the restrictions with DTDs are resolved using XML Schema Definitions.Use XML syntax. More powerful than DTD filesExample: http://www.uniprot.org/docs/uniprot.xsdRELAX NGSimple and easy to learnIt supports XML Schema datatypesTwo syntaxes: XML and compact syntax
*A DTD fixes a schema for any XML document that adheres to it..it enforces a standard
For example in the sample document, the DTD forces you to include the persons middle name.
**Note the syntactical elements of the DTD: The !DOCTYPE declaration is required. The [ bracket tells the parser that this is an internal subset declaration and the actual DTD is ended by the ].*Forget codeplot, use http://www.validome.org/xml/validate/*This is part of students first attempt to create a DTD. Have them acknowledge that this is an internal DTD. They will be doing an external one next.Perhaps demo this online for them and then cut them loose. *Get this activity done here. Do not wait because its going to take students a while to get use to the editor.*internal subset declarations are not the usual case. It is more common to put the DTD into some kind of external file. This can be done in either of two ways.
The SYSTEM keyword tells the validater to search the specified file for the DTD. In the three example on this slide, you are asking it to search, respectively a file located on your local computer, a file found on the internet and finally a file in exactly the same directory that you are working in. These are all called external subset declarations
Optionally, you can refer to both an external and an internal DTD. In the first two examples the [..] are there to indicate that a second internal DTD is present*As the keyword might indicate, Public DTDs belong in some publically available file. The syntax of the PUBLIC declaration is
PUBLIC element-name Owner Class-description Language Version (filename)**If you have time, get students to do some searching here for DTDs that require specialization, such as chemical analysis or measures.*Time to introduce regular expression*Recall that elements can contain other elements. Above the element name contains three other elements: the element content model. The element name must appear exactly as it does in the XML documents themselves including namespace prefixes!*Note that each element contained in an elements element content MUST be accompanied by its own description.
Note also that the element content that you declare determines the semantics of the corresponding XML documents*This is concatenation in a regular expression i.e. andwe might read this declaration as A name must contain a first and a Middle and a Last*This is the same symbol as used in regular expressions. It means or as in element location can contain either an address or a GPS
Keep in mind that you must also define the elements address and GPS *As in regular expression ands and ors can be mixed and matched.
Is this a Boolean algebra?*Note the Kleene closure operators*
Technically speaking, character data is called mixed content
This example embeds some XHTML in the document. It is also mixed content. Note the *
Notice that the declaration allows the desired text but doesnt force it*The empty element model REQUIRES that the element be emptyused here to embed a break
The any element model allows you to put any elements ( or none ) anywhere inside the given elementit basically does away with the concept of specifying a schema, avoid it.*Extensive exercise at the bottom of page 110. Well do part 2 later*Do part 2.*Talk about the rules for Attribute Names Have them acknowledge the following passage in the book.
As far as DTDs are concerned, namespace declarations, such as xmlns:contacts= "http://wiley.com/contacts", are also treated as attributes. Although the Namespace Recommendation insists that xmlns statements are declarations and not attributes, DTDs must declare them in an ATTLIST declaration if they are used. Again, this is because the W3C finalized the syntax for DTDs before the Namespace Recommendation was completed. *You can read more about these in the book.*Enumerated list, default valueFixed valueRequiredNote that for this declaration to be correct, unique ids must have been assigned to contact elements
**Page 128*Entities are things like & for ampersands. They *The built in is on the next slide*Can also use the hexadecimal representation for most of unicode *These are macro substitutions, they work just like the built-in entities. Replacement text can be any well formed XML fragment.I.e. jeff.txt replaces jeff-description where ever foundMust be declared within the DTDPoint out the two ways they can declare these: in the declaration and refer to an external file using the SYSTEM keywordReference the entity in the same you reference other entities*Parameter entities can contain other DTD textBuild DTDs from multiple files, i.e. modular design.Make sure that they know that the space between the % NameDeclaration is intentional when defining. The % next to the NameDeclaration, like %NameDeclaration is to reference it. ***Pps128,136 and 140*