View
226
Download
0
Tags:
Embed Size (px)
Citation preview
XMLDTD
Transparency No. 1
XML Document Type Definitions (DTDs)
XML DTD
Transparency No. 2
DTD - Table of Contents
Introduction to DTD An introduction to the XML Document Type Definition.
DTD - XML Building Blocks What XML building blocks are defined in a DTD. DTD Elements How to define the elements of an XML document
using DTD. DTD Attributes How to define the legal attributes of XML elements
using DTD. DTD Entities How to define XML entities using DTD.
XML DTD
Transparency No. 3
Introduction to DTD
The purpose of a DTD is to define the legal building blocks of an XML document.
It defines the document structure with a list of legal elements.
A DTD can be declared inline in your XML document, or as an external reference.
XML DTD
Transparency No. 4
Internal DTD
This is an XML document with a Document Type Definition: <?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Tove</to> <from>Jani</from> <heading>Remind
er</heading> <body>Don't forget me this weekend!</body> </note>
The DTD is interpreted like this:!ELEMENT note (in line 2) defines the element "note" as having four elements: "to,from,heading,body". and so on.....
XML DTD
Transparency No. 5
External DTD
This is the same XML document with an external DTD:
<?xml version="1.0"?><!DOCTYPE note SYSTEM "note.dtd"><note> <to>Tove</to> <from>Jani</from><heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
XML DTD
Transparency No. 6
note.dtd
This is a copy of the file "note.dtd" containing the Document Type Definition:
<?xml version="1.0"?> <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>
XML DTD
Transparency No. 7
Why use a DTD?
XML provides an application independent way of sharing data.
With a DTD, independent groups of people can agree to use a common DTD for interchanging data.
Your application can use a standard DTD to verify that data that you receive from the outside world is valid. You can also use a DTD to verify your own data.
XML DTD
Transparency No. 8
2.8 Document Type Declaration (cont’d)
Document Type Definition
[28] doctypedecl ::= '<!DOCTYPE' S Name ( S ExternalID)?
S? ('[' (markupdecl | DeclSep)* ']' S?)? '>'
[ VC: Root Element Type ]
[28a] DeclSep ::= PEReference | S[29] markupdecl ::= elementdecl | AttlistDecl | EntityDe
cl
| NotationDecl | PI | Comment
[ VC: Proper Declaration/PE Nesting ]
[ WFC: PEs in Internal Subset ]
XML DTD
Transparency No. 9
WFC: PEs in Internal Subset
Well-formedness constraint: PEs in Internal Subset In the internal DTD subset, parameter-entity references ca
n occur only where markup declarations can occur, not within markup declarations.
(This does not apply to references that occur in external parameter entities or to the external subset.)
Ex: the following is not well-formed! <?xml version="1.0" ?> <!DOCTYPE test SYSTEM "test.dtd" [ <!ENTITY WhatISaid "I said %YN;"> ]> <test/>
PE reference cannot appear here
In internal subset even YN is defined in test.dtd
XML DTD
Transparency No. 10
2.8 External Subset
Portions of the contents of the external subset or of external parameter entities may conditionally be ignored by using the conditional section construct; this is not allowed in the internal subset.
External Subset
[30] extSubset ::= TextDecl? extSubsetDecl
[31] extSubsetDecl ::= ( markupdecl | conditionalSect | DeclSep )*
XML DTD
Transparency No. 11
2.8 Example XML documents
An example of an XML document with a document type declaration:
<?xml version="1.0"?>
<!DOCTYPE greeting SYSTEM "hello.dtd">
<greeting>Hello, world!</greeting>The system identifier "hello.dtd" gives the URI of a DT
D for the document. The declarations can also be given locally, as in this e
xample:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE greeting [
<!ELEMENT greeting (#PCDATA)> ]>
<greeting>Hello, world!</greeting>
XML DTD
Transparency No. 12
2.9 Standalone Document Declaration
Standalone Document Declaration [32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no')
"'") | ('"' ('yes' | 'no') '"')) [ VC: Standalone Document Declaration ]
Example: <?xml version="1.0" standalone='yes'?>
XML DTD
Transparency No. 13
2.10 White Space and End-of_line Handling
White Space: special attribute xml:space used to indicate if (markup) s
paces should be preserved. <!ATTLIST poem xml:space (default | preserve) 'preserv
e'>
<e1 v1=“abc” v2 =“def” /> <e1 v1=“abc” v2 =“def”/>
Normalize End-of-line: #xD#xA --> #xA #D --> #xA before parsing
XML DTD
Transparency No. 14
2.12 Language Identification
A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document.
In valid documents, this attribute, like any other, must be declared if it is used.
The values of the attribute are language identifiers as defined by [IETF RFC 1766], "Tags for the Identification of Languages”.
Example:
xml:lang NMTOKEN #IMPLIED
<!ATTLIST poem xml:lang NMTOKEN 'fr'>
<!ATTLIST gloss xml:lang NMTOKEN 'en'>
<!ATTLIST note xml:lang NMTOKEN 'en'>
XML DTD
Transparency No. 15
2.12 Language Identifications
<p xml:lang="en">The quick brown fox jumps over the lazy dog.</p>
<p xml:lang="en-GB">What colour is it?</p>
<p xml:lang="en-US">What color is it?</p>
<sp who="Faust" desc='leise' xml:lang="de">
<l>Habe nun, ach! Philosophie,</l>
<l>Juristerei, und Medizin</l>
<l>und leider auch Theologie</l>
<l>durchaus studiert mit hei 絽 m Bem 'n.</l>
</sp>
XML DTD
Transparency No. 16
DTD - XML building blocks
The building blocks of XML documentsXML documents (and HTML documents) are made
up by the following building blocks: Elements, Tags, Attributes, Entities, PCDATA, and CDATA sections
This is a brief explanation of each of the building blocks:
XML DTD
Transparency No. 17
Elements
Elements are the main building blocks of both XML and HTML documents.
Examples of HTML elements are "body" and "table". Examples of XML elements could be "note" and "m
essage".Elements can contain text, other elements, or be em
pty.Examples of empty HTML elements are "hr", "br" an
d "img".
XML DTD
Transparency No. 18
Tags
Tags are used to markup elements.A starting tag like <element_name> mark up the
beginning of an element, and an ending tag like </element_name> mark up the
end of an element.Examples:
A body element:
<body> body text in between</body>.A message element:
<message> some message in between</message>
XML DTD
Transparency No. 19
Attributes
Attributes provide extra information about elements.Attributes are placed inside the start tag of an elem
ent.Attributes come in name/value pairs. The following "img" element has an additional infor
mation about a source file:
<img src="computer.gif" />Notes:
The name of the element is "img". The name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is empty it is closed by a " /".
XML DTD
Transparency No. 20
PCDATA
PCDATA means parsed character data.Think of character data as the text found between th
e start tag and the end tag of an XML element.PCDATA is text that will be parsed by a parser.
Tags inside the text will be treated as markup and entities will be expanded, hence they should not appear pcdata.
Ex: <!ELEMENT section (#PCDATA)><section> abc <em> de </section>
XML DTD
Transparency No. 21
CDATA sections
CDATA also means character data.CDATA is text that will NOT be parsed by a parser.
Tags inside the text will NOT be treated as markup and entities will not be expanded.
Ex:
<section>abc <![CDATA[ &a; <em>def]]> g</section>
XML DTD
Transparency No. 22
Entities
Entities are used to define common text like macros. Entity references are references to entities.Most of you will know the HTML entity reference: " " t
hat is used to insert an extra space in an HTML document. Entities are expanded when a document is parsed by an XML
parser.The following entities are predefined in XML:Entity References Character
< <
> >
& &
" "
' '
XML DTD
Transparency No. 23
DTD - Elements
Declaring an ElementIn the DTD, XML elements are declared with an elem
ent declaration. An element declaration has the following syntax:
<!ELEMENT element-name element-content>
Types of element contents: EMPTY – no contents ANY -- no restriction on contents MIXED -- allow character data (character data only) or (character data + elements) ELEMENTs-ONLY -- allow elements only
XML DTD
Transparency No. 24
EMPTY elements
Elements with empty contentDeclared with the keyword EMPTY:
<!ELEMENT element-name EMPTY>
Example:<!ELEMENT img EMPTY>
Legal Instances: <img/> <img></img><img> </img>
XML DTD
Transparency No. 25
ANY Elements
Elements that can contain any combination of elements and text data.
Declared with the ‘ANY’ keyword<!ELEMENT name ANY >Example:<!ELEMENT E1 ANY>Legal instances:
<E1> <E2/> e2 <E3> fff </E3> … </E1> <E1> dddd <E1> <E1/>
XML DTD
Transparency No. 26
Elements with MIXED contents
Elements that can only contain text contents <!ELEMENT name (#PCDATA)> Elements allowing text as well as element contents <!ELEMENT E0 (#PCDATA | E1 | E2 … )* > Example: <!ELEMENT note (#PCDATA)>
<!ELEMENT em EMPTY> <!ELEMENT e1 (#PCDATA | note | em)* > Instances:
<e1> ddd <em/> cd <note>ttt</note> <em/> </e1>
XML DTD
Transparency No. 27
Elements that can contains element contents only
Issue: how to declare the possible sequences of content elements occurrences.
Solu: regular expressions over element namesDefinition:CP ::= (name | choice | seq ) (‘+’ | ‘*’ | ‘?’ )?choice ::= a list of two or more CPs separated
by ‘|’ and is enclosed by ‘(‘ and ‘)’.seq ::= a list of one or more CPs seprated by ‘,’ an
d is enclosed by ‘(‘ and ‘)’Element-Only elements:<!ELEMENT name CP – name (‘+’ | ‘*’ | ‘?’ )? >
XML DTD
Transparency No. 28
Recursive definition of CP, seq and choice:
Basis: if is a name, then , ?, +, * are CPs (content particl
e).closure:
if is an seq or a choice, then , ?, +, * are CPs. if 1, 2,… n (n > 1) are CPs, then (1 | 2 | … | n) is a choic
e. if 1, 2,… n (n > 0) are CPs, then (1 , 2 , … , n) is an seq.
is a children if is a CP but is not a name optionally followed by +, ? or *.
Examples of children: Illegal : <!ELEMENT e1 e2*>, <!ELEMENT e1 e2> Legal : <!ELEMENT e1 (e2)>,<… (e2+)>, <… (e2)?>
XML DTD
Transparency No. 29
More examples
<!ELEMENT note (to,from,heading,body)> <!ELEMENT note
(to, from, heading1 | heading2, body)> (X)
<!ELEMENT note (to, from, (heading1 | heading2), body)>
(0)<!ELEMENT E1 ( (E1, E2) | (E1, E3, E2)) > (x, 1-
ambiguous)Rewritten as
… (E1, (E2 | (E3,E2)))> (0)
XML DTD
Transparency No. 30
3.2 (cont’d)
Element Type Declaration
[45] elementdecl ::= '<!ELEMENT' S Name S
contentspec S? '>'
[ VC: Unique Element Type Declaration]
[46] contentspec ::= ‘EMPTY’ | ‘ANY’ | Mixed | children
Examples: <!ELEMENT br EMPTY> <!ELEMENT p (#PCDATA|emph)* > <!ELEMENT %name.para; %content.para; > <!ELEMENT container ANY>
XML DTD
Transparency No. 31
3.2.1 Element Content
An element type has element content when elements of that type must contain only child elements (no character data), optionally separated by white space (characters matching the nonterminal S).
In this case, the constraint includes a content model, a simple grammar governing the allowed types of the child elements and the order in which they are allowed to appear.
The grammar is built on content particles (cps), which consist of names, choice lists of content particles, or sequence lists of content particles:
XML DTD
Transparency No. 32
3.2.1 (cont’d)
Element-content Models[47] children ::= (choice | seq) ('?' | '*' | '+')?[48] cp ::= (Name | choice | seq) ('?' | '*' | '+')?[49] choice ::= '(' S? cp ( S? '|' S? cp )+ S? ')'[50] seq ::= '(' S? cp ( S? ',' S? cp )* S? ')'where each Name is the type of an element which m
ay appear as a child.Examples:<!ELEMENT spec (front, body, back?)><!ELEMENT div1 (head, (p | list | note)*, div2*)><!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*>Note: (x) <!ELEMENT spec body> (0) <!ELEMENT spec (body)>
XML DTD
Transparency No. 33
3.2.2 Mixed Content
Mixed-content Declaration
[51] Mixed ::= '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*'
| '(' S? '#PCDATA' S? ')'
Examples: <!ELEMENT p (#PCDATA|a|ul|b|i|em)*> <!ELEMENT p (#PCDATA | %font; | %phrase; | %special; |
%form;)* > <!ELEMENT b (#PCDATA)>
XML DTD
Transparency No. 34
Attribute Definition
Defined for the elements they belong to <!ELEMENT book (preface, toc, chapter+, index?) <!ATTLIST book title CDATA #REQUIRED> <!ATTLIST book isbn CDATA #IMPLIED>
Or <!ATTLIST book title CDATA #REQUIRED isbn CDATA #IMPLIED >
Format:<!ATTLIST elm-name (attr-name attr-type attr-default-value)+
> Atributes have a name, a type, a default-value and belong to an eleme
nt.
XML DTD
Transparency No. 35
Attribute types
type ExplanationCDATA The value is character data(eval|eval|..) The value must be an enumerated valu
eID The value is an unique id IDREF The value is the id of another elementIDREFS The value is a list of other idsNMTOKENThe value is a valid XML name tokenNMTOKENS The value is a list of valid XML name tokensENTITY The value is an entity ENTITIES The value is a list of entitiesNOTATION The value is a name of a notation
XML DTD
Transparency No. 36
Attribute-default value
Value Explanation“v1” The attribute has a default
value “v1”#REQUIRED The attribute value must be
included in the element#IMPLIED The attribute does not have
to be included#FIXED “value”The attribute value is fixed
XML DTD
Transparency No. 37
Attribute Examples
DTD example: <!ELEMENT square EMPTY> <!ATTLIST square width CDATA "0"> XML example: <square width="100"></square>
Default attribute value Syntax: <!ATTLIST elm-name attribute-name CDATA "default-va
lue"> DTD example: <!ATTLIST payment type CDATA "check"> XML example: <payment type="check"> equ.to. <payment >
XML DTD
Transparency No. 38
Implied attribute
Syntax: <!ATTLIST elm-name attribute-name
attribute-type #IMPLIED>example:
<!ATTLIST contact fax CDATA #IMPLIED>
instance: <contact fax="555-667788">
XML DTD
Transparency No. 39
Required attributeSyntax: <!ATTLIST elm-name attr-name attr-type #REQUIRE
D>DTD example: <!ATTLIST person number CDATA #REQUIRED> XML example: <person number="5677"> <person> (x)
XML DTD
Transparency No. 40
Fixed attribute valueSyntax: <!ATTLIST elm-name attr-name attr-type #FIXED "va
lue">
DTD example: <!ATTLIST sender company CDATA #FIXED "Microsof
t">
XML example: <sender company="Microsoft"> equ.to <sender>
XML DTD
Transparency No. 41
Enumerated attribute valuesSyntax: <!ATTLIST elm-name attr-name (v1|v2|..) def-valu
e>DTD example: <!ATTLIST payment type (check|cash) "cash"> <!ATTLIST light color (red | green |yellow) #IMPLIED>XML example: <payment type="check"> or <payment type="cas
h"><light color=‘red’> or <light>
XML DTD
Transparency No. 42
3.3 Attribute-List Declarations
Attribute-list Declaration
[52] AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '>'
[53] AttDef ::= S Name S AttType S DefaultDecl
XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types.
XML DTD
Transparency No. 43
3.3.1 Attribute Types
Attribute Types
[54] AttType ::= StringType | TokenizedType |
EnumeratedType
[55] StringType ::= 'CDATA'
[56] TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS’
| 'ENTITY’ | 'ENTITIES' | 'NMTOKEN’ | 'NMTOKENS’ ID, IDREF and IDREFS for cross references ENTITY for referring to external unparsed objects NMTOKEN restrict attvalue to be a Nmtoken.
XML DTD
Transparency No. 44
3.3.1 (cont’d) Example of Entity type usage
<! DOCTYPE BookCategory [...
<!ATTLIST BOOK cover ENTITY #REQUIRED>
<!NOTATION PDF SYSTEM “http://www.adobe.com/…”>
…
<!ENTITY cover1 SYSTEM
http://www.host/thebookCover.pdf NDATA PDF>
…
]> ...
<BookCategory>
…
<BOOK cover=“cover1”>
… </BOOK>
XML DTD
Transparency No. 45
3.3.1 Enumerated Attribute Types
Enumerated Attribute Types
[57] EnumeratedType ::= NotationType | Enumeration
[58] NotationType ::= 'NOTATION' S
'(' S? Name (S? '|' S? Name)* S? ')'
[59] Enumeration ::= '(' S? Nmtoken
(S? '|' S? Nmtoken)* S? ')'
A NOTATION attribute identifies a notation, declare
d in the DTD with associated system and/or public identifiers, to be used in interpreting the element to which the attribute is attached.
XML DTD
Transparency No. 46
3.3.2 Attribute Defaults
An attribute declaration provides information on whether the attribute's presence is required, and if not, how an XML processor should react if a declared attribute is absent in a document.
Attribute Defaults[60] DefaultDecl ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue)Examples:<!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED> <!ATTLIST list type (bullets|ordered|glossary) "ordered"> <!ATTLIST file format NOTATION (ps | pdf | word ) #REQUIRED <!ATTLIST form method CDATA #FIXED "POST">
XML DTD
Transparency No. 47
3.3.3 Attribute-value normalization
When: after end-of-line processing but before passed to app. 0. End-of-line processing ( 
 
 
 
 
)
Steps: initially nv=“” // normalized value1. Repeat until end of input.
character reference => append the referenced character to the normalized value
entity reference => recursively apply step 1 to the replacement text of the entity.
white space character (#x20, #xD, #xA, #x9) => append a space character (#x20) to the normalized value.
O/w (other character ) =>append the character to the normalized value.
2. If not CDATA type => removing leading/trailing spaces and replace sequences of space (#x20) characters by a single space (#x20) character
Notes : 1. char and entity references are not treated equal.
2. White spaces are normalized to space.
XML DTD
Transparency No. 48
Examples
<!ENTITY d "
"> <!ENTITY a "
"> <!ENTITY da "
">
Attribute specification
a is NMTOKENS A is CDATA
a=“
xyz”
xyz #x20 #x20 x y z
a="&d;&d;A&a;&a;B&da;"
A #x20 B #x20 #x20 A #x20 #x20 B #x20 #x20
a= "

A

B
"
#xD #xD A #xA #xA B #xD #xA
#xD #xD A #xA #xA B #xD #xD
XML DTD
Transparency No. 49
DTD-Entities
Entities used to define shortcuts to common text, like macro
s in programming languages. Entity references are references to entities.
If name is an entity [name], then &name; (or %name; but not both) is its reference
Entities can be declared internal ( contents in the same doc as its declaration) or external (contents external to its declaration)
XML DTD
Transparency No. 50
Internal Entity Declaration
Syntax: <!ENTITY entity-name "entity-value"> DTD Example:<!ENTITY p1 “Peter"> <!ENTITY birthday “2/12/2000">XML example:<baby>&p1; &birthday;</baby>Equ. To.<baby> Peter 2/12/2000 </baby>
XML DTD
Transparency No. 51
External Entity Declaration
Syntax: <!ENTITY entity-name SYSTEM "URI/URL"> DTD Example:<!ENTITY writer SYSTEM "http://www.xml.com/enti
ties/writer.xml"> <!ENTITY copyright SYSTEM "http://www.xml.com/
entities/copyright.xml">XML example:<author>&writer;©right;</author>
XML DTD
Transparency No. 52
Structure of XML Documents
Logical Structure Elements Character data
Physical Structure Entities
Document
UnitSub-unit
Document entity External parsed entity
External unparsed entity
XML DTD
Transparency No. 53
4. Physical Structures
An XML document may consist of one or many storage units called entities; have content identified by name.
Each XML document has one entity called the document entity, the starting entity for the XML processor and may contain
the whole document. the only kind of entities without name.
Entities may be either parsed or unparsed. this text is considered an integral part of the document.
XML DTD
Transparency No. 54
Classification of entities
parsed v.s. unparsd entities
general v.s. parameter entities
external v.s. internal entities
XML DTD
Transparency No. 55
Parsed entity and unparsed entity
An unparsed entity is a resource whose contents are not to be processed by XML processor. has an associated notation, identified by name. must be an external entity (with publicId or SystemId) referenced by [entity] name (instead of entity reference) o
ccurring only in the value of ENTITY or ENTITIES attributes.
Parsed entities are entities whose contents need to be processed by XML Processor. reference by using entity references. contents are referred to as its replacement text;
XML DTD
Transparency No. 56
Examples
external general parsed entity. <!ENTITY legal SYSTEM "http://www.example.com/legal.xml">
internal general parsed entity <!ENTITY nccu “National Chengchi University”>
internal parameter parsed entity <!ENTITY % colorValues “(male | female)”>
external general unparsed entity. <!NOTATION PDF SYSTEM “http://www.adobe.com/pdf”> <!ENTITY cover1 SYSTEM
http://www.host/book1cover.pdf NDATA PDF>
Note: Notation and unparsed entity are rarely used in practice.
XML DTD
Transparency No. 57
General entity and parameter entity
Parameter entities are parsed entities for use within the DTD. referenced by the form: %name;
General entities are entities for use within the document content. sometimes simply called entity when this leads to no
ambiguity. reference by the form: &name;
Comparisons: use different syntax in DTD for definition. use different forms of references recognized in different contexts.
XML DTD
Transparency No. 58
Examples
external general parsed entity. <!ENTITY legal SYSTEM "http://www.example.com/legal.xml">
internal general parsed entity <!ENTITY nccu “National Chengchi University”>
internal parameter parsed entity <!ENTITY % colorValues “(male | female)”>
external parameter parsed entity. <!ENTITY % html40 SYSTEM "http://www.w3c.org/html40.dtd">
Notes: all parameter entities are parsed entities Parameter entities generally contain only grammar inform
ation.
XML DTD
Transparency No. 59
4.1 Character and Entity References
A character reference refers to a specific character in the ISO/IEC 10646 character set.
Character Reference
[66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'
XML DTD
Transparency No. 60
4.1 Character and Entity References (cont’d)
Entity Reference
[67] Reference ::= EntityRef | CharRef
[68] EntityRef ::= '&' Name ';'
[69] PEReference ::= '%' Name ';’
XML DTD
Transparency No. 61
4.2 Entity Declarations
Entity Declaration
[70] EntityDecl ::= GEDecl | PEDecl
[71] GEDecl ::= '<!ENTITY' S Name S EntityDef S? '>'
[72] PEDecl ::= '<!ENTITY' S '%' S Name S PEDef S? '>'
[73] EntityDef ::= EntityValue[9] | ( ExternalID NDataDecl?)
[74] PEDef ::= EntityValue | ExternalID
notes:
1. General entities can only be referenced at non-DTD region
2. Parameter entities are referenced at DTD
XML DTD
Transparency No. 62
Literals[9] EntityValue ::= ‘”’ ([^%&”] | PEReference | Reference)* ‘”’ | “’” ([^%&'] | PEReference | Reference)* “’”
[10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"[11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'") [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"[13] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]
XML DTD
Transparency No. 63
4.2.1 Internal Entities
Entities defined by EntityValue is called an internal entity. the content of the entity is given in the declaration. no separate physical storage object, Some processing of entity and character references in the
literal entity value may be required to produce the correct replacement text.
An internal entity is always a parsed entity.
Example of an internal entity declaration:
<!ENTITY Pub-Status "This is a pre-release of the
specification.">
XML DTD
Transparency No. 64
4.2.2 External Entities
If the entity is not internal, it is an external entity. External Entity Declaration[75] ExternalID ::= 'SYSTEM' S SystemLiteral[9]
| 'PUBLIC' S PubidLiteral S SystemLiteral [76] NDataDecl ::= S 'NDATA' S Name [ VC: Notation Declared ]
If the NDataDecl is present, this is a general unparsed entity; otherwise it is a parsed entity.
[VC: Notation Declared]: The Name must match the declared name of a notation.
SystemLiteral is called the entity’ system identifier, which is a URI.
PubidLiteral is called the entity’s public identifier, which the XML processor may use to produce an alternative URI.
XML DTD
Transparency No. 65
Examples of external entity declaration
<!ENTITY open-hatch SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml">
<!ENTITY open-hatch PUBLIC
"-//Textuality//TEXT Standard open-hatch boilerplate//EN" "http://www.textuality.com/boilerplate/OpenHatch.xml” >
<!ENTITY hatch-pic SYSTEM "../grafix/OpenHatch.gif"
NDATA gif >
XML DTD
Transparency No. 66
4.3 Parsed Entities 4.3.1 The Text Declaration
External parsed entities may each begin with a text declaration.
Text Declaration
[77] TextDecl ::= '<?xml' VersionInfo? EncodingDecl S? '?>'
Notes: must appear at the beginning of an external parsed entity. The text declaration must be provided literally, not by refe
rence to a parsed entity.
XML DTD
Transparency No. 67
4.3.2 Well-formed Parsed Entities
The document entity is well-formed if it matches the production labeled document[1] .
An external general parsed entity is well-formed if it matches the production labeled extParsedEnt[78] .
All external parameter entities are well-formed by definition.
Well-Formed External Parsed Entity
[78] extParsedEnt ::= TextDecl? content
XML DTD
Transparency No. 68
4.3.2 Well-Formed Parsed Entities (cont’d)
An internal general parsed entity is well-formed if its replacement text matches the production labeled content[43].
All internal parameter entities are well-formed by definition.
A consequence of well-formedness in entities: the logical and physical structures in an XML document a
re properly nested; i.e., no start-tag, end-tag, empty-element tag, element, comme
nt, processing instruction, character reference, or entity reference can begin in one entity and end in another.
XML DTD
Transparency No. 69
4.3.3 Character Encoding in Entities
External parsed entities may use different encoding for their characters. All XML processors must support UTF-8 and UTF-16. must declare encoding in text declaration for encoding ot
her than UTF-8 or UTF-16. Encoding Declaration[80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'"EncName "'" ) [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* /* Encoding name contains only Latin characters */Examples: <?xml encoding='UTF-8'?> <?xml encoding=’Big-5'?>
XML DTD
Transparency No. 70
4.4 XML Processor Treatment of Entities and References
The contexts in which character references, entity references, and unparsed entity names might appear:
1. Reference in Content : as a reference in content. EX: <p>He said: &WhatHeSaid; </p>
2. Reference in Attribute Value : as a reference within either the value of an attribute in a start-tag, or a default value in an attribute declaration; corresponds to the nonterminal AttValue. ex: <A HREF='&home;/start.html'> ex: <!ATTLIST A HREF CDATA ‘&home;/index.html’>
3. Occurs as Attribute Value: as a Name, not a reference, appearing as the value of an attribute declared as type ENTITY, or ENTITIES.
XML DTD
Transparency No. 71
4.4 Context in which entities or character reference may occur ex: <!ENTITY Apicture SYSTEM "http://www.antarctica.net/mypic.gif” NDATA GIF> <!ATTLIST World src ENTITY #REQUIRED> … <World src=’Apicture'>
4. Reference in Entity Value : as a reference in rule EntityValue. ex: <!ENTITY PLX "Perl &heart; XML!">
5. Reference in DTD : as a reference in internal or external subsets of the DTD, but outside of an EntityValue or AttValue. ex: <!ELEMENT %Para; (#PCDATA|%ParaBits;)*> %manyElements; <!ATTLIST … >
XML DTD
Transparency No. 72
4.4 summary on entities
internal v.s. external: internal ==> content given in the declaration external ==> content obtained outside the declaration ex1: <!ENTITY Pub-Status “this is …”> ex2: <!ENTITY % book-format SYSTEM “http://…/book.dtd” > ex3: <!ENTITY book1 SYSTEM “bybook.doc” NDATA WORD>
general v.s. parameter entities: general ==> used in document instance parameter ==> used in document declaration(DTD) ex: ex1==> general; ex2=> PE
parsed v.s. unparsed entities: parsed => XML processor will parse it ==> ex1, ex2 unparsed => XML processopr need’t parse it. ==> ex3 note: unparsed entities must be general and external.
XML DTD
Transparency No. 73
possible types of entities:
There are only 5 kinds of entities Since unparsed entities must be external and general.
Internal parsed general entities
Internal parsed parameter entities
external parsed general entities
external parsed parameter entities
external unparsed general entities
internal unparsed ---- entities (x) All internal entities are parsed entities.
external unparsed parameter entities (x) All parameter entities are parsed.
XML DTD
Transparency No. 74
4.5 Construction of Internal Entity Replacement Text
Two forms of the entity's value of an internal entity. literal entity value : the quoted string actually present in th
e entity declaration, corresponding to the non-terminal EntityValue.
replacement text : the content of the entity, after replacement of character references and parameter-entity references.
Notes: 1. General-entity references in literal entity value are not ex
panded to produce replacement text . 2. It is the replacement text of the entity that is substituted
for every occurrence of its entity reference.
XML DTD
Transparency No. 75
4.5 Example
<!ENTITY % pub "Éditions Gallimard" >
<!ENTITY rights "All rights reserved" >
<!ENTITY book "La Peste: Albert Camus,
© 1947 %pub;. &rights;" >
=> Entity book has replacement text:
“La Peste: Albert Camus,
© 1947 Éditions Gallimard. &rights;”
Note: No forward reference for PE is permitted. Hence entity ‘book’ could not be put before ‘pub’ entity.
XML DTD
Transparency No. 76
4.4.2 IncludedAn entity is included when its replacement text is
retrieved and processed,in place of the reference itself, as though it were part of the document at the location the reference was recognized. The replacement text may contain both character data and
(except for parameter entities) markup, which must be recognized in the usual way,
ex: <!ENTITY AC "The &W3C; Advisory Council"> <!ENTITY W3C "WWW Consortium"> ==>”&AC;” ==> “The &W3C; Advisory Council” ==> “The WWW Consortium Advisory Council”.
XML DTD
Transparency No. 77
4.4.5 include in literal
Same as Included except that a single or double quote character in the replacement text
is always treated as a normal data character and will not terminate the literal.
Example: this is well-formed: <!ENTITY % YN '"Yes"' > <!ENTITY WhatHeSaid "He said %YN;" >
while this is not: <!ENTITY EndAttr "27'" > <element attribute='a-&EndAttr;>
XML DTD
Transparency No. 78
4.4.8 included as PE
same as ‘included as literal’ butthe replacement text is enlarged by the attachment
of one leading and one following space (#x20) character.
ex: pe1 => [red | gree | blue] <!ATTLIST light color (yellow%pe1;) “yellow” >
XML DTD
Transparency No. 79
Parameter
entity Internal general
External Parsed general
Unparsed Character
Reference in content
Not Rec. (N.R.)
Included Included if validating Forbidden Included
Ref in Attr value
N.R. Included in literal
Forbidden Forbidden Included
Occurs as Attr value
N.R. Forbidden Forbidden Notify N.R.
Ref in Entity value
Included in Literal
Bypassed Bypassed Forbidden Included
Ref. in DTD
Included as PE
Forbidden Forbidden Forbidden Forbidden
4.4 XML Processor Treatment of Entities and References
XML DTD
Transparency No. 80
4.6 Predefined Entities
Entity and character references can both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (amp, lt, gt, apos, quot) is specified for this p
urpose. Numeric character references may also be used; they are expanded i
mmediately when recognized and must be treated as character data, so the numeric character references "<" and "&" may be use
d to escape < and & when they occur in character data.
1. <!ENTITY lt "&#60;"> // < double escaping required for
2. <!ENTITY amp "&#38;"> // & well-formed replacement text
3. <!ENTITY gt ">"> // > double escaping harmless but
4. <!ENTITY apos "'"> // ‘ not needed
5. <!ENTITY quot """> // “
ex: The string "AT&T;” ==> "AT--&T;" ==> “AT&--T;”.
If Define 2. as “& => AT&T;” ==> “AT--&T;” ==> err.
XML DTD
Transparency No. 81
4.7 Notation Declarations
Notations identify by name the format of unparsed entities e.g., GIF, JPEG, DOC,BMP,…
Notation Declarations
[82] NotationDecl ::= '<!NOTATION' S Name S (ExternalID |
PublicID) S? '>'
[83] PublicID ::= 'PUBLIC' S PubidLiteral
4.8 Document Entityserves as the root of the entity tree and a starting-point for a
n XML processor. unlike other entities, the document entity has no name and
might well appear on a processor input stream without any identification at all.
XML DTD
Transparency No. 82
6. Grammar Notation (EBNF)
#xN[a-zA-Z], [#xN-#xN], [acg][^a-z][^abc]“string”, ‘STRING’ [vc: …. ](expression) [wfc: …. ]A?A B /* Comment */A | BA-B A+A*
XML DTD
Transparency No. 83
Appendix D. Expansion of Entity and Character References
<!ENTITY example "<p>An ampersand (&#38;) may be escaped numerically (&#38;#38;) or with a general entity (&amp;).</p>" >
==> ENTITY example has value(replacement text):
<p>An ampersand (&) may be escaped numerically (&#38;) or with a general entity (&amp;).</p>
A reference in the document to “&example;” cause the text to be reparsed: ==>
An ampersand (&) may be escaped numerically (&) or with a general entity (&).
XML DTD
Transparency No. 84
D. More complex example
1 <?xml version='1.0'?> 2 <!DOCTYPE test [ 3 <!ELEMENT test (#PCDATA) > 4 <!ENTITY % xx '%zz;'> 5 <!ENTITY % zz '<!ENTITY tricky "error-prone" >' > 6 %xx; 7 ]> 8 <test>This sample shows a &tricky; method.</test>line4 => xx has value “%zz;”line5 => zz has value “<!ENTITY trickey “error-prone”>”line6 => %xx; => %zz; => <!ENTITY trickey “error-prone”> d
eclaredline 8 => element test has content: “This sample shows a error-prone method.”
XML DTD
Transparency No. 85
3.4 Conditional Sections
Conditional sections are portions of the document type declaration external subset which are included in, or excluded from, the logical structure of the DTD based on the keyword which governs them.
Conditional Section
[61] conditionalSect ::= includeSect | ignoreSect
[62] includeSect ::= '<![' S? 'INCLUDE' S? '['
extSubsetDecl ']]>'
[63] ignoreSect ::= '<![' S? 'IGNORE' S? '[’
ignoreSectContents* ']]>'
[64] ignoreSectContents ::= Ignore ('<!['
ignoreSectContents ']]>' Ignore)*
[65] Ignore ::= Char* - (Char* ('<![' | ']]>') Char*)
XML DTD
Transparency No. 86
3.4 Conditional Sections
Example:<!ENTITY % draft 'INCLUDE' ><!ENTITY % final 'IGNORE' >
<![%draft;[<!ELEMENT book (comments*, title, body,
supplements?)> ]]>
<![%final;[<!ELEMENT book (title, body, supplements?)> ]]>