31
XML and friends Part 1 - XML and DTD ELAG 2001 workshop 8 Jan Erik Kofoed © BIBSYS Library Automation

XML and friends Part 1 - XML and DTD ELAG 2001 workshop 8 Jan Erik Kofoed © BIBSYS Library Automation

Embed Size (px)

Citation preview

XML and friendsPart 1 - XML and DTD

ELAG 2001workshop 8

Jan Erik Kofoed ©BIBSYS Library Automation

XML © Jan Erik Kofoed 2001 2

XML – eXtensible Markup Language

• Describing content

• The grammar of the language is given, but ”any” word is allowed

• Readable for both man and machine

• Generally usable as meta language

• Uses tags for describing content

XML © Jan Erik Kofoed 2001 3

SGML and XML

• SGML – Standard Generalized Markup Language, ISO 8879:1986(E)

• XML – Extensible Markup Language 1.0 2nd Ed., W3C Recommendation 06.10.2000

• XML is compatible with SGML• SGML is of many considered as difficult and

expensive• XML is designed to be easy to implement and to

operate together with SGML and HTML

XML © Jan Erik Kofoed 2001 4

Design goals for XML

1. XML shall be straight forwardly useable over the internet.

2. XML shall support a wide variety of applications.

3. XML shall be compatible with SGML.

4. It shall be easy to write programs which process XML documents.

5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

6. XML documents should be human-legible and reasonably clear.

7. The XML design should be prepared quickly.

8. The design of XML shall be formal and concise.

9. XML documents shall be easy to create.

10. Terseness in XML markup is of minimal importance.

XML © Jan Erik Kofoed 2001 5

HTML and XML

• HTML – HyperText Markup Language 4.01,W3C Recommendation 24.12.1999.– Now replaced by:

• XHTML - Extensible HyperText Language 1.0,W3C Recommendation 26.01.2000.

• Both XML and HTML is compatible with SGML.• HTML describes presentation.• XML describes content.• XHTML is HTML with XML syntax.

XML © Jan Erik Kofoed 2001 6

HTML shows layout

<html> <head> <title>Book</title> </head> <body> <p> <b>Hamsun, Knut:</b> Markens grøde. <i>Oslo, Aschehoug, 1948</i> </p> </body></html>

XML © Jan Erik Kofoed 2001 7

XML is marking the content

<?xml version="1.0" encoding="ISO-8859-1"?><bok> <forfatter>Hamsun, Knut</forfatter> <tittel>Markens grøde</tittel> <utgitt> <sted>Oslo</sted> <forlag>Aschehoug</forlag> <år>1948</år> </utgitt></bok>

XML © Jan Erik Kofoed 2001 8

One simple document

<catalogue>

</catalogue>

<person>

</person>

<name> Åse Østby </name>

<address> <street>Furuveien 8</street> <postnumber>3721</postnumber> <city>Skogheim</city> </address>

XML © Jan Erik Kofoed 2001 9

Remember to add encoding!

<catalogue>

</catalogue>

<person>

</person>

<name> Åse Østby </name>

<address> <street>Furuveien 8</street> <postnumber>3721</postnumber> <city>Skogheim</city> </address>

<?xml version=”1.0” encoding=”ISO-8859-1”?>

XML © Jan Erik Kofoed 2001 10

Architecture of XML-documents

• Processing instruction <? PI ?>• Element <element>content</element>

– Empty element <element />

• Attribute <element attribute=”value”>• Comment<!-- comment -->• Entity &entity;• CDATA <![CDATA[<this is not an element>]]>• DTD <!DOCTYPE name .......... >

XML © Jan Erik Kofoed 2001 11

Example

<?xml version="1.0" encoding="ISO-8859-1" standalone =”no”?><!DOCTYPE book SYSTEM ”book.dtd”><!– Here are books by Hamsun: --><![CDATA[Element <book> is the root element. ]]><book id=”133” language=”nor”> <author>Hamsun, Knut</author> <title>Markens grøde</title> <published> <place>Kristiania</place> <publisher>Gyldendal</publisher> <year>1917</year> </published> <annotation>&lt;Dedication &quot;Til Marie&quot; on the title page. &gt;</annotation> </book>

XML © Jan Erik Kofoed 2001 12

Presented i MS IE<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?><!-- Here are books by Hamsun:   --><book id="133" language="nor"><![CDATA[ Element <book> is the root element.   ]]> <author>Hamsun, Knut</author> <title>Markens grøde</title> <published>  <place>Kristiania</place> <publisher>Gyldendal</publisher>   <year>1917</year>   </published> <annotation><Dedication "Til Marie" on the title page.></annotation></book>

XML © Jan Erik Kofoed 2001 13

Two types of XML documents

• Well formed XML– Must follow certain rules

• Valid XML– Must be well formed– Must follow rules given in a DTD,

Document Type Definition

XML © Jan Erik Kofoed 2001 14

Well formed XML1. The document must begin with a XML declaration.2. All elements that contains data must begin with a start and end

tag.3. Empty elements without end tag must end with: /> 4. One root element must span all other elements.5. Elements may be nested, but cannot be overlapped.6. Attribute values must be inside quotes: “ “7. The characters < and & must only be used to start tags and

entities.8. An element may not have two attributes with the same name.9. Comments and processing instructions may not appear inside

tags.

XML © Jan Erik Kofoed 2001 15

Rules for XML• Name of elements and attributes

– must start with a letter or _– can then contain letter, number, -, . or _– is case sensitive– may not start with xml, XML, Xml, xMl ...– non ASCII (“national”) letters are allowed

• Standard character set is UTF8 (Unicode)– use the encoding attribute.

XML © Jan Erik Kofoed 2001 16

Reserved attributes (1)

• xml:lang– language code. Defined in RFC-1766.

• xml:space– preserve Preserve space, tab and carriage

return – default The XML processor decides

how spaces shall be processed.

XML © Jan Erik Kofoed 2001 17

Reserved attributes (2)

• xml:link– simple one way pointer– document pointer to a member of a group– extended multiple and extended pointer– group group with pointers to documents

• xml:attribute– old-attribute new-attribute

switches attributes

XML © Jan Erik Kofoed 2001 18

Entities in XML

• Entities starts with & and ends with ;• &amp; &• &lt; <• &gt; >• &qout; ”• &apos; ’• &#xnnnn; character with Unicode value

nnnn

XML © Jan Erik Kofoed 2001 19

Document type definition DTD

• Rules for the structure of XML documents

• Defines names of elements and attributes

• Defines succession (order)

• Defines occurrence

• Defines type of attributes

• Defines default values for attributes

• Defines required elements and attributes

XML © Jan Erik Kofoed 2001 20

DTD: entities (1)

• General entity– <!ENTITY name ”string”>

– Ex.: <!ENTITY www ”world wide web”>

– <service>&www;</service>

• Parameter entity– <!ENTITY % name ”string”>

– Ex.: <!ENTITY % language ”lang CDATA #REQUIRED”>

– <!ATTLIST book %language;>

XML © Jan Erik Kofoed 2001 21

DTD: entities (2)

• External entities– <!ENTITY name SYSTEM uri>– <!ENTITY addresses SYSTEM

http://www.bibsys.no/addressbook.xml>– <addresslist>&addresses;</addresslist>

• Non-XML entity– <!ENTITY name SYSTEM uri NDATA type>– <!ENTITY picture SYSTEM ”wolf.jpg” NDATA jpeg>– <image src=”&picture;” />

• Notation– <!NOTATION name SYSTEM ekstern id>– <!NOTATION jpeg SYSTEM ” image/jpeg”>

XML © Jan Erik Kofoed 2001 22

DTD: ELEMENT

• <!ELEMENT name rule>– <!ELEMENT name ANY>– <!ELEMENT name EMPTY>– <!ELEMENT name (#PCDATA)>– <!ELEMENT name (name, name, ...)>– <!ELEMENT name (name | name | ...)>– <!ELEMENT name ((name | name), name)>

XML © Jan Erik Kofoed 2001 23

DTD: Definition of occurrences

• (none) Must occur exactly once.

• ? Can occur zero or once.

• + Must occur once or more.

• * Can occur zero or more.

XML © Jan Erik Kofoed 2001 24

DTD: attributes

• <!ATTLIST element attribute-name type default>• Example:

– <!ATTLIST book language CDATA ”nor”>

– <!ATTLIST book id CDATA #REQUIRED>

– <!ATTLIST car color (red | blue) ”red”>

– <!ATTLIST person sex (female | male) #IMPLIED>

– <!ATTLIST record year CDATA #FIXED ”2000”>

XML © Jan Erik Kofoed 2001 25

DTD: Attribute types

– CDATA String– (name | name | ...) List of values– ENTITY Defined entity– ENTITIES List of entities– ID Unique identifier– IDREF Reference to an ID– IDREFS List of ID references– NMTOKEN A word built from name

characters– NMTOKENS List of nmtokens– NOTATION Non-analyzed entities

XML © Jan Erik Kofoed 2001 26

DTD:Examples (1)

<!ELEMENT person EMPTY>

<!ATTLIST person

name CDATA #REQUIRED

number ID #REQUIRED

sex (M | K) #IMPLIED>

<person name=”Mary Hill”

number=”p12077137651”

sex=”M” />

XML © Jan Erik Kofoed 2001 27

DTD:Examples (2)

<NOTATION mpeg SYSTEM ”mpegplay.exe”>

<!NOTATION avi SYSTEM ”mediaplayer.exe”>

<!ELEMENT video (#PCDATA)>

<!ATTLIST video player NOTATION (mpeg | avi) #REQUIRED>

<video player=”avi”>Gold feber</video>

XML © Jan Erik Kofoed 2001 28

DTD:Examples (3)<?xml version="1.0" standalone="yes"?>

<!DOCTYPE DOCUMENT [

<!ELEMENT DOCUMENT (PERSON*)>

<!ELEMENT PERSON (#PCDATA)>

<!ATTLIST PERSON PNUMBER ID #REQUIRED>

<!ATTLIST PERSON FATHER IDREF #IMPLIED>

<!ATTLIST PERSON MOTHER IDREF #IMPLIED>

]>

<DOCUMENT>

<PERSON PNUMBER="a1">Susan</PERSON>

<PERSON PNUMBER="a2">Jack</PERSON>

<PERSON PNUMBER="a3" MOTHER="a1" FATHER="a2">Chelsea</PERSON>

<PERSON PNUMBER="a4" MOTHER="a1" FATHER="a2">David</PERSON>

</DOCUMENT>

XML © Jan Erik Kofoed 2001 29

DTD:Examples (4)

<!ELEMENT picture EMPTY>

<!ATTLIST picture source ENTITY #REQUIRED>

<!NOTATION JPEG PUBLIC”ISO/IEC 10918:1993//NOTATIONDigital Compression and Coding of Continous-toneStill Images (JPEG)//EN”>

<!ENTITY Jane SYSTEM ”jane003.jpg” NDATA JPEG>

<picture source=”Jane” />

XML © Jan Erik Kofoed 2001 30

DTD:Examples (5a)<?xml version="1.0" encoding="ISO-8859-1" ?>

<!ELEMENT book (author*, title, published*, note*)>

<!ATTLIST book id CDATA #REQUIRED>

<!ATTLIST book language CDATA #REQUIRED>

<!ELEMENT author (#PCDATA)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT published (place+, publisher*, year?)>

<!ELEMENT place (#PCDATA)>

<!ELEMENT publisher (#PCDATA)>

<!ELEMENT year (#PCDATA)>

<!ELEMENT note (#PCDATA)>

XML © Jan Erik Kofoed 2001 31

DTD:Examples (5b)<?xml version="1.0" encoding="ISO-8859-1" standalone ="no" ?>

<!DOCTYPE book SYSTEM "book.dtd">

<book id="133" language="nor">

<author>Hamsun, Knut</author>

<title>Markens grøde</title>

<published>

<place>Oslo</place>

<publisher>Aschehoug</publisher>

<year>1948</year>

</published>

</book>