Upload
felicia-underwood
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
XML
Marc NyssenVrije Universiteit Brussel
Medical Informatics
1st International Summer SchoolApplications of ICT in Biomedicine
August 5-10, 2002Dubrovnik, CROATIA
XML: COURSE CONTENTS
1. Introduction: what is XML? 2. Syntax rules 3. Xschema definitions 4. XML document formats 5. Extensible Stylesheet Language 6. Cascading Stylesheet7. XSL Formatting Objects (XSL-FO) 8. XML Data formatting 9. Document Object Model (DOM) 10. Simple API for XML (SAX) 11. The Broader view: semantic web 12. Examples 13. References
1. Introduction: what is XML?
1. Introduction: what is XML?
eXtensible Markup Language
subset of SGML (Standard Generalized Markup Language, 1986) iso srandard ISO 8879 (Charles F. Gotfarb)
GML (Generalized ...) °1969
markup: tags contain meta-information
extensible: define your own tags
with XML you can define specific markup languages
Strict separation: structure / layout
1. Introduction: what is XML? (2)Extensible Markup Language 1.0
Open solution: W3C Recommendation, Feb. 10th 1998
hardware independent (data exchange)
sofware independence
text based (human + computer readable)
strict syntax
International (unicode character set)
1. Introduction: what is XML? (3)Example:<?xml version=”1.0”?><patients>
<patfile nr=”952345”><name>
Frank Doo </name> <bdate>
<day>23</day><month>05</month><year>1958</year>
</bdate><diag>healthy</diag>
</patfile> </patients>
1. Introduction: what is XML? (4)Tree representation:
<?xml version=”1.0”?> <patients>
<patfile nr=”952345”> <name>
Frank Doo </name> <bdate>
<day>23</day><month>05</month><year>1958</year>
</bdate> <diag>healthy</diag>
</patfile> </patients>
1. Introduction: what is XML? (5)
XML is structure:
strict separation structure/layout self-describing data style sheet required
XML: single source:
1. Introduction: what is XML? (6)
XML is multimedia:
MathML: mathematics VoiceXML: speech
XML medical applications:
data exchange medical record storage Electronic prescriptions Summary records
1. Introduction: what is XML? (7)
XML is a number of annex technologies:
data rendering/formatting via style sheet Cascading style sheet eXtensible Stylesheet Language (XSLT)
data structuring and integrity via data description
Data processing via parsers (huge body of work available)
XFORMS
Some 93!!! languages (MathML, Xlink, Xpath, EbXML, ...)
1. Introduction: what is XML? (8)
XML support:
In web browsers great differences in supported featueres Mozilla (open software) does best job
Great variety of free tools available!
Java-based parsers
Active web sites (validation)
2. Syntax rules
2. Syntax rules Well formed XML documents comply to syntax rules
programming line:
<?xml version=”1.0”?> (other examples will follow) element:
<name>Frank Zappa</name> attributes:
<patient nr=”99858201”> comments:
<!-- Any text ... ... -->
2. Syntax rules (2)Using elements:
<name>Frank Zappa</name>
starting tag data ending tag
<empty></empty> no data: but strict!
<empty/> brief notation
2. Syntax rules (3)
all elements MUST have start and en tag
tags are case sensitive (Tag differs from tag)
elements must be nested cleanly: <a><b> data </b></a>
an XML document has a single root element
the order of the elements counts!
2. Syntax rules (4) attributes add extra information:
<date lastcorrect=”03Jan2002”>Mon Oct 21 1999</date>
attributes have a name and a value
lastcorrect name
03Jan2002 value
order of no importance except in:
<?xml version=”1.0” encoding=”UTF-8”>
2. Syntax rules (5)
Special characters via entity references: (5)
< (tag delimiter) < (less than)
> (tag delimiter) > (greater than)
& (ampersand) &
“ (double quotes) "
' (apostrophe) '
2. Syntax rules (6)
CDATA section: between <![CDATA[ and ]]>
Can contain any character data except ]]>
Example: <p> A sample XML code would be:
<![CDATA[
<?xml version=”1.0” ?>
<patients> .... </patients>
]]>
</p>
2. Syntax rules (7)The XML declaration: <?xml version=”1.0” ... ?>
Useful but not absolutely required
If there: on the very first line (no spaces in front)
version=”1.0” currently (2002) the only one (backw. Compat.)
encoding=”UTF-8” differerent encodings: ASCII, UNICODE, ISO-8859, ... (optional, default: “UNICODE”)
standalone=”yes” if “no”, application should read an external DTD, in another file (optional attribute, default: “no”)
2. Syntax rules (8)Names of elements and attributes:
should start with letter or _ (underscore)
cannot start with XML
then, succession of:
letters numbers _ (underscores) - (minus signs) . (full stops)
the : (colon) is reserved for namespaces spaces in tags and attributes: irrelevant spaces in elements are kept
2. Syntax rules (9)
Well formedness test:
Use a XML-capable browser (Netscape Navigator 6, Mozilla, IE5):
2. Syntax rules (10)
XPATH:
XML document is a tree structure (7 node types)
root node element nodes text nodes attibute nodes comment nodes processing instruction nodes namespace nodes
XPATH does not recognize CDATA
2. Syntax rules (11)
XPATH: syntax
Location path identifies set of nodes in a document
/ is document root// all descendants.. parent element
/patient/name/first
* wildcard
//patient[@born <= 1995 and @born >= 1990]
(used in xsl)
2. Syntax rules (12)
XLinks: syntax
can be simple point-to point (HTML) xlink:type=”simple”
<course xmlns:xlink=”http://www.w3.org/1999/xlink”
xlink:type =”simple”
xlink:href=”http://mnf.ac.be/course”> ...
several more types: extended, locator, arc, title, resource
several attributes: xlink:show xlink:actuate
3. XML Schema
3. XML schema structuring XML
define names for elements and attributes
Imposes order in which elements and children appear
Schema
elements
attributes
entities
3. XSchema (2)
Syntax:
XML Schema (W3C): XML Schema Definition Language XSD
3. Document Type Definitions (9)
Conformity checking of XSD in XML: validation
XMLSPY and other validating parsers (xmllint)
http://www.stg.brown.edu/service/xmlvalid/
3. XML schema (10)
Graphical representation
3. XSchema (11)
Summary
well formed XML: complies to all syntax rules
valid XML document: complies to XSchema
use tools such ax XML editors and validating parsers
3. XSchema (12)
Namespaces
distinguish between elements/attributes sharing same name
group related elements/attributes from 1 XML application for processing software
3. XSchema (13)Namespaces (2)
a namespace is defined by a Uniform Resource Identifier URI
looks like a URL but is just an identifier!
suppose you work with 2 XML docs in 1 app
<title>XML Course</title> and <title>Great Student</title>
to distinguish: associate each with a different “ name space”
3. Document Type Definitions (14)
Namespaces (3)
<crs:courselist xmlns:crs=”http://www.docarch.be/crs” xmlns:stu=”http://www.docarch.be/stu”>
<crs:title>XML</crs:title>
<stu:name>John</stu:name>
<stu:title>distinguished</stu:title>
</crs:courselist>
Prefix not necessary for 1 (default) namespace
3. XSchema (16)
XSD (XML Schema Definition):
<xs: element>
attributes: Name minOccurs: 0 .. x maxOccurs: 0 .. unbounded
<xs: simpleType> <xs: complexType> <xs: sequence>
3. XSchema (17)XSD: example
<?xml version=”1.0” encoding=”UTF-8”?>
<?xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema” elementFormDefault=”qualified”>
<xs:element name=”patientlist”>
<xs: complexType> <xs:sequence> <xs:element ref=”name” minoccurs=”0” maxOccurs=”unbounded”/> </xs:sequence>
</xs: complexType>
</xs:element>
3. Document Type Definitions (18)
XSD self-defined data types :
<xs:simpleType name=”date”>
<xs: restriction base=”xs:date”/>
</xs:simpleType>
4. XML document formats
4. XML document formatsDifferent stylesheet methods
CSS: Cascading Style Sheet: simple instructions determine layout, fonts, colors (CSS levels 1 and 2)
XSLT (eXtensible Stylesheet Language Transformations) also - XHTML: strict HTML
really strict <!DOCTYPE html public “ -//W3C//DTD XHTML 1.0 STRICT//EN”
“DTD/shtml1strict.dtd”> transitional
<!DOCTYPE html public “ -//W3C//DTD XHTML 1.0 TRANSITIONAL//EN” “DTD/shtml-1transitional.dtd”>
frameset<!DOCTYPE html public “ -//W3C//DTD XHTML 1.0 FRAMESET//EN”
“DTD/shtml1-frameset.dtd”>
4. XML document formats (2)
CSS: simplest, most supported by browsers
not XML related
XSL-T: more general, more complex fully XML-related
4. XML document formats (3)Docbook: (http://www.oasis-open.org/docbook/)
XML/SGML vocabulary for books/papers
Technical Committee
DocBook schema
Document Type Definition (DTD)
<!DOCTYPE book PUBLIC “-//Norman Walsh//DTD DocBk XML V4.2.1//EN” “ docbook/docbookx.dtd”>
Annex activities
Allows to write books in XML -> tools -> output styles
4. XML document formats (4)
The Dublin Core (http://purl.org/dc/)
Minimal set of publication items:15
Title, Author or Creator, Subject and Keywords,Description, Publisher, Contributor,
Type, Format, Resource Identifier, Date Source, Language, Relation, Coverage, Rights Management
5. Extensible Stylesheet Language (XSL)
5. Extensible Stylesheet Language (XSL)
XSL is 'client' based
XML- technology (vs. CSS: non-XML)
Two instances:
XSL-T : Transformations XSL-FO: Formatting Objects
5. Extensible Stylesheet Language(XSL)(2)
5. Extensible Stylesheet Language(XSL)(3)
XSL components:
XPATH XML path references nodes for processing
XSL-T XSL transformations
- to transform from XML to XML
- produce data presentation document
XSL-FO Formatting Objects: produce documents
6. Cascading Stylesheets (CSS)
6. Cascading Stylesheets (CSS)
Syntax: (non-XML)
element-match {formatting-item: value; ........ }
* {font-size: large} set large font for all elements
patient [nr=12345] {display: none} select on attribute value
diag {display: block ; text-align: center} text block display
diag {display: item-list } list (bullets or not)
diag {display: table} start a table, children:
rows and cells
6. Cascading Stylesheets (CSS) (2)
/* Defaults for whole doc */
patients {font-family: "Times New Roman" ; font-size 18pt}
/* name as header */
name { display: block ; text-align: center ; font- size:36pt ; font-weight:"bold"}
/* bdate as list */
day { display: list-item; list-style-type: decimal}
month { display: list-item; list-style-type: decimal}
year { display: list-item; list-style-type: decimal}
/* diagnosis */
diag { display: block ; text-align: center ; font-size: 22}
6. Cascading Stylesheets (CSS) (3) Example: XML file
<?xml version="1.0" standalone="yes"?>
<?xml-stylesheet type="text/css" href="patient.css"?>
<patients>
<patfile nr="A952345">
<name>Frank Doo</name>
<bdate><day>23</day> <month>05</month> <year>1958</year></bdate>
<address>Long Street 15, Hightown</address>
<diag>Jan 2002: healthy</diag>
<diag>May 2002: flue</diag>
<diag>July 2002: pain in the back</diag>
</patfile>
</patients>
6. Cascading Stylesheets (CSS) (4) Example: patient.css
/* Defaults for whole document */ patients {font-family: "Times New Roman"; font-size: 22pt}
/* name as header */ name { display: block; text-align: center; font-size: 30pt; font-weight:"bold"}
/* bdate as table */ bdate {display: block }
day {color: blue} month {color: green} year {color: blue}
/* address */ address { display: block; font-style: italic}
/* diagnosis */ diag { display: list-item ; text-indent: 2cm; list-style-position: inside; font-size: 18pt; color: red}
6. Cascading Stylesheets (CSS) (5)
Result: xml file, together with patient.css stylesheet
7. XSL Formatting Objects (XSL-FO)
7. XSL Formatting Objects (XSL-FO)
Documents consist of boxes
Block areas Inline areas Line areas Glyph areas
Master pages define margins dimension
XSLT-like syntax defines 'format processing'
8. XML Data formatting
8. XML Data formatting
Data formatting options:
client-side: XSL-T
server-side:
DOM (Document Object Model)
SAX (Simple API for XML)
8. XML Data formatting (2)
General model for data processing:
9. Document Object Model (DOM)
9. Document Object Model (DOM)
W3C recommendation: model to store hiërarchical documents in memory
the whole document is in memory, we have random access
ideal for document editing, data retrieval, navigation
disadvantage 1: speed
disadvantage 2: memory resource
9. Document Object Model (DOM)(2)
Document structure:
9. Document Object Model (DOM)(3)
DOM nodes:
document: parent of all nodes
elements: children: other nodes and text nodes; attributes
attributes
comment
CDATA: not parsed
processing instructions
document fragments
other types: entities, entity references, notations
10. Simple API for XML (SAX)
10. Simple API for XML (SAX)
Event-based:
parse the document when a match -> corresponding action
Non-official standard
good speed minimal memory requirement Platform independent: Java
10. Simple API for XML (SAX) (2)
Methodology:
write an appropriate event handler
get a SAX parser
link the event handler to the parser
parse and process as events are triggered
11. The Broader view: semantic web
11. The Broader view: semantic web
11. The Broader view: semantic web (2)
The semantic web: extension of the current web, in which information is given well-defined meaning, better enabling computers and people to work in cooperation (W3C)
searching: far better to dispose of semantic data
keywords alone too weak
HTML meta tags <meta content=”diagnosis”>
ad-hoc and insufficient
RDF: Resource Description Framework
11. The Broader view: semantic web (3) RDF: Resource Description Framework
XML encoding for resources
Each Description element contains: about attribute with URI
Children: property of resource
<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”>
<rdf:Description about=”http://mnf.ac.be/xml/”>
<author>Marc Nyssen</author>
<coursetype>lecture</coursetype>
</rdf:Description>
</rdf:RDF>
12. Examples
12. Examples patients+dtd.xml
<?xml version="1.0"?> <!DOCTYPE patients [ <!ELEMENT patients (patfile*)> <!ELEMENT patfile (name+, bdate+, diag+)> <!ATTLIST patfile nr ID #REQUIRED> <!ELEMENT name (#PCDATA)>
<!ELEMENT bdate (day+, month+, year+)> <!ELEMENT day (#PCDATA)> <!ELEMENT month (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT diag (#PCDATA)> ]>
<patients> <patfile nr="A952345"> <name>
Frank Doo </name> <bdate>
<day>23</day> <month>05</month> <year>1958</year>
</bdate> <diag>healthy</diag>
</patfile> </patients>
12. Examples (2) patient.css
/* Defaults for whole doc */patients {font-family: "Times New Roman"; font-size: 22pt}
/* name as header */name { display: block; text-align: center; font-size: 30pt;
font-weight:"bold"}
/* bdate as table */bdate {display: block }
day {color: blue}month {color: green}year {color: blue}
/* address */address { display: block; font-style: italic}
/* diagnosis */diag { display: list-item ; text-indent: 2cm; list-style-
position: inside; font-size: 18pt; color: red}
12. Examples (3) pat-css.xml
<?xml version="1.0" standalone="yes"?><?xml-stylesheet type="text/css" href="patient.css"?>
<patients> <patfile nr="A952345"> <name>
Frank Doo </name> <bdate>
<day>23</day> <month>05</month> <year>1958</year> </bdate>
<address>Long Street 15, Hightown</address>
<diag>Jan 2002: healthy</diag> <diag>May 2002: flue</diag> <diag>July 2002: pain in the back</diag>
</patfile> </patients>
12. Examples (4) xsltproc patients-xslt.xml > patients-xslt.html
<?xml version="1.0" ?><?xml-stylesheet type="text/xsl" href="./patients.xsl" ?><patients> <patient> <number>0599123123</number> <name>John Doo</name> <diag>healthy</diag> </patient> <patient> <number>0479123123</number> <name>Jane Bee</name> <diag>flue</diag> </patient> <patient> <number>2469523729</number> <name>Louise Three</name> <diag>pregnant</diag> </patient></patients>
12. Examples (5) patients.xsl
<?xml version="1.0"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:template match ="/">
<html><body><h1>Patient list</h1><p></p><table border="3">
<xsl:apply-templates select="//patients"/></table></body></html>
</xsl:template>
<xsl:template match="patient"><tr> <td><B>Patient name: </B></td>
<td><xsl:value-of select="name"/></td> <td><B>Number: </B></td> <td> <xsl:value-of select="number"/></td> <td><B>Diagnosis: </B></td> <td> <xsl:value-of select="diag"/></td>
</tr> </xsl:template></xsl:stylesheet>
12. Examples (6) Resulting HTML output: patients-xslt.html
<html><body><h1>Patient list</h1><p><table border="3"> <tr><td><B>Patient name: </B></td><td>John Doo</td><td><B>Number: </B></td><td>0599123123</td><td><B>Diagnosis: </B></td><td>healthy</td></tr> <tr><td><B>Patient name: </B></td><td>Jane Bee</td><td><B>Number: </B></td><td>0479123123</td><td><B>Diagnosis: </B></td><td>flue</td></tr> <tr><td><B>Patient name: </B></td><td>Louise Three</td><td><B>Number: </B></td><td>2469523729</td><td><B>Diagnosis: </B></td><td>pregnant</td></tr></table> </body></html>
13. References
13. References
XML in a nutshell, Eliotte Rusty Harold, W. Scott Means, O'Reilly, Jan 2001,
ISBN 0-596-00058-8
XML Specification Guide, Ian S. Graham, Liam Quin, Wiley, 1999, ISBN 0-471-32753-0
Learning XML (Creating Self-Describing Data), Erik T. Ray, O'Reilly, 2001, ISBN0-596-00046-4
XML Cursus (Technologisch Instituut KVIV 2001-2002), Erik Duval, Bert Paepen (Departement Computerwetenschappen, KUL)
13. References (2)
http://www.w3.org/, http://www.w3.org/XML/ the reference for XML
http://www.stg.brown.edu/service/xmlvalid/ XML validation form
Namespaces FAQ http://www.rpbourret.com/xml/NamespacesFAQ.htm
Docbook: http://www.docbook.org ... and others (http://www.oasis-open.org)
The Dublin Core (http://purl.org/dc/)
Specialized XML sites: http://www.xml.org http://www.oasis-open.org/cover
XML encryption: http://www.w3.org/Encryption/2001
XML signatures: http://www.w3.org/signature
XML tutorials: http://www.xml101.com/xml/default.asp
13. References (3) Apache project: http://xml.apache.org/
The goals of the Apache XML Project are:
The goals of the Apache XML Project are:
* to provide commercial-quality standards-based XML solutions that are developed in an open and cooperative fashion, * to provide feedback to standards bodies (such as IETF and W3C) from an implementation perspective, and * to be a focus for XML-related activities within Apache projects
The Apache XML Project currently consists of the following sub- projects, each focused on a different aspect of XML:
* Xerces - XML parsers in Java, C++ (with Perl and COM bindings) * SOAP - Simple Object Access Protocol
* Xalan - XSLT stylesheet processors, in Java and C++ * Batik - A Java based toolkit for Scalable Vector Graphics
* Cocoon - XML-based web publishing, in Java * Crimson - A Java XML parser derived from the
* AxKit - XML-based web publishing, in mod_perl Sun Project X Parser.
* FOP - XSL formatting objects, in Java
* Xang - Rapid development of dynamic server pages, in JavaScript