56
XML & XML Schema Semantic Web - Spring 2008 Computer Engineering Department Sharif University of Technology

XML & XML Schema

  • Upload
    ormand

  • View
    143

  • Download
    3

Embed Size (px)

DESCRIPTION

XML & XML Schema. Semantic Web - Spring 2008 Computer Engineering Department Sharif University of Technology. Outline. Markup Languages SGML, HTML, XML XML Building Blocks XML Applications Namespaces XML Schema. SGML(ISO 8879). S tandard G eneralized M arkup L anguage - PowerPoint PPT Presentation

Citation preview

Page 1: XML & XML Schema

XML & XML Schema

Semantic Web - Spring 2008Computer Engineering Department

Sharif University of Technology

Page 2: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 2

Outline

• Markup Languages– SGML, HTML, XML

• XML Building Blocks• XML Applications• Namespaces• XML Schema

Page 3: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 3

SGML(ISO 8879)

• Standard Generalized Markup Language• The international standard for defining descriptions of

structure and content in text documents • Interchangeable: device-independent, system-independent• tags are not predefined• Using DTD to validate the structure of the document• Large, powerful, and very complex• Heavily used in industrial and commercial usages for over

a decade

Page 4: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 4

HTML(RFC 1866)

• HyperText Markup Language

• A small SGML application used on web (a DTD and a set of processing conventions)

• Only uses a predefined set of tags

Page 5: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 5

What is XML?

• eXtensible Markup Language• A simplified version of SGML• Maintains the most useful parts of SGML• Designed so that SGML can be delivered over the

Web• More flexible and adaptable than HTML• XHTML: a reformulation of HTML 4 in XML 1.0

Page 6: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 6

HTML vs. XML

HTML is used to mark up text so it can be displayed to users.

XML is used to mark up data so it can be processed by computers.

HTML describes both structure (e.g. <p>, <h2>, <em>) and appearance (e.g. <br>, <font>, <i>)

XML describes only content, or “meaning”

HTML uses a fixed, unchangeable set of tags.

In XML, you make up your own tags.

Page 7: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 7

HTML vs. XML (2)

• HTML is for humans– HTML describes web pages– You don’t want to see error messages about the web pages you

visit– Browsers ignore and/or correct as many HTML errors as they can,

so HTML is often sloppy• XML is for computers

– XML describes data– The rules are strict and errors are not allowed

• In this way, XML is like a programming language– Current versions of most browsers can display XML

• However, browser support of XML is spotty at best

Page 8: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 8

XML-related technologies

• DTD (Document Type Definition) and XML Schemas are used to define legal XML tags and their attributes for particular purposes

• XSLT (eXtensible Stylesheet Language Transformations) and XPath are used to translate from one form of XML to another

• SAX (Simple API for XML)

Page 9: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 9

XML Building blocks - Elements

• Delimited by angle brackets • Identify the nature of the content they surround• General format: <element> … </element>• Empty element: <empty-Element />• XML Elements have Relationships

– Elements are related as parents and children

• Elements have Content– Elements can have different content types:

• Element, mixed, Simple, empty

Page 10: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 10

XML Building blocks - Attributes

Name-value pairs that occur inside start-tags after element name, like: <element attribute=“value” />

• Provide additional information about elements that often is not a part of data.

• Attributes and elements are somewhat interchangeable• Should I use an element or an attribute?• Example using just elements:

<name> <first>David</first> <last>Matuszek</last></name>

• Example using attributes: <name first="David" last="Matuszek"></name>

metadata (data about data) should be stored as attributes, and that data itself should be stored as elements

Page 11: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 11

XML Building blocks - Entities

Five special characters must be written as entities: &amp; for & (almost always necessary) &lt; for < (almost always necessary) &gt; for > (not usually necessary) &quot; for " (necessary inside double quotes) &apos; for ' (necessary inside single quotes)

These entities can be used even in places where they are not absolutely required.These are the only predefined entities in XML.

Page 12: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 12

XML Building blocks - Declaration

The XML declaration looks like this:<?xml version="1.0" encoding="UTF-8" standalone="yes"?>– The XML declaration is not required by browsers, but is

required by most XML processors (so include it!)– If present, the XML declaration must be first--not even

whitespace should precede it– Note that the brackets are <? and ?>– version="1.0" is required (this is the only version so far)– encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode),

or something else, or it can be omitted– standalone tells whether there is a separate DTD

Page 13: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 13

XML Building blocks - Processing instructions

• PIs (Processing Instructions) may occur anywhere in the XML document (but usually first)

• A PI is a command to the program processing the XML document to handle it in a certain way

• XML documents are typically processed by more than one program

• Programs that do not recognize a given PI should just ignore it• General format of a PI: <?target instructions?>• Example: <?xml-stylesheet type="text/css"

href="mySheet.css"?>

Page 14: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 14

XML Building blocks - Comments

• <!-- This is a comment in both HTML and XML -->• Comments can be put anywhere in an XML document• Comments are useful for:

– Explaining the structure of an XML document– Commenting out parts of the XML during development and testing

• The character sequence -- cannot occur in the comment• Comments are not displayed by browsers, but can be

seen by anyone who looks at the source code

Page 15: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 15

CDATA

• By default, all text inside an XML document is parsed• You can force text to be treated as unparsed character

data by enclosing it in <![CDATA[ ... ]]>• Any characters, even & and <, can occur inside a CDATA• Whitespace inside a CDATA is (usually) preserved• The only real restriction is that the character sequence ]]>

cannot occur inside a CDATA• CDATA is useful when your text has a lot of illegal

characters (for example, if your XML document contains some HTML text)

Page 16: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 16

XML Syntax

• All XML elements must have a closing tag • XML tags are case sensitive• All XML elements must be properly nested• All XML documents must have a root tag• Attribute values must always be quoted• With XML, white space is preserved• With XML, a new line is always stored as LF • Comments in XML: <!-- This is a comment -->

Page 17: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 17

Well-formed XML

• Every element must have both a start tag and an end tag, e.g. <name> ... </name>– But empty elements can be abbreviated: <break />.– XML tags are case sensitive– XML tags may not begin with the letters xml, in any combination

of cases• Elements must be properly nested, e.g. not <b><i>bold and

italic</b></i>• Every XML document must have one and only one root element• The values of attributes must be enclosed in single or double quotes,

e.g. <time unit="days">• Character data cannot contain < or &

Page 18: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 18

Displaying XML

• XML documents do not carry information about how to display the data

• We can add display information to XML with– CSS (Cascading Style Sheets)– XSL (eXtensible Stylesheet Language) --- preferred

Page 19: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 19

XML Applications (1)

Separate data

XML can Separate Data from HTML• Store data in separate XML files• Using HTML for layout and display• Using Data Islands• Data Islands can be bound to HTML elements

Benefits:Changes in the underlying data will not require any changes to your HTML

Page 20: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 20

XML Applications (2)

Exchange data

XML is used to Exchange Data• Text format• Software-independent, hardware-independent• Exchange data between incompatible systems, given that they agree on

the same tag definition.• Can be read by many different types of applications

Benefits:• Reduce the complexity of interpreting data• Easier to expand and upgrade a system

Page 21: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 21

XML Application (3)

Store Data

XML can be used to Store Data • Plain text file• Store data in files or databases• Application can be written to store and retrieve information from the store• Other clients and applications can access your XML files as data sources

Benefits:Accessible to more applications

Page 22: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 22

XML Applications (4)

Create new language

XML can be used to Create new Languages, e.g. : • WML (Wireless Markup Language) used to markup Internet applications

for handheld devices like mobile phones (WAP)• MusicXML used to publishing musical scores

Page 23: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 23

Names in XML

• Names (as used for tags and attributes) must begin with a letter or underscore, and can consist of:– Letters, both Roman (English) and foreign– Digits, both Roman and foreign . (dot) - (hyphen) _ (underscore) : (colon) should be used only for namespaces– Combining characters and extenders (not used in English)

Page 24: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 24

Namespaces

• Namespaces are a simple mechanism for creating globally unique names for the elements and attributes of your markup language.

• Benefits:– De-conflicts the meaning of identical names in different markup

languages.– Allows different markup languages to be mixed together without

ambiguity.

• Namespaces are implemented by requiring every XML name to consist of two parts: a prefix and a local part: <xsd:integer>

Page 25: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 25

Namespaces and URIs

• A namespace is defined as a unique string– To guarantee uniqueness, typically a URI (Uniform

Resource Indicator) is used, because the author “owns” the domain

– It doesn't have to be a “real” URI; it just has to be a unique string

– Example: http://ce.sharif.edu/sw

• There are two ways to use namespaces:– Declare a default namespace– Associate a prefix with a namespace, then use the

prefix in the XML to refer to the namespace

Page 26: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 26

Namespace syntax

• In any start tag you can use the reserved attribute name xmlns: <book xmlns="http://ce.sharif.edu/sw">– This namespace will be used as the default for all elements up to

the corresponding end tag– You can override it with a specific prefix

• You can use almost this same form to declare a prefix: <book xmlns:dave="http://ce.sharif.edu/sw">– Use this prefix on every tag and attribute you want to use from

this namespace, including end tags--it is not a default prefix <dave:chapter dave:number="1">To

Begin</dave:chapter>• You can use the prefix in the start tag in which it is defined:

<dave:book xmlns:dave=“http://ce.sharif.edu/sw">

Page 27: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 27

Review of XML rules

• Start with <?xml version="1"?>• XML is case sensitive• You must have exactly one root element that

encloses all the rest of the XML• Every element must have a closing tag• Elements must be properly nested• Attribute values must be enclosed in double or

single quotation marks• There are only five pre-declared entities

Page 28: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 28

XML as a tree

• An XML document represents a hierarchy; a hierarchy is a tree

novel

foreword chapternumber="1"

paragraph paragraph paragraph

This is the greatAmerican novel.

It was a darkand stormy night.

Suddenly, a shotrang out!

Page 29: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 29

Extended document standards

• You can define your own XML tag sets, but here are some already available:– XHTML: HTML redefined in XML– SMIL: Synchronized Multimedia Integration Language– MathML: Mathematical Markup Language– SVG: Scalable Vector Graphics– DrawML: Drawing MetaLanguage– ICE: Information and Content Exchange– ebXML: Electronic Business with XML– cxml: Commerce XML– CBL: Common Business Library

Page 30: XML & XML Schema

XML Schema

Page 31: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 31

XML Validation

• "Well Formed" XML document– correct XML syntax

• "Valid" XML document– “well formed”– Conforms to the rules of a DTD

• XML DTD– defines the legal building blocks of an XML document– Can be inline in XML or as an external reference

• XML Schema– an XML based alternative to DTD, more powerful– Support namespace and data types

Page 32: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 32

An Example XML with DTD

<?xml version="1.0"?><!DOCTYPE note [

<!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>

]> <note>

<to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend</body>

</note>

Page 33: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 33

XML Schemas

• “Schema” is a general term– DTDs are a form of XML schemas

• When we say “XML Schemas,” we usually mean the W3C XML Schema Language– This is also known as “XML Schema Definition”

language, or XSD.

Page 34: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 34

XSD vs. DTD

• DTDs provide a very weak specification language– You can’t put any restrictions on text content– You have very little control over mixed content (text plus

elements)– You have little control over ordering of elements

• DTDs are written in a strange (non-XML) format– You need separate parsers for DTDs and XML

• The XML Schema Definition language solves these problems– XSD gives you much more control over structure and content– XSD is written in XML

Page 35: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 35

Referring to a schema

• To refer to a DTD in an XML document, the reference goes before the root element:– <?xml version="1.0"?>

<!DOCTYPE rootElement SYSTEM "url"><rootElement> ... </rootElement>

• To refer to an XML Schema in an XML document, the reference goes in the root element:– <?xml version="1.0"?>

<rootElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" (The XML Schema Instance reference is required) xsi:noNamespaceSchemaLocation="url.xsd"> (This is where your XML Schema definition can be found) ...</rootElement>

Page 36: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 36

The XSD document

• Since the XSD is written in XML, it can get confusing which we are talking about.

• The file extension is .xsd• The root element is <schema>• The XSD starts like this:• <?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.rg/2001/XMLSchema">

Page 37: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 37

<schema>

• The <schema> element may have attributes:– xmlns:xs="http://www.w3.org/2001/

XMLSchema"• This is necessary to specify where all our XSD tags are

defined– elementFormDefault="qualified"

• This means that all XML elements must be qualified (use a namespace)

• It is highly desirable to qualify all elements, or problems will arise when another schema is added

Page 38: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 38

“Simple” and “complex” elements

• A “simple” element is one that contains text and nothing else– A simple element cannot have attributes– A simple element cannot contain other elements– A simple element cannot be empty– However, the text can be of many different types, and may have

various restrictions applied to it

• If an element isn’t simple, it’s “complex”– A complex element may have attributes– A complex element may be empty, or it may contain text, other

elements, or both text and other elements

Page 39: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 39

Defining a simple element

• A simple element is defined as <xs:element name="name" type="type" />where:– name is the name of the element– the most common values for type are

xs:boolean xs:integer xs:date xs:string xs:decimal xs:time

• Other attributes a simple element may have:– default="default value" if no other value is specified– fixed="value" no other value may be specified

Page 40: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 40

Defining an attribute

• Attributes themselves are always declared as simple types• An attribute is defined as

<xs:attribute name="name" type="type" />where:– name and type are the same as for xs:element

• Other attributes a simple element may have:– default="default value" if no other value is specified– fixed="value" no other value may be specified– use="optional" the attribute is not required (default)– use="required" the attribute must be present

Page 41: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 41

Restrictions, or “facets”

• The general form for putting a restriction on a text value is:– <xs:element name="name"> (or

xs:attribute) <xs:restriction base="type"> ... the restrictions ... </xs:restriction></xs:element>

• For example:– <xs:element name="age">

<xs:restriction base="xs:integer"> <xs:minInclusive value="0"> <xs:maxInclusive value="140"> </xs:restriction></xs:element>

Page 42: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 42

Restrictions on numbers

• minInclusive -- number must be ≥ the given value • minExclusive -- number must be > the given value • maxInclusive -- number must be ≤ the given value • maxExclusive -- number must be < the given value • totalDigits -- number must have exactly value digits

• fractionDigits -- number must have no more than value digits after the decimal point

Page 43: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 43

Restrictions on strings

• length -- the string must contain exactly value characters • minLength -- the string must contain at least value characters• maxLength -- the string must contain no more than value characters• pattern -- the value is a regular expression that the string must

match• whiteSpace -- not really a “restriction”--tells what to do with

whitespace– value="preserve" Keep all whitespace– value="replace" Change all whitespace characters to spaces– value="collapse" Remove leading and trailing whitespace, and

replace all sequences of whitespace with a single space

Page 44: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 44

Enumeration

• An enumeration restricts the value to be one of a fixed set of values

• Example:– <xs:element name="season">

<xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Spring"/> <xs:enumeration value="Summer"/> <xs:enumeration value="Autumn"/> <xs:enumeration value="Fall"/> <xs:enumeration value="Winter"/> </xs:restriction> </xs:simpleType></xs:element>

Page 45: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 45

Complex elements

• A complex element is defined as <xs:element name="name"> <xs:complexType> ... information about the complex type... </xs:complexType> </xs:element>

• Example: <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element>

• <xs:sequence> says that elements must occur in this order• Remember that attributes are always simple types

Page 46: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 46

Declaration and use

• So far we’ve been talking about how to declare types, not how to use them

• To use a type we have declared, use it as the value of type="..."– Examples:

•<xs:element name="student" type="person"/>•<xs:element name="professor"

type="person"/>– Scope is important: you cannot use a type if is local

to some other type

Page 47: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 47

xs:sequence

• We’ve already seen an example of a complex type whose elements must occur in a specific order:

• <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element>

Page 48: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 48

xs:all

• xs:all allows elements to appear in any order• <xs:element name="person">

<xs:complexType> <xs:all> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:all> </xs:complexType> </xs:element>

• Despite the name, the members of an xs:all group can occur once or not at all

• You can use minOccurs="0" to specify that an element is optional (default value is 1)– In this context, maxOccurs is always 1

Page 49: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 49

Empty elements

• Empty elements are (ridiculously) complex

• <xs:complexType name="counter"> <xs:complexContent> <xs:extension base="xs:anyType"/> <xs:attribute name="count" type="xs:integer"/> </xs:complexContent></xs:complexType>

Page 50: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 50

Mixed elements

• Mixed elements may contain both text and elements• We add mixed="true" to the xs:complexType element• The text itself is not mentioned in the element, and

may go anywhere (it is basically ignored)

• <xs:complexType name="paragraph" mixed="true"> <xs:sequence> <xs:element name="someName” type="xs:anyType"/> </xs:sequence></xs:complexType>

Page 51: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 51

Extensions

• You can base a complex type on another complex type

• <xs:complexType name="newType"> <xs:complexContent> <xs:extension base="otherType"> ...new stuff... </xs:extension> </xs:complexContent></xs:complexType>

Page 52: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 52

Predefined string types

• Recall that a simple element is defined as: <xs:element name="name" type="type" />

• Here are a few of the possible string types:– xs:string -- a string– xs:normalizedString -- a string that doesn’t contain tabs,

newlines, or carriage returns– xs:token -- a string that doesn’t contain any whitespace other

than single spaces

• Allowable restrictions on strings:– enumeration, length, maxLength, minLength, pattern,

whiteSpace

Page 53: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 53

Predefined date and time types

• xs:date -- A date in the format CCYY-MM-DD, for example, 2002-11-05

• xs:time -- A date in the format hh:mm:ss (hours, minutes, seconds)

• xs:dateTime -- Format is CCYY-MM-DDThh:mm:ss– The T is part of the syntax

• Allowable restrictions on dates and times:– enumeration, minInclusive, minExclusive,

maxInclusive, maxExclusive, pattern, whiteSpace

Page 54: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 54

Predefined numeric types

• Here are some of the predefined numeric types:

• Allowable restrictions on numeric types:– enumeration, minInclusive, minExclusive, maxInclusive,

maxExclusive, fractionDigits, totalDigits, pattern, whiteSpace

xs:decimal xs:positiveIntegerxs:byte xs:negativeIntegerxs:short xs:nonPositiveIntegerxs:int xs:nonNegativeIntegerxs:long

Page 55: XML & XML Schema

Questions?

Page 56: XML & XML Schema

Semantic web - Computer Engineering Dept. - Spring 2008 56

References

• http://www.w3.org/XML/ • http://www.w3.org/XML/Schema