21
XML XML CSC1310 Fall 2009 CSC1310 Fall 2009

XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language HTML (HyperText Markup Language): December 1990. Markup Markup is a symbol

Embed Size (px)

Citation preview

Page 1: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

XMLXMLCSC1310 Fall 2009CSC1310 Fall 2009

Page 2: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

HTML (TIM BERNERS-LEE)HTML (TIM BERNERS-LEE) HTML (HyperText Markup LanguageHyperText Markup Language): December

1990. MarkupMarkup is a symbol embedded into text (<name><name>) to

enhance the meaning of the information in certain ways, identifying parts and to know how they relate to each other.

<!DOCTYPE html> <html> <head> <title>Hello HTML</title> </head> <body> <b>Hello World!</b> </body> </html>

Page 3: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

HTML DRAWBACKSHTML DRAWBACKS HTML isn't extensibleHTML isn't extensible

There is no user-defined tag (browser makers, W3C (World Wide Web Consortium)).

HTML is very display-centricHTML is very display-centric HTML is a mixture of document logical structure (titles,

paragraphs) with presentation tags (bold, image alignment). HTML is useless for data replication or application services.

HTML isn't usually directly reusableHTML isn't usually directly reusable The entire translation needs to be redone if any

change. HTML only provides one 'view' of dataHTML only provides one 'view' of data

Dynamic HTML requires a huge amount of scripting. HTML has little or no semantic structureHTML has little or no semantic structure

Web applications need to represent data by meaning rather than by layout. HTML has no way to specify what a particular page item means.

Page 4: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

XML (JON BOSAK, MARCH XML (JON BOSAK, MARCH 1997)1997)

XML stands for EEXXtensible Markup Languagetensible Markup Language XML was designed to carry datacarry data, not to display

data. XML tags are not predefined. You must define

your own tags.

XML is not a replacement for HTML: HTMLHTML is about displayingdisplaying information, while XMLXML is about carryingcarrying information.

Page 5: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

XML DOES NOT DO XML DOES NOT DO ANYTHINGANYTHING

XML is just plain text. XML-aware applications can handle the XML tags specially to structure, store, and transport information.

Page 6: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

XML USAGEXML USAGE XML data is stored in plain text format. XML is a cross-platform, software and XML is a cross-platform, software and

hardware independent tool for hardware independent tool for transmitting information.transmitting information.

Data can be exchanged between incompatible systems.

Data become available to more applications. XML simplifies platform changes. XML separates data from HTML: with a few lines

of JavaScript, you can read an external XML file and update the data content of your HTML.

XML is going to be the main language for exchanging financial information between businesses over the Internet.

Page 7: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

XML SYNTAXXML SYNTAX

The first line in the document - the XML the XML declarationdeclaration - defines the XML version and the character encoding used in the document.

Optional.

Page 8: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

XML SYNTAXXML SYNTAX

The second line describes the root root element of the document ("this document is a(?) note")

Page 9: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

XML SYNTAXXML SYNTAX

The next 4 lines describe four childchild elements of the rootroot (to, from, heading, and bodyto, from, heading, and body)

Page 10: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

TREE STRUCTURETREE STRUCTURE XML documents must contain a root root

elementelement, "the parent" of all other elements.

The elements in an XML document form a document tree which starts at the root and branches to the lowest level of the tree.

All elements can have sub elements, text contents and attributes.

Children on the same level are called siblingssiblings

Page 11: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

EXAMPLEEXAMPLE

Page 12: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

ELEMENT VS ATTRIBUTEELEMENT VS ATTRIBUTE There are no rules about when to use

attributes and when to use elements.

• In contrast to elements, attributes cannotcannot:– contain multiple values– contain tree structures– be easily expandable.

• Use elementselements for datadata.• Use attributesattributes for

metadata metadata (data about data).

Page 13: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

ALL XML ELEMENTS MUST HAVE A ALL XML ELEMENTS MUST HAVE A CLOSING TAGCLOSING TAG

With XML, it is illegal to omit the closing tag except a tag for empty element.

In HTML:<p>This is a paragraph.<p>This is another paragraph</p>  In XML:<p>This is another paragraph</p>  Empty element references information that should

be used rather than contains it.<graphic fileref=“me.eps”/> 

Page 14: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

XML TAGS ARE CASE SENSITIVEXML TAGS ARE CASE SENSITIVE

Unlike HTML, XML tags are case Unlike HTML, XML tags are case sensitive. sensitive.

<Message>This is incorrect</message><message>This is correct</message>

Page 15: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

XML ELEMENTS MUST BE PROPERLY XML ELEMENTS MUST BE PROPERLY NESTEDNESTED

Improper nesting of tags makes no Improper nesting of tags makes no sense to XML. sense to XML.

In HTML some elements can be improperly nested within each other like this:

<b><i>This text is bold and italic</b></i>

In XML all elements must be properly nested within each other like this:

<b><i>This text is bold and italic</i></b>

Page 16: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

XML ATTRIBUTE VALUES MUST XML ATTRIBUTE VALUES MUST BE QUOTEDBE QUOTED

Page 17: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

ENTITY REFERENCESENTITY REFERENCES

<message>if salary << 1000 then</message>• Replace the "<" character with an entity entity

referencereference:<message>if salary &lt&lt; 1000 then</message>• There are 5 predefined entity references in XML

• Only the characters "<<" and "&&" are strictly illegal in XML.

Page 18: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

NAMING RULESNAMING RULES

• Names can contain letters, numbers, and other

characters.• Names must not start with a number, or

punctuation character.• Names cannot contain spaces (use underscore).• Any name can be used, no words are reserved.

• Make names descriptive.• Names should be short and simple.• Avoid

– "-“ : software may think it is subtraction.– "." : software may think that it is property.– ":" : colons are reserved for namespaces.

Page 19: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

WELL FORMED XML DOCSWELL FORMED XML DOCS

• A well formedwell formed XML doc follows XML syntax:– XML documents must have a root element– XML elements must have a closing tag– XML tags are case sensitive– XML elements must be properly nested– XML attribute values must be quoted

• Result: compatibility.• Some program may perform complex operations

on highly specific data and need to concretize markup language (document modeldocument model)

Page 20: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

DOCUMENT TYPE DEFINITIONS DOCUMENT TYPE DEFINITIONS (DTDS)(DTDS)

• DTD defines the structurestructure of an XML document. • It is a collection of rules (declarationsdeclarations)

describing the list of legal elementslegal elements and other markup objects.

• DTD does not restrict what kind of data go inside elements.

• PCDATA is text that will be parsed by a parser for entities and markup (CDATA).

• http://www.w3schools.com/dtd/default.asp

Page 21: XML CSC1310 Fall 2009. HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December 1990.  Markup  Markup is a symbol

XML SCHEMAXML SCHEMA

• Schema is alternative way to specify patterns for

data. • Schema is more powerful than DTDs• Schema supports data types

• http://www.w3schools.com/schema/schema_intro.asp