1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

  • View
    219

  • Download
    1

Embed Size (px)

Text of 1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs)

  • Slide 1

1 Module 2 XML Basics (XML, Namespaces, Usage scenarios, DTDs) Slide 2 2 History: SGML vs. HTML vs. XML SGML (1960) XML(1996) HTML(1990) XHTML(2000) http://www.w3.org/TR/2006/REC-xml-20060816/ Slide 3 3 Why XML ? HTML is to be interpreted by browsers HTML is to be interpreted by browsers Shown on the screen to a human Shown on the screen to a human Desire to separate the content from presentation Desire to separate the content from presentation Presentation has to please the human eye Presentation has to please the human eye Content can be interpreted by machines, for machines presentation is a handicap Content can be interpreted by machines, for machines presentation is a handicap Semantic markup of the data Semantic markup of the data Slide 4 4 Information about a book in HTML Politics of experience by Ronald Laing, published in 1967 Item number:320070381076 Politics of experience by Ronald Laing, published in 1967 Item number:320070381076 Slide 5 5 The same information in XML The same information in XML Politics of experience Politics of experience Ronald Ronald Laing Laing Elements Information is (1) decoupled from presentation, then (2) chopped into smaller pieces, and then (3) marked with semantic meaning It can be processed by machines Like HTML, only syntax, not logical abstract data model Slide 6 6 XML key concepts Documents Documents Elements Elements Attributes Attributes Namespace declarations Namespace declarations Text Text Comments Comments Processing Instructions Processing Instructions All inherited from SGML, then HTML All inherited from SGML, then HTML Slide 7 7 The key concepts of XML The key concepts of XML Politics of experience Politics of experience Ronald Ronald Laing Laing Elements Documents Elements Attributes Text Nested structure Conceptual tree Order is important Only characters, not integers, etc Slide 8 8Elements Enclosed in Tags Enclosed in Tags Begin Tag: e.g., Begin Tag: e.g., End Tag: e.g., End Tag: e.g., Element without content: e.g., is a shorthand for Element without content: e.g., is a shorthand for Elements can be nested Wilde Wutz Elements can be nested Wilde Wutz Subelements can implement multisets...... Subelements can implement multisets...... Order is important ! Order is important ! Documents must be well-formed is forbidden! is forbidden! Documents must be well-formed is forbidden! is forbidden! Slide 9 9Attributes Attribute are associated to Elements...... Attribute are associated to Elements...... Elements can have only attributes Elements can have only attributes Attribute names must be unique! (No Multisets) is illegal! Attribute names must be unique! (No Multisets) is illegal! What is the difference between a nested element and an attribute? Are attributes useful? What is the difference between a nested element and an attribute? Are attributes useful? Modeling decision: should name be an attribute or a subelement of a person ? What about age ? Modeling decision: should name be an attribute or a subelement of a person ? What about age ? Slide 10 10 Text and Mixed Content Text appears in element content Text appears in element content The politics of experience The politics of experience Can be mixed with other subelements Can be mixed with other subelements The politics of experience The politics of experience Mixed Content Mixed Content For documents data -- very useful For documents data -- very useful The need does not arise in data processing, only entities and relationships The need does not arise in data processing, only entities and relationships People speak in sentences, not entities and relationships. XML allows to preserve the structure of natural language, while adding semantic markup that can be interpreted by machines. People speak in sentences, not entities and relationships. XML allows to preserve the structure of natural language, while adding semantic markup that can be interpreted by machines. Slide 11 11 Continuous spectrum between natural language, semi-structured data, and structured data 1. Dana said that the book entitled The politics of experience is really excellent ! 2. The book entitled The politics of experience is really excellent ! 2. The book entitled The politics of experience is really excellent ! 3. The book entitled The politics of experience is really excellent ! 3. The book entitled The politics of experience is really excellent ! 4. 4. Dana Dana The politics of experience The politics of experience excellent excellent Slide 12 12 CDATA sections Sometimes we would like to preserve the original characters, and not interpret them as markup Sometimes we would like to preserve the original characters, and not interpret them as markup CDATA sections CDATA sections Not parsed as XML Not parsed as XML Hello,world! Hello,world!Hello, world! ]]> Hello, world! ]]> Slide 13 13 Comments, PIs, Prolog Comment: Syntax as in HTML Comment: Syntax as in HTML Processing Instructions Processing Instructions Contain no data - interpretation by processor Contain no data - interpretation by processor Syntax: Syntax: Pause is Target; 10secs is Content Pause is Target; 10secs is Content XML is a reserved target for prolog XML is a reserved target for prolog Prolog Prolog Standalone defines whether there is a DTD Standalone defines whether there is a DTD Encoding is usually Unicode. Encoding is usually Unicode. Slide 14 14 Whitespaces declaration Whitespace = Continuous sequence of Space, Tab and Return character Whitespace = Continuous sequence of Space, Tab and Return character Special Attribute xml:space to control use Special Attribute xml:space to control use Human-readible XML (with Whitespace) The politics of experience Ronald laing Human-readible XML (with Whitespace) The politics of experience Ronald laing (Efficient) machine-readible XML (no WS) The politics of experience Ronald Laing (Efficient) machine-readible XML (no WS) The politics of experience Ronald Laing Performance improvement: ca. Factor 2. Performance improvement: ca. Factor 2. Slide 15 15 Language declaration The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. What colour is it? What colour is it? What color is it? What color is it? Slide 16 16 Universal Resource Identifiers on the Web URLs, URIs, IRIs URLs, URIs, IRIs URL (Universal Resource Locators): deferenceable identifier on the Web URL (Universal Resource Locators): deferenceable identifier on the Web The target of an URL pointer is an HTML file (virtual or materialized) The target of an URL pointer is an HTML file (virtual or materialized) URIs (Unique Resource Identifier): general purpose key to resources on the Web URIs (Unique Resource Identifier): general purpose key to resources on the Web Uniquely identifies a resource Uniquely identifies a resource Target is not an HTML file, can be anything (schema, table, file, entity, object, tuple, person, physical item, etc) Target is not an HTML file, can be anything (schema, table, file, entity, object, tuple, person, physical item, etc) Lifetime and scope of this key is user dependent Lifetime and scope of this key is user dependent IRI (Internationalized Resource Identifiers) IRI (Internationalized Resource Identifiers) Allow non Latin characters (Chinese, Arabic, Japanese, etc) Allow non Latin characters (Chinese, Arabic, Japanese, etc) URL, URI, IRIs URL, URI, IRIs All strings All strings Very LONG strings Very LONG strings Slide 17 17Namespaces Integration of Data from diverse data sources Integration of Data from diverse data sources Integration of different XML Vocabularies (aka Namespaces) Integration of different XML Vocabularies (aka Namespaces) Each vocabulary has a unique key, identified by a URI/IRI Each vocabulary has a unique key, identified by a URI/IRI Same local name, from different vocabularies can have Same local name, from different vocabularies can have Different meaning Different meaning Different structure associated with it Different structure associated with it Qualified Names (Qname) to attach a name to its vocabulary Qualified Names (Qname) to attach a name to its vocabulary for all nodes in an XML document that has names (Attributes, Elements, Pis for all nodes in an XML document that has names (Attributes, Elements, Pis QName ::= triple ( URI [ prefix: ] localname ) QName ::= triple ( URI [ prefix: ] localname ) Binding (prefix, URI) is introduced in elements start tag Binding (prefix, URI) is introduced in elements start tag Later only the prefix is used, not the long URIs Later only the prefix is used, not the long URIs Prefix is optional, default namespaces Prefix is optional, default namespaces Prefix and localname a separated by : Prefix and localname a separated by : http://w3.org/TR/1999/REC-xml-names http://w3.org/TR/1999/REC-xml-names Slide 18 18 Namespaces (cont) Namespace definitions look like Attributes Namespace definitions look like Attributes Identified by xmlns:prefix or xmlns (default) Identified by xmlns:prefix or xmlns (default) Bind the Prefix to the URI Bind the Prefix to the URI Scope is the entire element where the namespace is declared Scope is the entire element where the namespace is declared Includes the element itslef, its attributes and ist subtrees Includes the element itslef, its attributes and ist subtrees Example Example content content Slide 19 19 Default namespaces Default namespaces, no prefix Default namespaces, no prefixOnly applies to subelements, not attributes Only applies to subelements, not attributesSlide 20 20 Example: Namespaces DQ1 defines dish for china DQ1 defines dish for china Diameter, Volume, Decor,... Diameter, Volume, Decor,... DQ2 defines dish for satellites DQ2 defines dish for satellites Diameter, Frequency Diameter, Frequency How many dishes are there? How many dishes are there? Better ask for: Better ask for: How many dishes are there? or How many dishes are there? or How many dishes are