XML, Namespaces, DTD XML Techniques for E-Commerce ... Michael Sonntag XML Techniques for E-Commerce:

  • View
    1

  • Download
    0

Embed Size (px)

Text of XML, Namespaces, DTD XML Techniques for E-Commerce ... Michael Sonntag XML Techniques for...

  • © Michael Sonntag 2004

    Basics XML, Namespaces, DTD

    XML Techniques for E-Commerce, Budapest 2004

    Mag. iur. Dr. techn. Michael Sonntag

    Institute for Information Processing and Microprocessor Technology (FIM)

    Johannes Kepler University Linz, Austria E-Mail: sonntag@fim.uni-linz.ac.at http://www.fim.uni-linz.ac.at/staff/sonntag.htm

  • © Michael Sonntag 2004

    QuestionsQuestions?? Please ask immediately!

    ? ?

    ??

    ??

  • Michael Sonntag 3XML Techniques for E-Commerce: Basics

    Content

    Introduction XML: Structure and principles The problems of XML XML

    Structure Elements Attributes Other concepts

    DTDs What is it and why is it not enough? Defining content with DTD

    Short but important: Namespaces

  • Michael Sonntag 4XML Techniques for E-Commerce: Basics

    Introduction

    What is this record?

    10001011011011010110010001101011 00101100111111000110110010000001 00001110110111010011010101001101 01001110100100110101100110111011 01111010101101101000000111001111

    This is an order from a customer for a new Laptop and an external firewire harddisk.

    Isn't this obvious???

  • Michael Sonntag 5XML Techniques for E-Commerce: Basics

    Introduction

    10001011011011010110010001101011 00101100111111000110110010000001 00001110110111010011010101001101 01001110100100110101100110111011 01111010101101101000000111001111

    Order number Ordering date

    1 piece

    Firewire harddisk Laptop

    Customer name

    Delivery date Usage by recipient:

    if(orderBytes[15]&0x3f==0x17) ... String customerName=extractBytes(orderBytes,4,20); ...

  • Michael Sonntag 6XML Techniques for E-Commerce: Basics

    Introduction

    This is NOT a very good design! Where is the description of the record? How to write the parser to extract all the data? What if we want to add a "comment" field? ...

    Better:

    13911.11.2003 Michael Sonntag

    Laptop Firewire harddisk (External)

    2.1.2004, 10:30

  • Michael Sonntag 7XML Techniques for E-Commerce: Basics

    Reasons & Goals for XML (1)

    Format for storage of data Independent from presentation (XSLT for this)

    Extensibility To accommodate all the different needs without formal process

    Exact definition Strict set of rules without alternatives for the same, but still complete Possibility to check correctness on different levels

    » E. g. without knowing about the content Platform independence

    Very simple basic character set (others optional; see below) Everything is text (no binary data which can lead to problems)

    Full character set XML supports Unicode

  • Michael Sonntag 8XML Techniques for E-Commerce: Basics

    Reasons & Goals for XML (2)

    Open definition Not proprietary (would allow revocation/changes at any time) Can be standardized

    Simplicity Should be easy to understand

    » Partly document can be used even without knowing the definition » Human readable: No special tools needed for basic information

    Easier to implement Application independence

    Not only for webpages but also for any application Terseness unimportant

    Rely on text compression programs for this

  • Michael Sonntag 9XML Techniques for E-Commerce: Basics

    Reasons & Goals for XML (3)

    History: Binary formats Pro: Very small, rather simple to create Con: Unintelligible without proper format, complicated to parse, special parser for every message/format, etc., works for only one type of computer (or large difficulties)

    XML: Textual format ONE parser for all documents Description of structure can be contained

    » Or can be easily exchanged Automatic checking whether document conforms to structure Human readable, easily understandable Completely portable (systems, character sets, etc.)

  • Michael Sonntag 10XML Techniques for E-Commerce: Basics

    XML vs. HTML

    HTML: Describes the visual representation XML: Describes the structure of the data Example: A poem

    HTML: Linebreak defines when a new line on the screen starts » Might be off if the line is long and the screen small

    What's in a name? that which we call a rose
    By any other name would smell as sweet;

    XML: Linebreak defines when one line of the poem ends » Whether a single line of the poem is printed on one or two lines, the

    second perhaps indented, … is NOT specified! What's in a name? that which we call a rose By any other name would smell as sweet;

    http://the-tech.mit.edu/Shakespeare/romeo_juliet/romeo_juliet.2.2.html

  • Michael Sonntag 11XML Techniques for E-Commerce: Basics

    What XML is not

    No programming language XML is for data; programs can be stored as data, but not executed

    No successor/replacement of HTML XML + ..... (e. g. stylesheets) could replace HTML This is often, but not always a good idea!

    No database File format: A lot of data can be stored, but there is e. g. no (efficient) query language

    » XPath and XQuery work, but currently performace nowhere near SQL Efficiency as a database is very bad (slow, no transactions, ...)

    Not reserved for special applications ("Only" Web, EDI, ...) Universal, can be used everywhere

    No "jack of all trades" Suitable for many applications, but obviously not for all!

  • Michael Sonntag 12XML Techniques for E-Commerce: Basics

    Of the origin of XML (1)

    (Grand)parent: SGML (ISO 8879; 15.10.1986) Rather complicated; easy implementation was NOT a design goal!

    Initial idea: SGML Editorial Review Board (1996) Participation from the SGML Working Group All in the scope of the W3C

    Name changed to XML Working Group Participation from XML Special Interest Group

    Now: XML Core Working Group First Version: 10.2.1998 (W3C Recommendation)

    Extensible Markup Language (XML) 1.0 Second Version: 6.10.2000 (W3C Recommendation)

    Extensible Markup Language (XML) 1.0 (Second Edition) » Not a new version, just includes all the errata since 1998!

    GML (1969)

    SGML (1985)

    HTML (1993)

    XML (1998)

  • Michael Sonntag 13XML Techniques for E-Commerce: Basics

    Of the origin of XML (2)

    Third Version: 4.2.2004 (W3C Recommendation) Extensible Markup Language (XML) 1.0 (Third Edition)

    » Not a new version, just includes all the errata since 2000! New Version: 4.2.2004 (W3C Recommendation)

    Extensible Markup Language (XML) 1.1 Official encouragement to use 1.0, if new features are not required! Changes/new features:

    » New Unicode characters can now also be used in names – In content text already possible!

    » Names are more "loose" everything not forbidden is allowed – 1.0: Everything not allowed is forbidden

    » Additional line termination characters (important for mainframes only) – XML files are then plain text (instead of binaries) also on these computers

    » Normalization: Allows binary comparison even for Unicode characters – Uses the "Unicode Normalization Form C"

  • Michael Sonntag 14XML Techniques for E-Commerce: Basics

    Some XML technologies

    XML

    HTML

    XML

    FO XML

    Schema

    XML Name- space

    XSLT

    ebXML, SOAP,

    Security Metadata,

    ...

    XPath

    Java

  • Michael Sonntag 15XML Techniques for E-Commerce: Basics

    Vital

    Some XML technologies

    XML

    HTML

    XML

    FO XML

    Schema

    XML Name- space

    XSLT

    ebXML, SOAP,

    Security Metadata,

    ...

    XPath

    Java

    Optional

  • Michael Sonntag 16XML Techniques for E-Commerce: Basics

    Basic XML

    XML

    HTML

    XML

    FO XML

    Schema

    XML Name- space

    XSLT

    ebXML, SOAP,

    Security Metadata,

    ...

    XPath

    Java

  • Michael Sonntag 17XML Techniques for E-Commerce: Basics

    Structure of XML: Elements

    A "tag" is a name within angular brackets (“”) Each "element" consists of a start and an end tag

    Empty elements are “fused” together (modified end tag alone) Between start and end tag there may be some content Start or empty tags may contain attributes (see later)

    Restrictions for tag names: Case-sensitive (Unicode character number, glyphs don’t matter) May not start with xml (or XML, xML, xMl, …) Name must start with letter, “_” or “:” Within a name: Letter, Digit, “. -_:” + some other Unicode chars

    » If namespaces are used, “:” is NOT allowed anymore! Examples:

    … ; …

  • Michael Sonntag 18XML Techniques for E-Commerce: Basics

    Structure of XML: Well-formedness

    Name of start tag must exactly match name of end tag No “interleaving”

    ………: INVALID!

    ………: VALID!

    ……: VALID!

    At the top level there may be only a single element The “document element” (its tag name is irrelevant, however!)

    Any attribute may occur only once in a single tag + Several rules for entities and entity references

    See specification for details!

    ………: VALID!

  • Michael Sonntag 19XML Techniques for E-Commerce: Basics

    Structure of XML: Attributes

    Attribute: name “=“ value Value MUST be qu