Transcript
  • 8/13/2019 An Introduction to the Extensible Markup Language

    1/8

    An Introduction to the Extensible

    Markup Language (XML)

    by Martin Bryan ofThe SGML Centre

    The SGML Centre, 1997

    This file gives a very brief overview of the most commonly use com!onents of the

    "orl "ie "eb Consortium#s $"%C& '(tensible Mar)u! Language $*ML&, ass!ecifie in the "%C +ecommenation a!!rove on 1th -ebruary 199./

    What is XML?

    *ML is subset of the Stanar Generali0e Mar)u! Language $SGML& efine in S2

    stanar ..79319.4 that is esigne to ma)e it easy to interchange structure ocumentsover the nternet/ *ML files always clearly mar) where the start an en of each of the

    logical !arts $calle elements& of an interchange ocument occurs/ *ML restricts the use

    of SGML constructs to ensure that fall bac) o!tions are available when access to certaincom!onents of the ocument is not currently !ossible over the nternet/ t also efines

    how nternet 5niform +esource Locators can be use to ientify com!onent !arts of

    *ML ata streams/

    6y efining the role of each element of te(t in a formal moel, )nown as aDocumentType Definition$T&, users of *ML can chec) that each com!onent of ocument

    occurs in a vali !lace within the interchange ata stream/ 8n *ML T allows

    com!uters to chec), for e(am!le, that users o not accientally enter a thirlevelheaing without first having entere a seconlevel heaing, something that cannot be

    chec)e using the :y!erTe(t Mar)u! Language $:TML& !reviously use to coe

    ocuments that form !art of the "orl "ie "eb $"""& of ocuments accessiblethrough the nternet/

    :owever, unli)e SGML, *ML oes not re;uire the !resence of a T/ f no T is

    available, either because all or !art of it is not accessible over the nternet or because the

    user faile to create it, an *ML system can assign a efault efinition for uneclarecom!onents of the mar)u!/

    *ML allows users to3

    bring multi!le files together to form com!oun ocuments

    ientify where illustrations are to be incor!orate into te(t files, an the format

    use to encoe each illustration

    !rovie !rocessing control information to su!!orting !rograms, such as ocument

    valiators an browsers

    a eitorial comments to a file/

    http://www.w3.org/TR/REC-xmlhttp://www.w3.org/TR/REC-xml
  • 8/13/2019 An Introduction to the Extensible Markup Language

    2/8

    t is im!ortant to note, however, that *ML is not3

    a !reefine set of tags, of the ty!e efine for :TML, that can be use to

    mar)u! ocuments

    a stanari0e tem!late for !roucing !articular ty!es of ocuments/

    *ML was not esigne to be a stanari0e way of coing te(t3 in fact it is im!ossible to

    evise a single coing scheme that woul be suit all languages an all a!!lications/

    nstea *ML is formal language that can be use to !ass information about the

    com!onent !arts of a ocument to another com!uter system/ *ML is fle(ible enough tobe able to escribe any logical te(t structure, whether it be a form, memo, letter, re!ort,

    boo), encyclo!eia, ictionary or atabase/

    The components o XML

    *ML is base on the conce!t of documentscom!ose of a series of entities/ $

  • 8/13/2019 An Introduction to the Extensible Markup Language

    3/8

    6ecause *ML tag sets are base on the logical structure of the ocument they are

    somewhat easier to unerstan, an remember, than !hysically base mar)u! schemes of

    the ty!e ty!ically !rovie by wor !rocessors/ 8n *ML memo might be coe as3

    All staff

    Martin Bryan5th November

    ats and !o"s

    $lease remember to %ee all cats and do"s indoors toni"ht.

    This form the file is ieal for a com!uter to follow, an therefore to !rocess/ The start an

    en of each logical element of the file has been clearly ientifie by entry of a starttag

    $e/g/ & an an entag $e/g/ &/

    ?otice that at this !oint nothing has been sai about the format of the final ocument/-rom the neutral format !rovie by *ML users can either chose to is!lay the memo on

    a screen, whose si0e can be varie to suit user !references, to !rint the te(t onto a !re

    !rinte form, or to generate a com!letely new form, !ositioning each element of theocument where neee/

    #eining $our o"n tag sets

    To efine tag sets users must create a ocument Ty!e efinition that formally ientifies

    the relationshi!s between the various elements that form their ocuments/ -or a sim!lememo the *ML T might ta)e the form3

  • 8/13/2019 An Introduction to the Extensible Markup Language

    4/8

    "here the !osition of an element in the moel is variable the element can be efine as

    !art of a re!eatable choice of elements/ -or e(am!le, to allow references to boo)s or

    figures to occur anywhere in the te(t of a !aragra!h, but not in the heaing, the moelefinition for the element coul be moifie to rea3

    This tells the com!uter that the starttag can be amene to rea or if a variant font is re;uire/ f no such

    change is re;ueste the !rogram is to use the efault value to ma)e the tag rea /

    2ne es!ecially im!ortant ty!e of attribute is the uni;ue ientifier/ 6ecause it is uni;ue it

    can be use to !rovie a cross reference between two !oints in the ocument/ -or

    e(am!le, you can ensure that a uni;ue ientifier is assigne to each figure by aing anattribute list eclaration of the following form to the T3

  • 8/13/2019 An Introduction to the Extensible Markup Language

    5/8

    This tells the com!uter that every /

    5ni;ue ientifiers can be referre to within the te(t by use of attributes that formientifier references/ Ty!ically a figure reference element might have its attribute

    eclaration list efine as3

    2nce such a eclaration has been mae in the T users can use an entity referenceof

    the form &comany;in !lace of the full name of the com!any/ 8n avantage of using this

    techni;ue is that, shoul the name of the com!any referre to by the mnemonic changelater, only the entry in the T nees to be change as the entity reference will

    automatically call in the current efinition/

    Te(t store in another file it can also be incor!orate into a file using entity references/ nthis case the entity eclaration in the T ientifies the location of the file containing the

    te(t to be reference, e/g/3

  • 8/13/2019 An Introduction to the Extensible Markup Language

    6/8

    "hen the string is encountere in the te(t the com!uter will re!lace it by the

    coe whose ecimal value is @%%/

    8lternatively the ecimal character number, or its he(aecimal e;uivalent, !recee by #,

    can be use irectly as !art of a character reference, e/g/ &3#+E;to generate =/

    Illustrations& tables and other special elements

    *ML !rovies a number of techni;ues for hanling nonstanar ocument elements/

    "here the coing scheme of an element of the file such as an illustration iffers from thatuse for normal te(t the contents of the element can be treate as an entity with a s!ecial

    notation, e/g/3

  • 8/13/2019 An Introduction to the Extensible Markup Language

    7/8

    8n *ML file normally consists of three ty!es of mar)u!, the first two of which are

    o!tional3

    1/ 8nXML processing instructionientifying the version of *ML being use, theway in which it is encoe, an whether it references other files or not, e,g,

    ect Moel $2M&, which

    !rovies a C2+68 L interface between a!!lications e(changing *ML ata/

    ata store using non*ML notations will nee a!!ro!riate a!!lication software to!rocess it, but the *MLcoe file will correctly ientify where each !iece of such ata

    belongs in the com!lete ocument an where it has been store !rior to use/

    6y storing ata in the clearly efine format !rovie by *ML you can ensure that your

    ata will be transferable to a wie range of harware an software environments/ ?ewtechni;ues in !rogramming an !rocessing ata will not affect the logical structure of

    your ocument#s message/ f more etail nees to be ae to the file all you nee to o is

    to u!ate the moel an then a new mar)u! tags where re;uire in the ocument

    instance/ f a com!letely new style is re;uire then the e(isting ocument moel can belin)e to the new one to !rovie automatic u!ating of ocument structures/

    "ebmaster3 mtbryanAsgml/unet/com

    Lesson 1: Authoring XML Elements

    mailto:[email protected]:[email protected]
  • 8/13/2019 An Introduction to the Extensible Markup Language

    8/8

    What is an XML element?

    XML is a meta-markup language, a set of rules for creating semantic tags used to describe data. An XMLelement is made up of a start tag, an end tag, and data in between. The start and end tags describe thedata within the tags, which is considered the value of the element. For eample, the following XMLelement is a !director" element with the value #Matthew $unn.#

    Matthew Dunn

    The element name #director# allows %ou to mark up the value #Matthew $unn# semanticall%, so %ou candifferentiate that particular bit of data from another, similar bit of data. For eample, there might beanother element with the value #Matthew $unn.#

    Matthew Dunn

    &ecause each element has a different tag name, %ou can easil% tell that one element refers to Matthew$unn, the director, while the other refers to Matthew $unn, the actor.'f there were no wa% to mark upthe data semanticall%, having two elements with the same value might cause confusion.

    'n addition, XML tags are case-sensitive, so the following are each a different element.

    Attributes

    An element can optionall% contain one or more attributes. An attribute is a name-value pair separated b%an e(ual sign )*+.

    Westfield

    'n this eample, ZIP="0108"is an attribute of the !'T" element. Attributes are used to attachadditional, secondar% information to an element, usuall% meta information. Attributes can also acceptdefault values, while elements cannot. ach attribute of an element can be specified onl% once, but in an%

    order.

    !r it#

    'n the following tet bo, t%pe the title of a favorite movie and then click $ontinue.

    $he%& the snta'

    &ecause XML is a highl% structured language, it is important that all XML be well-formed. That is, the XMLmust have both a start tag and end tag, and must be authored using the proper s%nta. 'n the following

    bo, create an XML element with a start tag, an end tag, and a value on a single line. lick the Well()orme*?button to see if %our XML is correct.