35
1 Venkat Subramaniam – [email protected] HTML and XML 2 Venkat Subramaniam – [email protected] HTML • Hyper Text Markup Language • HTML 4.0 has strict compliance with XML standard • Presentation details presented with information – using markups • Browsers act as interpreters/parsers in – parsing through HTML documents – displaying the contents of the documents

HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

1Venkat Subramaniam – [email protected]

HTML and XML

2Venkat Subramaniam – [email protected]

HTML• Hyper Text Markup Language• HTML 4.0 has strict compliance with XML

standard• Presentation details presented with

information –using markups

• Browsers act as interpreters/parsers in –parsing through HTML documents–displaying the contents of the documents

Page 2: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

3Venkat Subramaniam – [email protected]

Tags, Elements and Attributes<STRONG>boldface Text</STRONG>

<HR><TABLE BORDER="1">…</TABLE>

• Tag starts with < and ends with >• Elements generally have start and end tags

– starts with <TagName> – ends with </TagName> (optional in some cases)– contents of elements included between tags

• Attributes – Name=Value specifies information about contents in

an element– Provided between tag name and ending >– Multiple attributes separated by space

4Venkat Subramaniam – [email protected]

Tags, Case, well-formedness• HTML is relaxed when it comes to case and

well-formedness• <HR> is as good as <hr> as are <Hr> and <hR>

• <STRONG>This is <I> italics</I> Text</STRONG>

• However, – <STRONG>This is <I> italics</STRONG> </I> Text

– Is generally accepted, though not well-formed– How does a browser handle this? Try it on different

browsers

• XML on the other hand is well-formed and case sensitive

• XHMTL is HTML following XML restrictions

Page 3: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

5Venkat Subramaniam – [email protected]

Tags, Line Breaks, Special Characters

• Block-level tags affect a block of text/content– HEAD, BODY, P, H1, BR, UL, TABLE

• Inline tags affect only a few letters or words– EM, B, IMG

• Line breaks– generally include automatic in block-level tags– Not so with inline tags

• Special characters– <, >, & and " are special characters– To display these use names (&lt;, &gt;, &amp;, &quot;) or numbers ()

6Venkat Subramaniam – [email protected]

Common Tags• <HTML> Optional tag indicating content

type• <TITLE> Title of a web page• <BODY> Content of a web page• <Hn ALIGN=direction>

Level 1 to 6 of header (Times New Roman

24, 18, 14, 12, 10 and 8 points)

direction = left, right or center• <P ALIGN=direction>

Space between paragraphs

Page 4: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

7Venkat Subramaniam – [email protected]

Text Formatting – Font, Size• Specifying Font (deprecated in HTML 4.0)• <FONT SIZE="value" FACE="name1, name2" COLOR="value">– Size value may be 1 to 7 (Times 8, 10, 12, 14, 18,

24, 36)– Size may also be +n or –n to specify a point higher

or lower• Also may be altered with <BIG> or <SMALL> tags

– If name1 is not available on system, select name2• More alternatives may be specified

– If none of the alternatives available, choose default

• You may set default size for entire document using <BASEFONT SIZE=“value”>

8Venkat Subramaniam – [email protected]

Text Formatting - Color• Color value can be specified

–using either #rrggbb value–Or using “color” for one of 16 predefined

colors

• <BODY TEXT=“value”>–Sets the default color for text in the

document

• <FONT COLOR=“value”>–Sets the color for the content of this element

Page 5: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

9Venkat Subramaniam – [email protected]

Text Formatting -Miscellaneous

• <SUB> for subscript• <SUP> for superscript• <STRIKE> for strikeout• <U> for underline• <B> or <STRONG> for boldface• <I> or <EM> for italics• <CODE>, <KBD>, <SAMP>, <TT> for monospace• <BLINK> for blinking text• <!– to start comments and end with -->• All these tags have a start and end tag

10Venkat Subramaniam – [email protected]

Links• Links are used to relate documents together

– to navigate, to view, to take some action, etc.

• Link has three parts destination, label and target<A HREF=“anotherPage.html” >Next</A>

– HREF provides target, Next is the label– A special attribute called TARGET may be used to tell browser

to display in another frame or new window (_blank)

• target names are case sensitive• <BASE TARGET=“…”> in head section sets default target for page

• Good practice to use relative URL – use absolute for outside web pages

• Links may be of other types: ftp, news, mailto, etc.

Page 6: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

11Venkat Subramaniam – [email protected]

Links and Anchors• You may define an anchor within a

document– <A NAME=“anchorName”>…</A>

• You may link to that location in document by– <A HREF=“#anchorName”>label</A>

– <A HREF=“URL#anchorName”>label</A>

12Venkat Subramaniam – [email protected]

Tables<TABLE>

<TR><TD>cell 1 content</TD><TD>cell 2 content</TD>

</TR>…

</TABLE>

• TABLE attribute BORDER=n defines thickness – default is 2– If you do not specify, the border is drawn with space, not line– to add extra space around table, use HSPACE or VSPACE

• TABLE attribute ALIGN=center will center the table • TABLE or TD attribute WIDTH=n sets cell width pixels

– size specified ignored if specified space is too small for contents• Attribute of TD, COLSPAN=n specifies number of columns to span

– use ROWSPAN to span across rows• Use <TH> for table header, centered and boldfact• Use <CAPTION> for a table caption

– attribute ALIGN=direction (top, bottom, left, right)

Page 7: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

13Venkat Subramaniam – [email protected]

Lists• You may create (un)ordered list and definitions

lists– May be plain, numbered, bulleted

<OL TYPE=X><LI> list item 1</LI><LI> list item 2</LI>

</OL>– Type is optional (defaults to 1 for numbers)– A for capital letters, a for small letters, I for capital

roman numerals, i for small roman numerals– Use START=n for initial value for list item

• always numeric and converted automatically to proper type– In LI, may override TYPE, VALUE for this & following items

14Venkat Subramaniam – [email protected]

Unordered List• Use <UL> to create unordered list

• Use attribute TYPE=shape for bullet type–disc for solid round bullet (default for 1st

level)– circle for an empty round bullet (default for

2nd level)– square for square bullets (default for >= 3rd

level)

• <LI> may override the type

Page 8: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

15Venkat Subramaniam – [email protected]

Definition Lists• Great to create lists that describe items

–Like glossaries

<DL>Text here will appear on own line<DT>Text To Appear On Own Line Aligned Left</DT><DD> Definition text </DD>…</DL>

– You may have multiple of DLs and DTs to allow multiple words or definitions

16Venkat Subramaniam – [email protected]

Images• HTML tag IMG allows placement of images• <IMG SRC=“LocationAndNameOfImageFile”>

• Attributes– BORDER=“n”– ALT=“tooltip or alternate text”

• specify a text that may appear instead of image• this also serves a tool tip on windows• a required attribute in HTML 4

– WIDTH=“x” HEIGHT=“y”• allows browser to optimize size for image while displaying text

– LOWSRC• specify a fast load low resolution image to be shown first• high resolution image is loaded slowly replacing the low resolution

image– ALIGN

• align left or right to allow text wrapping around image– HSPACE=“pixel” VSPACE=“pixel”

• Provides padding on sides (horizontal and vertical) around image

Page 9: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

17Venkat Subramaniam – [email protected]

BR, CLEAR and Text Wrapping• <BR> command provides a line break• CLEAR attribute says do not begin text until the

specified margin is clear

– <BR CLEAR=“left”>• Do not begin text until left margin is clear of images

– <BR CLEAR=“right”>• Do not begin text until right margin is clear of images

– <BR CLEAR=“all”>• Do not begin text until both margins are clear of images

18Venkat Subramaniam – [email protected]

Forms• Form has three parts

–FORM tag with URL of the action script– form elements, text, radio buttons, etc.–Submit button to send data to the script

<FORM METHOD=POST ACTION=“scriptURL”>…</FORM>

• The method may be POST or GET–GET is limiting in the amount of information

sent• sent as part of query string

Page 10: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

19Venkat Subramaniam – [email protected]

FORM elements• Elements are created using<INPUT TYPE=“type” NAME=“name” VALUE=“initvalue”>

– name and user given value are sent as name=value– Use attributes DISABLED or READONLY if desired

• Text box– TYPE=“text”– Attributes: SIZE=“n” MAXLENGTH=n

– last two attributes are in number of characters, optional

– SIZE defaults to 20

• Password box– A text box where what you type is not shown

(asterisks)– Not encrypted when sent to server, though

20Venkat Subramaniam – [email protected]

• Radio button– TYPE=“radio”– NAME=“radioset”

• where radioset is group name for mutually exclusive buttons• verifies that only one of the group is set• This is the name sent to server side script, as well

– attribute CHECKED if you like button checked initially– VALUE=“value” is the value sent if this button

checked

• Check box– TYPE=“checkbox”

– attribute CHECKED if you like button checked initially– VALUE=“value” is the value sent if this button

checked

FORM elements…

Page 11: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

21Venkat Subramaniam – [email protected]

• Uploading files– TYPE=“file”

– NAME=“title” for server to identify– SIZE=n number of chars of field to enter

path/file• default 20

– In the FORM tag, use attribute ENCTYPE=“multipart/form-data”

– METHOD on FORM should be POST

• Hidden fields–Useful to maintain session information– TYPE=“hidden”

FORM elements…

22Venkat Subramaniam – [email protected]

• Menu<SELECT NAME=“name” SIZE=“n” MULTIPLE><OPTION SELECTED VALUE=“value”>label</OPTION>…</SELECT>

– SIZE is height in lines– SELECTED is optional, initial selection of menu item

• Text Area– When one line is not enough– <TEXTAREA NAME=“name” ROWS=“n” COLS=“n” WRAP>

– ROWS defaults to 4 and COLS to 40, WRAP optional– User may provide up to 32,700 chars

FORM elements…

Page 12: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

23Venkat Subramaniam – [email protected]

• Submit button<INPUT TYPE=“submit” VALUE=“button text”>– if you do not provide value, the word Submit

appears– if you set the name attribute, value is sent to server

• Use TYPE=“reset” to provide a clear/reset button

• HTML 4 adds BUTTON tag that allows you to– change the font– background color– image<BUTTON TYPE=“submit” NAME=“name” VALUE=“value”STYLE=“font: size FontName;background:color”>Text to left of image <IMG SRC=“imageFileName”>Text to right of image

</BUTTON>

FORM elements…

24Venkat Subramaniam – [email protected]

• You may also use an image to send information

• <INPUT TYPE=“image”SRC=“imageFileName”NAME=“name”>

• Mouse coordinate on which user clicks is sent –as name.x and name.y–Top-left of image is (0, 0)

FORM elements…

Page 13: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

25Venkat Subramaniam – [email protected]

Organizing Form Elements• You may put a box around elements

<FORM…><FIELDSET><LEGEND ALIGN=right>box caption</LEGEND>

… elements …

</FIELDSET>… other fieldsets

</FORM>• Simply surround elements with FIELDSET

element

26Venkat Subramaniam – [email protected]

Running a Script on Input• It is useful to run a script when user

makes a selection– JavaScript is the default scripting language

• Simply add an attribute of an event type to the tag

• Specify the code to execute–You may either type the code right there or

refer to it<BUTTON TYPE=“button” NAME=“Time”

ONCLICK=“alert(‘Today is ‘ + Date())”>

Current Time</Button>

We will see this put to work in JavaScript session

Page 14: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

27Venkat Subramaniam – [email protected]

HTML Events• ONBLUR user leaves an element that has focus• ONCHANGE user modifies content of element (like INPUT)• ONCLICK / ONDBLCLICK user clicks / double clicks on specified area• ONFOCUS user selects, clicks or tabs to element• ONKEYDOWN / ONKEYPRESS user types something in the specified area• ONKEYUP user releases key after typing• ONLOAD page is loaded in browser• ONMOUSEDOWN mouse pressed down over the element• ONMOUSEMOVE mouse moved over after pointing at element• ONMOUSEOVER mouse moved away from element after being

over• ONMOUSEUP mouse released after the click• ONRESET form’s reset button clicked• ONSELECT selected one or more words in element• ONSUBMIT form’s submit button clicked• ONUNLOAD browser loads different page after specified

page

28Venkat Subramaniam – [email protected]

Cascading Style Sheets• HTML allows specification of fonts, colors,

etc.• These may be placed through out the

document– results in poor maintainability–What if you want to change these

• This is where CSS comes in • You specify the formatting or styling

separately in – the top of the document–or in a separate document

Page 15: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

29Venkat Subramaniam – [email protected]

CSS: Specifying Style• Instead of defining style all over document,• specify at the top and simply refer to it in

document• Specification has two parts:

– selector• this is a name you associate a style with

– declarations• this is definition of how it should look

• The specification may be local, internal or external

• The cascade:– local overrides internal which in turn may override

external specifications

30Venkat Subramaniam – [email protected]

CSS: Local Style• This style applies to the element on which

it is declared

• This takes a local effect

• Useful to alter the style specified internally in the document or externally from another file

Page 16: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

31Venkat Subramaniam – [email protected]

CSS: Internal Style• Specified between the <HEAD> and the

</HEAD>• Provide one or more selectors

–Separate by comma for declarations to apply to all of selectors

–Separate by space if declarations to apply to only nested selectors and not other appearances

• Provide the declarations –within the {}, separated by ;

32Venkat Subramaniam – [email protected]

CSS: External Style Sheet• Writing the style in a separate file allows

sharing of the style and applying it to more than one page

• Pages link the style sheet that specifies the style

• You may apply internal style sheet as well as local at the same time

Page 17: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

33Venkat Subramaniam – [email protected]

CSS: Defining Classes• You can define a class or category and

style for that class

• Any element defined to be as part of that class will use the specified style for that class

• Classes are defined to belong to a certain selector type using the format selectoryName.className

34Venkat Subramaniam – [email protected]

CSS: Defining IDs• ID can be defined for individual elements in

your document– The ID must be unique

• Style can be specified for that tag/element– Tag name followed by # followed by the ID

• The style applies only for that element with that ID

• Scripts may also identify that element in document

Page 18: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

35Venkat Subramaniam – [email protected]

CSS: DIV and SPAN• Style may be specified on pre-defined

tags – like Hn and P–how to apply style on a wide range of items?

• DIV and SPAN allows you to define areas of document over which a style may be applied

• DIV is a block-level tag while SPAN in an inline tag

36Venkat Subramaniam – [email protected]

CSS: Font Styles• font-family

• specify a list of fonts to choose from• font-family:”Times Roman”, “Helvetica”, “Ariel”

• font-style• specify whether font should be italic, oblique, or normal• font-style:italic• to remove italic font-style:normal

• font-weight• specifies boldness of text; possible values: bold, bolder, lighter

• or multiple of 100s between 100 and 900, with 400 for book weight and 700 for bold

• normal will remove bold

• font-size• specify absolute font size: xx-small, x-small, small, medium, large, x-large, xx-large

• specify relative font size: large, small• exact point size: 18pt• percentage relative size: 200%

Page 19: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

37Venkat Subramaniam – [email protected]

CSS: Font Style…• line-height

– specifies the space between lines (leading) within a paragraph

– line-height:15pt or line-height:50%

• All the font-styles may be specified in one shot as well–Specify in the following order, space

separated:• font-size/line-height font-weight small-cap font-

size font-family– / separates font-size from line-height

38Venkat Subramaniam – [email protected]

CSS: Text Color Style• color

– specify one of 16 colors or #rrggbb or rgb(r, g, b) or (r%, g%, b%)

• background– transparent or a color value–url(image.gif) to specify an image file name– repeat to tile the image, repeat-x for

horizontal tiling, repeat-y for vertical tiling– fixed or scroll for background to scroll along

canvas–x y for position of background image from

top-left corner

Page 20: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

39Venkat Subramaniam – [email protected]

CSS: Text Spacing Style• word-spacing• letter-spacing• text-indent• white-space

– pre to preserve extra spaces; nowrap to keep elements on same line; normal to return to normal behavior

• text-align– left, center, right, justify

• text-decoration– underline, overline, line-through, none, blink

• blink not supported by IE, generally not recommended as well

• text-transform– capitalize, uppercase, lowercase, none

• font-variant:small-caps will type uppercase in lowercase size

40Venkat Subramaniam – [email protected]

Markup and XML• Markup

–conveying metadata with literals/tags to delimit, describe

–Generalized Markup Language (GML)–Standard Generalized Markup Language

(SGML)• adopted by ISO• Popular use, however, too complex

• eXtensible Markup Language (XML) – designed by World Wide Web Consortium (W3C)

– subset of SGML–simpler to read, write and develop parsers

Page 21: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

41Venkat Subramaniam – [email protected]

Why XML?• HTML is de facto standard for mark up

–Markup for information presentation–Talks about how information looks, is

presented–Does not let you add more markups of your

own• What about the information itself?• Need to

–describe information–Extend the descriptions–Must be structured, easy to express and

validate

42Venkat Subramaniam – [email protected]

What is XML?• XML is about extensibility and flexibility• tags describe and surround the data• Example:<?xml version = "1.0" ?><equipment>

<pump><name> p01 </name><pressure units="psi"> 32.23 </pressure>

</pump><pump>

<name> p02 </name><pressure units="psi"> 22.887 </pressure>

</pump></equipment>

• Open, extensible• Platform

independent• Self describing data

–Data Exchange

• Supports query and discovery of data

• Dynamic Data Exchange

Page 22: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

43Venkat Subramaniam – [email protected]

What does XML provide?• Tags delimit content

– lets you define structure of arbitrary complexity

• Self Describing Data– tags describe and name the data being defined– name related to the information it models/represents

• standard eXtensibility – in defining new tags & semantics

• Vocabularies–description of data used for information

exchange –within specific domains

• Separates contents from presentation

44Venkat Subramaniam – [email protected]

XML System

XML Document

XML Constraint(DTD, Schema)

XML Parser/Processor/Styling

XML APP

Page 23: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

45Venkat Subramaniam – [email protected]

• Well-Formed syntax• Document Type Definitions (DTDs)

– Captures rules added to extend core syntax rules

• Document Object Model (DOM)– API for manipulating, parsing, creating XML

documents– provides a tree-structured view of the document– Standard API

• Simple API for XML (SAX)– Provides events as document is being parsed– Leaves it to application to keep state and content

information

• Styling and Transformation (XSL and XSLT)

Features of XML technologies

46Venkat Subramaniam – [email protected]

The Markup Syntax• XML Entity

–A file or stream with a well-formed structure

• Tags delimit the elements of the structure

• XML Tags are case-sensitive• XML uses Unicode character set• Names are used to identify structures

–Names begin with letter, underscore or colon• Followed by any chars, including numbers, hyphen & period

Start TagAttributes

Content End Tag

Page 24: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

47Venkat Subramaniam – [email protected] (Optional) : comments, processing instructions

Structure of a DocumentProlog (Optional) : comments, processing instructions

BODY : Root Elementcommentsprocessing instructionsElements

AttributesCDATA, Entities, ID,…

PCDATAEntity References

Entity ReferencesCDATA Sections

Document Type Declarationcommentsprocessing instructionsDocument Type Definitions

Element DeclarationsAttribute DeclarationsEntity DeclarationsNotation Declarations

48Venkat Subramaniam – [email protected]

Markups that go in XML Document

• The following tags may be contained in any XML document–Element start and end tags–Attributes–Comments–Entity references–Processing instructions–Character data sections (CDATA)–Document type declarations

Page 25: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

49Venkat Subramaniam – [email protected]

A Sample XML File

50Venkat Subramaniam – [email protected]

Elements• Building blocks of an XML document• Element content may include

– Other elements– Character data– Character references– Entity references– Processing instructions– Comments– CDATA sections

• Empty elements may be abbreviated to save space– <ElementTypeName/> indicates an empty

element

StartTag Content EndTag

<ElementTypeName> </ElementTypeName>

Page 26: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

51Venkat Subramaniam – [email protected]

Document and Elements• XML document may be viewed as a

hierarchical tree

Document DocumentRoot

Prolog

DocumentElement

Epilog

Element*

Represents containment/aggregation

*

52Venkat Subramaniam – [email protected]

Contents• Element Content

–Contains other elements but no character data

• Mixed content–Contains character data and other elements

• Character content–Contains nothing but character data

• Empty element–Contains nothing

Page 27: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

53Venkat Subramaniam – [email protected]

Nesting• XML requires proper nesting of elements• Items must be fully contained within their

nested level

• XML is strict about proper nesting unlike HTML–Allowing ambiguity leads to programming

complexity–Keep it simply policy–Gives not well-formed error if encountered–Results in fatal error/termination of parsing

54Venkat Subramaniam – [email protected]

Name• A name

–begins with an alphabetic character or an underscore

– followed by alphanumeric characters, periods, hyphens, underscores or full stops

Name = (Letter | '_') (Char)*Char = Letter | Digit | '.' | '-' | '_'

Page 28: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

55Venkat Subramaniam – [email protected]

XML String Literals• Literals are delimited by apostrophe or quote

• "hello" 'hi'

• Character used as delimiter can’t appear in literal• "George, What's up!"• 'He said "what a nice day!"'• Following is not valid: 'what's up'

– apostrophe may be used as an escape character in front of a quote• "He said '"what a nice day!'""

– quote may be used as an escape character in front of an apostrophe• 'George, What"'s up!'

• What if you need to use apostrophe and quote– You may use entity reference: the &apos; or &quot;

• 'I asked George, What&apos;s up, "He said, fine"'

56Venkat Subramaniam – [email protected]

Attributes• Element generally describes & contains

information• Attributes provide information that are

part of element rather than being contained in it– Generally talks about the information format, etc.

• Name-value pair•attributeName="value"•attributeName='value'

– The value must be a string literal; numbers not allowed

– An attribute may appear only ones within a tag

Page 29: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

57Venkat Subramaniam – [email protected]

Special Attributes• xml:space

– White spaces are not generally preserved– How does one indicate that there is a space– xml:space tells that a space is encoded into the

document– Recommends that the space must be preserved– Applications may choose to honor or ignore the

space– Must take a value of "preserve" or "default"

• xml:lang– Indicates the language/locale info of the XML

document• If present, these two attributes apply on all nested

elements as well

58Venkat Subramaniam – [email protected]

Special Characters• White spaces:

– Horizontal Tab(09), Line-feed(0A), Carriage-return(0D),space (20)

– Parsers preserve white spaces within element content

– May remove from attributes and element tags

• End-of-line–End of line is generally indicated by

• A carriage-return followed by line-feed• Only a line-feed• Only a carriage-return

• XML parsers required to convert to single line-feed–UNIX-style favored

Page 30: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

59Venkat Subramaniam – [email protected]

Character References• Character References

–Represent displayable characters that can’t be placed in a well-formed document as is

–The character may be represented using• &# prefixed before a decimal number

representing char• &#x prefixed before a hexadecimal number

representing char

60Venkat Subramaniam – [email protected]

Entity References• Entity References

–Think of these as macro definitions–Allows insertion of string literals–Provides mnemonic equivalence–Starts with an & and ends with a ;

–Predefined Entity references:•&amp;, &lt;, &gt;, &apos;, &quot;

• Rather than repeating content, you can refer where to find it–Declare the substitution text in doctype –Refer to it by &name;

Page 31: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

61Venkat Subramaniam – [email protected]

Processing Instructions• Processing Instructions (PI) allows you to

provide hints to applications as part of the document

• PI consists of two things:–a target tag followed by instruction

•<?target instruction ?>

–The target tag is an XML name that identifies the application the instruction is intended for

– Instruction is a string literal

• To avoid confusion with – <?xml version = "1.0" ?>

–PI can’t be a string "xml" or "XML"

62Venkat Subramaniam – [email protected]

XML Comments• Comments may be present any where in

a document–Except as part of other markup

• Comments start with <!-- and end with -->• May contain any string that does not

–have --–does not end with -

• Entities within comments are not expanded

• Markups within comments are not interpreted

Page 32: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

63Venkat Subramaniam – [email protected]

CDATA Sections• CDATA sections are bulk of document

that will not be interpreted for markup<![CDATA[ ]]>

• Starts with the tag:– <![CDATA[

• Ends with the tag– ]]>

• The contained text can’t have–String that contains the delimiter ]]>

–Nested CDATA

non parsed data

64Venkat Subramaniam – [email protected]

Prolog• Optional member of an XML document• Provides hints and information on

encoding methods• Contains

–Optional XML declaration–Optional comments (several)– PIs–White space characters–Optional Document Type Declarations (not

DTDs)• Ties DTD to the document

Page 33: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

65Venkat Subramaniam – [email protected]

XML Declaration• XML declaration is optional• If present

–Must be the first in the document• No comments or white spaces allowed to precede

–The xml tag must be lowercase•<?xml version="1.0" ?>

• Attributes:–version required. For future versions–encoding optional. UTF-8, UTF-16, IS-8859-1 (Latin-1),

etc.

– standaloneoptional. yes or no (external DTD required)

66Venkat Subramaniam – [email protected]

Epilog• Optional member of an XML document

• Contains–Optional comments (several)– PIs–White space characters

• Use of this is ambiguous since it is optional and most applications may not wait for reading this

Page 34: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

67Venkat Subramaniam – [email protected]

Well-formed Document• An XML document is said to be well-

formed if–The document syntax conforms to XML

specifications–Elements form a hierarchical tree with a

single root node–There are no references to external entities

• Unless DTD is provided

–A Well-formed XML document is• case sensitive• expects you to close tags• does not allow overlapping tags

68Venkat Subramaniam – [email protected]

Parsers• An XML Processor or Parser is an application

that will read through an XML document and interpret it

• Parser Types– Non-validating

• Ensures data object/document is well-formed XML– Validating

• Validates, using DTD, well-formed data object’s form and content

• Parser Implementations– Event-driven Parsers

• Parser calls back into application as it identifies data• Applications handle the data• Parser does not keep the tree structure or the data upon parsing• Memory resource usage is minimal

– Tree-based Parsers• A tree structure of the document is built in memory• This tree is then manipulated using an interface

Page 35: HTMLsvenkat/STSD/slides/pdf/session20.pdf · 2018. 6. 27. · – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) –Szie may aslo be + n or –n to specify a point higher

69Venkat Subramaniam – [email protected]

XML Parsers• Several parsers available in the market

–Xerces (Apache)– JAXP (More of an API from Sun)–MSXML (Microsoft)–Expat (James Clark)–RXP (Richard Tobin)–XP (James Clark)–XML4J (IBM)–XML::Parser (Clark Cooper)–Pyexpat (Jack Jansen)– Lark (Tim Bray)–TclXML (Steve Ball)

70Venkat Subramaniam – [email protected]

Major APIs• DOM API

• SAX API

• JDOM

• XSLT

• XPath