23
Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 1 INTRODUCTION XML (Extensible Markup Language) is a flexible way to create common information formats and share both the format and the data on the World Wide Web, intranets, and elsewhere. For example, computer makers might agree on a standard or common way to describe the information about a computer product (processor speed, memory size, and so forth) and then describe the product information format with XML. Such a standard way of describing data would enable a user to send an intelligent agent (a program) to each computer maker's Web site, gather data, and then make a valid comparison. XML can be used by any individual or group of individuals or companies that wants to share information in a consistent way. XML, a formal recommendation from the World Wide Web Consortium (W3C), is similar to the language of today's Web pages, the Hypertext Markup Language (HTML). Both XML and HTML contain markup symbols to describe the contents of a page or file. HTML, however, describes the content of a Web page (mainly text and graphic images) only in terms of how it is to be displayed and interacted with. For example, the letter "p" placed within markup tags starts a new paragraph. XML describes the content in terms of what data is being described. For example, the word "phoneme" placed within markup tags could indicate that the data that followed was a phone number. This means that an XML file can be processed purely as data by a program or it can be stored with similar data on another computer or, like an HTML file, that it can be displayed. For example, depending on how the application in the receiving computer wanted to handle the phone number, it could be stored, displayed, or dialled. XML is "extensible" because, unlike HTML, the markup symbols are unlimited and self- defining. XML is actually a simpler and easier-to-use subset of the Standard Generalized Markup Language (SGML), the standard for how to create a document structure. It is expected that HTML and XML will be used together in many Web applications. XML markup, for example, may appear within an HTML page. Early applications of XML include Microsoft's Channel Definition Format (CDF), which describes a channel, a portion of a Web site that has been downloaded to your hard disk and is then is updated periodically as information changes. A specific CDF file contains data that specifies an initial Web page and how frequently it is updated. Another early application is ChartWare, which uses XML as a way to describe medical charts so that they can be shared by doctors.Applications related to banking, e-commerce ordering, personal preference profiles, purchase orders, litigation documents, part lists, and many others are anticipated. VALIDATING XML FILES When you validate your XML file, the XML validator will check to see that your file is valid and well-formed. The XML editor will process XML files that are invalid or not well-formed. The editor uses heuristics to open a file using the best interpretation of the tagging that it can. For example, an element with a missing end tag is simply assumed to end at the end of the document. As you make updates to a file, the editor incrementally reinterprets your document, changing the highlighting, tree view, and so on. Many formation errors are easy to spot in the

Web engineering UNIT IV as per RGPV syllabus

Embed Size (px)

Citation preview

Page 1: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 1

INTRODUCTION

XML (Extensible Markup Language) is a flexible way to create common information formats

and share both the format and the data on the World Wide Web, intranets, and elsewhere. For

example, computer makers might agree on a standard or common way to describe the

information about a computer product (processor speed, memory size, and so forth) and then

describe the product information format with XML. Such a standard way of describing data

would enable a user to send an intelligent agent (a program) to each computer maker's Web site,

gather data, and then make a valid comparison. XML can be used by any individual or group of

individuals or companies that wants to share information in a consistent way.

XML, a formal recommendation from the World Wide Web Consortium (W3C), is similar to the

language of today's Web pages, the Hypertext Markup Language (HTML). Both XML and

HTML contain markup symbols to describe the contents of a page or file. HTML, however,

describes the content of a Web page (mainly text and graphic images) only in terms of how it is

to be displayed and interacted with. For example, the letter "p" placed within markup tags starts

a new paragraph. XML describes the content in terms of what data is being described. For

example, the word "phoneme" placed within markup tags could indicate that the data that

followed was a phone number. This means that an XML file can be processed purely as data by

a program or it can be stored with similar data on another computer or, like an HTML file, that it

can be displayed. For example, depending on how the application in the receiving computer

wanted to handle the phone number, it could be stored, displayed, or dialled.

XML is "extensible" because, unlike HTML, the markup symbols are unlimited and self-

defining. XML is actually a simpler and easier-to-use subset of the Standard Generalized

Markup Language (SGML), the standard for how to create a document structure. It is expected

that HTML and XML will be used together in many Web applications. XML markup, for

example, may appear within an HTML page.

Early applications of XML include Microsoft's Channel Definition Format (CDF), which

describes a channel, a portion of a Web site that has been downloaded to your hard disk and is

then is updated periodically as information changes. A specific CDF file contains data that

specifies an initial Web page and how frequently it is updated. Another early application is

ChartWare, which uses XML as a way to describe medical charts so that they can be shared by

doctors.Applications related to banking, e-commerce ordering, personal preference profiles,

purchase orders, litigation documents, part lists, and many others are anticipated.

VALIDATING XML FILES

When you validate your XML file, the XML validator will check to see that your file is valid

and well-formed. The XML editor will process XML files that are invalid or not well-formed.

The editor uses heuristics to open a file using the best interpretation of the tagging that it can.

For example, an element with a missing end tag is simply assumed to end at the end of the

document. As you make updates to a file, the editor incrementally reinterprets your document,

changing the highlighting, tree view, and so on. Many formation errors are easy to spot in the

Page 2: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 2

syntax highlighting, so you can easily correct obvious errors on-the-fly. However, there will be

other cases when it will be beneficial to perform formal validation on your documents.

You can validate your file by selecting it in the Navigator view, right-clicking it, and

clicking Validate. Any validation problems are indicated in the Problems view. You can double-

click on individual errors, and you will be taken to the invalid tag in the file, so that you can

make corrections.

Note: If you receive an error message indicating that the Problems view is full, you can increase

the number of error messages allowed by clicking Window > Preferences and

selecting General > Markers . Select the Use marker limits check box and change the number

in theLimit visible items per group field.

You can set up a project's properties so that different types of project resources are automatically

validated when you save them. From a project's pop-up menu, click Properties, then

select Validation. Any validators you can run against your project will be listed in the

Validation page.

The purpose of a Document Type Definition or DTD is to define the structure of a document

encoded in XML (eXtended Markup Language).

For introductory material about XML, see the XML help page.

It is possible to build and use files containing XML tags without ever defining what tags are

legal. However, if you want to insure that files conform to a known structure, writing a DTD is

the preferred method.

A well-formed file is one that obeys the general XML rules for tags: tags must be

properly nested, opening and closing tags must be balanced, and empty tags must end

with '/>'.

A valid file is not only well-formed, but it must also conform to a publicly available

DTD that specifies which tags it uses, what attributes those tags can contain, and which

tags can occur inside which other tags, among other properties.

The advantage of a valid file is that its contents are more predictable for applications that want to

process or present that file. The DTD insures that only certain tags can be used in certain places.

DEFINITIONS

We need to review some terminology before proceeding:

A proper XML name must start with a letter or underbar (_), with the rest letters,

underbars, digits, or hyphen (-).

A tag is one of the XML constructs used to mark up documents. All tags start with a less-

than symbol (<) and end with a greater-than symbol (>).

Page 3: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 3

An element is a section of an XML document that acts as a unit. It may be either empty

element, or it may have content.

An empty element consists of a single tag of the form

<gi.../>

Where gi is the tag type (or ―generic identifier‖), and the tag may include attributes. Note the

slash before the closing ―>‖; this signifies an empty tag.

An opening tag begins a section of an XML document that ends with the

corresponding closing tag. An opening tag has this form:

<gi...>

where gi is the tag type (or ―generic identifier‖), and the tag may include attributes. A closing

tag has the form:

</gi>

The content is everything between the opening tag and its corresponding closing tag. The

content may be other elements or just plain text.

The DTD can contain several different types of declarations:

Element declarations let you specify what kinds of tags can be used, and what (if

anything) can appear inside the contents of the element.

Attribute declarations define what attributes you can use inside a given element.

Entity declarations define chunks of fixed text that can be included elsewhere.

Notation declarations define file types (like JPG and WAV files) so you can refer to non-

XML files like image and sound files.

ELEMENTS WITH MIXED CONTENT

In general, an element can have any mixture of text and other elements as children. You can

specify exactly which elements can be children. If you like, you can even specify that the

children must occur in a given order. You can also specify that the child elements are optional.

So, in the general form of the declaration <!ELEMENT gi (content)>, the content is an

expression syntax—that is, it consists of operators and operands arranged in arbitrarily complex

ways. Let's start with some simple cases to show you the features of a content declaration, but

keep in mind that these features can be used in combination. The simplest case is when an

element a has a single child element b:

<!ELEMENT a (b)>

Page 4: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 4

The above declaration in a DTD means that an element <a>...</a> must contain exactly

one <b> element.

To specify that a child element can occur one or more times, append a plus sign (+) after the

child element name. For example, to say that a <squid> element may contain one or

more <tentacle> elements:

<!ELEMENT squid (tentacle+)>

You can also specify that a child element can occur any number of times, or not at all. Append

an asterisk (*), meaning ―zero or more of the previous,‖ after the child element name:

<!ELEMENT lizard (leg*)> <!-- some <lizard>s have no <leg>s -->

The question-mark suffix (?) means the child element is optional: it can occur zero or one time

in the content of the element you're declaring. For example, suppose an <oven> element can

either be empty or contain a <pie> element:

<!ELEMENT oven (pie?)>

If you want a certain sequence of children, name the child elements in a comma-separated list.

For example, suppose a <memo>element must contain exactly one <from> element, then

one <to> element, one <subject>, and one <message> element:

<!ELEMENT memo (from,to,subject,message)>

But you can use the +, *, and ? operators in this declaration. For example, suppose that you want

to require that a <memo> must have<from> and <to> elements, but the <subject> element is

optional, and it can have zero or more <message> elements. You'd then declare it like this:

<!ELEMENT memo (from,to,subject?,message*)>

Sometimes you need to specify that there is a choice of children. The ―or‖ operator (|) can be

used to separate the choices. For example, suppose that a <trophy> element can have either a

child named <bowling> or a child named <tennis>. Here's how you'd declare it:

<!ELEMENT trophy (bowling|tennis)>

You can also apply the usual suffix operators to groups of elements. For example, suppose you

have an element <timerecord> that starts with a required <purpose> element, followed by zero

or more pairs of <start-time> and <end-time> records:

<!ELEMENT timerecord (purpose,(start-time,end-time)*)>

Here's another more general example:

<!ELEMENT stock ((pig|chicken|cow)*)>

The above example says a <stock> element can contain any number of the three child elements,

in any order.

Page 5: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 5

Moreover, you can allow regular, untagged text to be mixed in with your specified child tags by

placing #PCDATA at the start of a list of choices. For example, suppose a <speech> element can

contain any mixture of regular text, and text tagged with the elements<loud> and <soft>:

<!ELEMENT speech ((#PCDATA|loud|soft)*)>

<!ELEMENT loud (#PCDATA)>

<!ELEMENT soft (#PCDATA)>

So, the content part of the element declaration can be arbitrarily complex. There are some ways

#PCDATA cannot be used, and there are other uncommon features you may need; refer to the

XML standard or a good book on the subject.

ATTRIBUTE DECLARATIONS

If an element is to have attributes, the names and possible values of those attributes must

be declared in the DTD. Here is the general form:

<!ATTLIST ename {aname atype default} ...>

where ename is the name of the element for which you're defining attributes, aname is the name

of one of that element's possible attributes, atype describes what values it can have,

and default describes whether it has a default value. The last three items can be repeated inside

an <!ATTLIST...> declaration, one group per attribute.

The atype part describing the attribute's type can have three kinds of values:

The keyword CDATA means that the attribute can have any character string as a value.

For example, suppose you want every <play> element to have a title attribute that can contain

any text, and that attribute is required. Here is the complete attribute declaration:

<!ATTLIST play title CDATA #REQUIRED>

There are several tokenized attribute types, which are required to have a certain

structure. See tokenized attributes below.

You can provide a specific set of legal values for the attribute; see enumerated

attributes below.

The last part of the declaration, default, specifies whether the attribute can be omitted, and what

value it will have if omitted. This must be one of the following:

REQUIRED- The attribute must always be supplied.

IMPLIED - The attribute can be omitted, and the DTD does not provide a default value.

Anyone reading this file may assume a default value, but that is not the DTD's problem. "value"

The attribute can be omitted, and the default value is the quoted string that you provide.

Page 6: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 6

FIXED "value" - The attribute must be given and must have the given "value".

TOKENIZED ATTRIBUTES

You can restrict an attribute to have only values with a certain structure. Here are the possible

values of the atype part of the attribute declaration for such attributes:

ID

An ID attribute must be a unique identifier for that node. This allows other nodes to refer to it.

The attribute value must also be a valid XML name (see above).

IDREF

An IDREF attribute is a reference to an ID attribute in a different node.

For example, suppose that in your DTD, there is a <sailor> element with an ID-

type nickname attribute, and another element <duty> with an IDREF-type attribute called sailor-

nick. Then if you have an element like this:

<sailor nickname='Bluto'>...</sailor>

then this tag would refer to that element:

<duty sailor-nick='Bluto'>...</duty>

IDREFS

The value of an IDREFS attribute must contain one or more ID references separated by spaces.

Example:

<roster sailor-nicks='Bluto Popeye Olive_Oyl'/>

ENTITY

Use this attribute type to refer to external, non-parsed entities. See the section on notations,

below.

ENTITIES

Like ENTITY, but the attribute can be a list of one or more entity names separated by spaces.

NMTOKEN

The attribute value must be a name token, conforming to the rules for XML names (see above).

NMTOKENS

Like NMTOKEN, but the attribute value can contain one or more name tokens separated by

spaces.

ENUMERATED ATTRIBUTES

Page 7: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 7

You can specify that attributes must have one of a set of one or more values. Here is the

general form of the atype part of the<!ATTLIST...> declaration:

(value1|value2|...)

For example, suppose you want your <vehicle> element to have a kind attribute that must have a

value of either "car","truck", or "boat":

<!ATTLIST vehicle

kind (car|truck|boat) #REQUIRED>

You can also supply a default value in quotes. For example:

<!ATTLIST vehicle

kind (car|truck|boat) "car">

DECLARING AND USING ENTITIES

In a DTD, entities come in four flavours:

A general entity is a chunk of text with a name attached, so you can use the entity as a

sort of shorthand to get the related text substituted in its place.

For example, suppose you are working on a new product called Project Giant-Slayer, but you

know that the marketing department will change the name when it's released to the market. You

could define the current product name as an entity named &product, and use it everywhere in

your product literature. Then, when the marketing department decides on the final name, you can

change the declaration of the entity and the new name will magically appear in place of the old

one in all your web pages and brochures.

A character entity is one of the many standardized special characters that you can use

when you need a character unavailable in your local character set.

A parameter entity is like a general entity, but it can be used as shorthand for parts of a

content declaration in an element declaration.

A binary or non-parsed entity represents an external file that is not in XML format.

GENERAL ENTITIES

General entities have names of the form &name;, where the name follows the usual rules for

XML names (above).

To declare a general entity, use a declaration of this general form in your DTD:

<!ENTITY ename "text">

where ename is the name of the entity you are defining (without the initial & and final ;),

and text is the text you want substituted for that entity.

Page 8: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 8

For example, to define an entity named &cr; with your copyright string, you might use a

declaration like this:

<!ENTITY cr "Copyright (C) 1763 Cotton Mather LLP">

CHARACTER ENTITIES

To use special characters in your document, you can use the form &#n; where n is

the decimal number of the character you want. A table of these entities is online

at http://www.w3.org/TR/html401/sgml/entities.html.

PARAMETER ENTITIES

The purpose of a parameter entity is to serve as a short hand for some or all of the content part of

an element declaration.

The general form is:

<!ENTITY % ename "text">

For example, suppose you have a lot of tags whose content model is "#PCDATA|bold|ital)*".

You could define an entity like this:

<!ENTITY bitext "(#PCDATA|bold|ital)*">

Then, to define an element <excuse> with that content:

<!ELEMENT excuse %bitext;>

BINARY (NON-PARSED) ENTITIES

This last type of entity represents a file, like an image or sound file, that is not XML. To declare

such an entity:

<!ENTITY ename SYSTEM "url" NDATA nname>

where ename is the name of the entity you are defining, url is the URL where the file can be

found, and nname is the name of thenotation that the file uses. See the section on

notations below for an example.

NOTATION DECLARATIONS

The purpose of a notation declaration is to define the format of some external non-XML file,

such as a sound or image file, so you can refer to such files in your document.

The general form of a notation declaration can be either of these:

<!NOTATION nname PUBLIC std>

<!NOTATION nname SYSTEM url>

Page 9: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 9

where nname is the name you are giving to the notation; std is the published name of a public

notation, and url is a reference to a program that can render a file in the given notation.

There are four steps to connecting an attribute to a notation:

1. Declare the notation. Example:

<!NOTATION jpeg PUBLIC "JPG 1.0">

2. Declare the entity. For example:

3. <!ENTITY bogie-pic SYSTEM

"http://stars.com/bogart.jpg" NDATA jpeg>

4. Declare the attribute as type ENTITY. For example:

<!ATTLIST star-bio pin-shot ENTITY #REQUIRED>

5. Use the attribute:

<star-bio pin-shot="bogie-pic">...</star-bio>

In a way, you could argue that this is the most widespread use of XML, as XHTML. Because

XHTML is simply HTML 4.0 reworked, many HTML 4.0 sites are actually using an invalid

form of XHTML.

But the benefit of XML is not that it already exists as XHTML, but that you can create

web documents from XML using XSLT to transform your documents into HTML. You can then

send your XML to an XSLT processor on the web server and serve that result to the web

browser. This makes your documentation available in whatever format you need it to be in.

XML AND CONTENT MANAGEMENT

Ironically, with most websites that use XML, the web designers and content developers might

not even know that XML is there. This is because there is generally a CMS or content

management system that sits in front of the XML to make it easier for the content writers to

write their web content without worrying about how to write HTML or design web pages.

XML AND DOCUMENTATION

Many companies are moving to XML to write their internal documentation. The most common

XML platform for this is DocBook. The advantage of XML for documentation is that it can be

used to define the common traits in books, magazines, stories, advertisements, and so forth. And

DocBook already has that type of information defined.

The best thing about XML for documentation is that the XML is easy to understand for humans,

both of the actual documentation, but also the XML code surrounding it. XML can be used for

any type of documentation, from a publishing house to Marketing materials.

Page 10: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 10

Here is an example of documentation written in XML:

<howto>

<title>How to Write a Mail Link</title>

<author>Jennifer Kyrnin, Web Design Guide</author>

<description>

<paragraph>

Use a HTML tag to allow your readers to send email directly from your Web site.

</paragraph>

</description>

<directions>

<step>Write a link as usual <a href="">email me</a></step>

<step>Where you would normally put a URL, put the code "mailto" <a href="mailto:">email

me</a></step>

<step>Then put your email address after the colon <a

href="mailto:[email protected]">email me</a></step>

</directions>

</howto>

As you can see, both the data and the XML are readable and understandable. The content is also

in an order that would be expected by a human reading the document.

XML AND DATABASE DEVELOPMENT

Databases are a natural use for XML, because XML is all about data. Unlike XML for

documentation, XML for databases does not need to be readable by humans. The data is simply

written in such a way to allow machines to read it and make it accessible to a database.

Here's XML that might be loaded into a database:

<item number="00001">

<name>

<first>Jane</first>

<middle>Q</middle>

<last>Public</last>

</name>

<phone type="voice">

<areacode>407</areacode>

<number>555-1212</number>

</phone>

<phone type="fax">

<areacode>407</areacode>

<number>555-1213</number>

</phone>

<email>[email protected]</email>

</item>

Page 11: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 11

Unlike the document XML, it's not necessary that this be easily readable by humans. Since it is

meant to be input into a database, it is only important that it be processable by a computer.

HTML versus XML

The most salient difference between HTML and XML is that HTML describes presentation and

XML describes content. An HTML document rendered in a web browser is human readable.

XML is aimed toward being both human and machine readable.

Consider the following HTML.

<html>

<head><title>Books</title><head>

<body>

<h2>Books</h2>

<hr>

<em>Sense and Sensibility</em>, <b>Jane Austen</b>, 1811<br>

<em>Pride and Prejudice</em>, <b>Jane Austen</b>, 1813<br>

<em>Alice in Wonderland</em>, <b>Lewis Carroll</b>, 1866<br>

<em>Through the Looking Glass<</em>, <b>Lewis Carroll</b>, 1872<br>

</body>

</html>

The previous HTML is rendered in a browser as follows.

Page 12: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 12

The HTML above describes how bibliography information is to be presented and formatted for a

human to view in a web browser. Knowing that Sense and Sensibility is enclosed in italic tags

does not however help a program determine that it is the title of a book. XML attempts to

describe web data to address this void.

The following is XML describing the contents of the books HTML page above.

<books>

<book>

<title>Sense and Sensibility</title>

<author>Jane Austen</author>

<year>1811</year>

</book>

<book>

<title>Pride and Prejudice</title>

<author>Jane Austen</author>

<year>1813</year>

</book>

Page 13: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 13

<book>

<title>Alice in Wonderland</title>

<author>Lewis Carroll</author>

<year>1866</year>

</book>

<book>

<title>Through the Looking Glass</title>

<author>Lewis Carroll</author>

<year>1872</year>

</book>

</books>

A program parsing this data can take advantage of the fact that all book titles are enclosed

in <title> tags. Where would such a program find such information? An XML document may

contain an optional description of its grammar. A grammar describes which tags are used in the

XML document and how such tags can be nested. A grammar is a schema or road map for the

XML document. Originally an XML grammar was specified in a DTD (Document Type

Definition). A newer standard however, XSchema (XML Schema) has been adopted. XSchema

addresses some of the limitations of DTDs.

As can be seen above, XML does not contain any information indicating how the document

should be rendered in a browser. Therefore, XML factors data from presentation. The beauty of

this feature is that the same data can be presented in a variety of ways without having to replicate

any data (e.g., consider making book titles bold and authors italic).

XML SYNTAX DIFFERS FROM HTML

New tags may be defined at will

Tags may be nested to arbitrary depth

May contain an optional description of its grammar

XML can be used to store data inside HTML documents. XML data can be stored inside HTML

pages as "Data Islands". As HTML provides a way to format and display the data, XML stores

Page 14: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 14

data inside the HTML documents. The data contained in an XML file is of little value unless it

can be displayed, and HTML files are used for that purpose.

The simple way to insert XML code into an HTML file is to use the <xml> tag. The XML tag

informs, the browser that the contents are to be parsed and interpreted using the XML parser.

Like most other HTML tags, the <xml> tag has attributes. The most important attribute is the ID,

which provides for the unique naming of the code. The contents of the XML tag come from one

of two sources : inline XML code or an imported XML file.

If the code appears in the current location , it's said to be inline.

Example

Embedding XML code inside an HTML File.

<html>

<xml Id = msg>

<message>

<to> Visitors </to>

<from> Author </from>

<Subject> XML Code Islands </Subject>

<body> In this example, XML code is embedded inside HTML code

</body>

</message>

</xml>

</html>

The efficient way is to create a file and import it. You can easily do so by using the SRC

attribute of the XML tag.

Syntax

<xml Id = msg SRC = "example1.xml">

</xml>

DATA BINDING

Data binding involves mapping, synchronizing, and moving data from a data source, usually on

a remote server, to an end user's local system where the user can manipulate the data. Using data

binding means that after a remote server transmits data, the user can perform some minor data

manipulations on their own local system. The remote server does not have to perform all the data

manipulations nor repeatedly transmit variations of the same data.

Data binding involves moving data from a data source to a local system, and then

manipulating the data, such as, searching, sorting, and filtering, it on the local system.

Page 15: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 15

When you bind data in this way, you do not have to request that the remote server

manipulate the data and then retransmit the results; you can perform some data

manipulation locally.

In data binding, the data source provides the data, and the appropriate applications

retrieve and synchronize the data and present it on the terminal screen.

If the data changes, the applications are written so they can alter their presentation to

reflect those changes.

Data binding is used to reduce traffic on the network and to reduce the work of the Web

server, especially for minor data manipulations.

Binding data also separates the task of maintaining data from the tasks of developing and

maintaining binding and presentation programs.

CONVERTING XML TO HTML FOR DISPLAY

There exist several ways to convert XML to HTML for display on the Web.

Using HTML alone

If your XML file is of a simple tabular form only two levels deep then you can display XML

files using HTML alone.

Using HTML + CSS

This is a substantially more powerful way to transform XML to HTML than HTML alone, but

lacks the full power and flexibility of the methods listed below.

Using HTML with JavaScript

Fully general XML files of any type and complexity can be processed and displayed using a

combination of HTML and JavaScript. The advantages of this approach are that any possible

transformation and display can be carried out because JavaScript is a fully general purpose

programming language. The disadvantages are that it often requires large, complex, and very

detailed programs using recursive functions (functions that call themselves repeatedly) which are

very difficult for most people to grasp

Using XSL and Xpath

XSL (eXtensible Stylesheet Language) is considered the best way to convert XML to HTML.

The advantages are that the language is very compact, very sophisticated HTML can be

displayed with relatively small programs, it is easy to re-purpose XML to serve a variety of

purposes, it is non-procedural in that you generally specify only what you wish to accomplish as

opposed to detailed instructions as to how to achieve it, and it greatly reduces or eliminates the

need for recursive functions. The disadvantages are that it requires a very different mindset to

use, and the language is still evolving so that many XSL processors in the Web servers are out of

date and newer ones must sometimes be invoked through DOS

Page 16: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 16

DISPLAYING XML DOCUMENT USING XSL

It is a language for expressing stylesheets. It consists of two parts:

A language for transforming XML documents (XSLT)

An XML vocabulary for specifying formatting semantics

An XSL stylesheet specifies the presentation of a class of XML documents by describing how an

instance of the class is transformed into an XML document that uses the formatting vocabulary.

Like CSS an XSL is linked to an XML document and tell browser how to display each of

document's elements. An XML document with an attached XSL can be open directly in Internet

Explorers. You don't need to use an HTML page to access and display the data.

There are two basic steps for using a css to display an XML document:

Create the XSL file.

Link the XSL sheet to XML document.

CREATING XSL FILE

XSL is a plain text file with .css extension that contains a set of rules telling the web browser

how to format and display the elements in a specific XML document. You can create a css file

using your favorite text editors like Notepad, Wordpad or other text or HTML editor as show

below:

general.xsl

employees

{

background-color: #ffffff;

width: 100%;

}

id

{

display: block; margin-bottom: 30pt; margin-left: 0;

}

name

{

color: #FF0000;

font-size: 20pt;

}

city,state,zipcode

{

color: #0000FF;

Page 17: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 17

font-size: 20pt;

}

LINKING

To link to a style sheet you use an XML processing directive to associate the style sheet with the

current document. This statement should occur before the root node of the document.

<?xml-stylesheet type="text/xsl" href="styles/general.xsl">

The two attributes of the tag are as follows:

href

The URL for the style sheet.

type

The MIME type of the document begin linked, which in this case is text/css.

MIME stands for Multipart Internet Mail Extension. It is a standard which defines how to make

systems aware of the type of content being included in e-mail messages.

The css file is designed to attached to the XML document as shown below:

<?xml version="1.0" encoding="utf-8" standalone="no"?>

<!--This xml file represent the details of an employee-->

<?xml-stylesheet type="text/xsl" href="styles/general.xsl">

<employees>

<employee id="1">

<name>

<firstName>Mohit</firstName>

<lastName>Jain</lastName>

</name>

<city>Karnal</city>

<state>Haryana</state>

<zipcode>98122</zipcode>

</employee>

<employee id="2">

<name>

<firstName>Rahul</firstName>

<lastName>Kapoor</lastName>

</name>

<city>Ambala</city>

<state>Haryana</state>

<zipcode>98112</zipcode>

</employee>

</employees>

Page 18: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 18

REWRITING

Let's say you have a proxy running om www.myproxy.com and have proxied the site

www.remotesite.com to the directory /remote. The links on the proxied page

www.remotesite.com doesn't know they are being proxied, this can create some problems. But

lets start with looking at the three different link types.

<a href="myfile.html"> - This link will work

<a href="/myfile.html"> - This link wont work

<a href="http://www.remotesite.com/myfile.html"> - This link wont work

The first link will work since it is relative to the content.

The second link is mapped to the root and therefore the browser will request the following page:

http://www.myproxy.com/myfile.html, but this file isn't found since only files in the directory

/remote will be sent to www.remotesite.com. We have to change so that the link points to

/remote/myfile.html.

The third link is absolute and therefor the browser will follow it to

http://www.remotesite.com/myfile.html. This works correctly, but only if the remote site is

visible to the client. Probably the site being proxied is some internal server not accessible from

the outside. We have to change the link to http://www.myproxy.com/remote/myfile.html.

The rewrite filter

As you should already have learned the proxy is built using a filter that proxies

all incomingrequests. To make the rewrite work there is another filter supplied, the rewrite filter.

Theproxy filter will work perfectly fine without a rewrite filter and doesn't have any knowledge

of the possibility for links to be rewritten. This makes it just as easy to run the proxy with and

without rewriting.

How it works

The current rewriting is done by parsing the html, javascript and css files looking for links using

regular expressions.

The reason the proxy is using regular expressions is that it then can use the the same type of

parsing to find links in both css and html. There is one other reason for using regular expression

over a XML parser, pages aren't writing in XHTML. Since there are so many non XML

compatible pages out there using a standard XML parser wouldn't work. There are other options

like javax.swing.text.html and changing from regular expressions is something considered for

the next versions. There will have to be some measurable performance benefits for doing so

however.

Turn on rewrite

web.xml

Page 19: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 19

The default setting of the proxy is to not do any link rewriting. But you can easily turn the

rewriting on by adding the rewrite filter. A alternate web.xml is supplied with the proxy that has

rewriting enabled. The file is called web_rewriting.xml and can be found in

TOMCAT_HOME/webapps/J2EP_INSTALL_DIR/WEB-INF/. To enable rewriting rename

web_rewriting.xml to web.xml, make sure that you overwrite the existing file.

data.xml (config file)

Here are the good news, you don't have to do anything (almost). If you have mapped a site for

the proxy all of the links excluding the absolute ones will be rewritten. The reason that the

absolute links aren't rewritten is that you might want to leave them as they are and let the user

follow those links.

You will probably turn absolute link rewriting on however. To do this, simply add

theparameter isRewriting="true" to the server. All absolute links found on a page will be

matched to see if we have them mapped in the config. If we have the server mapped

andisRewriting is set to "true" absolute links for the server will be rewritten.

All servers doesn't support the isRewriting=‖true‖, for instance RoundRobinCluster will always

do rewriting. Consult the documentation of the servers for more information.

Other form of rewrites

There are two more issues with rewriting. One is when the server says a page has moved and

sends a location for the new page, we have to rewrite that location. The other issue is when a

cookie is sent from the server, we have to change so the cookie is set for the correct directory.

Both of these issues are handled by the proxy without having to do any extra configuration.

HTML, SGML, and XML

First you should know that SGML (Standard Generalized Markup Language) is the basis for

both HTML and XML. SGML is an international standard (ISO 8879) that was published in

1986.

Second, you need to know that XHTML is XML. "XHTML 1.0 is a reformulation of HTML

4.01 in XML, and combines the strength of HTML 4 with the power of XML."

Thirdly, XML is NOT a language, it is rules to create an XML based language. Thus, XHTML

1.0 uses the tags of HTML 4.01 but follows the rules of XML.

The Document

A typical document is made up of three layers:

structure

Content

Style

Page 20: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 20

Structure

Structure would be the documents title, author, paragraphs, topics, chapters, head, body etc.

Content

Content is the actual information that composes a title, author, paragraphs etc.

Style

Style is how the content within the structural elements are displayed such as font color, type and

size, text alignment etc.

Markup

HTML, SGML, and XML all markup content using tags. The difference is that SGML and XML

mainly deal with the relationship between content and structure, the structural tags that markup

the content are not predefined (you can make up your own language), and style is kept

TOTALLY separate; HTML on the other hand, is a mix of content marked up with both

structural and stylistic tags. HTML tags are predefined by the HTML language.

By mixing structure, content and style you limit yourself to one form of presentation and in

HTML's case that would be in a limited group of browsers for the World Wide Web.

By separating structure and content from style, you can take one file and present it in multiple

forms. XML can be transformed to HTML/XHTML and displayed on the Web, or the

information can be transformed and published to paper, and the data can be read by any XML

aware browser or application.

SGML (Standard Generalized Markup Language)

Historically, Electronic publishing applications such as Microsoft Word, Adobe PageMaker or

QuarkXpress, "marked up" documents in a proprietary format that was only recognized by that

particular application. The document markup for both structure and style was mixed in with the

content and was published to only one media, the printed page.

These programs and their proprietary markup had no capability to define the appearance of the

information for any other media besides paper, and really did not describe very well the actual

content of the document beyond paragraphs, headings and titles. The file format could not be

read or exchanged with other programs, it was useful only within the application that created it.

Because SGML is a nonproprietary international standard it allows you to create documents that

are independent of any specific hardware or software. The document structure (what elements

are used and their relationship to each other) is described in a file called the DTD (Document

Type Definition). The DTD defines the relationships between a document's elements creating a

consistent, logical structure for each document.

SGML is good for handling large-scale, long-term information management needs and has been

around for more than a decade as the language of defense contractors and the electronic

Page 21: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 21

publishing industry. Because SGML is very large, powerful, and complex it is hard to learn and

understand and is not well suited for the Web environment.

XML (Extensible Markup Language)

XML is a "restricted form of SGML" which removes some of the complexity of SGML. XML

like SGML, retains the flexibility of describing customized markup languages with a user-

defined document structure (DTD) in a non-proprietary file format for both storage and

exchange of text and data both on and off the Web.

As mentioned before, XML separates structure and content from style and the structural markup

tags can actually describe the content because they can be customized for each XML based

markup language. A good example of this is the Math Markup Language (MathML) which is an

XML application for describing mathematical notation and capturing both its structure and

content.

Until MathML, the ability to communicate mathematical expressions on the Web was limited to

mainly displaying images (JPG or GIF) of the scientific notation or posting the document as a

PDF file. MathML allows the information to be displayed on the Web, and makes it available for

searching, indexing, or reuse in other applications.

HTML (Hypertext markup Language)

HTML is a single, predefined markup language that forces Web designers to use it's limiting and

lax syntax and structure. The HTML standard was not designed with other platforms in mind,

such as Web TV’s, mobile phones or PDAs. The structural markup does little to describe the

content beyond paragraph, list, title and heading.

XML breaks the restricting chains of HTML by allowing people to create their own markup

languages for exchanging information. The tags can be descriptive of the content and authors

decide how the document will be displayed using style sheets (CSS and XSL). Because of

XML's consistent syntax and structure, documents can be transformed and published to multiple

forms of media and content can be exchanged between other XML applications.

HTML was useful in the part it has played in the success of the Web but has been outgrown as

the Web requires more robust, flexible languages to support it's expanding forms of

communication and data exchange.

XML will never completely replace SGML because SGML is still considered better for long-

time storage of complex documents. However, XML has already replaced HTML as the

recommended markup language for the Web with the creation of XHTML 1.0.

Even though XHTML has not made the HTML that currently exists on the Web obsolete, HTML

4.01 is the last version of HTML. XHTML (an XML application) is the foundation for a

universally accessible, device independent Web.

Semantic Web Services, like conventional web services, are the server end of a client–

server system for machine-to-machine interaction via the World Wide Web. Semantic services

Page 22: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 22

are a component of the semantic web because they use markup which makes data machine-

readable in a detailed and sophisticated way (as compared with human-readable HTML which is

usually not easily "understood" by computer programs).

WEB ONTOLOGY LANGUAGE

It is a family of knowledge representation languages or ontology languages for authoring

ontologies or knowledge bases. The languages are characterised by formal

semantics and RDF/XML-based serializations for the Semantic Web. OWL is endorsed by

the World Wide Web Consortium (W3C) and has attracted academic, medical and commercial

interest.

In October 2007, a new W3C working group was started to extend OWL with several new

features as proposed in the OWL 1.1 member submission. W3C announced the new version of

OWL on 27 October 2009. This new version, called OWL 2, soon found its way into semantic

editors such as Protégé and semantic reasoners such as Pellet. The OWL family contains many

species, serializations, syntaxes and specifications with similar names. OWL and OWL2 are

used to refer to the 2004 and 2009 specifications, respectively. Full species names will be used,

including specification version (for example, OWL2 EL). When referring more generally, OWL

Family will be used.

TYPES OF ONTOLOGIES

Domain ontology - A domain ontology (or domain-specific ontology) models a specific domain,

which represents part of the world. Particular meanings of terms applied to that domain are

provided by domain ontology. For example the word card has many different meanings. An

ontology about the domain of poker would model the "playing card" meaning of the word, while

an ontology about the domain of computer hardware would model the "punched card" and

"video card" meanings.

Since domain ontologies represent concepts in very specific and often eclectic ways, they are

often incompatible. As systems that rely on domain ontologies expand, they often need to merge

domain ontologies into a more general representation. This presents a challenge to the ontology

designer. Different ontologies in the same domain arise due to different languages, different

intended usage of the ontologies, and different perceptions of the domain (based on cultural

background, education, ideology, etc.).

At present, merging ontologies that are not developed from a common foundation ontology is a

largely manual process and therefore time-consuming and expensive. Domain ontologies that

use the same foundation ontology to provide a set of basic elements with which to specify the

meanings of the domain ontology elements can be merged automatically. There are studies on

generalized techniques for merging ontologies, but this area of research is still largely

theoretical.

Upper ontology - An upper ontology (or foundation ontology) is a model of the common

objects that are generally applicable across a wide range of domain ontologies. It employs a core

Page 23: Web engineering UNIT IV as per RGPV syllabus

Unit-IV/Web Engineering Truba College of Sc. & Tech., Bhopal

Prepared By: Ms. Nandini Sharma(CSE DEPT.) Page 23

glossarythat contains the terms and associated object descriptions as they are used in various

relevant domain sets.

There are several standardized upper ontologies available for use, including Dublin

Core, GFO, OpenCyc/ResearchCyc, SUMO, and DOLCE. WordNet, while considered an upper

ontology by some, is not strictly an ontology. However, it has been employed as a linguistic tool

for learning domain ontologies.

Hybrid ontology - The Gellish ontology is an example of a combination of an upper and a

domain ontology.