Bsc IT 5th Sem Assignment Solved Answer

Bsc IT 5th Sem Assignment Solved Answer KU 5TH SEM ASSIGNMENT - BSIT (TA) - 51 (GRAPHICS & MULTIMEDIA)Assignment: TA (Compulsory)

1.What is the meaning of interactive computer graphics? List the various applications of the computer graphics.The term interactive graphics refers to devices and systems that facilitate the man-machine graphic communication, in a way, which is more convenient than the writing convention. For example, to draw a straight line between two points, one has to input the coordinates of the two end points. In interactive graphics with the help of graphical input technique by indicating two end points on the display screen draws the line.Various applications of the computer graphics are listed below :-i). Building Design and Constructionii). Electronics Designiii). Mechanical Designiv). Entertainment and Animationv). Aerospace Industryvi). Medical Technologyvii). Cartographyviii). Art and Commerce.

2. Explain in detail the Hardware required for effective graphics on the computer system.The hardware components required to generate interactive graphics are the input device, the outputdevice (usually display) and the computer system. The human operator is also an integral part of theinteractive system. The text and graphics displayed act as an input to the human vision system and, therefore, the reaction of the human being will depend on how quickly one can see and appreciate thegraphics present on the display.

3. Compare Raster scan system with random scan system.In raster scan display, the electron beam is swept across the screen, one row at a time from top to bottom. The picture definition is stored in a memory area called refresh buffer or frame buffer. In random scan display unit, a CRT has the electron beam directed only to the parts of the screen where a picture is to be drawn. It draws a picture one line at a time and so it is referred to as vector displays.

Raster scanThe Most common type of graphics monitor employing a CRT is the raster-scan Display, based on television technology. In a raster- scan system; the electron beam is swept across the screen, one row at a time from top to bottom. The picture definition is stored in a memory area called the refresh buffer or frame buffer. Each point on the screen is called pixel. On a black and system with one bit per pixel, the frame buffer is called bitmap. For systems with multiple bits per pixel, the frame buffer is referred to as a pix map.

Refreshing on raster scan display is carried out at the rate of 60 to 80 frames per second. Some displays use interlaced refresh procedure. First, all points on the even numbered scan lines are displayed then all the points along odd numbered lines are displayed. This is an effective technique for avoiding flickering.

Random scan displayWhen operated as a random-scan display unit, a CRT has the electron beam directed only to the parts of the screen where a picture is to be drawn. Random scan monitors draw a picture one line at a time and for this reason they are also referred as vector displays (or stroke-writing or calligraphic displays). The component lines of a picture can be drawn and refreshed by a random-scan system in any specified order. A pen plotter operates in a similar way and is an example of a random-scan, hard-copy device.

Refresh rate on a random-scan system depends on the number of lines to be displayed. Picture definition is now stored as a set of line- drawing commands in an area of memory referred to as the refresh display file. Sometimes the refresh display file is called the display list, display program, or simply the refresh buffer.

4. How many colors are possible if a. 24 bits / pixel is used b. 8 bits / pixel is used Justify your answer

a). 24 bit color provides 16.7 million colors per pixels, That 24 bits are divided into 3 bytes; one each for the read, green, and blue components of a pixel.

b). 256, 8 bits per pixel = 2^8 colours.

Widely accepted industry standard uses 3 bytes, or 24 bits, per pixel, with one byte for each primary color results in 256 different intensity levels for each primary color. Thus a pixel can take on a color from 256 X 256 X 256 or 16.7 million possible choices. In Bi-level imagerepresentation one bit per pixel is used to represent black-and white images. In gray level image 8 bits per pixel to allow a total of 256 intensity or gray levels. Image representation using lookup table can be viewed as a compromise between our desire to have a lower storage requirement and our need to support a reasonably sufficient number of simultaneous colors.

5. List and explain different text mode built-in functions of C Programming language.The different text mode built-in functions of C Programming language are listed below :-i). textmode( int mode);This function sets the number of rows and columns of the screen, mode variable can take the values 0, 1, 1, or 3.0: represents 40 column black and white1: represents 40 column color2: represents 80 column black and white3: represents 80 column colorExample: textmode(2); // sets the screen to 80 column black and whiteii). clrscr();This function clears the entire screen and locates the cursor on the top left corner(1,1) Example clrscr(); // clears the screeniii). gotoxy(int x, int y);This function positions the cursor to the location specified by x and y. x represents the row number and y represents the column number.Example: gotoxy(10,20) // cursor is placed in 20th column of 10th rowiv). textbackground (int color);This function changes the background color of the text mode. Valid colors for the CGA are from 0 to 6 namely BLACK, BLUE, GREEN, CYAN, RED, MAGENTA and BROWN. Example: textbackground(2); Or //changes background color to blue textbackground(BLUE);v). textcolor (int color);This function sets the subsequent text color numbered between 0 to 15 and 128 for blinking. Example : textcolor(3); // set the next text color to Greenvi). delline ();It is possible to delete a line of text and after the deletion all the subsequent lines will be pushed up by one line Example : /* deletes the 5th line*/ gotoxy (5,4); delline ( );vii). insline()Inserts a blank line at the current cursor position Example: /* inserts line at the 3rd row */ gotoxy (3,5); insline ( );

6. Write a C program to create Indian national flag.#include"graphics.h"#include"conio.h"void main(){int gd=DETECT,gm,x,y;initgraph(&gd,&gm,"c:\\tc\\bgi");x=getmaxx();y=getmaxy();clearviewport();setfillstyle(LINE_FILL,BLUE);bar(0,0,639,479);setcolor(6);rectangle(50,50,300,200);setfillstyle(SOLID_FILL,6);bar(50,50,300,100);setfillstyle(SOLID_FILL,WHITE);bar(50,100,300,150);setfillstyle(SOLID_FILL,GREEN);bar(50,150,300,200);setcolor(BLUE);rectangle(45,45,50,300);setfillpattern(0x20,MAGENTA);bar(45,45,50,400);setcolor(BLUE);circle(175,125,25);line(175,125,200,125);line(175,125,175,150);line(175,125,150,125);line(175,125,175,100);line(175,125,159,107);line(175,125,193,143);line(175,125,159,143);line(175,125,193,107);setcolor(YELLOW);rectangle(0,0,640,43);setfillstyle(SOLID_FILL,YELLOW);bar(0,0,640,43);setcolor(BLACK);settextstyle(1,HORIZ_DIR,5);outtextxy(150,0,"INDIAN FLAG");getch();}

KU 5TH SEM ASSIGNMENT - BSIT (TB) - 51 (GRAPHICS & MULTIMEDIA)Assignment: TB (Compulsory)

1. What is the need for computer graphics?Computers have become a powerful tool for the rapid and economical production of pictures. Computer Graphics remains one of the most exciting and rapidly growing fields. Old Chinese saying One picture is worth of thousand words can be modified in this computer era into One picture is worth of many kilobytes of data. It is natural to expect that graphical communication, which is an older and more popular method of exchanging information than verbal communication, will often be more convenient when computers are utilized for this purpose. This is true because one must represent objects in two-dimensional and three-dimensional spaces. Computer Graphics has revolutionized almost every computer-based application in science and technology.

2. What is graphics processor? Why it is needed?To provide visual interface, additional processing capability is to be provided to the existing CPU. The solution is to provide dedicated graphic processor. This helps in managing the screen faster with an equivalent software algorithm executed on the CPU and certain amount of parallelism can be achieved for completing the graphic command. Several manufacturers of personal computers use a proprietary graphic processor. For example, Intel 82786 is essentially a line drawing processor; Texas Instruments 43010 is a high performance general-purpose processor.

3. What is a pixel?Pixel (picture element): Pixel may be defined as the smallest size object or color spot that can be displayed and addressed on a monitor. Any image that is displayed on the monitor is made up of thousands of such small pixels. The closely spaced pixels divide the image area into a compact and uniform two-dimensional grid of pixel lines and columns.

4. Why C language is popular for graphics programming?Turbo C++ is for C++ and C programmers. It is also compatible with ANSI C standard and fullysupports Kernighan and Ritchie definitions. It includes C++ class libraries, mouse support, multiple overlapping windows, Multi file editor, hypertext help, far objects and error analysis. Turbo C++ comes with a complete set of graphics functions to facilitate preparation of charts and diagrams. It supports the same graphics adapters as turbo Pascal. The Graphics library consists of over 70 graphics functions ranging from high level support like facility to set view port, draw 3-D bar charts, draw polygons to bitoriented functions like get image and put image. The graphics library supports numerous objects, line styles and provides several text fonts to enable one to justify and orient text, horizontally and vertically. It may be noted that graphics functions use far pointers and it is not supported in the tiny memory model.

5. Define resolution.Resolution: Image resolution refers as the pixel spacing i.e. the distance from one pixel to the next pixel. A typical PC monitor displays screen images with a resolution somewhere between 25 pixels per inch and 80 pixels per inch. Pixel is the smallest element of a displayed image, and dots (red, green and blue) are the smallest elements of a display surface (monitor screen). The dot pitch is the measure of screen resolution. The smaller the dot pitch, the higher the resolution, sharpness and detail of the image displayed.

6. Define aspect ratio.Aspect ratio: The aspect ratio of the image is the ratio of the number of X pixels to the number of Y pixels. The standard aspect ratio PCs is 4:3, and some use 5:4. Monitors are calibrated to this standard so that when you draw a circle it appears to be a circle and not an ellipse.

7. Why refreshing is required in CRT?When the electron beam strikes a dot of phosphor material, it glows for a fraction of a second and then fades. As brightness of the dots begins to reduce, the screen-image becomes unstable and gradually fades out. In order to maintain a stable image, the electron beam must sweep the entire surface of the screen and then return to redraw it number of times per second. This process is called refreshing the screen. If the electron beam takes too long to return and redraw a pixel, the pixel begins to fade results in flicker in the image. In order to avoid flicker the screen image must be redrawn sufficiently quickly that the eye cannot tell that refresh is going on. The refresh rate is the number of times per second that the screen is refreshed. Some monitor uses a technique called interlacing for refreshing every line of the screen. In the first pass, odd-numbered lines are refreshed, and in the second pass, even numbered lines are refreshed. This allows the refresh rate to be doubled because only half the screen is redrawn at a time.

8. Name the different positioning devices.The devices discussed so far, the mouse, the tablet, the joystick are called positioning devices. They are able to position the curser at any point on the screen. (We can operate at that point or the chain of points) Often, one needs devices that can point to a given position on the screen. This becomes essential when a diagram is already there on the screen, but some changes are to be made. So, instead of trying to know its coordinates, it is advisable to simply point to that portion of the picture and asks for changes. The simplest of such devices is the light pen. Its principle is extremely simple.

9. What are pointing devices?A pointing device is an input interface (specifically a human interface device) that allows a user to input spatial (i.e., continuous and multi-dimensional) data to a computer. CAD systems and graphical user interfaces (GUI) allow the user to control and provide data to the computer using physical gestures point, click, and drag for example, by moving a hand-held mouse across the surface of the physical desktop and activating switches on the mouse. Movements of the pointing device are echoed on the screen by movements of the pointer (or cursor) and other visual changes.

10. What is multimedia?The word Multimedia seems to be everywhere nowadays. The word multimedia is a compoundof the Latin prefix multi meaning many, and the Latin-derived work media, which is the pluralof the world medium. So multimedia simply means using more than one kind of medium.Multimedia is the mixture of two or more media effects-Hypertext, Still Images, sound, Animation and Video to be interacted on a computer terminal.

11. What are sound cards?Sound cards: The first sound blaster was an 8-bit card with 22 KHz sampling, besides being equipped with a number of drives and utilities. This became a king of model for the other sound cards. Next came the Sound Blaster Pro, again 8-bit sound but with a higher sampling rate of 44 KHz, which supports a wider frequency range. Then there was Yamaha OPL3 chipset with more voices. Another development was built-in CD ROM interface through which huge files could be played directly via the sound card.

12. What is sampling?Sampling: Sampling is like breaking a sound into tiny piece and storing each piece as a small, digital sample of sound. The rate at which a sound is Sampled can affect its quality. The higher the sampling rate (the more pieces of sound that are stored) the better the quality of sound. Higher quality of sound will occupy a lot of space in hard disk because of more samples.

13. What is morphing?Morphing: The best example would be the Kawasaki advertisement, where the motorbike changes into a cheetah, the muscle of MRF to a real muscle etc.. Morphing is making an image change into another by identifying key points so that the key points displacement, etc. are taken into consideration for the change.

14. What is rendering?Rendering: The process of converting your designed objects with texturing and animation into an image or a series of images is called rendering. Here various parameters are available like resolution, colors type of render, etc.

15. What is warping?Warping: Certain parts of the image could be marked for a change and made to change to different one. For examples, the eyes of the owl had to morph into the eyes of cat, the eyes can alone be marked and warped.

16. Why we use scanner?Photographs, illustrations, and paintings continue to be made the old fashioned way, even by visual artists who are otherwise immersed in digital imaging technology. Traditional photographs, illustrations, and paintings are easily imported into computers through the use of a device called a scanner.

A Scanner scans over an image such as photo, drawing, logo, etc, converting it into an image and it can be seen on the screen. Using a good paint programme, Image Editor we can do adding, removing colors, filtering, Masking color etc.

17. What is ganut in Photoshop?

Write yourself...

18. What is a layer?

The concept of layering is similar to that of compositing as we make the different layers by keying out the uniform color and making it transparent so that layer beneath becomes visible. In case of future modifications we will be able to work with individual layers and need not work with the image as a whole.

19. What are editing tools? Why it is needed?You can use the editing tools to draw on a layer, and you can copy and paste selections to a layer.

Many types of editing tools are:-

i).Eraser tool: The eraser tool changes pixels in the image as you drag through them. You can choose to change the color and transparency of the affected pixels, or to revert the affected area to its previously saved version.

ii).Smudge tool: The smudge tool simulates the actions of dragging a finger through wet paint. The tool picks up color from where the stroke begins and pushes it in the direction in which you drag.

20. What is file format?

File Format: When you create an image-either through scanning into your computer or drawing it from scratch on your monitor or captured through a camera, recorded voice or music from the two-in-one or recorded connecting a music instrument it must be saved to your disk. Otherwise it would become an ethereal artifact that could never again be seen or listened. Once the computers power is turned off, its gone forever unless it is saved. The method by which the software organizes the data in the saved file is called the file format.

KU 5TH SEM ASSIGNMENT - BSIT (TA) - 52 (WEB PROGRAMMING)Assignment: TA (Compulsory)

1. What is the meaning of Web? Explain in detail the building elements of webWeb is a complex network of international , cross plateform, and cross cultural communicating devices, connected to each other without any ordering or pattern.There are two most important building blocks of web:HTML and HTTP.

HTML: - HTML stands for Hyper Text Markup Language. HTML is a very simple language usedto describe the logical structure of a document. Actually, HTML is often called programminglanguage it is really not. Programming languages are Turing-complete, or computable. Thatis, programming languages can be used to compute something such as the square root of pi orsome other such task. Typically programming languages use conditional branches and loopsand operate on data contained in abstract data structures. HTML is much easier than all of that.HTML is simply a markup language used to define a logical structure rather than computeanything.

HTTP: - HTTP is a request-response type protocol. It is a language spoken between webbrowser (client software) and a web server (server software) so that can communicate with eachother and exchange files. Now let us understand how client/server system works using HTTP. Aclient/server system works something like this: A big piece of computer (called a server) sits insome office somewhere with a bunch of files that people might want access to. This computerruns a software package that listens all day long to requests over the wires.

2. HTML is the Language of the Web Justify the statementHTML is often called a programming language it is really not. Programming languagesare Turing-complete, or computable. That is, programming languages can be used to compute somethingsuch as the square root of pi or some other such task. Typically programming languages use conditionalbranches and loops and operate on data contained in abstract data structures. HTML is much easier thanall of that. HTML is simply a markup language used to define a logical structure rather than computeanything.For example, it can describe which text the browser should emphasize, which text should be consideredbody text versus header text, and so forth.The beauty of HTML of course is that it is generic enough that it can be read and interpreted by a webbrowser running on any machine or operating system. This is because it only focuses on describing thelogical nature of the document, not on the specific style. The web browser is responsible for adding style.For instance emphasized text might be bolded in one browser and italicized in another. it is up to thebrowser to decide

3. Give the different classification of HTML tags with examples for each categoryLIST OF HTML TAGS :-Tags for Document Structure HTML HEAD BODYHeading Tags TITLE BASE META STYLE LINKBlock-Level Text Elements ADDRESS BLOCKQUOTE DIV H1 through H6 P PRE XMPLists DD DIR DL DT LI MENU OL ULText Characteristics B BASEFONT BIG BLINK CITE CODE EM FONT I KBD PLAINTEXT S SMALL

4. Write CGI application which accepts 3 numbers from the user and displays biggest number using GET and POST methods#!/usr/bin/perl#print "Content-type:text/html\n\n";#$form = $ENV{'QUERY_STRING'};use CGI;$cgi = new CGI;print $cgi->header;print $cgi->start_html( "Question Ten" );my $one = $cgi->param( 'one' );my $two = $cgi->param( 'two' );my $three = $cgi->param( 'three' );if( $one && $two && $three ){$lcm = &findLCM( &findLCM( $one, $two ), $three );print "LCM is $lcm";}else{print '';print 'Enter First Number';print 'Enter Second Number';print 'Enter Third Number';print '';print "";}print $cgi->end_html;sub findLCM(){my $x = shift;my $y = shift;my $temp, $ans;if ($x < $y) {$temp = $y;$y = $x;$x = $temp;}$ans = $y;$temp = 1;while ($ans % $x){$ans = $y * $temp;$temp++ ;}return $ans;}

5. What is Javascript? Give its importance in web.JavaScript is an easy to learn way to Scriptyour web pages that is have them to do actions that cannot be handled with HTML alone. With JavaScript, you can make text scroll across the screen like ticker tape; you can make pictures change when you move over them, or any other number of dynamic enhancement.JavaScript is generally only used inside of HTML document.

i) JavaScript control document appearance and content.ii) JavaScript control the browser.iii) JavaScript interact with document content.iv) JavaScript interact with the user.v) JavaScript read and write client state with cookies.vi) JavaScript interact with applets.vii) JavaScript manipulate embedded images.

6. Explain briefly Cascading Style SheetsCascading Style Sheet (CSS) is a part of DHTML that controls the look and placement of the element on the page. With CSS you can basically set any style sheet property of any element on a html page. One of the biggest advantages with the CSS instead of the regular way of changing the look of elements is that you split content from design. You can for instance link a CSS file to all the pages in your site that sets the look of the pages, so if you want to change like the font size of your main text you just change it in the CSS file and all pages are updated.

7. What is CGI? List the different CGI environment variablesCGI or Common Gateway Interface is a specification which allows web users to run program from their computer.CGI is a part of the web server that can communicate with other programs running on the server. With CGI, the web server can call up a program, while passing user specific data to a program. The program then processes that data and the server passes the programs response back to the web browser. When a CGI program is called, the information that is made available to it can be roughly broken into three groups:-i). Information about client, server and user.ii). Form data that are user supplied.iii). Additional pathname information.

Most Information about client, server and user is placed in CGI environmental variables. Form data that are user supplied is incorporated in environment variables. Extra pathname information is placed in environment variables.i). GATEWAY_INTERFACE T he revision of the common Gateway interface that the server uses.ii). SERVER_NAME The Servers hostname or IP address.iii). SERVER_PORT The port number of the host on which the server is running.iv). REQUEST_METHOD The method with which the information request is issued.v). PATH_INFO Extra path information passed to the CGI program

8. What is PERL? Explain PERl control structures with the help of an examplePerl control structures include conditional statement, such as if/elseif/else blocks as well as loop like for each, for and while.i). Conditional statements - If condition The structure is always started by the word if, followed by a condition to be evaluated, then a pair the braces indicating the beginning and end of the code to be executed if the condition is true. If(condition) {condition to be executed} - Unless Unless is similar to if. You wanted to execute code only if a certain condition were false. If($ varname! = 23) { #code to execute if $ varname is not 23 } - The same test can be done using unless: Unless ($ varname== 23) { #code to execute if $ varname is not 23 }ii). Looping Looping allow you to repeat code for as long as a condition is met. Perl has several loop control structures: foreach, for, while and until.- While Loop A while loop executes as long as a particular condition is true: While (condition) { #code to run as long as condition is true. }- Until Loop A until loops the reverse of while. It executes as long as a particular condition is not true: While (condition) { #code to run as long as condition is not true. }

KU 5TH SEM ASSIGNMENT - BSIT (TB) - 52 (WEB PROGRAMMING)Assignment: TB (Compulsory)

Part - Aa) What is the difference between Internet and Intranet?Internet: Internet is global network of networks.Internet is a tool for collaborating academic research,and it has become a medium for exchanging anddistributing information of all kinds. It is aninterconnection between several computers of different types belongingto various networks all over global.Intranet: Intranet is not global. It is a mini web that islimited to user machines and software program of particulars organization or company

b) List any five HTML tags.Five HTML tags are:-i). UL (unordered list): The UL tags displays a bulleted list. You can use the tags TYPE attribute to change the bullet style.ii). TYPE: defines the type of bullet used of each list item. The value can be one of the following-CIRCLE, DISC, SQUAREiii). LI (list item): The LI tag indicates an itemized element, which is usually preceded by bullet, a number, or a letter. The LI is used inside list elements such as OL (ordered list) and UL (unordered list).iv). TABLES (table): The TABLE tag defines a table. Inside the TABLE tag, use the TR tag to define rows in the table, use the TH tag to define row or column headings, and the TD tag to define table cells.v). HTML (outermost tag): The HTML identifies a document as an HTML document. All HTML documents should start with the and end with the tags.

c) Write the difference between HTML and DHTML.HTML: HTML stands for Hyper Text MarkupLanguage. It is a language. HTML cant bedone after the page loads. HTML can be or not usedwith JavaScript.DHTML: DHTML stands for Dynamic Hyper TextMarkup Language. DHTML isnt really alanguage or a thing in itself its just a mix of thosetechnologies. Dynamic HTML is simply HTMLthat can change even after a page has been loaded into a browser. DHTML can be used with JavaScript.

d) Explain the different types of PERL variables.Perl has three types of variables:i). Scalars ii). Arraysiii). Hashes.

i). Scalars: A scalar variable stores a single (scalar) value.Perl scalar names are prefixed with a dollar sign ($), so for example, $username, and $url are all examples of scalar variable names. A scalar can hold data of anytype, be it a string, a number, or whatnot. We can alsouse scalars in double-quoted strings: my $fnord = 23;my $blee = The magic number is $fnord.; Now if you print $blee, we will get The magic number is 23.Perl interpolates the variables in the string, replacingthe variable name with the value of that variable.ii). Arrays: An array stores an ordered list of values. Whilea scalar variable can only store one value, an array canstoremany. Perl array names are prefixed with a @-sign.e.g.:my @colors = (red,green,blue); foreach my $i(@colors) { print $i\n; }iii). Hashes: Hashes are an advanced form of array. One of the limitations of an array is that the information contained within it can be difficult to get to. For example, imagine that you have a list of people and their ages. The hash solves this problem very neatly by allowing us to access that @ages array not by an index, but by a scalar key. For example to use age of different people we can use thier names as key to define a hash.

e) How are JSPs better than servlets.Java programming knowledge is needed todevelop and maintain all aspects of the application,since the processing code and the HTML elements are jumped together.Changing the look and feel of theapplication,or adding support for a new type of client, requires theservlet code to be updated and recompiled.Its hardto take advantage of web-page development tools whendesigning the application interface. If such tools areused to develop the web page layout, the generatedHTML must then be manually embedded into theservletcode, a process which is time consuming, error prone,and extremely boring. Adding JSP to the puzzle wesolvethese problems.So JSPs better than servlets.

Part - B1. a) Explain GET and POST method with the help of an example.When a client sends a request to the server, theclients can also additional information with the URL todescribe what exactly is required as output from theserver by using the GET method. The additionalsequenceof characters that are appended to URL is called a querystring. However, the length of the query string islimited to 240 characters. Moreover, the query string isvisible on the browser and can therefore be a securityrisk.to overcome these disadvantages, the POST method can be used. The POST method sends the data as packetsthrough a separate socket connection. The completetransaction is invisible because to the client. Thedisadvantageof POST method is that it is slower compared to theGET method because data is sent to the server asseparate packets.

b) Explain in detail the role played by CGI programming in web programming.CGI opened the gates of more complex Web applications. It enabled developers to write scripts,which can communicate with server applications and databases. In addition, it enablesdevelopers to write scripts that could also parse client's input, process it, and present it in a userfriendly way.The Common Gateway Interface, or CGI, is a standard for external gatewayprograms to interface with information servers such as HTTP servers. A plain HTML documentthat the Web daemon retrieves is static, which means it exists in a constant state: a text file thatdoesn't change. A CGI program, on the other hand, is executed in real-time, so that it can outputdynamic information.CGI programming allows us to automate passing information to and from web pages. It can alsobe used to capture and process that information, or pass it off to other software (such as in anSQL database).CGI programs (sometimes called scripts) can be written in any programming language, but thetwo most commonly used are Perl and PHP. Despite all the flashy graphics, Internet technologyis fundamentally a text-based system. Perl was designed to be optimal for text processing, so itquickly became a popular CGI tool. PHP is a scripting language designed specifically to makeweb programming quick and easy.

2. a) With the help of an example explain the embedding of an image in an HTML tag.

b) Create a HTML page to demonstrate the usage of Anchor tags.

A Cold Autumn DayIf this anchor is in a file called "nowhere.htm," you could define a link that jumps to theanchor as follows:Jump to the second section A Cold Autumn Day in the mystery "A man from Nowhere."

3. a) Explain the usage of script tags.Using the SCRIPT Tag: The following example uses the SCRIPT tag to define a JavaScript script in the HEAD tag. The script is loaded before anything else in the document is loaded. The JavaScript code in this example defines a function, changeBGColor(), that changes the documents background color.The body of the document contains a form with two buttons. Each button invokes the changeBGColor()function to change the background of the document to a different color.

Script Example

function changeBGColor (newcolor) {document.bgColor=newcolor;return false;}

Select a background color:

Your browser is not JavaScript-enabled.These buttons will not work.

b) What is Java script? List the use of Java script.JavaScript is a scripting language (like a simple programming language). It is a language that can be used for client-side scripting. JavaScript is only usedinside of HTML documents. With JavaScript, we can make text scroll across the screen like ticker tape.The uses of JavaScript are:i). Control DocumentAppearance and Contentii). Control the Browseriii). Interact with Document Controliv). Interact withUserv). Read and Write Client State with Cookiesvi). Interact with Appletsvii). JavaScript is only usedinside of HTML documents.

4. a) With the help of an example explain any five CGI environment variables.i). SERVER_NAME : The server's host name or IP address.ii). SERVER_PORT : The port number of the host on which the server is running.iii). SERVER_SOFTWARE : The name and version of the server software that is answering the client request.iv). SERVER_PROTOCOL : The name and revision of the information protocol that request came in with.v). GATEWAY_INTERFACE : The revision of the common gateway interface that the server uses.

Example:-#!/usr/local/bin/perlprint "Content-type: text/html", "\n\n";print "", "\n";print "About this Server", "\n";print "About this Server", "\n";print "";print "Server Name: ", $ENV{'SERVER_NAME'}, "
", "\n";print "Running on Port: ", $ENV{'SERVER_PORT'}, "
", "\n";print "Server Software: ", $ENV{'SERVER_SOFTWARE'}, "
", "\n";print "Server Protocol: ", $ENV{'SERVER_PROTOCOL'}, "
", "\n";print "CGI Revision: ", $ENV{'GATEWAY_INTERFACE'}, "
", "\n";print "", "\n";print "", "\n";exit (0);

b) Write a CGI application which accepts three numbers from the used and display biggest number using GET and POST methods.#!/usr/bin/perl#print "Content-type:text/html\n\n";#$form = $ENV{'QUERY_STRING'};use CGI;$cgi = new CGI;print $cgi->header;print $cgi->start_html( "Question Ten" );my $one = $cgi->param( 'one' );my $two = $cgi->param( 'two' );my $three = $cgi->param( 'three' );if( $one && $two && $three ){$lcm = &findLCM( &findLCM( $one, $two ), $three );print "LCM is $lcm";}else{print '';print 'Enter First Number ';print 'Enter Second Number ';print 'Enter Third Number ';print '';print "";}print $cgi->end_html;sub findLCM(){my $x = shift;my $y = shift;my $temp, $ans;if ($x < $y) {$temp = $y;$y = $x;$x = $temp;}$ans = $y;$temp = 1;while ($ans % $x){$ans = $y * $temp;$temp++ ;}return $ans;}

5. a) List the differences between web server and application server.The main differences between Web servers and application servers :-A Web server is where Web components are deployed and run. An application server is wherecomponents that implement the business logic are deployed. For example, in a JSP-EJB Webapplication, the JSP pages will be deployed on the Web server whereas the EJB components willbe deployed on the application servers.A Web server usually supports only HTTP (and sometimes SMTP and FTP). However, anapplication server supports HTTP as well as various other protocols such as SOAP.

In other word :-Difference between AppServer and a Web server :-i). Webserver serves pages for viewing in web browser, application server provides exposesbusinness logic for client applications through various protocolsii). Webserver exclusively handles http requests.application server serves bussiness logic toapplication programs through any number of protocols.iii). Webserver delegation model is fairly simple,when the request comes into the webserver,itsimply passes the request to the program best able to handle it(Server side program). It may notsupport transactions and database connection pooling.iv). Application server is more capable of dynamic behaviour than webserver. We can alsoconfigure application server to work as a webserver.Simply applic! ation server is a superset ofwebserver.

b) What is a war file? Explain its importance.WAR or Web Application Archive file is packaged servlet Web application. Servlet applicationsare usually distributed as a WAR files.WAR file (which stands for "web application_ archive" ) is a JAR_ file used to distribute a collection of JavaServer Pages_ , servlets_ , Java_ classes_ , XML_ files, tag libraries and static Web pages ( HTML_ and related files) that together constitute a Web application.

6. a) Explain implicit objects out, request response in a JSP page.Following are the implicit objects in a JSP page:-out: This implicit object represents a JspWriter that provides a stream back to the requesting client. The most common method of this object is out.println(),which prints text that will be displayed in the client's browser request: This implicit object represents the javax.servlet.HttpServletRequest interface. The request object is associated with every HTTP request. One common use of the request object is to access request parameters. You can do this by calling the request object's getParameter() method with the parameter name you are seeking. It will return a string with the values matching the named parameter. response: This implicit object represents the javax.servlet.HttpServletRequest object. The response object is used to pass data back to the requesting client. A common use of this object iswriting HTML output back to the client browser.

b) With the help of an example explain JSP elements.JSP elements are of 3 types:-Directive: Specifies information about the page itself that remains the same between requests.For example, it can be used to specify whether session tracking is required or not, bufferingrequirements, and the name of the page that should be used to report errors.

Action: Performs some action based on information that is required at the exact time the JSPpage is requested by a browser. An action, for instance, can access parameters sent with therequest to lookup a database.Scripting: Allows you to add small pieces of code in JSP page.

KU 5TH SEM ASSIGNMENT - BSIT (TA) - 53 (DATA WAREHOUSING & DATA MINING).1.With neat diagram explain the main parts of the computerA Computer will have 3 basic main parts i). A central processing unit that does all the arithmetic and logical operations. This can bethought of as the heart of any computer and computers are identified by the type of CPUthat they use.ii). The memory is supposed to hold the programs and data. All the computers that we cameacross these days are what are known as stored program computers. The programs areto be stored before hand in the memory and the CPU accesses these programs line by lineand executes them.iii). The Input/output devices: These devices facilitate the interaction of the users with the computer.The input devices are used to send information to the computer, while the output devicesaccept the processed information form the computer and make it available to the user.Diagram:-

2. Briefly explain the types of memories.There are two types of memories Primary memory, which is embedded in the computerand which is the main source of data to the computer and the secondary memory like floppy disks, CDs etc., which can be carried around and used in different computers. They cost much less than the primary memory, but the CPU can access data only from the primary memory. The main advantage of computer memories, both primary and secondary, is that they can store data indefinitely and accurately

3. Describe the basic concept of databases.The Concept of Database :-We have seen in the previous section how data can be stored in computer. Such stored data becomesa database a collection of data. For example, if all the marks scored by all the students of a class arestored in the computer memory, it can be called a database. From such a database, we can answerquestions like who has scored the highest marks? ; In which subject the maximum number of studentshave failed?; Which students are weak in more than one subject? etc. Of course, appropriate programshave to be written to do these computations. Also, as the database becomes too large and more and moredata keeps getting included at different periods of time, there are several other problems about maintainingthese data, which will not be dealt with here.Since handling of such databases has become one of the primary jobs of the computer in recent years,it becomes difficult for the average user to keep writing such programs. Hence, special languages called database query languages- have been deviced, which makes such programming easy, there languageshelp in getting specific queries answered easily.

4. With example explain the different views of a data.Data is normally stored in tabular form, unless storage in other formats becomes advantageous, westore data in what are technically called relations or in simple terms as tables.The views are Mainly 2 types .i). Simple Viewii). Complex ViewSimple view: - It is created by selecting only one table. - It does not contains functions. - it can perform DML (SELECT,INSERT,UPDATE,DELETE,MERGE, CALL,LOCK TABLE) operations through simple view.Complex view : -It is created by selecting more than one table. -It can performs functions. -You can not perform always DML operations through

5. Briefly explain the concept of normalization.Normalization is dealt with in several chapters of any books on database management systems. Here, we will take the simplest definition, which suffices our purpose namely any field should not have subfields.

Again consider the following student table.Here under the field marks, we have 3 sub fields: marks for subject1, marks for subject2 and subject3.

However, it is preferable split these subfields to regular fields as shown belowQuite often, the original table which comes with subfields will have to be modified suitable, by theprocess of normalization.

6. Explain the concept of data ware house delivery process in detail.The concept of data ware house delivery process :-This section deals with the dataware house from a different view point - how the different components that go into it enable the building of a data ware house. The study helps us in two ways: i) to have a clear view of the data ware house building process. ii) to understand the working of the data ware house in the context of the components.Now we look at the concepts in details :- i). IT Strategy : The company must and should have an overall IT strategy and the data ware housing has to be a part of the overall strategy. ii). Business case analysis : This looks at an obvious thing, but is most often misunderstood. The overall understanding of the business and the importance of various components there in is a must. This will ensure that one can clearly justify the appropriate level of investment that goes into the data ware house design and also the amount of returns accruing. iii). Education : This has two roles to play - one to make people, specially top level policy makers, comfortable with the concept. The second role is to aid the prototyping activity. iv). Business Requirements : As has been discussed earlier, it is essential that the business requirements are fully understood by the data ware house planner. This would ensure that the ware house is incorporated adequately in the overall setup of the organization. v). Technical blue prints : This is the stage where the overall architecture that satisfies the requirements is delivered. vi). Building the vision : Here the first physical infrastructure becomes available. The major infrastructure components are set up, first stages of loading and generation of data start up. vii). History load : Here the system is made fully operational by loading the required history into the ware house - i.e. what ever data is available over the previous years is put into the dataware house to make is fully operational. viii). Adhoc Query : Now we configure a query tool to operate against the data ware house. ix). Automation : This phase automates the various operational processes like - a) Extracting and loading of data from the sources. b) Transforming the data into a suitable form for analysis. c) Backing up, restoration and archiving. d) Generate aggregations. e) Monitoring query profiles. x). Extending Scope : There is not single mechanism by which this can be achieved. As and when needed, a new set of data may be added, new formats may be included or may be even involve major changes. xi). Requirement Evolution : Business requirements will constantly change during the life of the ware house. Hence, the process that supports the ware house also needs to be constantly monitored and modified.

7. What are three major activities of data ware house? Explain.Three major activities of data ware house are :- i) Populating the ware house (i.e. inclusion of data) ii) day-to-day management of the ware house. iii) Ability to accommodate the changes.

i). The processes to populate the ware house have to be able to extract the data, clean it up, and make it available to the analysis systems. This is done on a daily / weekly basis depending on the quantum of the data population to be incorporated. ii). The day to day management of data ware house is not to be confused with maintenance and management of hardware and software. When large amounts of data are stored and new data are being continually added at regular intervals, maintaince of the quality of data becomes an important element. iii). Ability to accommodate changes implies the system is structured in such a way as to be able to cope with future changes without the entire system being remodeled. Based on these, we can view the processes that a typical data ware house scheme should support as follows.

8. Explain the extract and load process of data ware house.Extract and Load Process : This forms the first stage of data ware house. External physical systems like the sales counters which give the sales data, the inventory systems that give inventory levels etc. constantly feed data to the warehouse. Needless to say, the format of these external data is to be monitored and modified before loading it into the ware house. The data ware house must extract the data from the source systems, load them into their data bases, remove unwanted fields (either because they are not needed or because they are already there in the data base), adding new fields / reference data and finally reconciling with the other data. We shall see a few more details of theses broad actions in the subsequent paragraphs. i). A mechanism should be evolved to control the extraction of data, check their consistencyetc. For example, in some systems, the data is not authenticated until it is audited. ii). ?Having a set of consistent data is equally important. This especially happens when we arehaving several online systems feeding the data. iii). Once data is extracted from the source systems, it is loaded into a temporary data storagebefore it is Cleaned and loaded into the warehouse.

9. In what ways data needs to be cleaned up and checked? Explain briefly.Data needs to be cleaned up and checked in the following ways :- i) It should be consistent with itself. ii) It should be consistent with other data from the same source. iii) It should be consistent with other data from other sources. iv) It should be consistent with the information already available in the data ware house.

While it is easy to list act the needs of a clean data, it is more difficult to set up systems thatautomatically cleanup the data. The normal course is to suspect the quality of data, if it does not meet the normally standards of commonsense or it contradicts with the data from other sources, data already available in the data ware house etc. Normal intution doubts the validity of the new data and effective measures like rechecking, retransmission etc., are undertaken. When none of these are possible, one may even resort to ignoring the entire set of data and get on with next set of incoming data.

10. Explain the architecture of data warehouse.The architecture for a data ware is indicated below. Before we proceed further, we should be clear about the concept of architecture. It only gives the major items that make up a data ware house. The size and complexity of each of these items depend on the actual size of the ware house itself, the specific requirements of the ware house and the actual details of implementation.

11. Briefly explain the functions of each manager of data warehouse.The Warehouse Manager : The ware house manager is a component that performs all operations necessary to support the ware house management process. Unlike the load manager, the warehouse management process is driven by the extent to which the operational management of the data ware house has been automated.

The ware house manger can be easily termed to be the most complex of the ware house components, and performs a variety of tasks. A few of them can be listed below. i) Analyze the data to confirm data consistency and data integrity. ii) Transform and merge the source data from the temporary data storage into the ware house. iii) Create indexes, cross references, partition views etc.,. iv) Check for normalizations. v) Generate new aggregations, if needed. vi) Update all existing aggregations vii) Create backups of data. viii) Archive the data that needs to be archived.

12. Explain the star schema to represent the sales analysis.Star schemes are data base schemas that structure the data to exploit a typical decision supportenquiry. When the components of typical enquirys are examined, a few similarities stand out. i) The queries examine a set of factual transactions - sales for example. ii) The queries analyze the facts in different ways - by aggregating them on different bases /graphing them in different ways.

The central concept of most such transactions is a fact table. The surrounding references are called dimension tables. The combination can be called a star schema.

13. What do you mean by partition of data? Explain briefly.Partitioning of data :-In most ware houses, the size of the fact data tables tends to become very large. This leads to several problems of management, backup, processing etc. These difficulties can be over come by partitioning each fact table into separate partitions.

Data ware houses tend to exploit these ideas by partitioning the large volume of data into data sets. For example, data can be partitioned on weekly / monthly basis, so as the minimize the amount of data scanned before answering a query. This technique allows data to be scanned to be minimized, without the overhead of using an index. This improves the overall efficiency of the system. However, having too many partitions can be counter productive and an optimal size of the partitions and the number of such partitions is of vital importance.

Participating generally helps in the following ways. i) Assists in better management of data ii) Ease of backup / recovery since the volumes are less. iii) The star schemes with partitions produce better performance. iv) Since several hardware architectures operate better in a partitioned environment, the overallsystem performance improve.

14. Describe the terms data mart and Meta data.Data mart :-

A data mart is a subset of information content of a data ware house, stored in its own data base. The data of a data ware house may have been collected through a ware house or in some cases, directly from the source. In a crude sense, if you consider a data ware house as a whole sale shop of data, a data mart can be thought of as a retailer.

Meta data :-

Meta data is simply data about data. Data normally describe the objects, their

quantity, size, how they are stored etc. Similarly meta data stores data about how data (of objects) is stored, etc.

Meta data is useful in a number of ways. It can map data sources to the common view of information within the warehouse. It is helpful in query management, to direct query to most appropriate source etc.,.

The structure of meta data is different for each process. It means for each volume of data, there are multiple sets of meta data describing the same volume. While this is a very convenient way of managing data, managing meta data itself is not a very easy task.

15. Enlist the differences between fact and dimension.

This ensures that key dimensions are no fact tables.

Consider the following example :-

Let us elaborate a little on the example. Consider a customer A. If there is a situation, where thewarehouse is building the profiles of customer, then A becomes a fact - against the name A, we can list his address, purchases, debts etc. One can ask questions like how many purchases has A made in the last 3 months etc. Then A is fact. On the other hand, if it is likely to be used to answer questions like how many customers have made more than 10 purchases in the last 6 months, and one uses the data of A, as well as of other customers to give the answer, then it becomes a fact table. The rule is, in such cases, avoid making A as a candidate key.

16. Explain the designing of star-flake schema in detail.

A star flake schema, as we have defined previously, is a schema that uses a combination of denormalised star and normalized snow flake schemas. They are most appropriate in decision support data ware houses. Generally, the detailed transactions are stored within a central fact table, which may be partitioned horizontally or vertically. A series of combinatory data base views are created to allow the user to access tools to treat the fact table partitions as a single, large table.

The key reference data is structured into a set of dimensions. Theses can be referenced from the fact table. Each dimension is stored in a series of normalized tables (snow flakes), with an additional denormalised star dimension table.

17. What is query redirection? Explain.

Query Redirection :-

One of the basic requirements for successful operation of star flake schema (or any schema, for that matter) is the ability to direct a query to the most appropriate source. Note that once the available data grows beyond a certain size, partitioning becomes essential. In such a scenario, it is essential that, in order to optimize the time spent on querying, the queries should be directed to the appropriate partitions that store the date required by the query.

The basic method is to design the access tool in such away that it automatically defines the locality to which the query is to be redirected.

18. In detail, explain the multidimensional schema.

Multidimensional schemas :-

Before we close, we see the interesting concept of multi dimensions. This is a very convenient

method of analyzing data, when it goes beyond the normal tabular relations.

For example, a store maintains a table of each item it sells over a month as a table, in each of its 10 outlets..

This is a 2 dimensional table. One the other hand, if the company wants a data of all items sold by its outlets, it can be done by simply by superimposing the 2 dimensional table for each of these items one behind the other. Then it becomes a 3 dimensional view.

Then the query, instead of looking for a 2 dimensional rectangle of data, will look for a 3 dimensional cuboid of data.

There is no reason why the dimensioning should stop at 3 dimensions. In fact almost all queries can be thought of as approaching a multi-dimensioned unit of data from a multidimensioned volume of the schema.

19. Why partitioning is needed in large data warehouse?Partitioning is needed in any large data ware house to ensure that the performance and manageability is improved. It can help the query redirection to send the queries to the appropriate partition, thereby reducing the overall time taken for query processing.

20. Explain the types of partitioning in detail.i). Horizontal partitioning :-This is essentially means that the table is partitioned after the first few thousand entries, and the nextfew thousand entries etc. This is because in most cases, not all the information in the fact table needed all the time. Thus horizontal partitioning helps to reduce the query access time, by directly cutting down the amount of data to be scanned by the queries.ii). Vertical partitioning :-As the name suggests, a vertical partitioning scheme divides the table vertically i.e. each row isdivided into 2 or more partitions.iii). Hardware partitioning :-Needless to say, the dataware design process should try to maximize the performance of the system. One of the ways to ensure this is to try to optimize by designing the data base with respect to specific hardware architecture.

21. Explain the mechanism of row splitting.

Row Splitting :-The method involved identifying the not so frequently used fields and putting them into another table.This would ensure that the frequently used fields can be accessed more often, at much lesser computation time.It can be noted that row splitting may not reduce or increase the overall storage needed, but normalization may involve a change in the overall storage space needed. In row splitting, the mapping is 1 to 1 whereas normalization may produce one to many relationships.

22. Explain the guidelines used for hardware partitioning.Guidelines used for hardware partitioning :-Needless to say, the dataware design process should try to maximize the performance of the system. One of the ways to ensure this is to try to optimize by designing the data base with respect to specific hardware architecture. Obviously, the exact details of optimization depends on the hardware platforms. Normally the following guidelines are useful:-i). maximize the processing, disk and I/O operations.ii). Reduce bottlenecks at the CPU and I/O

23. What is aggregation? Explain the need of aggregation. Give example.Aggregation : Data aggregation is an essential component of any decision support data ware house. It helps us to ensure a cost effective query performance, which in other words means that costs incurred to get the answers to a query would be more than off set by the benefits of the query answer. The data aggregation attempts to do this by reducing the processing power needed to process the queries. However, too much of aggregations would only lead to unacceptable levels of operational costs.Too little of aggregations may not improve the performance to the required levels. A file balancing ofthe two is essential to maintain the requirements stated above. One thumbrule that is often suggested is that about three out of every four queries would be optimized by the aggregation process, whereas the fourth will take its own time to get processed. The second, though minor, advantage of aggregations is that they allow us to get the overall trends in the data. While looking at individual data such overall trends may not be obvious, whereas aggregated data will help us draw certain conclusions easily.

24. Explain the different aspects for designing the summary table.Summary table are designed by following the steps given below :-i). Decide the dimensions along which aggregation is to be done.ii). Determine the aggregation of multiple facts.iii). Aggregate multiple facts into the summary table.iv). Determine the level of aggregation and the extent of embedding.v). Design time into the table.vi). Index the summary table.

25. Give the reasons for creating the data mart.The following are the reasons for which data marts are created :- i). Since the volume of data scanned is small, they speed up the query processing. ii). Data can be structured in a form suitable for a user access too iii). Data can be segmented or partitioned so that they can be used on different platforms andalso different control strategies become applicable.

26. Explain the two stages in setting up data marts.There are two stages in setting up data marts :-i). To decide whether data marts are needed at all. The above listed facts may help you todecide whether it is worth while to setup data marts or operate from the warehouse itself.The problem is almost similar to that of a merchant deciding whether he wants to set up retailshops or not.ii). If you decide that setting up data marts is desirable, then the following steps have to be gonethrough before you can freeze on the actual strategy of data marting. a) Identify the natural functional splits of the organization. b) Identify the natural splits of data. c) Check whether the proposed access tools have any special data base structures. d) Identify the infrastructure issues, if any, that can help in identifying the data marts. e) Look for restrictions on access control. They can serve to demarcate the warehouse details.

27. What are disadvantages of data mart?There are certain disadvantages :-i). The cost of setting up and operating data marts is quite high.ii). Once a data strategy is put in place, the datamart formats become fixed. It may be fairly difficult to change the strategy later, because the data marts formats also have to be changes.

28. What is role of access control issue in data mart design?Role of access control issue in data mart design :-This is one of the major constraints in data mart designs. Any data warehouse, with its huge volumeof data is, more often than not, subject to various access controls as to who could access which part of data. The easiest case is where the data is partitioned so clearly that a user of each partition cannot access any other data. In such cases, each of these can be put in a data mart and the user of each can access only his data .In the data ware house, the data pertaining to all these marts are stored, but the partitioning are retained. If a super user wants to get an overall view of the data, suitable aggregations can be generated.

29. Explain the purpose of using metadata in detail.Metadata will be used for the following purposes :-i). data transformation and loading.ii). data management.iii). query generation.

30. Explain the concept of metadata management.Meta data should be able to describe data as it resides in the data warehouse. This will help the warehouse manager to control data movements. The purpose of the metadata is to describe the objects in the database. Some of the descriptions are listed here. Tables - Columns * Names * Types Indexes - Columns * Name * Type Views - Columns * Name * Type Constraints - Name - Type - Table * Columns

31. How the query manager uses the Meta data? Explain in detail.Meta data is also required to generate queries. The query manger uses the metadata to build a history of all queries run and generator a query profile for each user, or group of uses.We simply list a few of the commonly used meta data for the query. The names are self explanatory.o Query o Table accessed Column accessed Name Reference identifiero Restrictions applied o Column name o Table name o Reference identifier o Restrictionso Join criteria applied o Column name o Table name o Reference identifier o Column name o Table name o Reference identifiero Aggregate function used o Column name o Reference identifier o Aggregate functiono Group by criteria o Column name o Reference identifier o Sort directiono Syntaxo Resourceso Disk o Read o Write o Temporary

32. Why we need different managers to a data ware house? Explain.Need for managers to a data ware house :-Data warehouses are not just large databases. They are complex environments that integrate manytechnologies. They are not static, but will be continuously changing both contentwise and structurewise. Thus, there is a constant need for maintenance and management. Since huge amounts of time, money and efforts are involved in the development of data warehouses, sophisticated management tools are always justified in the case of data warehouses.When the computer systems were in their initial stages of development, there used to be an army ofhuman managers, who went around doing all the administration and management. But such a scheme became both unvieldy and prone to errors as the systems grew in size and complexity. Further most of the management principles were adhoc in nature and were subject to human errors and fatigue.

33. With neat diagram explain the boundaries of process managers.A schematic diagram that defines the boundaries of the three types of managers :-

34. Explain the responsibilities of each manager of data ware house.Ware house Manager :-The warehouse manager is responsible for maintaining data of the ware house. It should also createand maintain a layer of meta data. Some of the responsibilities of the ware house manager areo Data movemento Meta data managemento Performance monitoringo Archiving.Data movement includes the transfer of data within the ware house, aggregation, creation andmaintenance of tables, indexes and other objects of importance. It should be able to create new aggregations as well as remove the old ones. Creation of additional rows / columns, keeping track of the aggregation processes and creating meta data are also its functions.

25. What are the different system management tools used for data warehouse?The different system management tools used for data warehouse :-i). Configuration managersii). schedule managersiii). event managersiv). database mangersv). back up recovery managersvi). resource and performance a monitors.

KU 5TH SEM ASSIGNMENT - BSIT (TB) - 53 (DATA WAREHOUSING & DATA MINING)PART - AI. Note: Answer all the questions.a) What is Normalization? What are the different forms of Normalization ?The usual approach in normalization in database applications is to ensure that the data is divided into two or more tables, such that when the data in one of them is updated, it does not lead to anamolies of data (The student is advised to refer any book on data base management systems for details, if interested).The idea is to ensure that when combined, the data available is consistent. However, in data warehousing, one may even tend to break the large table into several denormalized smaller tables. This may lead to lots of extra space being used. But it helps in an indirect way It avoids the overheads of joining the data during queries.

b) Define Data warehouse. What are roles of education in a data warehousing delivery process?Data Warehouse: In its simplest form, a data ware house is a collection of key pieces of information used to manage and direct the business for the most profitable outcome. It would decide the amount of inventory to be held, the no. of employees to be hired, the amount to be procured on loan etc.,.The above definition may not be precise - but that is how data ware house systems are. There are different definitions given by different authors, but we have this idea in mind and proceed. It is a large collection of data and a set of process managers that use this data to make information available. The data can be meta data, facts, dimensions and aggregations. The process managers can be load managers, ware house managers or query managers. The information made available is such that they allow the end users to make informed decisions.Roles of education in a data warehousing delivery process:-This has two roles to play - one to make people, specially top level policy makers, comfortable with the concept. The second role is to aid the prototyping activity. To take care of the education concept, an initial (usually scaled down) prototype is created and people are encouraged to interact with it. This would help achieve both the activities listed above. The users became comfortable with the use of the system and the ware house developer becomes aware of the limitations of his prototype which can be improvised upon.

c) What is process managers? What are the different types of process managers?Process Managers: These are responsible for the smooth flow, maintainance and upkeep of data into and out of the database.The main types of process managers are:-i). Load manager: to take case of source interaction, data transformation and data load.ii). Ware house manger: to take care of data movement, meta data management and performancemonitoring.iii). Query manager: to control query scheduling and monitoring.

We shall look into each of them briefly. Before that, we look at a schematic diagram that defines the boundaries of the three types of managers.

d) Give the architectures of data mining systems.

e) What are the guidelines for KDD environment ?It is customary in the computer industry to formulate rules of thumb that help information technology (IT) specialists to apply new developments. In setting up a reliable data mining environment we may follow the guidelines so that KDD system may work in a manner we desire.i). Support extremely large data setsii). Support hybrid learningiii). Establish a data warehouseiv). Introduce data cleaning facilitiesv). Facilitate working with dynamic codingvi). Integrate with decision support systemvii). Choose extendible architectureviii). Support heterogeneous databasesix). Introduce client/server architecturex). Introduce cache optimization

PART - BII. Answer any FIVE full questions.1. a) With the help of a diagram explain architecture of data warehouse.The architecture for a data ware is indicated below. Before we proceed further, we should be clear about the concept of architecture. It only gives the major items that make up a data ware house. The size and complexity of each of these items depend on the actual size of the ware house itself, the specific requirements of the ware house and the actual details of implementation.

Before looking into the details of each of the managers we could get a broad idea about their functionality by mapping the processes that we studied in the previous chapter to the managers. The extracting and loading processes are taken care of by the load manager. The processes of cleanup and transformation of data as also of back up and archiving are the duties of the ware house manage, while the query manager, as the name implies is to take case of query management.

b) Indicate the important function of a Load Manager, Warehouse Manager.Important function of Load Manager:i) To extract data from the source (s)ii) To load the data into a temporary storage deviceiii) To perform simple transformations to map it to the structures of the data ware house.

Important function of Warehouse Manager:i) Analyze the data to confirm data consistency and data integrity .ii) Transform and merge the source data from the temporary data storage into the ware house.iii) Create indexes, cross references, partition views etc.,.iv) Check for normalizations.v) Generate new aggregations, if needed.vi) Update all existing aggregationsvii) Create backups of data.viii) Archive the data that needs to be archived.

2. a) Differentiate between vertical partitioning and horizontal partitioning.In horizontal partitioning, we simply the first few thousand entries in one partition, the second few thousand in the next and so on. This can be done by partitioning by time, where in all data pertaining to the first month / first year is put in the first partition, the second one in the second partition and so on. The other alternatives can be based on different sized dimensions, partitioning an other dimensions, petitioning on the size of the table and round robin partitions. Each of them have certain advantages as well as disadvantages.In vertical partitioning, some columns are stored in one partition and certain other columns of the same row in a different partition. This can again be achieved either by normalization or row splitting. We will look into their relative trade offs.

b) What is schema? Distinguish between facts and dimensions.A schema, by definition, is a logical arrangements of facts that facilitate ease of storage and retrieval, as described by the end users. The end user is not bothered about the overall arrangements of the data or the fields in it. For example, a sales executive, trying to project the sales of a particular item is only interested in the sales details of that item where as a tax practitioner looking at the same data will be interested only in the amounts received by the company and the profits made.The star schema looks a good solution to the problem of ware housing. It simply states that one should identify the facts and store it in the read-only area and the dimensions surround the area. Whereas the dimensions are liable to change, the facts are not. But given a set of raw data from the sources, how does one identify the facts and the dimensions? It is not always easy, but the following steps can help in that direction.i) Look for the fundamental transactions in the entire business process. These basic entitiesare the facts.ii) Find out the important dimensions that apply to each of these facts. They are the candidatesfor dimension tables.iii) Ensure that facts do not include those candidates that are actually dimensions, with a set offacts attached to it.iv) Ensure that dimensions do not include these candidates that are actually facts.

3. a) What is an event in data warehousing? List any five events.An event is defined as a measurable, observable occurrence of a defined action. If this definition is quite vague, it is because it encompasses a very large set of operations. The event manager is a software that continuously monitors the system for the occurrence of the event and then take any action that is suitable (Note that the event is a measurable and observable occurrence). The action to be taken is also normally specific to the event.A partial list of the common events that need to be monitored are as follows:i). Running out of memory space.ii). A process dyingiii). A process using excessing resourceiv). I/O errorsv). Hardware failure

b) What is summary table? Describe the aspects to be looked into while designing a summary table.The main purpose of using summary tables is to cut down the time taken to execute a specific query.The main methodology involves minimizing the volume of data being scanned each time the query is to beanswered. In other words, partial answers to the query are already made available. For example, in theabove cited example of mobile market, if one expectsi) the citizens above 18 years of ageii) with salaries greater than 15,000 andiii) with professions that involve traveling are the potential customers, then, every time the query is to be processed (may be every month or every quarter), one will have to look at the entire data base to compute these values and then combine them suitably to get the relevant answers. The other method is to prepare summary tables, which have the values pertaining toe ach of these sub-queries, before hand, and then combine them as and when the query is raised.Summary table are designed by following the steps given below:i) Decide the dimensions along which aggregation is to be done.ii) Determine the aggregation of multiple facts.iii) Aggregate multiple facts into the summary table.iv) Determine the level of aggregation and the extent of embedding.v) Design time into the table.vi) Index the summary table.

4. a) List the significant issues in automatic cluster detection.Most of the issues related to automatic cluster detection are connected to the kinds of questions we want to be answered in the data mining project, or data preparation for their successful application.i). Distance measureMost clustering techniques use for the distance measure the Euclidean distance formula (square root of the sum of the squares of distances along each attribute axes).Non-numeric variables must be transformed and scaled before the clustering can take place. Dependingon this transformations, the categorical variables may dominate clustering results or they may be evencompletely ignored.ii). Choice of the right number of clustersIf the number of clusters k in the K-means method is not chosen so to match the natural structure of the data, the results will not be good. The proper way t alleviate this is to experiment with different values for k. In principle, the best k value will exhibit the smallest intra-cluster distances and largest inter-cluster distances.iii). Cluster interpretationOnce the clusters are discovered they have to be interpreted in order to have some value for the data mining project.

b) Define data marting. List the reasons for data marting.The data mart stores a subset of the data available in the ware house, so that one need not always have to scan through the entire content of the ware house. It is similar to a retail outlet. A data mart speeds up the queries, since the volume of data to be scanned is much less. It also helps to have tail or made processes for different access tools, imposing control strategies etc.,.Following are the reasons for which data marts are created:i) Since the volume of data scanned is small, they speed up the query processing.ii) Data can be structured in a form suitable for a user access tooiii) Data can be segmented or partitioned so that they can be used on different platforms and also different control strategies become applicable.

5. a) Explain how to categorize data mining system.There are many data mining systems available or being developed. Some are specialized systems dedicated to a given data source or are confined to limited data mining functionalities, other are more versatile and comprehensive. Data mining systems can be categorized according to various criteria among other classification are the following:a) Classification according to the type of data source mined: this classification categorizes data mining systems according to the type of data handled such as spatial data, multimedia data, time-series data, text data, World Wide Web, etc.b) Classification according to the data model drawn on: this classification categorizes data mining systems based on the data model involved such as relational database, object-oriented database, data warehouse, transactional, etc.c) Classification according to the king of knowledge discovered: this classification categorizes data mining systems based on the kind of knowledge discovered or data mining functionalities, such as characterization, discrimination, association, classification, clustering, etc. Some systems tend to be comprehensive systems offering several data mining functionalities together.d) Classification according to mining techniques used: Data mining systems employ and provide different techniques. This classification categorizes data mining systems according to the data analysis approach used such as machine learning, neural networks, genetic algorithms, statistics, visualization, database oriented or data warehouse-oriented, etc.

b) List and explain different kind of data that can be mined.Different kind of data that can be mined are listed below:-i). Flat files: Flat files are actually the most common data source for data mining algorithms, especially at the research level.ii). Relational Databases: A relational database consists of a set of tables containing either values of entity attributes, or values of attributes from entity relationships.iii). Data Warehouses: A data warehouse as a storehouse, is a repository of data collected from multiple data sources (often heterogeneous) and is intended to be used as a whole under the same unified schema.iv). Multimedia Databases: Multimedia databases include video, images, audio and text media. They can be stored on extended object-relational or object-oriented databases, or simply on a file system.v). Spatial Databases: Spatial databases are databases that in addition to usual data, store geographical information like maps, and global or regional positioning.vi). Time-Series Databases: Time-series databases contain time related data such stock market data or logged activities. These databases usually have a continuous flow of new data coming in, which sometimes causes the need for a challenging real time analysis.vii). World Wide Web: The World Wide Web is the most heterogeneous and dynamic repository available. A very large number of authors and publishers are continuously contributing to its growth and metamorphosis and a massive number of users are accessing its resources daily.

6. a) Give the syntax for task relevant data specification.Syntax for tax-relevant data specification:-The first step in defining a data mining task is the specification of the task-relevant data, that is, the data on which mining is to be performed. This involves specifying the database and tables or data warehouse containing the relevant data, conditions for selecting the relevant data, the relevant attributes or dimensions for exploration, and instructions regarding the ordering or grouping of the data retrieved. DMQL provides clauses for the clauses for the specification of such information, as follows:-i). use database (database_name) or use data warehouse (data_warehouse_name): The use clause directs the mining task to the database or data warehouse specified.ii). from (relation(s)/cube(s)) [where(condition)]: The from and where clauses respectively specify the database tables or data cubes involved, and the conditions defining the data to be retrieved.iii). in relevance to (attribute_or_dimension_list): This clause lists the attributes or dimensions for exploration.iv). order by (order_list): The order by clause specifies the sorting order of the task relevant data.v). group by (grouping_list): the group by clause specifies criteria for grouping the data.vi). having (conditions): The having cluase specifies the condition by which groups of data are considered relevant.

b) Explain the designing of GUI based on data mining query language.A data mining query language provides necessary primitives that allow users to communicate with data mining systems. But novice users may find data mining query language difficult to use and the syntax difficult to remember. Instead , user may prefer to communicate with data mining systems through a graphical user interface (GUI). In relational database technology , SQL serves as a standard core language for relational systems , on top of which GUIs can easily be designed. Similarly, a data mining query language may serve as a core language for data mining system implementations, providing a basis for the development of GUI for effective data mining.A data mining GUI may consist of the following functional components:-a) Data collection and data mining query composition - This component allows the user to specify task-relevant data sets and to compose data mining queries. It is similar to GUIs used for the specification of relational queries.b) Presentation of discovered patterns This component allows the display of the discovered patterns in various forms, including tables, graphs, charts, curves and other visualization techniques.c) Hierarchy specification and manipulation - This component allows for concept hierarchy specification , either manually by the user or automatically. In addition , this component should allow concept hierarchies to be modified by the user or adjusted automatically based on a given data set distribution.d) Manipulation of data mining primitives This component may allow the dynamic adjustment of data mining thresholds, as well as the selection, display and modification of concept hierarchies. It may also allow the modification of previous data mining queries or conditions.e) Interactive multilevel mining This component should allow roll-up or drill-down operations on discovered patterns.f) Other miscellaneous information This component may include on-line help manuals, indexed search , debugging and other interactive graphical facilities.

7. a) Explain how decision trees are useful in data mining.Decision trees are powerful and popular tools for classification and prediction. The attractiveness of tree-based methods is due in large part to the fact that, it is simple and decision trees represent rules. Rules can readily be expressed so that we humans can understand them or in a database access language like SQL so that records falling into a particular category may be retrieved.

b) Identify an application and also explain the techniques that can be incorporated in solving the problem using data mining techniques.Write yourself...

8. Write a short notes on :i) Data Mining Querying Languageii) Schedule Manageriii) Data Formatting.i) Data Mining Querying LanguageA data mining language helps in effective knowledge discovery from the data mining systems. Designinga comprehensive data mining language is challenging because data mining covers a wide spectrum oftasks from data characterization to mining association rules, data classification and evolution analysis.Each task has different requirements. The design of an effective data mining query language requires adeep understanding of the power, limitation and underlying mechanism of the various kinds of data miningtasks.ii) Schedule managerThe scheduling is the key for successful warehouse management. Almost all operations in the warehouse need some type of scheduling. Every operating system will have its own scheduler and batchcontrol mechanism. But these schedulers may not be capable of fully meeting the requirements of a datawarehouse. Hence it is more desirable to have specially designed schedulers to manage the operations.iii) Data formattingFinal data preparation step which represents syntactic modifications to the data that do not change itsmeaning, but are required by the particular modelling tool chosen for the DM task. These include:a). reordering of the attributes or records: some modelling tools require reordering of the attributes(or records) in the dataset: putting target attribute at the beginning or at the end, randomizingorder of records (required by neural networks for example)b). changes related to the constraints of modelling tools: removing commas or tabs, specialcharacters, trimming strings to maximum allowed number of characters, replacing specialcharacters with allowed set of special characters.

Software Quality and TestingAssignment: TA (Compulsory)1.What is softwaretesting? Software testing is tougher thanhardware testing, justify youranswer.Ans:-Software testing is the process of executing a program with the intent of finding errors. It is used to ensure the correctness of a software product. Software testing is also done to add value to software so that its quality and reliability is raised.Software testing is a critical element of software quality assurance and represents the ultimate process to ensure the correctness of the product. The quality of product always enhances the customer confidence in using the product thereby increasing the business economics. In other words, a good quality product means zero defects, which is derived from a betterquality processin testing.Testing the product means adding value to it which means raising the quality or reliability of the program. Raising the reliability of the product means finding and removing errors. Hence one should not test a product to show that it works; rather, one should start with the assumption that the program contains errors and then test the program to find as many errors as possible.2. Explain thetest informationflow in a typical software test life cycle.Ans:-Testing is a complex process and requires effort similar to software development . a typicaltest informationflow is show in figure

Predicted ReliabilitySoftware Configuration includes a Software Requirements Specification, a Design Specification, andsource code. A test configuration includes a Test Plan and Procedures, test cases, and testing tools. It is difficult to predict the time to debug the code, hence it is difficult to schedule.Once the right software is available for testing, proper test plan and test cases are developed. Then the software is subjected to test with simulated test data. After the test execution, the testresultsare examined. It may have defects or the software is passed with out any defect. The software with defect is subjected to debugging and again tested for its correctness. This process will continue till the testing reports zero defects or run out of time for testing.3.What is risk in software testing? How risk management improves the quality of the software?Ans:-Therisk associated

Documents

Bsc IT 5th Sem Assignment Solved Answer