118
LINUX PROGRAMMING AND DATA MINIG LAB MANUAL IV-BTECH VID

Lpdm Lab Manul

Embed Size (px)

Citation preview

Page 1: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINIG LAB MANUAL

IV-BTECH

VID

COMPUTER SCIENCE AND ENGINEERINGINSVIDYA VIKAS INSTITUTE OF TECHNOLOGY

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 2

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Contents

SNo TopicPage no

3

Week1 1 Write a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

7

2 Write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

3 Write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

4 Write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory and reports accordingly Whenever the argument is a file the number of lines on it is also reported

4

Week 25 Write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

106 Write a shell script to list all of the directory files in a directory

7 Write a shell script to find factorial of a given integer

5

Week 3 8 Write an awk script to count the number of lines in a file that do not contain vowels

139 Write an awk script to find the number of characters words and lines in a file

10 Write a c program that makes a copy of a file using standard IO and system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 3

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6

Week 4 11 Implement in C the following UNIX commands using System calls A cat B ls C mv

1512 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file A File type B Number of links C Time of last access D Read Write and Execute permissions

7

Week 513 Write a C program to emulate the UNIX ls ndashl command

1914 Write a C program to list for every file in a directory its inode number and file name15 Write a C program that demonstrates redirection of standard output to a file

Ex ls gt f1

8

Week 616 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

2917 Write a C program to create a Zombie process

18 Write a C program that illustrates how an orphan is created

9

Week 719 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

31

20 Write C programs that illustrate communication between two unrelated processes using named pipe

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

10

Week 823 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

40

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 4

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

24 Write a C program that illustrates suspending and resuming processes using signals

11Week 925 Write a C program that implements a producer-consumer system with two processes

41 (Using Semaphores)26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

12Week 1027 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

4728 Write a C program that illustrates two processes communicating using shared memory

13 Listing of categorical attributes and the real-valued attributes separately 55

14 Rules for identifying attributes 56

15 Training a decision tree 59

16 Test on classification of decision tree 63

17 Testing on the training set 67

18 Using cross ndashvalidation for training 68

19 Significance of attributes in decision tree 71

20 Trying generation of decision tree with various number of decision tree 74

21 Find out differences in results using decision tree and cross-validation on a data set

76

22 Decision trees 78

23 Reduced error pruning for training Decision Trees using cross-validation 78

24 Convert a Decision Trees into if-then-else rules 81

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 5

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 6

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week1

1 Write a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Aim ToWrite a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Script$ awk lsquoNRlt2 || NRgt 4 print $0rsquo 5 linesdat

IP line1line2line3line4line5

OP line1 line5

2 Write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Aim To write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Scriptcleari=1while [ $i -le $ ]dogrep -v Unix $i gt $idone

Output$ sh 1bsh test1the contents before deletingtest1hello hello

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 7

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bangaloremysore cityenter the word to be deletedcityafter deletinghello hello Bangalore

$ sh 1bshno argument passed

3 Write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Aim To write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Scriptecho enter the directory nameread dirif [ -d $dir ]then cd $dirls gt fexec lt fwhile read linedoif [ -f $line ]thenif [ -r $line -a -w $line -a -x $line ]thenecho $line has all permissionselseecho files not having all permissionsfifi

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 8

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

done fi

4 Write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory and reports accordingly Whenever the argument is a file the number of lines on it is also reported

Aim To write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory

Script for x in $

doif [ -f $x ]thenecho $x is a file echo no of lines in the file are wc -l $xelif [ -d $x ]thenecho $x is a directory elseecho enter valid filename or directory name fi

done

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 9

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 2

5 Write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Aim To write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Scriptif [ $ -ne 2 ]thenecho Error Invalid number of argumentsexitfistr=`cat $1 | tr n `for a in $strdoecho Word = $a Count = `grep -c $a $2`done

Output $ cat testhello ATRI$ cat test1hello ATRIhello ATRIhello$ sh 1sh test test1Word = hello Count = 3Word = ATRI Count = 2

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 10

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6 Write a shell script to list all of the directory files in a directory

Script binbashechoenter directory nameread dirif[ -d $dir]thenecholist of files in the directoryls $direlse echoenter proper directory name

fi Output Enter directory name Atri List of all files in the directoty CSEtxt ECEtxt

7 Write a shell script to find factorial of a given integer Script

binbashecho enter a numberread numfact=1while [ $num -ge 1 ]dofact=`expr $fact $num`let num--done

echo factorial of $n is $fact

Output Enter a number

5

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 11

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Factorial of 5 is 120

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 12

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 3

8 Write an awk script to count the number of lines in a file that do not contain vowels 9 Write an awk script to find the number of characters words and lines in a file

Aim To write an awk script to find the number of characters words and lines in a file

ScriptBEGINprint recordt characters t wordsBODY sectionlen=length($0)total_len+=lenprint(NRtlentNF$0)words+=NF

ENDprint(n total)print(characters t total len)print(lines t NR)

10 Write a c program that makes a copy of a file using standard IO and system calls

include ltunistdhgt include ltfcntlhgtint main(int argc char argv[])int fd1 fd2char buffer[100]long int n1if(((fd1 = open(argv[1] O_RDONLY)) == -1) ||((fd2 = open(argv[2] O_CREAT|O_WRONLY|O_TRUNC0700)) == -1))perror(file problem )exit(1)while((n1=read(fd1 buffer 100)) gt 0)if(write(fd2 buffer n1) = n1)perror(writing problem )exit(3)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 13

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 2: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 2

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Contents

SNo TopicPage no

3

Week1 1 Write a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

7

2 Write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

3 Write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

4 Write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory and reports accordingly Whenever the argument is a file the number of lines on it is also reported

4

Week 25 Write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

106 Write a shell script to list all of the directory files in a directory

7 Write a shell script to find factorial of a given integer

5

Week 3 8 Write an awk script to count the number of lines in a file that do not contain vowels

139 Write an awk script to find the number of characters words and lines in a file

10 Write a c program that makes a copy of a file using standard IO and system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 3

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6

Week 4 11 Implement in C the following UNIX commands using System calls A cat B ls C mv

1512 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file A File type B Number of links C Time of last access D Read Write and Execute permissions

7

Week 513 Write a C program to emulate the UNIX ls ndashl command

1914 Write a C program to list for every file in a directory its inode number and file name15 Write a C program that demonstrates redirection of standard output to a file

Ex ls gt f1

8

Week 616 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

2917 Write a C program to create a Zombie process

18 Write a C program that illustrates how an orphan is created

9

Week 719 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

31

20 Write C programs that illustrate communication between two unrelated processes using named pipe

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

10

Week 823 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

40

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 4

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

24 Write a C program that illustrates suspending and resuming processes using signals

11Week 925 Write a C program that implements a producer-consumer system with two processes

41 (Using Semaphores)26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

12Week 1027 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

4728 Write a C program that illustrates two processes communicating using shared memory

13 Listing of categorical attributes and the real-valued attributes separately 55

14 Rules for identifying attributes 56

15 Training a decision tree 59

16 Test on classification of decision tree 63

17 Testing on the training set 67

18 Using cross ndashvalidation for training 68

19 Significance of attributes in decision tree 71

20 Trying generation of decision tree with various number of decision tree 74

21 Find out differences in results using decision tree and cross-validation on a data set

76

22 Decision trees 78

23 Reduced error pruning for training Decision Trees using cross-validation 78

24 Convert a Decision Trees into if-then-else rules 81

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 5

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 6

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week1

1 Write a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Aim ToWrite a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Script$ awk lsquoNRlt2 || NRgt 4 print $0rsquo 5 linesdat

IP line1line2line3line4line5

OP line1 line5

2 Write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Aim To write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Scriptcleari=1while [ $i -le $ ]dogrep -v Unix $i gt $idone

Output$ sh 1bsh test1the contents before deletingtest1hello hello

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 7

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bangaloremysore cityenter the word to be deletedcityafter deletinghello hello Bangalore

$ sh 1bshno argument passed

3 Write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Aim To write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Scriptecho enter the directory nameread dirif [ -d $dir ]then cd $dirls gt fexec lt fwhile read linedoif [ -f $line ]thenif [ -r $line -a -w $line -a -x $line ]thenecho $line has all permissionselseecho files not having all permissionsfifi

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 8

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

done fi

4 Write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory and reports accordingly Whenever the argument is a file the number of lines on it is also reported

Aim To write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory

Script for x in $

doif [ -f $x ]thenecho $x is a file echo no of lines in the file are wc -l $xelif [ -d $x ]thenecho $x is a directory elseecho enter valid filename or directory name fi

done

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 9

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 2

5 Write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Aim To write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Scriptif [ $ -ne 2 ]thenecho Error Invalid number of argumentsexitfistr=`cat $1 | tr n `for a in $strdoecho Word = $a Count = `grep -c $a $2`done

Output $ cat testhello ATRI$ cat test1hello ATRIhello ATRIhello$ sh 1sh test test1Word = hello Count = 3Word = ATRI Count = 2

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 10

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6 Write a shell script to list all of the directory files in a directory

Script binbashechoenter directory nameread dirif[ -d $dir]thenecholist of files in the directoryls $direlse echoenter proper directory name

fi Output Enter directory name Atri List of all files in the directoty CSEtxt ECEtxt

7 Write a shell script to find factorial of a given integer Script

binbashecho enter a numberread numfact=1while [ $num -ge 1 ]dofact=`expr $fact $num`let num--done

echo factorial of $n is $fact

Output Enter a number

5

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 11

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Factorial of 5 is 120

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 12

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 3

8 Write an awk script to count the number of lines in a file that do not contain vowels 9 Write an awk script to find the number of characters words and lines in a file

Aim To write an awk script to find the number of characters words and lines in a file

ScriptBEGINprint recordt characters t wordsBODY sectionlen=length($0)total_len+=lenprint(NRtlentNF$0)words+=NF

ENDprint(n total)print(characters t total len)print(lines t NR)

10 Write a c program that makes a copy of a file using standard IO and system calls

include ltunistdhgt include ltfcntlhgtint main(int argc char argv[])int fd1 fd2char buffer[100]long int n1if(((fd1 = open(argv[1] O_RDONLY)) == -1) ||((fd2 = open(argv[2] O_CREAT|O_WRONLY|O_TRUNC0700)) == -1))perror(file problem )exit(1)while((n1=read(fd1 buffer 100)) gt 0)if(write(fd2 buffer n1) = n1)perror(writing problem )exit(3)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 13

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 3: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Contents

SNo TopicPage no

3

Week1 1 Write a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

7

2 Write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

3 Write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

4 Write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory and reports accordingly Whenever the argument is a file the number of lines on it is also reported

4

Week 25 Write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

106 Write a shell script to list all of the directory files in a directory

7 Write a shell script to find factorial of a given integer

5

Week 3 8 Write an awk script to count the number of lines in a file that do not contain vowels

139 Write an awk script to find the number of characters words and lines in a file

10 Write a c program that makes a copy of a file using standard IO and system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 3

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6

Week 4 11 Implement in C the following UNIX commands using System calls A cat B ls C mv

1512 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file A File type B Number of links C Time of last access D Read Write and Execute permissions

7

Week 513 Write a C program to emulate the UNIX ls ndashl command

1914 Write a C program to list for every file in a directory its inode number and file name15 Write a C program that demonstrates redirection of standard output to a file

Ex ls gt f1

8

Week 616 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

2917 Write a C program to create a Zombie process

18 Write a C program that illustrates how an orphan is created

9

Week 719 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

31

20 Write C programs that illustrate communication between two unrelated processes using named pipe

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

10

Week 823 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

40

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 4

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

24 Write a C program that illustrates suspending and resuming processes using signals

11Week 925 Write a C program that implements a producer-consumer system with two processes

41 (Using Semaphores)26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

12Week 1027 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

4728 Write a C program that illustrates two processes communicating using shared memory

13 Listing of categorical attributes and the real-valued attributes separately 55

14 Rules for identifying attributes 56

15 Training a decision tree 59

16 Test on classification of decision tree 63

17 Testing on the training set 67

18 Using cross ndashvalidation for training 68

19 Significance of attributes in decision tree 71

20 Trying generation of decision tree with various number of decision tree 74

21 Find out differences in results using decision tree and cross-validation on a data set

76

22 Decision trees 78

23 Reduced error pruning for training Decision Trees using cross-validation 78

24 Convert a Decision Trees into if-then-else rules 81

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 5

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 6

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week1

1 Write a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Aim ToWrite a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Script$ awk lsquoNRlt2 || NRgt 4 print $0rsquo 5 linesdat

IP line1line2line3line4line5

OP line1 line5

2 Write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Aim To write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Scriptcleari=1while [ $i -le $ ]dogrep -v Unix $i gt $idone

Output$ sh 1bsh test1the contents before deletingtest1hello hello

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 7

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bangaloremysore cityenter the word to be deletedcityafter deletinghello hello Bangalore

$ sh 1bshno argument passed

3 Write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Aim To write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Scriptecho enter the directory nameread dirif [ -d $dir ]then cd $dirls gt fexec lt fwhile read linedoif [ -f $line ]thenif [ -r $line -a -w $line -a -x $line ]thenecho $line has all permissionselseecho files not having all permissionsfifi

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 8

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

done fi

4 Write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory and reports accordingly Whenever the argument is a file the number of lines on it is also reported

Aim To write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory

Script for x in $

doif [ -f $x ]thenecho $x is a file echo no of lines in the file are wc -l $xelif [ -d $x ]thenecho $x is a directory elseecho enter valid filename or directory name fi

done

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 9

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 2

5 Write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Aim To write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Scriptif [ $ -ne 2 ]thenecho Error Invalid number of argumentsexitfistr=`cat $1 | tr n `for a in $strdoecho Word = $a Count = `grep -c $a $2`done

Output $ cat testhello ATRI$ cat test1hello ATRIhello ATRIhello$ sh 1sh test test1Word = hello Count = 3Word = ATRI Count = 2

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 10

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6 Write a shell script to list all of the directory files in a directory

Script binbashechoenter directory nameread dirif[ -d $dir]thenecholist of files in the directoryls $direlse echoenter proper directory name

fi Output Enter directory name Atri List of all files in the directoty CSEtxt ECEtxt

7 Write a shell script to find factorial of a given integer Script

binbashecho enter a numberread numfact=1while [ $num -ge 1 ]dofact=`expr $fact $num`let num--done

echo factorial of $n is $fact

Output Enter a number

5

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 11

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Factorial of 5 is 120

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 12

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 3

8 Write an awk script to count the number of lines in a file that do not contain vowels 9 Write an awk script to find the number of characters words and lines in a file

Aim To write an awk script to find the number of characters words and lines in a file

ScriptBEGINprint recordt characters t wordsBODY sectionlen=length($0)total_len+=lenprint(NRtlentNF$0)words+=NF

ENDprint(n total)print(characters t total len)print(lines t NR)

10 Write a c program that makes a copy of a file using standard IO and system calls

include ltunistdhgt include ltfcntlhgtint main(int argc char argv[])int fd1 fd2char buffer[100]long int n1if(((fd1 = open(argv[1] O_RDONLY)) == -1) ||((fd2 = open(argv[2] O_CREAT|O_WRONLY|O_TRUNC0700)) == -1))perror(file problem )exit(1)while((n1=read(fd1 buffer 100)) gt 0)if(write(fd2 buffer n1) = n1)perror(writing problem )exit(3)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 13

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 4: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6

Week 4 11 Implement in C the following UNIX commands using System calls A cat B ls C mv

1512 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file A File type B Number of links C Time of last access D Read Write and Execute permissions

7

Week 513 Write a C program to emulate the UNIX ls ndashl command

1914 Write a C program to list for every file in a directory its inode number and file name15 Write a C program that demonstrates redirection of standard output to a file

Ex ls gt f1

8

Week 616 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

2917 Write a C program to create a Zombie process

18 Write a C program that illustrates how an orphan is created

9

Week 719 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

31

20 Write C programs that illustrate communication between two unrelated processes using named pipe

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

10

Week 823 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

40

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 4

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

24 Write a C program that illustrates suspending and resuming processes using signals

11Week 925 Write a C program that implements a producer-consumer system with two processes

41 (Using Semaphores)26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

12Week 1027 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

4728 Write a C program that illustrates two processes communicating using shared memory

13 Listing of categorical attributes and the real-valued attributes separately 55

14 Rules for identifying attributes 56

15 Training a decision tree 59

16 Test on classification of decision tree 63

17 Testing on the training set 67

18 Using cross ndashvalidation for training 68

19 Significance of attributes in decision tree 71

20 Trying generation of decision tree with various number of decision tree 74

21 Find out differences in results using decision tree and cross-validation on a data set

76

22 Decision trees 78

23 Reduced error pruning for training Decision Trees using cross-validation 78

24 Convert a Decision Trees into if-then-else rules 81

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 5

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 6

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week1

1 Write a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Aim ToWrite a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Script$ awk lsquoNRlt2 || NRgt 4 print $0rsquo 5 linesdat

IP line1line2line3line4line5

OP line1 line5

2 Write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Aim To write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Scriptcleari=1while [ $i -le $ ]dogrep -v Unix $i gt $idone

Output$ sh 1bsh test1the contents before deletingtest1hello hello

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 7

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bangaloremysore cityenter the word to be deletedcityafter deletinghello hello Bangalore

$ sh 1bshno argument passed

3 Write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Aim To write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Scriptecho enter the directory nameread dirif [ -d $dir ]then cd $dirls gt fexec lt fwhile read linedoif [ -f $line ]thenif [ -r $line -a -w $line -a -x $line ]thenecho $line has all permissionselseecho files not having all permissionsfifi

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 8

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

done fi

4 Write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory and reports accordingly Whenever the argument is a file the number of lines on it is also reported

Aim To write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory

Script for x in $

doif [ -f $x ]thenecho $x is a file echo no of lines in the file are wc -l $xelif [ -d $x ]thenecho $x is a directory elseecho enter valid filename or directory name fi

done

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 9

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 2

5 Write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Aim To write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Scriptif [ $ -ne 2 ]thenecho Error Invalid number of argumentsexitfistr=`cat $1 | tr n `for a in $strdoecho Word = $a Count = `grep -c $a $2`done

Output $ cat testhello ATRI$ cat test1hello ATRIhello ATRIhello$ sh 1sh test test1Word = hello Count = 3Word = ATRI Count = 2

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 10

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6 Write a shell script to list all of the directory files in a directory

Script binbashechoenter directory nameread dirif[ -d $dir]thenecholist of files in the directoryls $direlse echoenter proper directory name

fi Output Enter directory name Atri List of all files in the directoty CSEtxt ECEtxt

7 Write a shell script to find factorial of a given integer Script

binbashecho enter a numberread numfact=1while [ $num -ge 1 ]dofact=`expr $fact $num`let num--done

echo factorial of $n is $fact

Output Enter a number

5

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 11

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Factorial of 5 is 120

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 12

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 3

8 Write an awk script to count the number of lines in a file that do not contain vowels 9 Write an awk script to find the number of characters words and lines in a file

Aim To write an awk script to find the number of characters words and lines in a file

ScriptBEGINprint recordt characters t wordsBODY sectionlen=length($0)total_len+=lenprint(NRtlentNF$0)words+=NF

ENDprint(n total)print(characters t total len)print(lines t NR)

10 Write a c program that makes a copy of a file using standard IO and system calls

include ltunistdhgt include ltfcntlhgtint main(int argc char argv[])int fd1 fd2char buffer[100]long int n1if(((fd1 = open(argv[1] O_RDONLY)) == -1) ||((fd2 = open(argv[2] O_CREAT|O_WRONLY|O_TRUNC0700)) == -1))perror(file problem )exit(1)while((n1=read(fd1 buffer 100)) gt 0)if(write(fd2 buffer n1) = n1)perror(writing problem )exit(3)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 13

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 5: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

24 Write a C program that illustrates suspending and resuming processes using signals

11Week 925 Write a C program that implements a producer-consumer system with two processes

41 (Using Semaphores)26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

12Week 1027 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

4728 Write a C program that illustrates two processes communicating using shared memory

13 Listing of categorical attributes and the real-valued attributes separately 55

14 Rules for identifying attributes 56

15 Training a decision tree 59

16 Test on classification of decision tree 63

17 Testing on the training set 67

18 Using cross ndashvalidation for training 68

19 Significance of attributes in decision tree 71

20 Trying generation of decision tree with various number of decision tree 74

21 Find out differences in results using decision tree and cross-validation on a data set

76

22 Decision trees 78

23 Reduced error pruning for training Decision Trees using cross-validation 78

24 Convert a Decision Trees into if-then-else rules 81

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 5

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 6

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week1

1 Write a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Aim ToWrite a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Script$ awk lsquoNRlt2 || NRgt 4 print $0rsquo 5 linesdat

IP line1line2line3line4line5

OP line1 line5

2 Write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Aim To write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Scriptcleari=1while [ $i -le $ ]dogrep -v Unix $i gt $idone

Output$ sh 1bsh test1the contents before deletingtest1hello hello

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 7

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bangaloremysore cityenter the word to be deletedcityafter deletinghello hello Bangalore

$ sh 1bshno argument passed

3 Write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Aim To write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Scriptecho enter the directory nameread dirif [ -d $dir ]then cd $dirls gt fexec lt fwhile read linedoif [ -f $line ]thenif [ -r $line -a -w $line -a -x $line ]thenecho $line has all permissionselseecho files not having all permissionsfifi

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 8

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

done fi

4 Write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory and reports accordingly Whenever the argument is a file the number of lines on it is also reported

Aim To write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory

Script for x in $

doif [ -f $x ]thenecho $x is a file echo no of lines in the file are wc -l $xelif [ -d $x ]thenecho $x is a directory elseecho enter valid filename or directory name fi

done

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 9

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 2

5 Write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Aim To write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Scriptif [ $ -ne 2 ]thenecho Error Invalid number of argumentsexitfistr=`cat $1 | tr n `for a in $strdoecho Word = $a Count = `grep -c $a $2`done

Output $ cat testhello ATRI$ cat test1hello ATRIhello ATRIhello$ sh 1sh test test1Word = hello Count = 3Word = ATRI Count = 2

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 10

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6 Write a shell script to list all of the directory files in a directory

Script binbashechoenter directory nameread dirif[ -d $dir]thenecholist of files in the directoryls $direlse echoenter proper directory name

fi Output Enter directory name Atri List of all files in the directoty CSEtxt ECEtxt

7 Write a shell script to find factorial of a given integer Script

binbashecho enter a numberread numfact=1while [ $num -ge 1 ]dofact=`expr $fact $num`let num--done

echo factorial of $n is $fact

Output Enter a number

5

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 11

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Factorial of 5 is 120

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 12

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 3

8 Write an awk script to count the number of lines in a file that do not contain vowels 9 Write an awk script to find the number of characters words and lines in a file

Aim To write an awk script to find the number of characters words and lines in a file

ScriptBEGINprint recordt characters t wordsBODY sectionlen=length($0)total_len+=lenprint(NRtlentNF$0)words+=NF

ENDprint(n total)print(characters t total len)print(lines t NR)

10 Write a c program that makes a copy of a file using standard IO and system calls

include ltunistdhgt include ltfcntlhgtint main(int argc char argv[])int fd1 fd2char buffer[100]long int n1if(((fd1 = open(argv[1] O_RDONLY)) == -1) ||((fd2 = open(argv[2] O_CREAT|O_WRONLY|O_TRUNC0700)) == -1))perror(file problem )exit(1)while((n1=read(fd1 buffer 100)) gt 0)if(write(fd2 buffer n1) = n1)perror(writing problem )exit(3)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 13

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 6: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 6

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week1

1 Write a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Aim ToWrite a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Script$ awk lsquoNRlt2 || NRgt 4 print $0rsquo 5 linesdat

IP line1line2line3line4line5

OP line1 line5

2 Write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Aim To write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Scriptcleari=1while [ $i -le $ ]dogrep -v Unix $i gt $idone

Output$ sh 1bsh test1the contents before deletingtest1hello hello

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 7

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bangaloremysore cityenter the word to be deletedcityafter deletinghello hello Bangalore

$ sh 1bshno argument passed

3 Write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Aim To write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Scriptecho enter the directory nameread dirif [ -d $dir ]then cd $dirls gt fexec lt fwhile read linedoif [ -f $line ]thenif [ -r $line -a -w $line -a -x $line ]thenecho $line has all permissionselseecho files not having all permissionsfifi

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 8

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

done fi

4 Write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory and reports accordingly Whenever the argument is a file the number of lines on it is also reported

Aim To write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory

Script for x in $

doif [ -f $x ]thenecho $x is a file echo no of lines in the file are wc -l $xelif [ -d $x ]thenecho $x is a directory elseecho enter valid filename or directory name fi

done

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 9

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 2

5 Write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Aim To write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Scriptif [ $ -ne 2 ]thenecho Error Invalid number of argumentsexitfistr=`cat $1 | tr n `for a in $strdoecho Word = $a Count = `grep -c $a $2`done

Output $ cat testhello ATRI$ cat test1hello ATRIhello ATRIhello$ sh 1sh test test1Word = hello Count = 3Word = ATRI Count = 2

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 10

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6 Write a shell script to list all of the directory files in a directory

Script binbashechoenter directory nameread dirif[ -d $dir]thenecholist of files in the directoryls $direlse echoenter proper directory name

fi Output Enter directory name Atri List of all files in the directoty CSEtxt ECEtxt

7 Write a shell script to find factorial of a given integer Script

binbashecho enter a numberread numfact=1while [ $num -ge 1 ]dofact=`expr $fact $num`let num--done

echo factorial of $n is $fact

Output Enter a number

5

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 11

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Factorial of 5 is 120

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 12

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 3

8 Write an awk script to count the number of lines in a file that do not contain vowels 9 Write an awk script to find the number of characters words and lines in a file

Aim To write an awk script to find the number of characters words and lines in a file

ScriptBEGINprint recordt characters t wordsBODY sectionlen=length($0)total_len+=lenprint(NRtlentNF$0)words+=NF

ENDprint(n total)print(characters t total len)print(lines t NR)

10 Write a c program that makes a copy of a file using standard IO and system calls

include ltunistdhgt include ltfcntlhgtint main(int argc char argv[])int fd1 fd2char buffer[100]long int n1if(((fd1 = open(argv[1] O_RDONLY)) == -1) ||((fd2 = open(argv[2] O_CREAT|O_WRONLY|O_TRUNC0700)) == -1))perror(file problem )exit(1)while((n1=read(fd1 buffer 100)) gt 0)if(write(fd2 buffer n1) = n1)perror(writing problem )exit(3)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 13

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 7: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week1

1 Write a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Aim ToWrite a shell script that accepts a file name starting and ending line numbers as arguments and displays all the lines between the given line numbers

Script$ awk lsquoNRlt2 || NRgt 4 print $0rsquo 5 linesdat

IP line1line2line3line4line5

OP line1 line5

2 Write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Aim To write a shell script that deletes all lines containing a specified word in one or more files supplied as arguments to it

Scriptcleari=1while [ $i -le $ ]dogrep -v Unix $i gt $idone

Output$ sh 1bsh test1the contents before deletingtest1hello hello

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 7

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bangaloremysore cityenter the word to be deletedcityafter deletinghello hello Bangalore

$ sh 1bshno argument passed

3 Write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Aim To write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Scriptecho enter the directory nameread dirif [ -d $dir ]then cd $dirls gt fexec lt fwhile read linedoif [ -f $line ]thenif [ -r $line -a -w $line -a -x $line ]thenecho $line has all permissionselseecho files not having all permissionsfifi

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 8

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

done fi

4 Write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory and reports accordingly Whenever the argument is a file the number of lines on it is also reported

Aim To write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory

Script for x in $

doif [ -f $x ]thenecho $x is a file echo no of lines in the file are wc -l $xelif [ -d $x ]thenecho $x is a directory elseecho enter valid filename or directory name fi

done

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 9

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 2

5 Write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Aim To write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Scriptif [ $ -ne 2 ]thenecho Error Invalid number of argumentsexitfistr=`cat $1 | tr n `for a in $strdoecho Word = $a Count = `grep -c $a $2`done

Output $ cat testhello ATRI$ cat test1hello ATRIhello ATRIhello$ sh 1sh test test1Word = hello Count = 3Word = ATRI Count = 2

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 10

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6 Write a shell script to list all of the directory files in a directory

Script binbashechoenter directory nameread dirif[ -d $dir]thenecholist of files in the directoryls $direlse echoenter proper directory name

fi Output Enter directory name Atri List of all files in the directoty CSEtxt ECEtxt

7 Write a shell script to find factorial of a given integer Script

binbashecho enter a numberread numfact=1while [ $num -ge 1 ]dofact=`expr $fact $num`let num--done

echo factorial of $n is $fact

Output Enter a number

5

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 11

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Factorial of 5 is 120

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 12

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 3

8 Write an awk script to count the number of lines in a file that do not contain vowels 9 Write an awk script to find the number of characters words and lines in a file

Aim To write an awk script to find the number of characters words and lines in a file

ScriptBEGINprint recordt characters t wordsBODY sectionlen=length($0)total_len+=lenprint(NRtlentNF$0)words+=NF

ENDprint(n total)print(characters t total len)print(lines t NR)

10 Write a c program that makes a copy of a file using standard IO and system calls

include ltunistdhgt include ltfcntlhgtint main(int argc char argv[])int fd1 fd2char buffer[100]long int n1if(((fd1 = open(argv[1] O_RDONLY)) == -1) ||((fd2 = open(argv[2] O_CREAT|O_WRONLY|O_TRUNC0700)) == -1))perror(file problem )exit(1)while((n1=read(fd1 buffer 100)) gt 0)if(write(fd2 buffer n1) = n1)perror(writing problem )exit(3)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 13

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 8: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bangaloremysore cityenter the word to be deletedcityafter deletinghello hello Bangalore

$ sh 1bshno argument passed

3 Write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Aim To write a shell script that displays a list of all the files in the current directory to which the user has read write and execute permissions

Scriptecho enter the directory nameread dirif [ -d $dir ]then cd $dirls gt fexec lt fwhile read linedoif [ -f $line ]thenif [ -r $line -a -w $line -a -x $line ]thenecho $line has all permissionselseecho files not having all permissionsfifi

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 8

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

done fi

4 Write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory and reports accordingly Whenever the argument is a file the number of lines on it is also reported

Aim To write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory

Script for x in $

doif [ -f $x ]thenecho $x is a file echo no of lines in the file are wc -l $xelif [ -d $x ]thenecho $x is a directory elseecho enter valid filename or directory name fi

done

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 9

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 2

5 Write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Aim To write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Scriptif [ $ -ne 2 ]thenecho Error Invalid number of argumentsexitfistr=`cat $1 | tr n `for a in $strdoecho Word = $a Count = `grep -c $a $2`done

Output $ cat testhello ATRI$ cat test1hello ATRIhello ATRIhello$ sh 1sh test test1Word = hello Count = 3Word = ATRI Count = 2

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 10

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6 Write a shell script to list all of the directory files in a directory

Script binbashechoenter directory nameread dirif[ -d $dir]thenecholist of files in the directoryls $direlse echoenter proper directory name

fi Output Enter directory name Atri List of all files in the directoty CSEtxt ECEtxt

7 Write a shell script to find factorial of a given integer Script

binbashecho enter a numberread numfact=1while [ $num -ge 1 ]dofact=`expr $fact $num`let num--done

echo factorial of $n is $fact

Output Enter a number

5

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 11

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Factorial of 5 is 120

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 12

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 3

8 Write an awk script to count the number of lines in a file that do not contain vowels 9 Write an awk script to find the number of characters words and lines in a file

Aim To write an awk script to find the number of characters words and lines in a file

ScriptBEGINprint recordt characters t wordsBODY sectionlen=length($0)total_len+=lenprint(NRtlentNF$0)words+=NF

ENDprint(n total)print(characters t total len)print(lines t NR)

10 Write a c program that makes a copy of a file using standard IO and system calls

include ltunistdhgt include ltfcntlhgtint main(int argc char argv[])int fd1 fd2char buffer[100]long int n1if(((fd1 = open(argv[1] O_RDONLY)) == -1) ||((fd2 = open(argv[2] O_CREAT|O_WRONLY|O_TRUNC0700)) == -1))perror(file problem )exit(1)while((n1=read(fd1 buffer 100)) gt 0)if(write(fd2 buffer n1) = n1)perror(writing problem )exit(3)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 13

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 9: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

done fi

4 Write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory and reports accordingly Whenever the argument is a file the number of lines on it is also reported

Aim To write a shell script that receives any number of file names as arguments checks if every argument supplied is a file or a directory

Script for x in $

doif [ -f $x ]thenecho $x is a file echo no of lines in the file are wc -l $xelif [ -d $x ]thenecho $x is a directory elseecho enter valid filename or directory name fi

done

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 9

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 2

5 Write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Aim To write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Scriptif [ $ -ne 2 ]thenecho Error Invalid number of argumentsexitfistr=`cat $1 | tr n `for a in $strdoecho Word = $a Count = `grep -c $a $2`done

Output $ cat testhello ATRI$ cat test1hello ATRIhello ATRIhello$ sh 1sh test test1Word = hello Count = 3Word = ATRI Count = 2

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 10

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6 Write a shell script to list all of the directory files in a directory

Script binbashechoenter directory nameread dirif[ -d $dir]thenecholist of files in the directoryls $direlse echoenter proper directory name

fi Output Enter directory name Atri List of all files in the directoty CSEtxt ECEtxt

7 Write a shell script to find factorial of a given integer Script

binbashecho enter a numberread numfact=1while [ $num -ge 1 ]dofact=`expr $fact $num`let num--done

echo factorial of $n is $fact

Output Enter a number

5

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 11

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Factorial of 5 is 120

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 12

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 3

8 Write an awk script to count the number of lines in a file that do not contain vowels 9 Write an awk script to find the number of characters words and lines in a file

Aim To write an awk script to find the number of characters words and lines in a file

ScriptBEGINprint recordt characters t wordsBODY sectionlen=length($0)total_len+=lenprint(NRtlentNF$0)words+=NF

ENDprint(n total)print(characters t total len)print(lines t NR)

10 Write a c program that makes a copy of a file using standard IO and system calls

include ltunistdhgt include ltfcntlhgtint main(int argc char argv[])int fd1 fd2char buffer[100]long int n1if(((fd1 = open(argv[1] O_RDONLY)) == -1) ||((fd2 = open(argv[2] O_CREAT|O_WRONLY|O_TRUNC0700)) == -1))perror(file problem )exit(1)while((n1=read(fd1 buffer 100)) gt 0)if(write(fd2 buffer n1) = n1)perror(writing problem )exit(3)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 13

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 10: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 2

5 Write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Aim To write a shell script that accepts a list of file names as its arguments counts and reports the occurrence of each word that is present in the first argument file on other argument files

Scriptif [ $ -ne 2 ]thenecho Error Invalid number of argumentsexitfistr=`cat $1 | tr n `for a in $strdoecho Word = $a Count = `grep -c $a $2`done

Output $ cat testhello ATRI$ cat test1hello ATRIhello ATRIhello$ sh 1sh test test1Word = hello Count = 3Word = ATRI Count = 2

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 10

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6 Write a shell script to list all of the directory files in a directory

Script binbashechoenter directory nameread dirif[ -d $dir]thenecholist of files in the directoryls $direlse echoenter proper directory name

fi Output Enter directory name Atri List of all files in the directoty CSEtxt ECEtxt

7 Write a shell script to find factorial of a given integer Script

binbashecho enter a numberread numfact=1while [ $num -ge 1 ]dofact=`expr $fact $num`let num--done

echo factorial of $n is $fact

Output Enter a number

5

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 11

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Factorial of 5 is 120

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 12

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 3

8 Write an awk script to count the number of lines in a file that do not contain vowels 9 Write an awk script to find the number of characters words and lines in a file

Aim To write an awk script to find the number of characters words and lines in a file

ScriptBEGINprint recordt characters t wordsBODY sectionlen=length($0)total_len+=lenprint(NRtlentNF$0)words+=NF

ENDprint(n total)print(characters t total len)print(lines t NR)

10 Write a c program that makes a copy of a file using standard IO and system calls

include ltunistdhgt include ltfcntlhgtint main(int argc char argv[])int fd1 fd2char buffer[100]long int n1if(((fd1 = open(argv[1] O_RDONLY)) == -1) ||((fd2 = open(argv[2] O_CREAT|O_WRONLY|O_TRUNC0700)) == -1))perror(file problem )exit(1)while((n1=read(fd1 buffer 100)) gt 0)if(write(fd2 buffer n1) = n1)perror(writing problem )exit(3)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 13

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 11: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6 Write a shell script to list all of the directory files in a directory

Script binbashechoenter directory nameread dirif[ -d $dir]thenecholist of files in the directoryls $direlse echoenter proper directory name

fi Output Enter directory name Atri List of all files in the directoty CSEtxt ECEtxt

7 Write a shell script to find factorial of a given integer Script

binbashecho enter a numberread numfact=1while [ $num -ge 1 ]dofact=`expr $fact $num`let num--done

echo factorial of $n is $fact

Output Enter a number

5

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 11

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Factorial of 5 is 120

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 12

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 3

8 Write an awk script to count the number of lines in a file that do not contain vowels 9 Write an awk script to find the number of characters words and lines in a file

Aim To write an awk script to find the number of characters words and lines in a file

ScriptBEGINprint recordt characters t wordsBODY sectionlen=length($0)total_len+=lenprint(NRtlentNF$0)words+=NF

ENDprint(n total)print(characters t total len)print(lines t NR)

10 Write a c program that makes a copy of a file using standard IO and system calls

include ltunistdhgt include ltfcntlhgtint main(int argc char argv[])int fd1 fd2char buffer[100]long int n1if(((fd1 = open(argv[1] O_RDONLY)) == -1) ||((fd2 = open(argv[2] O_CREAT|O_WRONLY|O_TRUNC0700)) == -1))perror(file problem )exit(1)while((n1=read(fd1 buffer 100)) gt 0)if(write(fd2 buffer n1) = n1)perror(writing problem )exit(3)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 13

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 12: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Factorial of 5 is 120

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 12

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 3

8 Write an awk script to count the number of lines in a file that do not contain vowels 9 Write an awk script to find the number of characters words and lines in a file

Aim To write an awk script to find the number of characters words and lines in a file

ScriptBEGINprint recordt characters t wordsBODY sectionlen=length($0)total_len+=lenprint(NRtlentNF$0)words+=NF

ENDprint(n total)print(characters t total len)print(lines t NR)

10 Write a c program that makes a copy of a file using standard IO and system calls

include ltunistdhgt include ltfcntlhgtint main(int argc char argv[])int fd1 fd2char buffer[100]long int n1if(((fd1 = open(argv[1] O_RDONLY)) == -1) ||((fd2 = open(argv[2] O_CREAT|O_WRONLY|O_TRUNC0700)) == -1))perror(file problem )exit(1)while((n1=read(fd1 buffer 100)) gt 0)if(write(fd2 buffer n1) = n1)perror(writing problem )exit(3)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 13

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 13: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 3

8 Write an awk script to count the number of lines in a file that do not contain vowels 9 Write an awk script to find the number of characters words and lines in a file

Aim To write an awk script to find the number of characters words and lines in a file

ScriptBEGINprint recordt characters t wordsBODY sectionlen=length($0)total_len+=lenprint(NRtlentNF$0)words+=NF

ENDprint(n total)print(characters t total len)print(lines t NR)

10 Write a c program that makes a copy of a file using standard IO and system calls

include ltunistdhgt include ltfcntlhgtint main(int argc char argv[])int fd1 fd2char buffer[100]long int n1if(((fd1 = open(argv[1] O_RDONLY)) == -1) ||((fd2 = open(argv[2] O_CREAT|O_WRONLY|O_TRUNC0700)) == -1))perror(file problem )exit(1)while((n1=read(fd1 buffer 100)) gt 0)if(write(fd2 buffer n1) = n1)perror(writing problem )exit(3)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 13

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 14: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Case of an error exit from the loopif(n1 == -1)perror(Reading problem )exit(2)close(fd2)exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 14

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 15: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 4

11 Implement in C the following UNIX commands using System calls A cat B ls C mv

AIM Implement in C the cat Unix command using system calls

includeltfcntlhgtincludeltsysstathgtdefine BUFSIZE 1int main(int argc char argv) int fd1 int n char buf fd1=open(argv[1]O_RDONLY) printf(Welcome to ATRIn) while((n=read(fd1ampbuf1))gt0) printf(cbuf) or write(1ampbuf1) return (0)

AIM Implement in C the following ls Unix command using system calls Algorithm

1 Start2 open directory using opendir( ) system call3 read the directory using readdir( ) system call4 print dpname and dpinode 5 repeat above step until end of directory6 Endinclude ltsystypeshgtinclude ltsysdirhgtinclude ltsysparamhgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 15

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 16: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltstdiohgt define FALSE 0define TRUE 1 extern int alphasort() char pathname[MAXPATHLEN] main() int countistruct dirent filesint file_select() if (getwd(pathname) == NULL ) printf(Error getting pathn)exit(0)printf(Current Working Directory = snpathname)count = scandir(pathname ampfiles file_select alphasort) if (count lt= 0) printf(No files in this directoryn)exit(0)printf(Number of files = dncount)for (i=1iltcount+1++i)

printf(s nfiles[i-1]-gtd_name)

int file_select(struct direct entry)if ((strcmp(entry-gtd_name ) == 0) ||(strcmp(entry-gtd_name ) == 0)) return (FALSE)elsereturn (TRUE)

AIM Implement in C the Unix command mv using system calls

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 16

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 17: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Algorithm1 Start2 open an existed file and one new open file using open()system call3 read the contents from existed file using read( ) systemcall4 write these contents into new file using write systemcall using write( ) system call5 repeat above 2 steps until eof6 close 2 file using fclose( ) system call7 delete existed file using using unlink( ) system8 End

Programincludeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtint main(int argc char argv) int fd1fd2 int ncount=0 fd1=open(argv[1]O_RDONLY)fd2=creat(argv[2]S_IWUSR)rename(fd1fd2)unlink(argv[1])printf(ldquo file is copied ldquo)return (0)

12 Write a program that takes one or more filedirectory names as command line input and reports the following information on the file

A File type B Number of links C Time of last access D Read Write and Execute permissionsincludeltstdiohgtmain()FILE streamint buffer_characterstream=fopen(ldquotestrdquordquorrdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 17

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 18: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(stream==(FILE)0)fprintf(stderrrdquoError opening file(printed to standard error)nrdquo)fclose(stream)exit(1)if(fclose(stream))==EOF)fprintf(stderrrdquoError closing stream(printed to standard error)n)exit(1)return()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 18

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 19: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 5

13 Write a C program to emulate the UNIX ls ndashl command

ALGORITHM

Step 1 Include necessary header files for manipulating directoryStep 2 Declare and initialize required objectsStep 3 Read the directory name form the userStep 4 Open the directory using opendir() system call and report error if the directory is not availableStep 5 Read the entry available in the directoryStep 6 Display the directory entry ie name of the file or sub directoryStep 7 Repeat the step 6 and 7 until all the entries were read

1 Simulation of ls command includeltfcntlhgtincludeltstdiohgtincludeltunistdhgtincludeltsysstathgtmain()char dirname[10]DIR pstruct dirent dprintf(Enter directory name )scanf(sdirname)p=opendir(dirname)if(p==NULL)perror(Cannot find dir)exit(-1)while(d=readdir(p))printf(snd-gtd_name)

SAMPLE OUTPUT

enter directory name iii

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 19

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 20: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

f2

14 Write a C program to list for every file in a directory its inode number and file name The Dirent structure contains the inode number and the name The maximum length of a filename component is NAME_MAX which is a system-dependent value opendir returns a pointer to a structure called DIR analogous to FILE which is used by readdir and closedir This information is collected into a file called direnth

define NAME_MAX 14 longest filename component

system-dependent

typedef struct portable directory entry

long ino inode number

char name[NAME_MAX+1] name + 0 terminator

Dirent

typedef struct minimal DIR no buffering etc

int fd file descriptor for the directory

Dirent d the directory entry

DIR

DIR opendir(char dirname)

Dirent readdir(DIR dfd)

void closedir(DIR dfd)

The system call stat takes a filename and returns all of the information in the inode for that file or -1 if there is an error That is

char name

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 20

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 21: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

struct stat stbuf

int stat(char struct stat )

stat(name ampstbuf)

fills the structure stbuf with the inode information for the file name The structure describing the value returned by stat is in ltsysstathgt and typically looks like this

struct stat inode information returned by stat

dev_t st_dev device of inode

ino_t st_ino inode number

short st_mode mode bits

short st_nlink number of links to file

short st_uid owners user id

short st_gid owners group id

dev_t st_rdev for special files

off_t st_size file size in characters

time_t st_atime time last accessed

time_t st_mtime time last modified

time_t st_ctime time originally created

Most of these values are explained by the comment fields The types like dev_t and ino_t are defined inltsystypeshgt which must be included too

The st_mode entry contains a set of flags describing the file The flag definitions are also included inltsystypeshgt we need only the part that deals with file type

define S_IFMT 0160000 type of file

define S_IFDIR 0040000 directory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 21

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 22: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

define S_IFCHR 0020000 character special

define S_IFBLK 0060000 block special

define S_IFREG 0010000 regular

Now we are ready to write the program fsize If the mode obtained from stat indicates that a file is not a directory then the size is at hand and can be printed directly If the name is a directory however then we have to process that directory one file at a time it may in turn contain sub-directories so the process is recursive

The main routine deals with command-line arguments it hands each argument to the function fsize

include ltstdiohgt

include ltstringhgt

include syscallsh

include ltfcntlhgt flags for read and write

include ltsystypeshgt typedefs

include ltsysstathgt structure returned by stat

include direnth

void fsize(char )

print file name

main(int argc char argv)

if (argc == 1) default current directory

fsize()

else

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 22

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 23: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

while (--argc gt 0)

fsize(++argv)

return 0

The function fsize prints the size of the file If the file is a directory however fsize first calls dirwalk to handle all the files in it Note how the flag names S_IFMT and S_IFDIR are used to decide if the file is a directory Parenthesization matters because the precedence of amp is lower than that of ==

int stat(char struct stat )

void dirwalk(char void (fcn)(char ))

fsize print the name of file name

void fsize(char name)

struct stat stbuf

if (stat(name ampstbuf) == -1)

fprintf(stderr fsize cant access sn name)

return

if ((stbufst_mode amp S_IFMT) == S_IFDIR)

dirwalk(name fsize)

printf(8ld sn stbufst_size name)

The function dirwalk is a general routine that applies a function to each file in a directory It opens the directory loops through the files in it calling the function on each then closes the

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 23

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 24: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

directory and returns Since fsize calls dirwalk on each directory the two functions call each other recursively

define MAX_PATH 1024

dirwalk apply fcn to all files in dir

void dirwalk(char dir void (fcn)(char ))

char name[MAX_PATH]

Dirent dp

DIR dfd

if ((dfd = opendir(dir)) == NULL)

fprintf(stderr dirwalk cant open sn dir)

return

while ((dp = readdir(dfd)) = NULL)

if (strcmp(dp-gtname ) == 0

|| strcmp(dp-gtname ))

continue skip self and parent

if (strlen(dir)+strlen(dp-gtname)+2 gt sizeof(name))

fprintf(stderr dirwalk name s s too longn

dir dp-gtname)

else

sprintf(name ss dir dp-gtname)

(fcn)(name)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 24

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 25: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

closedir(dfd)

Each call to readdir returns a pointer to information for the next file or NULL when there are no files left Each directory always contains entries for itself called and its parent these must be skipped or the program will loop forever

Down to this last level the code is independent of how directories are formatted The next step is to present minimal versions of opendir readdir and closedir for a specific system The following routines are for Version 7 and System V UNIX systems they use the directory information in the headerltsysdirhgt which looks like this

ifndef DIRSIZ

define DIRSIZ 14

endif

struct direct directory entry

ino_t d_ino inode number

char d_name[DIRSIZ] long name does not have 0

Some versions of the system permit much longer names and have a more complicated directory structure

The type ino_t is a typedef that describes the index into the inode list It happens to be unsigned short on the systems we use regularly but this is not the sort of information to embed in a program it might be different on a different system so the typedef is better A complete set of ``system types is found in ltsystypeshgt

opendir opens the directory verifies that the file is a directory (this time by the system call fstat which is like stat except that it applies to a file descriptor) allocates a directory structure and records the information

int fstat(int fd struct stat )

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 25

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 26: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

opendir open a directory for readdir calls

DIR opendir(char dirname)

int fd

struct stat stbuf

DIR dp

if ((fd = open(dirname O_RDONLY 0)) == -1

|| fstat(fd ampstbuf) == -1

|| (stbufst_mode amp S_IFMT) = S_IFDIR

|| (dp = (DIR ) malloc(sizeof(DIR))) == NULL)

return NULL

dp-gtfd = fd

return dp

closedir closes the directory file and frees the space

closedir close directory opened by opendir

void closedir(DIR dp)

if (dp)

close(dp-gtfd)

free(dp)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 26

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 27: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Finally readdir uses read to read each directory entry If a directory slot is not currently in use (because a file has been removed) the inode number is zero and this position is skipped Otherwise the inode number and name are placed in a static structure and a pointer to that is returned to the user Each call overwrites the information from the previous one

include ltsysdirhgt local directory structure

readdir read directory entries in sequence

Dirent readdir(DIR dp)

struct direct dirbuf local directory structure

static Dirent d return portable structure

while (read(dp-gtfd (char ) ampdirbuf sizeof(dirbuf))

== sizeof(dirbuf))

if (dirbufd_ino == 0) slot not in use

continue

dino = dirbufd_ino

strncpy(dname dirbufd_name DIRSIZ)

dname[DIRSIZ] = 0 ensure termination

return ampd

return NULL

15 Write a C program that demonstrates redirection of standard output to a fileEx ls gt f1

Description

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 27

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 28: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

An Inode number points to an Inode An Inode is a data structure that stores the following information about a file

Size of file Device ID

User ID of the file

Group ID of the file

The file mode information and access privileges for owner group and others

File protection flags

The timestamps for file creation modification etc

link counter to determine the number of hard links

Pointers to the blocks storing filersquos contents

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 28

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 29: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 6

16 Write a C program to create a child process and allow the parent to display ldquoparentrdquo and the child to display ldquochildrdquo on the screen

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0)

else printf(ldquoChild processrdquo)

17 Write a C program to create a Zombie process If child terminates before the parent process then parent process with out child is called zombie process

includeltstdiohgtincludeltstringhgtmain() int childpid if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) Printf(ldquochild processrdquo) exit(0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 29

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 30: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

elsewait(100) printf(ldquoparent processrdquo)

18 Write a C program that illustrates how an orphan is created

includeltstdiohgt main()

int id printf(Before fork()n) id=fork()

if(id==0) printf(Child has started dn getpid()) printf(Parent of this child dngetppid()) printf(child prints 1 item n ) sleep(25) printf(child prints 2 item n) else printf(Parent has started dngetpid()) printf(Parent of the parent proc dngetppid())

printf(After fork())

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 30

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 31: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Week 7

19 Write a C program that illustrates how to execute two commands concurrently with a command pipe

Ex - ls ndashl | sort

AIM Implementing Pipes

D ESCRIPTION

A pipe is created by calling a pipe() function int pipe(int filedesc[2]) It returns a pair of file descriptors filedesc[0] is open for reading and filedesc[1] is open for writing This function returns a 0 if ok amp -1 on error ALGORITHM

The following is the simple algorithm for creating writing to and reading from a pipe

1) Create a pipe through a pipe() function call2) Use write() function to write the data into the pipe The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the pipe

Size ndash buffer size for storing the input3) Use read() function to read the data that has been written to the pipe

The syntax is as followsread(int [] charsize)

PROGRAM

includeltstdiohgtincludeltstringhgtmain() int pipe1[2]pipe2[2]childpid

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 31

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 32: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

if(pipe(pipe1)lt0 || pipe(pipe2) lt 0) printf(pipe creation error) if (( childpid=fork())lt0) printf(cannot fork) else if(childpid gt0) close(pipe1[0]) close(pipe2[1]) client(pipe2[0]pipe1[1]) while (wait((int ) 0 ) =childpid) close(pipe1[1]) close(pipe2[0]) exit(0) else close(pipe1[1]) close(pipe2[0]) server(pipe1[0]pipe2[1]) close(pipe1[0]) close(pipe2[1]) exit(0) client(int readfdint writefd)int nchar buff[1024] if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 32

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 33: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(data write error) if(nlt0) printf(data error) server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

20 Write C programs that illustrate communication between two unrelated processes using named pipe

AIM Implementing IPC using a FIFO (or) named pipe

D ESCRIPTION

Another kind of IPC is FIFO(First in First Out) is sometimes also called as named pipeIt is like a pipe except that it has a nameHere the name is that of a file that multiple processes can open() read and write to A FIFO is created using the mknod() system call The syntax is as follows

int mknod(char pathname int mode int dev)

The pathname is a normal Unix pathname and this is the name of the FIFO

The mode argument specifies the file mode access modeThe dev value is ignored for a FIFO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 33

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 34: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Once a FIFO is created it must be opened for reading (or) writing using either the open system call or one of the standard IO open functions-fopen or freopen

ALGORITHM

The following is the simple algorithm for creating writing to and reading from a

FIFO

1) Create a fifo through mknod() function call2) Use write() function to write the data into the fifo The syntax is as follows

write(int []ip_stringsize)

int [] ndash filedescriptor variable in this case if int filedesc[2] is the variable then use the filedesc[1] as the first parameter

ip_string ndash The string to be written in the fifo

Size ndash buffer size for storing the input

3) Use read() function to read the data that has been written to the fifoThe syntax is as follows

read(int [] charsize)

PROGRAM

define FIFO1 Fifo1define FIFO2 Fifo2includeltstdiohgtincludeltstringhgtincludeltsystypeshgtincludeltfcntlhgtincludeltsysstathgtmain() int childpidwfdrfd mknod(FIFO10666|S_IFIFO0) mknod(FIFO20666|S_IFIFO0) if (( childpid=fork())==-1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 34

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 35: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(cannot fork) else if(childpid gt0) wfd=open(FIFO11) rfd=open(FIFO20) client(rfdwfd) while (wait((int ) 0 ) =childpid) close(rfd) close(wfd) unlink(FIFO1) unlink(FIFO2) else rfd=open(FIFO10) wfd=open(FIFO21) server(rfdwfd) close(rfd) close(wfd) client(int readfdint writefd)int nchar buff[1024]printf (enter s file name) if(fgets(buff1024stdin)==NULL) printf(file name read error) n=strlen(buff) if(buff[n-1]==n) n-- if(write(writefdbuffn)=n) printf(file name write error) while((n=read(readfdbuff1024))gt0) if(write(1buffn)=n) printf(data write error) if(nlt0) printf(data error)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 35

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 36: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(int readfdint writefd) char buff[1024]errmsg[50] int nfd n=read(readfdbuff1024) buff[n]=0 if((fd=open(buff0))lt0) sprintf(bufffile does nit exist) write(writefdbuff1024) else while((n=read(fdbuff1024))gt0) write(writefdbuffn)

21 Write a C program to create a message queue with read and write permissions to write 3 messages to it with different priority numbers

include ltstdiohgt include ltsysipchgt include ltfcntlhgt define MAX 255 struct mesg long type char mtext[MAX] mesg char buff[MAX] main() int midfdncount=0 if((mid=msgget(1006IPC_CREAT | 0666))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 36

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 37: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(ldquon Queue iddrdquo mid) mesg=(struct mesg )malloc(sizeof(struct mesg)) mesg -gttype=6 fd=open(ldquofactrdquoO_RDONLY) while(read(fdbuff25)gt0) strcpy(mesg -gtmtextbuff) if(msgsnd(midmesgstrlen(mesg -gtmtext)0)== -1) printf(ldquon Message Write Errorrdquo)

if((mid=msgget(10060))lt0) printf(ldquon Canrsquot create Message Qrdquo) exit(1) while((n=msgrcv(midampmesgMAX6IPC_NOWAIT))gt0) write(1mesgmtextn) count++ if((n= = -1)amp(count= =0)) printf(ldquon No Message Queue on Queuedrdquomid)

22 Write a C program that receives the messages (from the above message queue as specified in (21)) and displays them

Aim To create a message queue

DESCRIPTION

Message passing between processes are part of operating system which are done through a message queue Where messages are stored in kernel and are associated with message queue identifier (ldquomsqidrdquo) Processes read and write messages to an arbitrary queue in a way such that a process writes a message to a queue exits and other process reads it at later time

ALGORITHM

Before defining a structure ipc_perm structure should be defined which is done by including following file

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 37

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 38: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsystypeshgtinclude ltsysipchgt

A structure of information is maintained by kernel it should contain followingstruct msqid_ds

struct ipc_perm msg_perm operation permissionstruct msg msg_first ptr to first msg on queuestruct msg msg_last ptr to last msg on queueushort msg_cbytes current bytes on queueushort msg_qnum current no of msgs on queueushort msg_qbytes max no of bytes on queueushort msg_lspid pid o flast msg sendushort msg_lrpid pid of last msgrecvdtime_t msg_stime time of last msg sndtime_t msg_rtime time of last msg rcvtime_t msg_ctime time of last msg ctl

To create new message queue or access existing message queue ldquomsgget()rdquo function is used Syntaxint msgget(key_t key int msgflag) Msg flag values

Num val Symb value desc 0400 MSG_R Read by owner 0200 MSG_w Write by owner 0040 MSG_R gtgt3 Read by group 0020 MSG_Wgtgt3 Write by group

Msgget returns msqid or -1 if error1 To put message on queue ldquomsgsnd()rdquo function is used

Syntax int msgsnd(int msqid struct msgbuf ptrint length int flag)

msqid is message queue id a unique idmsgbuf is actual content to send a pointer to structure which contain following struct msgbuf

Long mtype message type gt0 Char mtext[1] data

length is the size of message in bytes

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 38

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 39: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

flag is - IPC_NOWAIT which allows sys call to return immediately when no room on queue

when this is specified msgsnd will return -1 if no room on queueElse flag can be specified as 0

2 To receive Message ldquomsgrcv()rdquo function is usedSyntaxInt msgrcv(int msqid struct msgbuf ptr int length long msgtype int flag)

ptr is pointer to structure where message received is to be storedLength is size to be received and stored in pointer areaFlag has MSG_NOERROR it returns an error if length is not large enough to receive msg if data portion is greater than msg length it truncates and returns

3 Variety of control operations on msg can be done through ldquomsgctl()rdquo functionInt msgctl(int msqid int cmd struct msqid_ds buff)

IPC_RMID in cmd is given to remove a message queue from the system

Let us create a header file msgqh with following in it

include ltsystypehgtinclude ltsysipchgtinclude ltsysmsghgt

include ltsyserrnohgtextern int errno

define MKEY1 1234Ldefine MKEY2 2345Ldefine PERMS 0666

Server operation algorithminclude ldquomsgqhrdquo

main() Int readid writeid

If((readid = msgget(MSGKEY1 PERMS |IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 1rdquo)

If((writeid= msgget(MKEY PERMS | IPC_CREAT))lt0)err_sys(ldquoServer cant get message queue 2rdquo)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 39

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 40: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

server(readidwriteid)exit(0)

Client process

include ldquomsgqhrdquomain() int readid writeid open queues which server has already created it If ( (wirteid =msgget(MKEY10))lt0)

err_sys(ldquoclient cant access msgget message queue 1rdquo)if((readid=msgget(MKEY20))lt0)

err_sys(ldquoclient cant msgget messages queue 2rdquo)

client(readidwriteid)

delete msg queuu

If (msgctl(readid IPC_RMID( struct msqid_ds )0)lt0) err_sys(ldquoClient cant RMID message queue1rdquo) if(msgctl(writeid IPC_RMID (struct msqid_ds ) 0) lt0)

err_sys(ldquoClient cant RMID message queue 2rdquo)

exit(0)

Week 8

23 Write a C program to allow cooperating processes to lock a resource for exclusive use using a) Semaphores b) flock or lockf system calls

PROGRAM

includeltstdiohgtincludeltstdlibhgtincludelterrorhgtincludeltsystypeshgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 40

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 41: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

includeltsysipchgtincludeltsyssemhgtint main(void)key_t keyint semidunion semun argif((key==ftok(sem democj))== -1)perror(ftok)exit(1)if(semid=semget(key10666|IPC_CREAT))== -1)perror(semget)exit(1)argval=1if(semctl(semid0SETVALarg)== -1)perror(smctl)exit(1)return 0

OUTPUT semgetsmctl

24 Write a C program that illustrates suspending and resuming processes using signals

includeltsystypeshgtincludeltsignalhgtsuspend the process(same as hitting crtl+z)kill(pidSIGSTOP)

continue the processkill(pidSIGCONT)

Week 9

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 41

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 42: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

25 Write a C program that implements a producer-consumer system with two processes (using Semaphores)

Algorithm

1 Start2 create semaphore using semget( ) system call3 if successful it returns positive value4 create two new processes5 first process will produce6 until first process produces second process cannot consume7 End

Source code

includeltstdiohgtincludeltstdlibhgtincludeltsystypeshgtincludeltsysipchgtincludeltsyssemhgtincludeltunistdhgtdefine num_loops 2int main(int argcchar argv[])int sem_set_idint child_pidisem_valstruct sembuf sem_opint rcstruct timespec delayclrscr()sem_set_id=semget(ipc_private20600)if(sem_set_id==-1)perror(ldquomainsemgetrdquo)exit(1)printf(ldquosemaphore set createdsemaphore setidlsquodrsquon rdquosem_set_id)child_pid=fork()

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 42

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 43: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

switch(child_pid)case -1perror(ldquoforkrdquo)exit(1)case 0for(i=0iltnum_loopsi++)sem_opsem_num=0sem_opsem_op=-1sem_opsem_flg=0semop(sem_set_idampsem_op1)printf(ldquoproducerrsquodrsquonrdquoi)fflush(stdout)breakdefaultfor(i=0iltnum_loopsi++)printf(ldquoconsumerrsquodrsquonrdquoi)fflush(stdout)sem_opsem_num=0sem_opsem_op=1sem_opsem_flg=0semop(sem_set_idampsem_op1)if(rand()gt3(rano_max14))delaytv_sec=0delaytv_nsec=10nanosleep(ampdelaynull)breakreturn 0

Outputsemaphore set created

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 43

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 44: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

semaphore set id lsquo327690rsquoproducer lsquo0rsquoconsumerrsquo0rsquoproducerrsquo1rsquo

consumerrsquo1rsquo

26 Write client and server programs (using c) for interaction between server and client processes using Unix Domain sockets

Serverc

include ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltsystypeshgtinclude ltunistdhgtinclude ltstringhgt

int connection_handler(int connection_fd) int nbytes char buffer[256]

nbytes = read(connection_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM CLIENT sn buffer) nbytes = snprintf(buffer 256 hello from the server) write(connection_fd buffer nbytes)

close(connection_fd) return 0

int main(void) struct sockaddr_un address int socket_fd connection_fd socklen_t address_length pid_t child

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 44

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 45: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

unlink(demo_socket)

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(bind(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0) printf(bind() failedn) return 1

if(listen(socket_fd 5) = 0) printf(listen() failedn) return 1

while((connection_fd = accept(socket_fd (struct sockaddr ) ampaddress ampaddress_length)) gt -1) child = fork() if(child == 0) now inside newly created connection handling process return connection_handler(connection_fd)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 45

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 46: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

still inside server process close(connection_fd)

close(socket_fd) unlink(demo_socket) return 0

Clientcinclude ltstdiohgtinclude ltsyssockethgtinclude ltsysunhgtinclude ltunistdhgtinclude ltstringhgt

int main(void) struct sockaddr_un address int socket_fd nbytes char buffer[256]

socket_fd = socket(PF_UNIX SOCK_STREAM 0) if(socket_fd lt 0) printf(socket() failedn) return 1

start with a clean address structure memset(ampaddress 0 sizeof(struct sockaddr_un))

addresssun_family = AF_UNIX snprintf(addresssun_path UNIX_PATH_MAX demo_socket)

if(connect(socket_fd (struct sockaddr ) ampaddress sizeof(struct sockaddr_un)) = 0)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 46

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 47: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(connect() failedn) return 1

nbytes = snprintf(buffer 256 hello from a client) write(socket_fd buffer nbytes)

nbytes = read(socket_fd buffer 256) buffer[nbytes] = 0

printf(MESSAGE FROM SERVER sn buffer)

close(socket_fd) return 0

Week 10

27 Write client and server programs (using c) for interaction between server and client processes using Internet Domain sockets

Serverc

include ltsyssockethgtinclude ltnetinetinhgtinclude ltarpainethgtinclude ltstdiohgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltstringhgtinclude ltsystypeshgtinclude lttimehgt

int main(int argc char argv[]) int listenfd = 0 connfd = 0 struct sockaddr_in serv_addr

char sendBuff[1025] time_t ticks

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 47

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 48: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

listenfd = socket(AF_INET SOCK_STREAM 0) memset(ampserv_addr 0 sizeof(serv_addr)) memset(sendBuff 0 sizeof(sendBuff))

serv_addrsin_family = AF_INET serv_addrsin_addrs_addr = htonl(INADDR_ANY) serv_addrsin_port = htons(5000)

bind(listenfd (struct sockaddr)ampserv_addr sizeof(serv_addr))

listen(listenfd 10)

while(1) connfd = accept(listenfd (struct sockaddr)NULL NULL)

ticks = time(NULL) snprintf(sendBuff sizeof(sendBuff) 24srn ctime(ampticks)) write(connfd sendBuff strlen(sendBuff))

close(connfd) sleep(1)

Clientc

include ltsyssockethgtinclude ltsystypeshgtinclude ltnetinetinhgtinclude ltnetdbhgtinclude ltstdiohgtinclude ltstringhgtinclude ltstdlibhgtinclude ltunistdhgtinclude lterrnohgtinclude ltarpainethgt

int main(int argc char argv[])

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 48

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 49: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

int sockfd = 0 n = 0 char recvBuff[1024] struct sockaddr_in serv_addr

if(argc = 2) printf(n Usage s ltip of servergt nargv[0]) return 1

memset(recvBuff 0sizeof(recvBuff)) if((sockfd = socket(AF_INET SOCK_STREAM 0)) lt 0) printf(n Error Could not create socket n) return 1

memset(ampserv_addr 0 sizeof(serv_addr))

serv_addrsin_family = AF_INET serv_addrsin_port = htons(5000)

if(inet_pton(AF_INET argv[1] ampserv_addrsin_addr)lt=0) printf(n inet_pton error occuredn) return 1

if( connect(sockfd (struct sockaddr )ampserv_addr sizeof(serv_addr)) lt 0) printf(n Error Connect Failed n) return 1

while ( (n = read(sockfd recvBuff sizeof(recvBuff)-1)) gt 0) recvBuff[n] = 0 if(fputs(recvBuff stdout) == EOF)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 49

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 50: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

printf(n Error Fputs errorn)

if(n lt 0) printf(n Read error n)

return 0

28 Write a C program that illustrates two processes communicating using shared memory

DESCRIPTION

Shared Memory is an efficeint means of passing data between programs One program will create a memory portion which other processes (if permitted) can access

The problem with the pipes FIFOrsquos and message queues is that for two processes to exchange information the information has to go through the kernel Shared memory provides a way around this by letting two or more processes share a memory segment

In shared memory concept if one process is reading into some shared memory for example other processes must wait for the read to finish before processing the data

A process creates a shared memory segment using shmget()| The original owner of a shared memory segment can assign ownership to another user with shmctl() It can also revoke this assignment Other processes with proper permission can perform various control functions on the shared memory segment using shmctl() Once created a shared segment can be attached to a process address space using shmat() It can be detached using shmdt() (see shmop()) The attaching process must have the appropriate permissions for shmat() Once attached the process can read or write to the segment as allowed by the permission requested in the attach operation A shared segment can be attached multiple times by the same process A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory The identifier of the segment is called the shmid The structure definition for the shared memory segment control structures and prototypews can be found in ltsysshmhgt

shmget() is used to obtain access to a shared memory segment It is prottyped by

int shmget(key_t key size_t size int shmflg)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 50

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 51: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The key argument is a access value associated with the semaphore ID The size argument is the size in bytes of the requested shared memory The shmflg argument specifies the initial access permissions and creation control flags

When the call succeeds it returns the shared memory segment ID This call is also used to get the ID of an existing shared segment (from a process requesting sharing of some existing memory portion)

The following code illustrates shmget() include ltsystypeshgtinclude ltsysipchgt include ltsysshmhgt key_t key key to be passed to shmget() int shmflg shmflg to be passed to shmget() int shmid return value from shmget() int size size to be passed to shmget()

key = size = shmflg) =

if ((shmid = shmget (key size shmflg)) == -1) perror(shmget shmget failed) exit(1) else (void) fprintf(stderr shmget shmget returned dn shmid) exit(0) Controlling a Shared Memory Segment shmctl() is used to alter the permissions and other characteristics of a shared memory segment It is prototyped as follows int shmctl(int shmid int cmd struct shmid_ds buf)The process must have an effective shmid of owner creator or superuser to perform this command The cmd argument is one of following control commands SHM_LOCK

-- Lock the specified shared memory segment in memory The process must have the effective ID of superuser to perform this command

SHM_UNLOCK

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 51

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 52: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

-- Unlock the shared memory segment The process must have the effective ID of superuser to perform this command

IPC_STAT -- Return the status information contained in the control structure and place it in the buffer pointed to by buf The process must have read permission on the segment to perform this command

IPC_SET -- Set the effective user and group identification and access permissions The process must have an effective ID of owner creator or superuser to perform this command

IPC_RMID -- Remove the shared memory segment

The buf is a sructure of type struct shmid_ds which is defined in ltsysshmhgt The following code illustrates shmctl() include ltsystypeshgtinclude ltsysipchgtinclude ltsysshmhgtint cmd command code for shmctl() int shmid segment ID struct shmid_ds shmid_ds shared memory data structure to hold results shmid = cmd = if ((rtrn = shmctl(shmid cmd shmid_ds)) == -1) perror(shmctl shmctl failed) exit(1) Attaching and Detaching a Shared Memory Segment shmat() and shmdt() are used to attach and detach shared memory segments They are prototypes as follows void shmat(int shmid const void shmaddr int shmflg)int shmdt(const void shmaddr)shmat() returns a pointer shmaddr to the head of the shared segment associated with a valid shmid shmdt() detaches the shared memory segment located at the address indicated by shmaddr The following code illustrates calls to shmat() and shmdt() include ltsystypeshgt include ltsysipchgt

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 52

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 53: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

include ltsysshmhgt static struct state Internal record of attached segments int shmid shmid of attached segment char shmaddr attach point int shmflg flags used on attach ap[MAXnap] State of current attached segments int nap Number of currently attached segments char addr address work variable register int i work area register struct state p ptr to current state entry p = ampap[nap++]p-gtshmid = p-gtshmaddr = p-gtshmflg = p-gtshmaddr = shmat(p-gtshmid p-gtshmaddr p-gtshmflg)if(p-gtshmaddr == (char )-1) perror(shmop shmat failed) nap-- else (void) fprintf(stderr shmop shmat returned 88xnp-gtshmaddr) i = shmdt(addr)if(i == -1) perror(shmop shmdt failed) else (void) fprintf(stderr shmop shmdt returned dn i)for (p = ap i = nap i-- p++) if (p-gtshmaddr == addr) p = ap[--nap] Algorithm

1 Start2 create shared memory using shmget( ) system call3 if success full it returns positive value4 attach the created shared memory using shmat( ) systemcall

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 53

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 54: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

5 write to shared memory using shmsnd( ) system call6 read the contents from shared memory using shmrcv( )system call7 End

Source Codeincludeltstdiohgtincludeltstdlibhgtincludeltsysipchgtincludeltsystypeshgtincludeltstringhgtincludeltsysshmhgtdefine shm_size 1024int main(int argcchar argv[])key_t keyint shmidchar dataint modeif(argcgt2)fprintf(stderrrdquousagestdemo[data_to_writte]nrdquo)exit(1)if((shmid=shmget(keyshm_size0644ipc_creat))==-1)perror(ldquoshmgetrdquo)exit(1)data=shmat(shmid(void )00)if(data==(char )(-1))perror(ldquoshmatrdquo)exit(1)if(argc==2)printf(writing to segmentrdquosrdquordquonrdquodata)if(shmdt(data)==-1)perror(ldquoshmdtrdquo)exit(1)

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 54

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 55: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

return 0

Inputaout koteswararao

Outputwriting to segment koteswararao

Data Mining Lab

Credit Risk Assessment

Description The business of banks is making loans Assessing the credit worthiness of an applicant is of crucial importance You have to develop a system to help a loan officer decide whether the credit of a customer is good or bad A bankrsquos business rules regarding loans must consider two opposing factors On the one hand a bank wants to make as many loans as possible Interest on these loans is the banrsquos profit source On the other hand a bank cannot afford to make too many bad loans Too many bad loans could lead to the collapse of the bank The bankrsquos loan policy must involve a compromise not too strict and not too lenient

To do the assignment you first and foremost need some knowledge about the world of credit You can acquire such knowledge in a number of ways

1 Knowledge Engineering Find a loan officer who is willing to talk Interview her and try to represent her knowledge in the form of production rules

2 Books Find some training manuals for loan officers or perhaps a suitable textbook on finance Translate this knowledge from text form to production rule form

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 55

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 56: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3 Common sense Imagine yourself as a loan officer and make up reasonable rules which can be used to judge the credit worthiness of a loan applicant

4 Case histories Find records of actual cases where competent loan officers correctly judged when not to approve a loan application

The German Credit Data

Actual historical credit data is not always easy to come by because of confidentiality rules Here is one such dataset ( original) Excel spreadsheet version of the German credit data (download from web)

In spite of the fact that the data is German you should probably make use of it for this assignment (Unless you really can consult a real loan officer )

A few notes on the German dataset

DM stands for Deutsche Mark the unit of currency worth about 90 cents Canadian (but looks and acts like a quarter)

Owns_telephone German phone rates are much higher than in Canada so fewer people own telephones

Foreign_worker There are millions of these in Germany (many from Turkey) It is very hard to get German citizenship if you were not born of German parents

There are 20 attributes used in judging a loan applicant The goal is the classify the applicant into one of two categories good or bad

Subtasks (Turn in your answers to the following tasks)

Laboratory Manual For Data Mining

EXPERIMENT-1

Aim To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Open the Weka GUI Chooser

2) Select EXPLORER present in Applications

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 56

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 57: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute

SampleOutput

EXPERIMENT-2

Aim To identify the rules with some of the important attributes by a) manually and b) Using Weka

Tools Apparatus Weka mining tool

Theory

Association rule mining is defined as Let be a set of n binary attributes called items Let be a set of transactions called the database Each transaction in D has a unique transaction ID and contains a subset of the items in I A rule is defined as an implication of the form X=gtY where

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 57

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 58: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

XY C I and X Π Y=Φ The sets of items (for short itemsets) X and Y are called antecedent (left hand side or LHS) and consequent (righthandside or RHS) of the rule respectively

To illustrate the concepts we use a small example from the supermarket domain

The set of items is I = milkbreadbutterbeer and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right An example rule for the supermarket could be meaning that if milk and bread is bought customers also buy butter

Note this example is extremely small In practical applications a rule needs a support of several hundred transactions before it can be considered statistically significant and datasets often contain thousands or millions of transactions

To select interesting rules from the set of all possible rules constraints on various measures of significance and interest can be used The bestknown constraints are minimum thresholds on support and confidence The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset In the example database the itemset milkbread has a support of 2 5 = 04 since it occurs in 40 of all transactions (2 out of 5 transactions)

The confidence of a rule is defined For example the rule has a confidence of 02 04 = 05 in the database which means that for 50 of the transactions containing milk and bread the rule is correct Confidence can be interpreted as an estimate of the probability P(Y | X) the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS

ALGORITHM

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database The problem is usually decomposed into two subproblems One is to find those itemsets whose occurrences exceed a predefined threshold in the database those itemsets are called frequent or large itemsets The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence

Suppose one of the large itemsets is Lk Lk = I1 I2 hellip Ik association rules with this itemsets are generated in the following way the first rule is I1 I2 hellip Ik1 and Ik by checking the confidence this rule can be determined as interesting or not Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent further the confidences of the new rules are checked to determine the interestingness of them Those processes iterated until the antecedent becomes empty Since the second subproblem is quite straight forward most of the researches focus on the first subproblem The Apriori algorithm finds the frequent sets L In Database D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 58

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 59: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

middot Find frequent set Lk minus 1

middot Join Step

o Ck is generated by joining Lk minus 1with itself

middot Prune Step

o Any (k minus 1) itemset that is not frequent cannot be a subset of a

frequent k itemset hence should be removed

Where middot (Ck Candidate itemset of size k)

middot (Lk frequent itemset of size k)

Apriori Pseudocode

Apriori (Tpound)

Llt Large 1itemsets that appear in more than transactions

Klt2

while L(k1)ne Φ

C(k)ltGenerate( Lk minus 1)

for transactions t euro T

C(t)Subset(Ckt)

for candidates c euro C(t)

count[c]ltcount[ c]+1

L(k)lt c euro C(k)| count[c] ge pound

KltK+ 1

return Ụ L(k) k

Procedure

1) Given the Bank database for mining

2) Select EXPLORER in WEKA GUI Chooser

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 59

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 60: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

3) Load ldquoBankcsvrdquo in Weka by Open file in Preprocess tab

4) Select only Nominal values

5) Go to Associate Tab

6) Select Apriori algorithm from ldquoChoose ldquo button present in Associator

wekaassociationsApriori -N 10 -T 0 -C 09 -D 005 -U 10 -M 01 -S -10 -c -1

7) Select Start button

8) now we can see the sample rules

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 60

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 61: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-3

Aim To create a Decision tree by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

Classification is a data mining function that assigns items in a collection to target categories or classes The goal of classification is to accurately predict the target class for each case in the data For example a classification model could be used to identify loan applicants as low medium or high credit risks

A classification task begins with a data set in which the class assignments are known For example a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time

In addition to the historical credit rating the data might track employment history home ownership or rental years of residence number and type of investments and so on Credit rating would be the target the other attributes would be the predictors and the data for each customer would constitute a case

Classifications are discrete and do not imply order Continuous floatingpoint values would indicate a numerical rather than a categorical target A predictive model with a numerical target uses a regression algorithm not a classification algorithm

The simplest type of classification problem is binary classification In binary classification the target attribute has only two possible values for example high credit rating or low credit rating Multiclass targets have more than two values for example low medium high or unknown credit rating

In the model build (training) process a classification algorithm finds relationships between the values of the predictors and the values of the target Different classification algorithms use

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 61

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 62: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

different techniques for finding relationships These relationships are summarized in a model which can then be applied to a different data set in which the class assignments are unknown

Classification models are tested by comparing the predicted values to known target values in a set of test data The historical data for a classification project is typically divided into two data sets one for building the model the other for testing the model

Scoring a classification model results in class assignments and probabilities for each case For example a model that classifies customers as low medium or high value would also predict the probability of each classification for each customer

Classification has many applications in customer segmentation business modeling marketing credit analysis and biomedical and drug response modeling

Different Classification Algorithms

Oracle Data Mining provides the following algorithms for classification

middot Decision Tree

Decision trees automatically generate rules which are conditional statements that reveal the logic used to build the tree

middot Naive Bayes

Naive Bayes uses Bayes Theorem a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data

Procedure

1) Open Weka GUI Chooser

2) Select EXPLORER present in Applications

3) Select Preprocess Tab

4) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

5) Go to Classify tab

6) Here the c45 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking the button choose

7) and select tree j48

9) Select Test options ldquoUse training setrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 62

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 63: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) if need select attribute

11) Click Start

12)now we can see the output details in the Classifier output

13) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 63

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 64: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

The decision tree constructed by using the implemented C45 algorithm

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 64

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 65: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-4

Aim To find the percentage of examples that are classified correctly by using the above created decision tree model ie Testing on the training set

Tools Apparatus Weka mining tool

Theory

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature For example a fruit may be considered to be an apple if it is red round and about 4 in diameter Even though these features depend on the existence of the other features a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple

An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification Because independent variables are assumed only the variances of the variables for each class need to be determined and not the entirecovariance matrix The naive Bayes probabilistic model

The probability model for a classifier is a conditional model

P(C|F1 Fn) over a dependent class variable C with a small number of outcomes or classes conditional on several feature variables F1 through Fn The problem is that if the number of features n is large or when a feature can take on a large number of values then basing such a model on probability tables is infeasible We therefore reformulate the model to make it more tractable

Using Bayes theorem we write P(C|F1Fn)=[p(C)p(F1Fn|C)p(F1Fn)]

In plain English the above equation can be written as

Posterior= [(prior likehood)evidence]

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 65

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 66: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

In practice we are only interested in the numerator of that fraction since the denominator does not depend on C and the values of the features Fi are given so that the denominator is effectively constant The numerator is equivalent to the joint probability model p(CF1Fn) which can be rewritten as follows using repeated applications of the definition of conditional probability

p(CF1Fn) =p(C) p(F1Fn|C) =p(C)p(F1|C) p(F2Fn|CF1F2)

=p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)

= p(C)p(F1|C) p(F2|CF1)p(F3Fn|CF1F2)p(Fn|CF1F2F3Fn1)

Now the naive conditional independence assumptions come into play assume that each feature Fi is conditionally independent of every other feature Fj for jnei

This means that p(Fi|CFj)=p(Fi|C)

and so the joint model can be expressed as p(CF1Fn)=p(C)p(F1|C)p(F2|C)

=p(C)π p(Fi|C)

This means that under the above independence assumptions the conditional distribution over the class variable C can be expressed like this

p(C|F1Fn)= p(C) πp(Fi|C)

Z

where Z is a scaling factor dependent only on F1Fn ie a constant if the values of the feature variables are known

Models of this form are much more manageable since they factor into a so called class prior p(C) and independent probability distributions p(Fi|C) If there are k classes and if a model for eachp(Fi|C=c) can be expressed in terms of r parameters then the corresponding naive Bayes model has (k minus 1) + n r k parameters In practice often k = 2 (binary classification) and r = 1 (Bernoulli variables as features) are common and so the total number of parameters of the naive Bayes model is 2n + 1 where n is the number of binary features used for prediction

P(hD)= P(Dh) P(h) P(D)

bull P(h) Prior probability of hypothesis h

bull P(D) Prior probability of training data D

bull P(hD) Probability of h given D

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 66

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 67: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

bull P(Dh) Probability of D given h

Naiumlve Bayes Classifier Derivation

bull D Set of tuples

ndash Each Tuple is an lsquonrsquo dimensional attribute vector

ndash X (x1x2x3hellip xn)

bull Let there me lsquomrsquo Classes C1C2C3hellipCm

bull NB classifier predicts X belongs to Class Ci iff

ndash P (CiX) gt P(CjX) for 1lt= j lt= m j ltgt i

bull Maximum Posteriori Hypothesis

ndash P(CiX) = P(XCi) P(Ci) P(X)

ndash Maximize P(XCi) P(Ci) as P(X) is constant

Naiumlve Bayes Classifier Derivation

bull With many attributes it is computationally expensive to evaluate P(XCi)

bull Naiumlve Assumption of ldquoclass conditional independencerdquo

bull P(XCi) = n P( xk Ci)

k = 1

bull P(XCi) = P(x1Ci) P(x2Ci) hellip P(xn Ci)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 67

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 68: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

7) Choose Classifier ldquoTreerdquo

8) Select ldquoNBTreerdquo ie Navie Baysiean tree

9) Select Test options ldquoUse training setrdquo

10) if need select attribute

11) now Start weka

12)now we can see the output details in the Classifier output

Sample output

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 554 923333

Incorrectly Classified Instances 46 76667

Kappa statistic 0845

Mean absolute error 01389

Root mean squared error 02636

Relative absolute error 279979

Root relative squared error 529137

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0894 0052 0935 0894 0914 0936 YES

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 68

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 69: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

0948 0106 0914 0948 0931 0936 NO

Weighted Avg 0923 0081 0924 0923 0923 0936

=== Confusion Matrix ===

a b lt-- classified as

245 29 | a = YES

17 309 | b = NO

EXPERIMENT-5

Aim To ldquoIs testing a good ideardquo

Tools Apparatus Weka Mining tool

Procedure

1) In Test options select the Supplied test set radio button

2) click Set

3) Choose the file which contains records that were not in the training set we used to create the model

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 69

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 70: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

4) click Start(WEKA will run this test data set through the model we already created )

5) Compare the output results with that of the 4th experiment

Sample output

This can be experienced by the different problem solutions while doing practice

The important numbers to focus on here are the numbers next to the Correctly Classified Instances (923 percent) and the Incorrectly Classified Instances (76 percent) Other important numbers are in the ROC Area column in the first row (the 0936) Finally in the Confusion Matrix it shows the number of false positives and false negatives The false positives are 29 and the false negatives are 17 in this matrix

Based on our accuracy rate of 923 percent we say that upon initial analysis this is a good model

One final step to validating our classification tree which is to run our test set through the model and ensure that accuracy of the model

Comparing the Correctly Classified Instances from this test set with the Correctly Classified Instances from the training set we see the accuracy of the model which indicates that the model will not break down with unknown data or when future data is applied to it

EXPERIMENT-6

Aim To create a Decision tree by cross validation training data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 70

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 71: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Decision tree learning used in data mining and machine learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value In these tree structures leaves represent classifications and branches represent conjunctions of features that lead to those classifications In decision analysis a decision tree can be used to visually and explicitly represent decisions and decision making In data mining a decision tree describes data but not decisions rather the resulting classification tree can be an input for decision making This page deals with decision trees in data mining

Decision tree learning is a common method used in data mining The goal is to create a model that predicts the value of a target variable based on several input variables Each interior node corresponds to one of the input variables there are edges to children for each of the possible values of that input variable Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf

A tree can be learned by splitting the source set into subsets based on an attribute value test This process is repeated on each derived subset in a recursive manner called recursive partitioning The recursion is completed when the subset at a node all has the same value of the target variable or when splitting no longer adds value to the predictions

In data mining trees can be described also as the combination of mathematical and computational techniques to aid the description categorisation and generalization of a given set of data

Data comes in records of the form

(x y) = (x1 x2 x3 xk y)

The dependent variable Y is the target variable that we are trying to understand classify or generalise The vector x is comprised of the input variables x1 x2 x3 etc that are used for that task

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 71

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 72: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select J48

9) Select Test options ldquoCross-validationrdquo

10) Set ldquoFoldsrdquo Ex10

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14)Compare the output results with that of the 4th experiment

15) check whether the accuracy increased or decreased

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 72

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 73: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 539 898333

Incorrectly Classified Instances 61 101667

Kappa statistic 07942

Mean absolute error 0167

Root mean squared error 0305

Relative absolute error 336511

Root relative squared error 612344

Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0861 0071 0911 0861 0886 0883 YES

0929 0139 0889 0929 0909 0883 NO

Weighted Avg 0898 0108 0899 0898 0898 0883

=== Confusion Matrix ===

a b lt-- classified as

236 38 | a = YES

23 303 | b = NO

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 73

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 74: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-7

Aim Delete one attribute from GUI Explorer and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) In the Filter panel click on the Choose button This will show a popup window with list available filters

7) Select ldquowekafiltersunsupervisedattributeRemoverdquo

8) Next click on text box immediately to the right of the Choose button

9) In the resulting dialog box enter the index of the attribute to be filtered out (Make sure that the invertSelection option is set to false )

10) Then click OK Now in the filter box you will see Remove -R 1

11) Click the Apply button to apply this filter to the data This will remove the id attribute and create a new working relation

12) To save the new working relation as an ARFF file click on save button in the top panel

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 74

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 75: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

13) Go to OPEN file and browse the file that is newly saved (attribute deleted file)

14) Go to Classify tab

15) Choose Classifier ldquoTreerdquo

16) Select j48 tree

17) Select Test options ldquoUse training setrdquo

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21) right click on the result list and select rdquo visualize tree ldquooption

22) Compare the output results with that of the 4th experiment

23) check whether the accuracy increased or decreased

24)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 75

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 76: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 76

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 77: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 77

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 78: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-8

Aim Select some attributes from GUI Explorer and perform classification and see the effect using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list which are to be removed With this step only the attributes necessary for classification are left in the attributes panel

7) The go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select j48

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

15)Compare the output results with that of the 4th experiment

16) check whether the accuracy increased or decreased

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 78

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 79: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

17)check whether removing these attributes have any significant effect

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 79

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 80: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-9

Aim To create a Decision tree by cross validation training data set by changing the cost matrix in Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) Go to Classify tab

7) Choose Classifier ldquoTreerdquo

8) Select j48

9) Select Test options ldquoTraining setrdquo

10)Click on ldquomore optionsrdquo

11)Select cost sensitive evaluation and click on set button

12)Set the matrix values and click on resize Then close the window

13)Click Ok

14)Click start

15) we can see the output details in the Classifier output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 80

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 81: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

16) Select Test options ldquoCross-validationrdquo

17) Set ldquoFoldsrdquo Ex10

18) if need select attribute

19) now Start weka

20)now we can see the output details in the Classifier output

21)Compare results of 15th and 20th steps

22)Compare the results with that of experiment 6

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 81

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 82: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-10

Aim Is small rule better or long rule check the biasby training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

This will be based on the attribute set and the requirement of relationship among attribute we want to study This can be viewed based on the database and user requirement

EXPERIMENT-11

Aim To create a Decision tree by using Prune mode and Reduced error Pruning and show accuracy for cross validation trained data set using Weka mining tool

Tools Apparatus Weka mining tool

Theory

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 82

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 83: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Reduced-error pruning

1048708 Each node of the (over-fit) tree is examined for pruning

1048708 A node is pruned (removed) only if the resulting pruned tree

performs no worse than the original over the validation set

1048708 Pruning a node consists of

bull Removing the sub-tree rooted at the pruned node

bull Making the pruned node a leaf node

bull Assigning the pruned node the most common classification of the training instances attached to that node

1048708 Pruning nodes iteratively

bull Always select a node whose removal most increases the DT accuracy over the validation set

bull Stop when further pruning decreases the DT accuracy over the validation set

IF (Children=yes) Λ (income=gt30000)

THEN (car=Yes)

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreerdquo

9) Select ldquoNBTreerdquo ie Navie Baysiean tree

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 83

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 84: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

10) Select Test options ldquoUse training setrdquo

11) right click on the text box besides choose button select show properties

12) now change unprone mode ldquofalserdquo to ldquotruerdquo

13) change the reduced error pruning as needed

14) if need select attribute

15) now Start weka

16)now we can see the output details in the Classifier output

17) right click on the result list and select rdquo visualize tree ldquooption

Sample output

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 84

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 85: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 85

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 86: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

EXPERIMENT-12

Aim To compare OneR classifier which uses single attribute and rule with J48 and PART classifierrsquos by training data set using Weka mining tool

Tools Apparatus Weka mining tool

Procedure

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoTreesRulesrdquo

9) Select ldquoJ48rdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

14) right click on the result list and select rdquo visualize tree ldquooption

(or)

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 86

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 87: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

Procedure for ldquoOneRrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoOneRrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Procedure for ldquoPARTrdquo

1) Given the Bank database for mining

2) Use the Weka GUI Chooser

3) Select EXPLORER present in Applications

4) Select Preprocess Tab

5) Go to OPEN file and browse the file that is already stored in the system ldquobankcsvrdquo

6) select some of the attributes from attributes list

7) Go to Classify tab

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 87

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 88: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

8) Choose Classifier ldquoRulesrdquo

9) Select ldquoPARTrdquo

10) Select Test options ldquoUse training setrdquo

11) if need select attribute

12) now Start weka

13)now we can see the output details in the Classifier output

Attribute relevance with respect to the class ndash relevant attribute (science)

IF accounting=1 THEN class=A (Error=0 Coverage = 7 instance)

IF accounting=0 THEN class=B (Error=413 Coverage = 13 instances)

Sample output

J48

java wekaclassifierstreesJ48 -t ctempbankarff

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 88

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 89: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

One R

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 89

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment
Page 90: Lpdm Lab Manul

LINUX PROGRAMMING AND DATA MINING LAB MANUAL

PART

VIDYA VIKAS INSTITUTE OF TECHNOLOGY Page 90

  • Controlling a Shared Memory Segment
  • Attaching and Detaching a Shared Memory Segment