DM record1

Embed Size (px)

Citation preview

  • 7/31/2019 DM record1

    1/60

    Roll No : 04 09 8106

    1. Gain insight for running pre-defined decision trees and explore results

    using MS OLAP Analytics.

    Solution:

    The purpose of this experiment is to generate a decision tree for a given data set. We can

    either write our own data set or use a predefined data set provided to us as in this case is the

    bank example.

    We start off by opening the weka explorer window.

    The above screen would appear, click on the Explorer button to begin.

    1 | P a g e

  • 7/31/2019 DM record1

    2/60

    Roll No : 04 09 8106

    Once the explorer is opened, we click on open file and select the appropriate data set.

    REMEMBER the data sets if written by you it should be saved as .arff file. For now we

    import an already existing file by using the open option. As shown below,

    2 | P a g e

  • 7/31/2019 DM record1

    3/60

    Roll No : 04 09 8106

    Once we import the data set, the software itself generates the following,

    Weka displays all the attributes of the imported dataset and shows some statistics based

    graph.

    Click on the classify tab and choose the algorithm J48 as shown below,

    3 | P a g e

  • 7/31/2019 DM record1

    4/60

    Roll No : 04 09 8106

    After selecting J48, we can see the defaults are selected that is Cross-Validation and in the

    drop down box, Nominal attribute pep is selected as default. Click on the start button and

    an output screen as below will be seen,

    4 | P a g e

  • 7/31/2019 DM record1

    5/60

    Roll No : 04 09 8106

    In the result list Right Click trees.J48 & choose the option visualize tree, as seen below,

    5 | P a g e

  • 7/31/2019 DM record1

    6/60

    Roll No : 04 09 8106

    The output as below will be generated,

    We also save the result buffer as follows,

    The buffer result gives information about the TP RATE, FP RATE, PRECISION,

    RECALL, F-MEASURE and the CONFUSION MATRIX.

    6 | P a g e

  • 7/31/2019 DM record1

    7/60

    Roll No : 04 09 8106

    7 | P a g e

  • 7/31/2019 DM record1

    8/60

    Roll No : 04 09 8106

    Buffer result.arff (Includes Time taken, Decision tree details, confusion matrix and all other

    details)

    Upon opening the buffer result.arff file we get,

    === Run information ===

    Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2Relation: bankInstances: 300Attributes: 9

    agesexregion

    incomemarriedchildrencarmortgagepep

    Test mode: 10-fold cross-validation

    === Classifier model (full training set) ===

    J48 pruned tree------------------

    children = YES| income

  • 7/31/2019 DM record1

    9/60

    Roll No : 04 09 8106

    | | car = YES: NO (50.0/15.0)| | car = NO| | | married = YES| | | | income 13106.6| | | | | mortgage = YES: YES (12.0/3.0)

    | | | | | mortgage = NO| | | | | | income 18923: NO (10.0/3.0)| | | married = NO: NO (22.0/6.0)| income > 30099.3: YES (59.0/7.0)children = NO| married = YES| | mortgage = YES| | | region = INNER_CITY| | | | income 39547.8: NO (4.0)| | | region = RURAL: NO (3.0/1.0)| | | region = TOWN: NO (9.0/2.0)

    | | | region = SUBURBAN: NO (4.0/1.0)| | mortgage = NO: NO (57.0/9.0)| married = NO| | mortgage = YES| | | age 39: NO (11.0)| | mortgage = NO: YES (20.0/1.0)

    Number of Leaves : 17

    Size of the tree : 31

    Time taken to build model: 0.09 seconds

    === Stratified cross-validation ====== Summary ===

    Correctly Classified Instances 206 68.6667 %Incorrectly Classified Instances 94 31.3333 %Kappa statistic 0.3576Mean absolute error 0.379Root mean squared error 0.4816Relative absolute error 76.2791 %

    Root relative squared error 96.6145 %Total Number of Instances 300

    === Detailed Accuracy By Class ===

    TP Rate FP Rate Precision Recall F-Measure ROC AreaClass

    0.536 0.185 0.712 0.536 0.612 0.683YES

    0.815 0.464 0.673 0.815 0.737 0.683NOWeighted Avg. 0.687 0.336 0.691 0.687 0.68 0.683

    === Confusion Matrix ===

    a b

  • 7/31/2019 DM record1

    10/60

    Roll No : 04 09 8106

    74 64 | a = YES30 132 | b = NO

    This completes the first experiment.

    2. Design a data mart from scratch to store the credit history of customers of

    a bank. Use this credit profiling to process future loan applications.

    10 | P a g e

  • 7/31/2019 DM record1

    11/60

    Roll No : 04 09 8106

    Solution:

    The purpose of this experiment is the prediction of attribute values for the future unknown

    attribute values using the known class labeled values of the records.

    Start the software SPRINT (WELLY), as shown below,

    Click on the Explorer button to open the Welly Explorer,

    11 | P a g e

  • 7/31/2019 DM record1

    12/60

    Roll No : 04 09 8106

    Use the open file button to import the bank data set,

    12 | P a g e

  • 7/31/2019 DM record1

    13/60

    Roll No : 04 09 8106

    Select the classify tab and choose the J48 algorithm and as before click the start button.

    13 | P a g e

  • 7/31/2019 DM record1

    14/60

    Roll No : 04 09 8106

    After this, in the test options section select the supplied test set option and click the set

    button and provide the test set file bank-new.arff.

    Once we have provided the test set i.e. bank-new.arff we right click on the result list andchoose the option as visualize classifier errors, as shown below,

    14 | P a g e

  • 7/31/2019 DM record1

    15/60

    Roll No : 04 09 8106

    The output screen as below will be seen,

    15 | P a g e

  • 7/31/2019 DM record1

    16/60

    Roll No : 04 09 8106

    On this page click on the save option and save the file as shown below,

    The saved file contains the predicted values. We show this by comparing all the 3 datasets

    being used here, Bank.arff (main dataset), Bank-new.arff (supplied test set) and bank-

    predicted.arff (generated output).

    16 | P a g e

  • 7/31/2019 DM record1

    17/60

    Roll No : 04 09 8106

    Bank.arff:

    Bank-new.arff:

    17 | P a g e

  • 7/31/2019 DM record1

    18/60

    Roll No : 04 09 8106

    Bank-predicted.arff:

    18 | P a g e

  • 7/31/2019 DM record1

    19/60

    Roll No : 04 09 8106

    Thus the values have been predicted. As we can see that the question marks have been

    replaced with a yes or no value.

    19 | P a g e

  • 7/31/2019 DM record1

    20/60

    Roll No : 04 09 8106

    3. For a given dataset generate the Association rules using weka and based

    on these association rules describe which rules are Strong and which

    rules are Weak.

    Solution:

    The purpose of this experiment is to generate the associate rules for a given dataset in this

    case we use the contact lenses dataset. Point to remember here is that association rules

    can be only generated for nominal attributes (that is only attributes which have a choice asin [yes,no] or [male,female] etc). Based on the generated rules we have to calculate the

    support and confidence values and then describe which rules are Strong or Weak.

    We start off by opening the weka explorer window.

    The above screen would appear, click on the Explorer button to begin.

    20 | P a g e

  • 7/31/2019 DM record1

    21/60

    Roll No : 04 09 8106

    Now provide the contact-lenses data set using the open file option,

    21 | P a g e

  • 7/31/2019 DM record1

    22/60

    Roll No : 04 09 8106

    Once we have done this, click on the associate tab and choose the Apriori algorithm which

    should be chosen by default,

    And then click the start button, the associate rules will be generated as below,

    22 | P a g e

  • 7/31/2019 DM record1

    23/60

    Roll No : 04 09 8106

    === Run information ===

    Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M0.1 -S -1.0 -c -1Relation: contact-lensesInstances: 24Attributes: 5

    agespectacle-prescrip

    astigmatismtear-prod-ratecontact-lenses

    === Associator model (full training set) ===

    Apriori=======

    Minimum support: 0.2 (5 instances)Minimum metric : 0.9Number of cycles performed: 16

    Generated sets of large itemsets:

    23 | P a g e

  • 7/31/2019 DM record1

    24/60

    Roll No : 04 09 8106

    Size of set of large itemsets L(1): 11

    Size of set of large itemsets L(2): 21

    Size of set of large itemsets L(3): 6

    Best rules found:

    1. tear-prod-rate=reduced 12 ==> contact-lenses=none 12 conf:(1)2. spectacle-prescrip=myope tear-prod-rate=reduced 6 ==> contact-lenses=none 6 conf:(1)3. spectacle-prescrip=hypermetrope tear-prod-rate=reduced 6 ==> contact-lenses=none 6 conf:(1)4. astigmatism=no tear-prod-rate=reduced 6 ==> contact-lenses=none 6conf:(1)5. astigmatism=yes tear-prod-rate=reduced 6 ==> contact-lenses=none 6conf:(1)6. contact-lenses=soft 5 ==> astigmatism=no 5 conf:(1)7. contact-lenses=soft 5 ==> tear-prod-rate=normal 5 conf:(1)

    8. tear-prod-rate=normal contact-lenses=soft 5 ==> astigmatism=no 5conf:(1)9. astigmatism=no contact-lenses=soft 5 ==> tear-prod-rate=normal 5conf:(1)10. contact-lenses=soft 5 ==> astigmatism=no tear-prod-rate=normal 5conf:(1)

    Confidence values are already given. Support values must be calculated by dividng the value in

    each rule by the total no. of instances i.e.

    For example,

    1st rule says that there are 12 instances where, for tear-prod-rate attribute the value is reduced

    and for all that contact-lenses attribute value is none so support is calculated as 12 divided by

    total no. of instances which is 24.

    Hence , 12/24 = 0.5. Thus the support for rule 1 is 0.5 and confidence is 1.

    Likewise, support and confidence values should be calculated for all the rules. Based on the

    question given we can decide as to whether the rule is strong or weak.( In the question it would

    be mentioned as rules with support value 0.5 or above and confidence value 1 are all strong

    rules, so based on such a question we must calculate the values and demonstrate which rules

    are strong and which rules are weak.)

    24 | P a g e

  • 7/31/2019 DM record1

    25/60

    Roll No : 04 09 8106

    4. To understand ETL (Extract Transform Load) processes.

    Solution:

    The purpose of this experiment is to create 2 tables and show inner join operation on these

    tables using MySQL 5.0 based on query.

    We begin by starting the MySQL command prompt. The command prompt asks for a

    password. As shown below,

    Enter password: ******* (i.e. root123)

    Welcome to the MySQL monitor. Commands end with ; or \g.

    Your MySQL connection id is 1

    Server version: 5.1.42-community MySQL Community Server (GPL)

    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

    mysql> show databases;

    +--------------------+

    | Database |

    +--------------------+

    | information_schema |

    | mysql |

    | test |

    | useless |

    +--------------------+

    4 rows in set (0.03 sec)

    mysql> use information_schema;

    25 | P a g e

  • 7/31/2019 DM record1

    26/60

    Roll No : 04 09 8106

    Database changed

    mysql> show tables;

    +---------------------------------------+

    | Tables_in_information_schema |

    +---------------------------------------+

    | CHARACTER_SETS |

    | COLLATIONS |

    | COLLATION_CHARACTER_SET_APPLICABILITY |

    | COLUMNS |

    | COLUMN_PRIVILEGES |

    | ENGINES |

    | EVENTS |

    | FILES |

    | GLOBAL_STATUS |

    | GLOBAL_VARIABLES |

    | KEY_COLUMN_USAGE |

    | PARTITIONS |

    | PLUGINS |

    | PROCESSLIST |

    | PROFILING |

    | REFERENTIAL_CONSTRAINTS |

    | ROUTINES |

    | SCHEMATA |

    | SCHEMA_PRIVILEGES |

    | SESSION_STATUS |

    | SESSION_VARIABLES |

    | STATISTICS |

    | TABLES |

    | TABLE_CONSTRAINTS |

    | TABLE_PRIVILEGES |

    26 | P a g e

  • 7/31/2019 DM record1

    27/60

    Roll No : 04 09 8106

    | TRIGGERS |

    | USER_PRIVILEGES |

    | VIEWS |

    +---------------------------------------+

    28 rows in set (0.00 sec)

    mysql> use mysql;

    Database changed

    mysql> show tables;

    +---------------------------+

    | Tables_in_mysql |

    +---------------------------+

    | columns_priv |

    | db |

    | event |

    | func |

    | general_log |

    | help_category |

    | help_keyword |

    | help_relation |

    | help_topic |

    | host |

    | ndb_binlog_index |

    | plugin |

    | proc |

    | procs_priv |

    | servers |

    | slow_log |

    | tables_priv |

    | time_zone |

    27 | P a g e

  • 7/31/2019 DM record1

    28/60

    Roll No : 04 09 8106

    | time_zone_leap_second |

    | time_zone_name |

    | time_zone_transition |

    | time_zone_transition_type |

    | user |

    +---------------------------+

    23 rows in set (0.19 sec)

    mysql> create database jkl;

    Query OK, 1 row affected (0.00 sec)

    mysql> use jkl;

    Database changed

    mysql> create table orders(orderid varchar(5),productid varchar(5),quantity int(

    5),unitsaleprice int(5),discountprice int(5),numoffreeservice int(3));

    Query OK, 0 rows affected (0.05 sec)

    mysql> insert into orders values('O101','P121',10,5000,10,2);

    Query OK, 1 row affected (0.03 sec)

    mysql> insert into orders values('O101','P01',1,234,5,3);

    Query OK, 1 row affected (0.02 sec)

    mysql> insert into orders values('O101','P180',2,4000,12,3);

    Query OK, 1 row affected (0.02 sec)

    mysql> insert into orders values('O102','P02',5,2500,3,2);

    Query OK, 1 row affected (0.02 sec)

    28 | P a g e

  • 7/31/2019 DM record1

    29/60

    Roll No : 04 09 8106

    mysql> insert into orders values('O102','P122',2,2800,3,2);

    Query OK, 1 row affected (0.03 sec)

    mysql> SELECT * FROM ORDERS;

    +---------+-----------+----------+---------------+---------------+------------------+

    | orderid | productid | quantity | unitsaleprice | discountprice | numoffreeservice |

    +---------+-----------+----------+---------------+---------------+------------------+

    | O101 | P121 | 10 | 5000 | 10 | 2 |

    | O101 | P01 | 1 | 234 | 5 | 3 |

    | O101 | P180 | 2 | 4000 | 12 | 3 |

    | O102 | P02 | 5 | 2500 | 3 | 2 |

    | O102 | P122 | 2 | 2800 | 3 | 2 |

    +---------+-----------+----------+---------------+---------------+------------------+

    5 rows in set (0.00 sec)

    mysql> create table products(productid varchar(5) primary key,companyid varchar(

    5),productname varchar(10),producttype varchar(10),productprice int(5),productdo

    m date,productinstock int(5));

    Query OK, 0 rows affected (0.06 sec)

    mysql> insert into products values('P01','30','CABLE',121,234,'1999-01-09',25);

    Query OK, 1 row affected (0.02 sec)

    mysql> insert into products values('P02','28','OPCABLE',122,500,'1998-08-04',35)

    ;

    Query OK, 1 row affected (0.03 sec)

    mysql> insert into products values('P121','12','MONITOR',147,5000,'2001-09-25',1

    9);

    29 | P a g e

  • 7/31/2019 DM record1

    30/60

    Roll No : 04 09 8106

    Query OK, 1 row affected (0.03 sec)

    mysql> insert into products values('P122','11','BATTERY',124,1400,'2003-08-15',1

    5);

    Query OK, 1 row affected (0.03 sec)

    mysql> insert into products values('P180','2','FAX',168,2100,'2003-03-12',12);

    Query OK, 1 row affected (0.01 sec)

    mysql> SELECT * FROM PRODUCTS;

    +-----------+-----------+-------------+-------------+--------------+------------+----------------+

    | productid | companyid | productname | producttype | productprice | productdom |

    productinstock |

    +-----------+-----------+-------------+-------------+--------------+------------+----------------+

    | P01 | 30 | CABLE | 121 | 234 | 1999-01-09 | 25 |

    | P02 | 28 | OPCABLE | 122 | 500 | 1998-08-04 | 35 |

    | P121 | 12 | MONITOR | 147 | 5000 | 2001-09-25 | 19 |

    | P122 | 11 | BATTERY | 124 | 1400 | 2003-08-15 | 15 |

    | P180 | 2 | FAX | 168 | 2100 | 2003-03-12 | 12 |

    +-----------+-----------+-------------+-------------+--------------+------------+----------------+

    5 rows in set (0.00 sec)

    mysql> select p1.productid,p1.productname,p2.quantity,p2.orderid from products as

    p1 INNER JOIN orders as p2 on p1.productid=p2.productid;

    +-----------+-------------+----------+---------+

    | productid | productname | quantity | orderid |

    +-----------+-------------+----------+---------+

    | P01 | CABLE | 1 | O101 |

    | P02 | OPCABLE | 5 | O102 |

    | P121 | MONITOR | 10 | O101 |

    30 | P a g e

  • 7/31/2019 DM record1

    31/60

    Roll No : 04 09 8106

    | P122 | BATTERY | 2 | O102 |

    | P180 | FAX | 2 | O101 |

    +-----------+-------------+----------+---------+

    5 rows in set (0.00 sec)

    Various options such as show databases;, show tables;, use database-name; can be

    used as well.

    31 | P a g e

  • 7/31/2019 DM record1

    32/60

    Roll No : 04 09 8106

    5. Generate a report using the Report Studio of cognos 8 using attributes

    from provided tables.

    Solution:

    The purpose of this experiment is to show the generation of a report using the report studio

    of cognos using the tables provided.

    We start of opening the internet explorer and typing in the address,

    192.100.100.150/cognos 8

    A page as below will appear,

    32 | P a g e

  • 7/31/2019 DM record1

    33/60

    Roll No : 04 09 8106

    Click on Quick Tour and in here select the MJCET public folder among the options and

    click on the report studio option present at the top right of the screen. As shown below,

    Once u click on the report studio, the report studio opens as below,

    33 | P a g e

  • 7/31/2019 DM record1

    34/60

    Roll No : 04 09 8106

    Choose the create new report option and begin,

    34 | P a g e

  • 7/31/2019 DM record1

    35/60

    Roll No : 04 09 8106

    You can choose any of the mentioned options, in this example I show using a list, so click

    on the list option and press OK.

    35 | P a g e

  • 7/31/2019 DM record1

    36/60

    Roll No : 04 09 8106

    Using the inset table options present on the left of the report studio, from the mypck folder

    we can use any of the tables. Here we use the customer table and import some of its

    attributes onto the right side as shown above. We simply drag and drop the attribute onto

    the right side to generate the view as above.

    We then use the play/run button on the tools horizontal menu and click on it to generate the

    report.

    The run button is the 12th button from the left. After clicking it we have to enter a user idand password ( sa and lab4 respectively). And hit ok to generate the report. As shown

    below,

    The generated report is as shown below,

    36 | P a g e

  • 7/31/2019 DM record1

    37/60

    Roll No : 04 09 8106

    37 | P a g e

  • 7/31/2019 DM record1

    38/60

    Roll No : 04 09 8106

    6.Generate Query based reports using Query Studio which performs thefollowing operations.

    i. Pivot

    ii. Group

    iii. Ungroup

    iv. Filter

    Solution:

    The purpose of this experiment is to show the generation of a query based reports using the

    query studio of cognos using the tables provided.

    We start of opening the internet explorer and typing in the address,

    192.100.100.150/cognos 8

    A page as below will appear,

    38 | P a g e

  • 7/31/2019 DM record1

    39/60

    Roll No : 04 09 8106

    Click on Quick Tour and in here select the MJCET public folder among the options and

    click on the query studio option present at the top right of the screen. As shown below,

    39 | P a g e

  • 7/31/2019 DM record1

    40/60

    Roll No : 04 09 8106

    The query studio begins and we enter the username and password as below,

    40 | P a g e

  • 7/31/2019 DM record1

    41/60

    Roll No : 04 09 8106

    Select the attributes by merely just drag and drop from left to right side or use the Insert

    button provided on the left side of the query studio as shown in the above image. Use the

    buttons in the tools to generate the reports for the corresponding operations.

    In query studio as it is in report studio we have a list of tools as shown below,

    These tools include the operations such as run, group, ungroup, pivot and filter buttons as

    shown above.

    41 | P a g e

  • 7/31/2019 DM record1

    42/60

    Roll No : 04 09 8106

    i. Pivot:

    Here in query studio we choose order and product tables attributes to show the operations.

    In pivot we show the relation between attributes as shown below,

    As we can see we have selected orderId, productId and Quantity from the 2 tables

    mentioned above and we have selected orderId as the attribute upon which we apply the

    pivot operation this is highlighted by the yellow colour. We then use the pivot button

    provided in the tools options and we generate the below,

    42 | P a g e

  • 7/31/2019 DM record1

    43/60

    Roll No : 04 09 8106

    ii. Group:In group operation we show the forming of groups for a particular inserted value, here we

    use orderId, productId and productName attributes and group on orderId attribute. As we

    can see below,

    43 | P a g e

  • 7/31/2019 DM record1

    44/60

    Roll No : 04 09 8106

    Selection of orderId as the attribute on which group operation is applied is signified by

    yellow colour.

    iii. Ungroup:

    44 | P a g e

  • 7/31/2019 DM record1

    45/60

    Roll No : 04 09 8106

    In ungroup operation we show the opposite of grouping operation we remove the crosstab

    formed using the group operation. As we can see below,

    Selection of orderId as the attribute on which group operation is applied is signified by

    yellow colour.

    45 | P a g e

  • 7/31/2019 DM record1

    46/60

    Roll No : 04 09 8106

    iv. Filter:

    In Filter operation we can choose which values of a particular selection of attributes we

    want to view. As we can see below,

    We have selected orderId, productId and Quantity as the attributes and select orderId as our

    attribute on which filter operation is applied signified by the yellow color.

    46 | P a g e

  • 7/31/2019 DM record1

    47/60

    Roll No : 04 09 8106

    Here we select the orderId values for which the report should be generated as we can see in

    the previous figure. We then hit the ok button and we generate the report below,

    47 | P a g e

  • 7/31/2019 DM record1

    48/60

    Roll No : 04 09 8106

    7. Design and build a DataWareHouse using bottom-up approach titled

    Citizen Information System. This should be able to serve the analytical needs

    of the various Government Departments and also provide a global integrated

    view.

    48 | P a g e

  • 7/31/2019 DM record1

    49/60

    Roll No : 04 09 8106

    Objective: Created three tables( birthcertf, college, addrproof) and also onemaster table ( citizeninfosystem ) joining these three tables. The objective is to

    provide complete information related to citizen using MySQL 5.0. based on query.

    mysql> create database prog8;

    Query OK, 1 row affected (0.01 sec)

    mysql> use prog8;

    Database changed

    mysql> create table birthcertf(name varchar(30), fathersname varchar(30),birth DATE,location varchar(30), SSN varchar(3),

    PRIMARY KEY(SSN));

    Query OK, 0 rows affected (0.06 sec)

    mysql> insert into birthcertf values('KLM','STR','1988-03-27','DELHI','665');

    Query OK, 1 row affected (0.03 sec)

    mysql> insert into birthcertf values('ABC','XYZ','1989-05-03','HYD','667');

    Query OK, 1 row affected (0.03 sec)

    mysql> insert into birthcertf values('GHI','MNO','1990-11-17','BNGLR','645');

    Query OK, 1 row affected (0.03 sec)

    mysql> insert into birthcertf values('DEF','PQR','1987-06-07','HYD','785');

    Query OK, 1 row affected (0.01 sec)

    mysql> insert into birthcertf values('GTH','THR','1989-06-01','CHNNI','765');

    Query OK, 1 row affected (0.03 sec)

    mysql> insert into birthcertf values('RTU','YHN','1994-07-24','DELHI','378');

    49 | P a g e

  • 7/31/2019 DM record1

    50/60

    Roll No : 04 09 8106

    Query OK, 1 row affected (0.01 sec)

    mysql> select * from birthcertf;

    +------+-------------+------------+----------+-----+

    | name | fathersname | birth | location | SSN |

    +------+-------------+------------+----------+-----+

    | RTU | YHN | 1994-07-24 | DELHI | 378 |

    | GHI | MNO | 1990-11-17 | BNGLR | 645 |

    | KLM | STR | 1988-03-27 | DELHI | 665 |

    | ABC | XYZ | 1989-05-03 | HYD | 667 |

    | GTH | THR | 1989-06-01 | CHNNI | 765 |

    | DEF | PQR | 1987-06-07 | HYD | 785 |

    +------+-------------+------------+----------+-----+

    6 rows in set (0.00 sec)

    mysql> create table college (name varchar(30),fathersname varchar(30),birth DATE

    ,SSN varchar(3) REFERENCES birthcertf(SSN),rollno varchar(3));

    Query OK, 0 rows affected (0.06 sec)

    mysql> insert into collge values('ABC','XYZ','1989-05-03','667','10');

    ERROR 1146 (42S02): Table 'prog8.collge' doesn't exist

    mysql> insert into college values('ABC','XYZ','1989-05-03','667','10');

    Query OK, 1 row affected (0.01 sec)

    mysql> insert into college values('KLM','STR','1988-03-27','665','11');

    Query OK, 1 row affected (0.03 sec)

    mysql> insert into college values('GHI','MNO','1990-11-17','645','15');

    Query OK, 1 row affected (0.02 sec)

    50 | P a g e

  • 7/31/2019 DM record1

    51/60

    Roll No : 04 09 8106

    mysql> insert into college values('DEF','PQR','1987-06-07','785','14');

    Query OK, 1 row affected (0.02 sec)

    mysql> insert into college values('GTH','THR','1989-06-01','765','12');

    Query OK, 1 row affected (0.03 sec)

    mysql> insert into college values('RTU','YHN','1994-07-24','378','19');

    Query OK, 1 row affected (0.02 sec)

    mysql> select * from college;

    +------+-------------+------------+------+--------+

    | name | fathersname | birth | SSN | rollno |

    +------+-------------+------------+------+--------+

    | ABC | XYZ | 1989-05-03 | 667 | 10 |

    | KLM | STR | 1988-03-27 | 665 | 11 |

    | GHI | MNO | 1990-11-17 | 645 | 15 |

    | DEF | PQR | 1987-06-07 | 785 | 14 |

    | GTH | THR | 1989-06-01 | 765 | 12 |

    | RTU | YHN | 1994-07-24 | 378 | 19 |

    +------+-------------+------------+------+--------+

    6 rows in set (0.00 sec)

    mysql> create table addrproof ( name1 varchar(30),name2 varchar(30),

    address varchar(30), SSN varchar(30)

    REFERENCES birthcertf(SSN));

    Query OK, 0 rows affected (0.08 sec)

    mysql> insert into addrproof values('ABC','XYZ','HYD','667');

    Query OK, 1 row affected (0.03 sec)

    51 | P a g e

  • 7/31/2019 DM record1

    52/60

    Roll No : 04 09 8106

    mysql> insert into addrproof values('RTU','YHN','HYD','378');

    Query OK, 1 row affected (0.03 sec)

    mysql> insert into addrproof values('GHI','MNO','DELHI','645');

    Query OK, 1 row affected (0.01 sec)

    mysql> insert into addrproof values('KLM','STR','BNGLR','665');

    Query OK, 1 row affected (0.02 sec)

    mysql> insert into addrproof values('GTH','THR','BNGLR','765');

    Query OK, 1 row affected (0.03 sec)

    mysql> insert into addrproof values('DEF','PQR','CHNNI','785');

    Query OK, 1 row affected (0.02 sec)

    mysql> select * from addrproof;

    +-------+-------+---------+------+

    | name1 | name2 | address | SSN |

    +-------+-------+---------+------+

    | ABC | XYZ | HYD | 667 |

    | RTU | YHN | HYD | 378 |

    | GHI | MNO | DELHI | 645 |

    | KLM | STR | BNGLR | 665 |

    | GTH | THR | BNGLR | 765 |

    | DEF | PQR | CHNNI | 785 |

    +-------+-------+---------+------+

    6 rows in set (0.00 sec)

    52 | P a g e

  • 7/31/2019 DM record1

    53/60

    Roll No : 04 09 8106

    mysql> create table CitizenInfoSystem as( select birthcertf.name,

    -> birthcertf.fathersname,birthcertf.birth,

    -> birthcertf.SSN,

    -> (birthcertf.location) as birthloc,

    -> addrproof.address,

    -> college.rollno

    -> from birthcertf INNER JOIN addrproof

    -> on birthcertf.name=addrproof.name1

    -> INNER JOIN college

    -> on birthcertf.name=college.name)

    -> ;

    Query OK, 6 rows affected (0.08 sec)

    Records: 6 Duplicates: 0 Warnings: 0

    mysql> select * from CitizenInfoSystem;

    +------+-------------+------------+-----+----------+---------+--------+

    | name | fathersname | birth | SSN | birthloc | address | rollno |

    +------+-------------+------------+-----+----------+---------+--------+

    | ABC | XYZ | 1989-05-03 | 667 | HYD | HYD | 10 |

    | KLM | STR | 1988-03-27 | 665 | DELHI | BNGLR | 11 |

    | GHI | MNO | 1990-11-17 | 645 | BNGLR | DELHI | 15 |

    | DEF | PQR | 1987-06-07 | 785 | HYD | CHNNI | 14 |

    | GTH | THR | 1989-06-01 | 765 | CHNNI | BNGLR | 12 |

    | RTU | YHN | 1994-07-24 | 378 | DELHI | HYD | 19 |

    +------+-------------+------------+-----+----------+---------+--------+

    6 rows in set (0.00 sec)

    53 | P a g e

  • 7/31/2019 DM record1

    54/60

    Roll No : 04 09 8106

    8.Case study for drawing datawarehouse schema using star or snowflake

    schema or fact constellation schema.

    Solution:

    Star Schema:

    54 | P a g e

  • 7/31/2019 DM record1

    55/60

    Roll No : 04 09 8106

    Snowflake Schema:

    Fact Constellation Schema:

    55 | P a g e

  • 7/31/2019 DM record1

    56/60

    Roll No : 04 09 8106

    9. Create an example of a data set and form its decision tree.

    SOLUTION:

    car.arff

    56 | P a g e

  • 7/31/2019 DM record1

    57/60

    Roll No : 04 09 8106

    We start off by opening the weka explorer window.

    The above screen would appear, click on the Explorer button to begin.

    Supply the car.arff data set as the input.

    57 | P a g e

  • 7/31/2019 DM record1

    58/60

    Roll No : 04 09 8106

    Click on the classify tab, and choose the J48 algorithm. Then click on start.

    The following J48 pruned tree information is created with the confusion matrix.

    === Run information ===

    Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2Relation: carInstances: 10Attributes: 5

    colortyresnumberplateACquality

    Test mode: 10-fold cross-validation

    === Classifier model (full training set) ===

    J48 pruned tree------------------

    color = red: average (4.0/2.0)color = black: average (3.0/1.0)color = white: poor (3.0/1.0)

    Number of Leaves : 3

    58 | P a g e

  • 7/31/2019 DM record1

    59/60

    Roll No : 04 09 8106

    Size of the tree : 4

    Time taken to build model: 0 seconds

    === Stratified cross-validation ====== Summary ===

    Correctly Classified Instances 0 0 %Incorrectly Classified Instances 10 100 %Kappa statistic -0.5385Mean absolute error 0.563Root mean squared error 0.6536Relative absolute error 117.8295 %Root relative squared error 128.5972 %Total Number of Instances 10

    === Detailed Accuracy By Class ===

    TP Rate FP Rate Precision Recall F-MeasureROC Area Class

    0 0.571 0 0 00 good

    0 0.833 0 0 00.229 average

    0 0.143 0 0 00.286 poorWeighted Avg. 0 0.548 0 0 0

    0.177

    === Confusion Matrix ===

    a b c

  • 7/31/2019 DM record1

    60/60

    Roll No : 04 09 8106