Experiments with MRDTL – A Multi-relational Decision Tree Learning Algorithm

Experiments with MRDTL –A Multi-relational Decision Tree Learning

Algorithm

Hector Leiva, Anna Atramentov and Vasant Honavar*

Artificial Intelligence Laboratory

Department of Computer Science and

Graduate Program in Bioinformatics and Computational Biology

Iowa State University

Ames, IA 50011, USA

www.cs.iastate.edu/~honavar/aigroup/html

* Support provided in part by National Science Foundation, Carver Foundation, and Pioneer Hi-Bred, Inc.

Experiments with MRDTL –A Multi-relational Decision Tree Learning

Algorithm

Motivation

Importance of multi-relational learning:

Growth of data stored in MRDB Techniques for learning unstructured data often extract the data into MRDB

Expanding of the techniques for multi-relational learning:

Blockeel’s framework (ILP)(1998) Getoor’s framework (first order extensions of PM)(2001) Knobbe’s framework (MRDM)(1999)

Problem: no experimental results available

Goals Perform experiments and evaluate performance of the Knobbe’s framework Understand strengths and limits of the approach

Multi-Relational Learning Literature

Inductive Logic ProgrammingInductive Logic Programming

First order extensions of probabilistic models First order extensions of probabilistic models

Multi-Relational Data Mining Multi-Relational Data Mining

Propositionalization methodsPropositionalization methods

PRMs extension for cumulative learning for learning and PRMs extension for cumulative learning for learning and reasoning as agents interact with the world reasoning as agents interact with the world

Approaches for mining data in form of graphApproaches for mining data in form of graph

Blockeel, 1998; De Raedt, 1998; Knobbe et al., 1999; Friedman et al., Blockeel, 1998; De Raedt, 1998; Knobbe et al., 1999; Friedman et al., 1999; Koller, 1999; Krogel and Wrobel, 2001; Getoor, 2001; Kersting et 1999; Koller, 1999; Krogel and Wrobel, 2001; Getoor, 2001; Kersting et al., 2000; Pfeffer, 2000; Dzeroski and Lavrac, 2001; Dehaspe and De al., 2000; Pfeffer, 2000; Dzeroski and Lavrac, 2001; Dehaspe and De Raedt, 1997; Dzeroski et al., 2001; Jaeger, 1997; Karalic and Bratko, Raedt, 1997; Dzeroski et al., 2001; Jaeger, 1997; Karalic and Bratko, 1997;1997; Holder and Cook, 2000; Gonzalez et al., 2000Holder and Cook, 2000; Gonzalez et al., 2000

Problem Formulation

Example of multi-relational database

Given: Data stored in relational data baseGoal: Build decision tree for predicting target attribute in the target table

schemainstances

Department

d1 Math 1000

d2 Physics 300

d3 Computer Science

400

Staff

p1 Dale d1 Professor

70 - 80k

p2 Martin d3 Postdoc 30-40k

p3 Victor d2 VisitorScientist

40-50k

p4 David d3 Professor

80-100k

Graduate Student

s1 John 2.0 4 p1 d3

s2 Lisa 3.5 10 p4 d3

s3 Michel 3.9 3 p4 d4

Department

ID

Specialization

#Students

Staff

ID

Name

Department

Position

Salary

Grad.Student

ID

Name

GPA

#Publications

Advisor

Department

No

{d3, d4}{d1, d2}

{d1, d2, d3, d4}Tree_induction(D: data) A = optimal_attribute(D) if stopping_criterion (D) return leaf(D) else Dleft := split(D, A) Dright := splitcomplement(D, A) childleft := Tree_induction(Dleft) childright := Tree_induction(Dright) return node(A, childleft, childright)

Propositional decision tree algorithm. Construction phase

Day Outlook Temp-re Humidity Wind PlayTennis

d1 Sunny Hot High Weak No

d2 Sunny Hot High Strong No

d3 Overcast

Hot High Weak Yes

d4 Overcast

Cold Normal Weak No

Outlook not sunny

…

…

…

…

Temperature

hot not hot

No Yes

{d3} {d4}

Day Outlook Temp Hum-ty Wind PlayT

Overcast Hot High Weak Yes

d4 Overcast Cold Normal Weak No

Day Outlook Temp Hum-ty Wind PlayT

d1 Sunny Hot High Weak No

d2 Sunny Hot High Strong No

sunny

MR setting. Splitting data with Selection Graphs

ID Specialization #Students

d1 Math 1000

d2 Physics 300

d3 Computer Science

400

Department Graduate Student

ID Name Department

Position Salary

p1 Dale d1 Professor 70 - 80k



40-50k

p4 David d3 Professor 80-100k

Staff

ID Name GPA #Public. Advisor Department

s1 John 2.0 4 p1 d3

s2 Lisa 3.5 10 p4 d3

s3 Michel 3.9 3 p4 d4

Staff

Grad. Student

Grad. Student

GPA >2.0

Department

Staff

Grad.Student

complement selection graphs

Staff Grad. Student

GPA >2.0

Staff Grad. Student

ID Name Department

Position Salary



40-50k

ID Name Department

Position Salary

p4 David d3 Professor

80-100k

ID Name Department

Position Salary

Dale d1 Professor 70-80k

What is selection graph?

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

It corresponds to the subset of the instances from target table

Nodes correspond to the tables from the database

Edges correspond to the associations between tables

Open edge = “have at least one”

Closed edge = “have non of ”

Department

Staff

Grad.Student

Specialization=math

Automatic transforming selection graphs into SQL queryStaff

Staff

Staff

Staff Grad. Student

Grad. Student

Grad. Student

Grad. Student

SelectSelect distinct T0.idT0.id

FromFrom Staff

WhereWhere T0.position=Professor T0.position=ProfessorPosition = Professor

Select Select distinct T0.idT0.id

FromFrom Staff T0, Graduate_Student T1T0, Graduate_Student T1

Where Where T0.id=T1.AdvisorT0.id=T1.Advisor

SelectSelect distinct T0.idT0.id

FromFrom Staff T0T0

WhereWhere T0.id not in T0.id not in

( ( SelectSelect T1. id T1. id

FromFrom Graduate_Student T1) Graduate_Student T1)

GPA >3.9

Select distinct T0. idFrom Staff T0, Graduate_Student T1Graduate_Student T1Where T0.id=T1.AdvisorT0.id=T1.Advisor

T0. id not in ( ( SelectSelect T1. id T1. id

FromFrom Graduate_Student T1 Graduate_Student T1

WhereWhere T1.GPA > 3.9) T1.GPA > 3.9)

Generic query:

select distinct T0.primary_key from table_list where join_list and condition_list

MR decision tree

Staff

… …

… …

… …

Staff Staff

StaffStaff Grad. StudentGrad.Student

Grad.Student Grad.Student

GPA >3.9

GPA >3.9

Grad.Student

Each node contains selection graph Each children selection graph is a supergraph

of the parent selection graph

How to choose selection graphs in nodes?

Problem: There are too many supergraph selection graphs to choose from in each node

Solution: start with initial selection graph find greedy heuristic to choose supergraph

selection graphs: refinements use binary splits for simplicity for each refinement

get complement refinement choose the best refinement based

on information gain criterion

Problem: Somepotentiallygood refinementsmay give noimmediate benefit

Solution: look ahead capability

Staff

… …

… …

… …

Staff Staff

StaffStaff Grad. StudentGrad.Student


GPA >3.9

GPA >3.9

Grad.Student

Refinements of selection graph

add condition to the node - explore attribute information in the tables

add present edge and open node –explore relational properties between the tables

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

Staff

Grad.Student

Specialization=math


Position = Professor

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Position != Professor

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

refinement

complement refinement

Department

Staff

Grad.Student

add condition to the nodeadd condition to the node add present edge and open node

Specialization=math

Specialization=math

Specialization=math


Staff

Grad.Student

GPA >3.9

Grad.Student

Department

GPA >2.0

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Grad.StudentGPA >2.0

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

Staff

Grad.Student


refinement

complement refinementSpecialization=math

Specialization=math

Specialization=math


Staff

Grad.Student

GPA >3.9

Grad.Student

Department

#Students >200

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

#Students >200

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

Staff

Grad.Student


refinement

complement refinementSpecialization=math

Specialization=math

Specialization=math


Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

Staff

Grad.Student

add condition to the node add present edge and open nodeadd present edge and open node

refinement


Note: information gain = 0

Specialization=math

Specialization=math

Specialization=math


Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Staff

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Staff

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

Staff

Grad.Student

refinement



Specialization=math

Specialization=math

Specialization=math


Staff

Grad.Student

GPA >3.9

Grad.Student

Department Staff

Staff

Grad.Student

GPA >3.9

Grad.Student

Department Staff

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

Staff

Grad.Student

refinement



Specialization=math

Specialization=math

Specialization=math


Staff

Grad.Student

GPA >3.9

Grad.Student

Department Grad.S

Staff

Grad.Student

GPA >3.9

Grad.Student

Department Grad.S

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

Staff

Grad.Student

refinement



Specialization=math

Specialization=math

Specialization=math

Look ahead capability

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

Staff

Grad.Student

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

refinement


Specialization=math

Specialization=math

Specialization=math

Look ahead capability

Department

Staff

Grad.Student

#Students > 200

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

refinement


#Students > 200

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Department

Department

Staff

Grad.Student

GPA >3.9

Grad.Student

Department

Specialization=math

Specialization=math

Specialization=math

MR decision tree algorithm. Construction phase

Staff

… …

… …

… …

Staff StaffGrad.Student Grad.Student

Staff Grad. Student

GPA >3.9

StaffGrad.Student

GPA >3.9

Grad.Student

for each non-leaf node: consider all possible refinements and

their complements of the node’s selection graph

choose the best ones based on information gain criterion

create children nodes

MR decision tree algorithm. Classification phaseStaff

… …

… …… …

Staff Staff

StaffStaff Grad. Student

Grad.Student


GPA >3.9

GPA >3.9

Grad.Student

Staff Grad. Student

GPA >3.9

Department

Spec=math

Staff Grad. Student

GPA >3.9

Department

Spec=physics

Position =Professor ……………..

70-80k 80-100k

for each leaf: apply selection graph of the leaf to the

test data classify resulting instances with

classification of the leaf

Experimental results. Mutagenesis Most widely DB used in ILP. Describes molecules of certain nitro aromatic compounds. Goal: predict their mutagenic activity (label attribute) – ability to

cause DNA to mutate. High mutagenic activity can cause cancer. Class distribution.

Compounds Active Inactive Total

Regression friendly 125 63 188

Regression unfriendly 13 29 42

Total 138 92 230

5 levels of background knowledge: B0, B1, B2, B3, B4. They provide richer descriptions of the examples. The first three levels (B0, B1, B2) are used only.

Experimental results. Mutagenesis

Results of 10-fold cross-validation for regression friendly set.

Systems Accuracy (%) Time (secs.)

B0 B1 B2 B0 B1 B2

Progol 79 86 86 8595 4627 6530

Progol 76 81 83 117k 64k 42k

FOIL 61 61 83 4950 9138 0.5

TILDE 75 79 85 41 170 142

MRDTL 67 87 88 0.85 332 221

Size of decision trees.

Systems Number of nodes

B0 B1 B2

MRDTL 1 53 51

Experimental results. Mutagenesis Results of leave-one-out cross-validation for regression unfriendly set.

Background Accuracy Time #Nodes

B0 70% 0.6 secs. 1

B1 81% 86 secs. 24

B2 81% 60 secs. 22

Two recent approaches (Sebag and Rauveirol, 1997) and (Kramer and De Raedt, 2001) using B3 have achieved 93.6% and 94.7%, respectively for mutagenesis database.

Experimental results. KDD Cup 2001

Consists of a variety of details about the various genes of one particular type of organism.

Genes code for proteins, and these proteins tend to localize in various parts of cells and interact with one another in order to perform crucial functions.

Task: Prediction of gene/protein localization (15 possible values)

Target table: Gene Target attribute: Localization 862 training genes, 381 test genes.

Challenge: many attribute values are missing.

Approach: using a special value to encode a missing value.Result: accuracy of 50%

Have to find good techniques for filling in missing values.Have to find good techniques for filling in missing values.

Experimental results. KDD Cup 2001

Approach: Replacing missing values by the most common value of the attribute for the class.Results:- accuracy of around 85% with a decision tree of 367 nodes, with no limit in the number of times an association can be instantiated.- accuracy of 80%, when limiting the number of times an association can be instantiated.- accuracy of around 75% is obtained when following associations only in the forward direction.

This shows that providing reasonable guesses for missing values can significantly enhance the performance of MRDTL on real world data sets.

In practice, since the class labels for test data are unknown, it is not possible to apply this method.

Approach: Extension of the Naïve Bayes algorithm for relational dataResult:-no improvement comparing to the first approach

Have to incorporate handling missing values into decision tree algorithmHave to incorporate handling missing values into decision tree algorithm

Experimental results. Adult database

Result after removal of missing values and using original train/test split: 82.2%. Filling missing values with Naïve Bayes approach yields 83% C4.5 result: 84.46%

Training Test Total

>50k <=50k >50k <=50k

With missing values 7841 24720 3846 12435 48842

W/o missing values 7508 22654 3700 11360 45222

Suitable for propositional learning. One table, 6 numerical attributes, 8 nominal attributes.

Information from 1994 census. Task: determine whether a person makes over 50k a year. Class distribution for adult database:

Summary the algorithm is a promising alternative to existing algorithms, such as

Progol, Foil, and Tilde

the running time is comparable with the best existing approaches

if equipped with principled approaches to handle missing values it is an effective algorithm for learning real-world relational data

the approach is an extension of propositional learning, and can be successfully applied for propositional learning

Questions:- why can’t we split the data based on the value of the attribute in arbitrary table right away?

- is there less restrictive and more simple way of representing the splits of data than selection graphs?

- the running time for computing the first nodes in decision tree is much less then for the rest of the nodes. Is it unavoidable? Can we implement the same idea more efficiently?

Future work

Incorporation of the more sophisticated techniques for handling missing values

Incorporating of more sophisticated pruning techniques or complexity regularizations

More extensive evaluation of MRDTL on real-world data sets Development of ontology-guided multi-relational decision tree learning

algotihms to generate classifiers at multiple levels of abstraction [Zhang et al., 2002]

Development of variants of MRDTL for classification tasks where the classes are not disjoint, based on the recently developed propositional decision tree counterparts of such algorithms [Caragea et al., 2002]

Development of variants of MRDTL that can learn from heterogeneous, distributed, autonomous data sources, based on recently developed techniques for distributed learning and ontology based data integration

Documents

Experiments with MRDTL – A Multi-relational Decision Tree Learning Algorithm