16
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM Foundations and New Directions of Data Mining Workshop 19 November 2003

Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Embed Size (px)

Citation preview

Page 1: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores

Frank DiMaio and Jude ShavlikUW-Madison Computer Sciences

ICDM Foundations and New Directions of Data Mining Workshop19 November 2003

Page 2: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Rule-Based Learning

• Goal: Induce a rule (or rules) that explains ALL positive examples and NO negative examples

positive examples negative examples

Page 3: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Inductive Logic Programming (ILP)• Encode background knowledge in first-order logic

as facts…

containsBlock(ex1,block1A).containsBlock(ex1,block1B).is_red(block1A). is_square(block1A).is_blue(block1B). is_round(block1B).on_top_of(block1B,block1A).

above(A,B) :- onTopOf(A,B)above(A,B) :- onTopOf(A,Z), above(Z,B).

and logical relations …

Page 4: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Inductive Logic Programming (ILP)• Covering algorithm applied to explain all data

+

+++

++ + +

+++- -

-

- --

--

-

Choose some positive exampleGenerate best rule that covers this exampleRemove all examples covered by this ruleRepeat until every positive example is covered

Page 5: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Inductive Logic Programming (ILP)• Saturate an example by writing everything true about it• The saturation of an example is the bottom clause ()

A

C

B

positive(ex2) :- contains_block(ex2,block2A), contains_block(ex2,block2B), contains_block(ex2,block2C),

isRed(block2A), isRound(block2A), isBlue(block2B), isRound(block2B), isBlue(block2C), isSquare(block2C),

onTopOf(block2B,block2A), onTopOf(block2C,block2B),

above(block2B,block2A), above(block2C,block2B), above(block2C,block2A).

ex2

Page 6: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Inductive Logic Programming (ILP)

• Candidate clauses are generated by choosing literals from

converting ground terms to variables

• Search through the space of candidate clauses using standard AI search algo

• Bottom clause ensures search finite

Selected literals from

containsBlock(ex2,block2B)

isRed(block2A)

onTopOf(block2B,block2A)

Candidate Clause

positive(A) :- containsBlock(A,B), onTopOf(B,C), isRed(C).

Page 7: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

ILP Time Complexity

• Time complexity of ILP systems depends on Size of bottom clause || Maximum clause length c Number of examples |E| Search algorithm Π

• O(||c|E|) for exhaustive search

• O(|||E|) for greedy search

• Assumes constant-time clause evaluation!

Page 8: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Ideas in Speeding Up ILP

• Search algorithm improvements Better heuristic functions, search strategy Srinivasan’s (2000) random uniform sampling

(consider O(1) candidate clauses)

• Faster clause evaluations Evaluation time of a clause (on 1 example)

exponential in number of variables Clause reordering & optimizing

(Blockeel et al 2002, Santos Costa et al 2003)

• Evaluation of a candidate still O(|E|)

Page 9: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

A Faster Clause Evaluation

• Our idea: predict clause’s evaluation in O(1) time (i.e., independent of number of examples)

• Use multilayer feed-forward neural network to approximately score candidate clauses

• NN inputs specify bottom clause literals selected

• There is a unique input for every candidate clause in the search space

Page 10: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Neural Network TopologySelected literals from

containsBlock(ex2,block2B)

isRed(block2A)

onTopOf(block2B,block2A)

1containsBlock(ex2,block2B)

1onTopOf(block2B,block2A)

1isRed(block2A)

0isRound(block2A)

predicted output

Σ

Candidate Clause

positive(A) :- containsBlock(A,B),

onTopOf(B,C), isRed(C).

Page 11: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Speeding Up ILP

• Trained neural network provides a tool for approximate evaluation in O(1) time

• Given enough examples (large |E|), approximate evaluation is free versus evaluation on data

• During ILP’s search over hypothesis space … Approximately evaluate every candidate explored Only evaluate a clause on data if it is “promising” Adaptive Sampling – use real evaluations to

improve approximation during search

Page 12: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

When to Evaluate Approximated Clauses?

• Treat neural network-predicted score as a Gaussian distribution of true score

• Only evaluate clauses when there is sufficient likelihood it is the best seen so far, e.g.

Best = 22

Pred = 18.9

Pred = 11.1

current hypothes

is potential moves

P(Best) = 0.03don’t evaluate

P(Best) = 0.24evaluate

← clause scores →

currentbest

Page 13: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Results

• Trained learning only on benchmark datasets Carcinogenesis Mutagenesis Protein Metabolism Nuclear Smuggling

• Clauses generated by random sampling• Clause evaluation metriccompression =

posCovered – negCovered – length + 1

totalPositives

• 10-fold c.v. learning curves

Page 14: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Results

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0 200 400 600 800 1000Training Set Size

10-f

old

c.v.

RM

S E

rror

Protein Metabolism

Nuclear Smuggling

Mutagenesis

Carcinogenesis

Page 15: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Future Work

• Test in an ILP system Potential for speedup in

datasets with many examples

Will inaccuracy hurt search?Space of Clauses

Pre

dict

edS

core

• The trained network defines a function over the space of candidate clauses

• We can use this function … Extract concepts

Escape local maxima in heuristic search

Page 16: Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM

Acknowledgements

Funding provided by

NLM grant 1T15 LM007359-01

NLM grant 1R01 LM07050-01

DARPA EELD grant F30602-01-2-0571