26
September 13, 2011 Experiments with Randomisation and Boosting for Multi-instance Classification Luke Bjerring, James Foulds, Eibe Frank University of Waikato

Experiments with Randomisation and Boosting for Multi-instance Classification

Embed Size (px)

DESCRIPTION

A fairly recent development in the WEKA software has been the addition of algorithms for multi-instance classification, in particular, methods for ensemble learning. Ensemble classification is a well-known approach for obtaining highly accurate classifiers for single-instance data. This talk will first discuss how randomisation can be applied to multi-instance data by adapting Blockeel et al.'s multi-instance tree inducer to form an ensemble classifier, and then investigate how Maron's diverse density learning method can be used as a weak classifier to form an ensemble using boosting. Experimental results show the benefit of ensemble learning in both cases.

Citation preview

Page 1: Experiments with Randomisation and Boosting for Multi-instance Classification

September 13, 2011

Experiments with Randomisation and Boosting for Multi-instance

ClassificationLuke Bjerring, James Foulds, Eibe Frank

University of Waikato

Page 2: Experiments with Randomisation and Boosting for Multi-instance Classification

What's in this talk?

• What is multi-instance learning?

• Basic multi-instance data format in WEKA

• The standard assumption in multi-instance learning

• Learning decision tree and rules

• Ensembles using randomisation

• Diverse density learning

• Boosting diverse density learning

• Experimental comparison

• Conclusions

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 2

Page 3: Experiments with Randomisation and Boosting for Multi-instance Classification

Multi-instance learning

• Generalized (supervised) learning scenario where each example for learning is a bag of instances

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 3

Multi-instancemodel

Single-instancemodel

Figure based on diagram in Dietterich et al (1997)

one feature vector classification

multiple feature vectors

classification

Page 4: Experiments with Randomisation and Boosting for Multi-instance Classification

Example applications

• Applicable whenever an object can best be represented as an unordered collection of instances

• Two popular application areas in the literature:−Image classification (e.g. does an image contain a tiger?)

• Approach: image is split into regions, each region becomes an instance described by a fixed-length feature vector

• Motivation for MI learning: location of object not important for classification, some “key” regions determine outcome

−Activity of molecules (e.g. does molecule smell musky?)

• Approach: instances describe possible conformations in 3D space, based on fixed-length feature vector

• Motivation for MI learning: conformations cannot easily be ordered, only some responsible for activity

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 4

Page 5: Experiments with Randomisation and Boosting for Multi-instance Classification

Multi-instance data in WEKA

• Bag of data given as value of relation-valued attribute

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 5

bag identifier instances inbag

class label

Page 6: Experiments with Randomisation and Boosting for Multi-instance Classification

What's the big deal?

• Multi-instance learning is challenging because instance-level classifications are assumed to be unknown

−Algorithm is told that an image contains a tiger, but not which regions are “tiger-like”

−Similarly, a molecule is known to be active (or inactive), but algorithm is not told which conformation is responsible for this

• Basic (standard) assumption in MI learning: bag is positive iff it contains at least one positive instance

−Example: molecule is active if at least one conformation is active, and inactive otherwise

• Generalizations of this are possible that assume interactions between instances in a bag

• Alternative: instances contribute collectively to bag label

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 6

Page 7: Experiments with Randomisation and Boosting for Multi-instance Classification

A synthetic example

• 10 positive/negative bags, 10 instances per bag

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 7

Page 8: Experiments with Randomisation and Boosting for Multi-instance Classification

A synthetic example

• Bag positive iff at least one instance in (0.4,0.6)x(0.4,0.6)

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 8

Page 9: Experiments with Randomisation and Boosting for Multi-instance Classification

Assigning bag labels to instances...

• 100 positive/negative bags, 10 instances per bag

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 9

Page 10: Experiments with Randomisation and Boosting for Multi-instance Classification

Partitioning generated by C4.5

• Many leaf nodes, only one of them matters...

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 10

Page 11: Experiments with Randomisation and Boosting for Multi-instance Classification

Partitioning generated by C4.5

• Many leaf nodes, only one of them matters...

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 11

Page 12: Experiments with Randomisation and Boosting for Multi-instance Classification

Blockeel et al.'s MITI tree learner

• Idea: home in on big positive leaf node, remove instances associated with that leaf node

y <= 0.3942 : 443 [0 / 443] (-)

y > 0.3942 : 1189

| y <= 0.6004 : 418

| | x <= 0.6000 : 262

| | | x <= 0.3676 : 59 [0 / 59] (-)

| | | x > 0.3676 : 128

| | | | x <= 0.3975 : 2 [0 / 2] (-)

| | | | x > 0.3975 : 118

| | | | | y <= 0.3989 : 1 [0 / 1] (-)

| | | | | y > 0.3989 : 116 [116 / 0] (+)

| | x > 0.6000 : 88 [0 / 88] (-)

| y > 0.6004 : 407 [0 / 407] (-)

• Blockeel et al. proposed tree learner MITI for multi-instance data, with two key modifications:

• Nodes are expanded in best-first manner, based on proportion of positive instances (→ identify positive leaf nodes early)

• Once a positive leaf node has been found, all bags associated with this leaf node are removed from the training data(→ all other instances in these bags are possible false positives)

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 12

Page 13: Experiments with Randomisation and Boosting for Multi-instance Classification

How MITI works

• Two key modifications compared to standard top-down decision tree inducers:­ Nodes are expanded in best-first manner, based on proportion of

positive instances (→ identify positive leaf nodes early)

­ Once a positive leaf node has been found, all bags associated with this leaf node are removed from the training data(→ all other instances in these bags are irrelevant)

• Blockeel et al. also use special purpose splitting criterion and biased estimate of proportion of positives

• Our experiments indicate that it is better to use Gini index and unbiased estimate of proportion

→Trees are generally slight more accurate and substantially smaller (also affects runtime)

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 13

Page 14: Experiments with Randomisation and Boosting for Multi-instance Classification

Learning rules: MIRI

• Conceptual drawback of MITI tree learner: deactivated data may have already been used to grow other branches

• Simple fix based on separate-and-conquer rule learning using partial trees:‒When positive leaf is found, make the path to this leaf into an if-then

rule, discard the rest of the tree

‒Start (partial) tree generation from scratch on the remaining data to generate the next rule

‒Stop when no positive leaf can be made; add default rule

• Experiments show: resulting rule learner (MIRI) has similar classification accuracy to MITI

• However: rule sets are much more compact than corresponding decision trees

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 14

Page 15: Experiments with Randomisation and Boosting for Multi-instance Classification

Random forests for MI learning

• Random forests are well-known to be high-performance ensemble classifiers in single-instance learning

• Straightforward to adapt MITI to learn semi-random decision trees from multi-instance data

– At each node, choose random fixed-size subset of attributes, then choose best split amongst those

– Also possible to apply semi-random node expansion (not best-first), but this yields little benefit

• Can trivially apply this to MIRI rule learning as well: it's based on partially grown MITI trees

• Ensemble can be generated in WEKA using RandomCommittee meta classifier

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 15

Page 16: Experiments with Randomisation and Boosting for Multi-instance Classification

Some experimental results: MITI

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 16

Page 17: Experiments with Randomisation and Boosting for Multi-instance Classification

Some experimental results: MIRI

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 17

Page 18: Experiments with Randomisation and Boosting for Multi-instance Classification

Maron's diverse density learning

• Idea: identify point x in instance space where positive bags overlap, centre bell-shaped function at this point

• Using this function, probability that instance Bij is positive,

based on current hypothesis h, is assumed to be:

where hypothesis h includes location x, but also a feature scaling vector s:

• Instance-level probabilities are turned into bag-level probabilities using noisy-or function:

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 18

Page 19: Experiments with Randomisation and Boosting for Multi-instance Classification

Boosting diverse density learning

• Point x and scaling vector s are found using gradient descent by maximising bag-level likelihood

• Problem: very slow; takes very long to converge

• QuickDD heuristic: find best point x first, using fixed scaling vector s, then optimise s; if necessary, iterate

• Much faster, similar accuracy on benchmark data (also, compares favourably to subsampling-based EMDD)

• Makes it computationally practical to apply boosting (RealAdaboost) to improve accuracy:

– In this case, QuickDD is applied with weighted likelihood, symmetric learning, and localised model

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 19

Page 20: Experiments with Randomisation and Boosting for Multi-instance Classification

Some experimental results: Boosted DD

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 20

Page 21: Experiments with Randomisation and Boosting for Multi-instance Classification

So how do the ensembles compare?

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 21

Page 22: Experiments with Randomisation and Boosting for Multi-instance Classification

But: improvement on “naive” methods?

• Can apply standard single-instance random forests to multi-instance data using data transformations...

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 22

Page 23: Experiments with Randomisation and Boosting for Multi-instance Classification

Summary

• MITI and MIRI are fast methods for learning compact decision trees and rule sets for MI data

• Randomisation for ensemble learning yields significantly improved accuracy in both cases

• Heuristic QuickDD variant of diverse density learning makes it computationally practical to boost DD learning

• Boosting yields substantially improved accuracy

• Neither boosting nor randomisation has clear advantage in accuracy, but randomisation is much faster

• However: marginal improvement in accuracy compared to “naive” methods

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 23

Page 24: Experiments with Randomisation and Boosting for Multi-instance Classification

Where in WEKA?

• Location of multi-instance learners in Explorer GUI:

• Available via package manager in WEKA 3.7, which also provides MITI, MIRI, and QuickDD

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 24

Page 25: Experiments with Randomisation and Boosting for Multi-instance Classification

Details on QuickDD for RealAdaboost

• Weights in RealAdaboost are updated using odds ratio:

• Weighted conditional likelihood is used in QuickDD:

• QuickDD model is thresholded at 0.5 probability to achieve local effect on weight updates:

• Symmetric learning is applied (i.e. both classes are tried as the positive class in turn)

– Of the two models, the one that maximises weighted conditional likelihood is added into the ensemble

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 25

Page 26: Experiments with Randomisation and Boosting for Multi-instance Classification

Random forest vs. bagging and boosting

09/13/11© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 26