Multiple Classifier Systems

Multiple Classifier SystemFarzad Vasheghani Farahani – Machine Learning

Outline

Introduction

Decision Making

General Idea

Brief History

Reasons & Rationale

Statistical

Large volumes of data

Too little data

Divide and Conquer

Data Fusion

Multiple Classifier system

Designing

Diversity

Create an Ensemble

Combining Classifiers

Example

Conclusions

References

Ensemble-based Systems in Decision

Making

For many tasks, we often seek second opinion before making a decision,

sometimes many more

Consulting different doctors before a major surgery

Reading reviews before buying a product

Requesting references before hiring someone

We consider decisions of multiple experts in our daily lives

Why not follow the same strategy in automated decision making?

Multiple classifier systems, committee of classifiers, mixture of experts,

ensemble based systems

Ensemble-based Classifiers

How to (i) generate individual components of the ensemble systems

(base classifiers), and (ii) how to combine the outputs of individual

classifiers?

Brief History of Ensemble Systems

Dasarathy and Sheela (1979) partitioned the feature space using two

or more classifiers

Schapire (1990) proved that a strong classifier can be generated by

combining weak classifiers through boosting; predecessor of AdaBoost

algorithm

Two types of combination:

classifier selection

classifier fusion

Why Ensemble Based Systems?


1. Statistical reasons

A set of classifiers with similar training performances may have different generalization performances

Combining outputs of several classifiers reduces the risk of selecting a poorly performing classifier

Example:

Suppose there are 25 base classifiers

Each classifier has error rate, = 0.35

Probability that the ensemble classifier makes a wrong prediction:

2525

1

25(1 ) 0.06i i

i i


2. Large volumes of data

If the amount of data to be analyzed is too large, a single classifier

may not be able to handle it; train different classifiers on

different partitions of data


3. Too little data

Ensemble systems can also be used when there is too little data;

resampling techniques


4. Divide and Conquer

Divide data space into smaller & easier-to-learn partitions; each classifier learns only one of the simpler partitions


5. Data Fusion

Given several sets of data from various sources, where the nature

of features is different (heterogeneous features), training a single

classifier may not be appropriate (e.g., MRI data, EEG recording,

blood test,..)

Multiple Classifier system Designing

Major Steps

All ensemble systems must have two key components:

Generate component classifiers of the ensemble

Method for combining the classifier outputs

“Diversity” of Ensemble

Objective: create many classifiers, and combine their outputs

to improve the performance of a single classifier

Intuition: if each classifier makes different errors, then their

strategic combination can reduce the total error!

Need base classifiers whose decision boundaries are adequately

different from those of others

Such a set of classifiers is said to be “diverse”

How to achieve classifier diversity?

A. Use different training sets to train individual classifiers

B. Use different training parameters for a classifier

C. Different types of classifiers (MLPs, decision trees, NN

classifiers, SVM) can be combined for added diversity

D. Using random feature subsets, called random subspace

method

Create an Ensemble(Coverage Optimization)

Creating An Ensemble

Two questions:

1. How will the individual classifiers be generated?

2. How will they differ from each other?

Create Ensembles Methods

1. Subsample Approach (Data sample)

Bagging

Random forest

Boosting

Adaboost

Wagging

Rotation forest

RotBoost

Mixture of Expert

2. Subspace Approach (Feature Level)

Random based

Feature reduction

Performance based

3. Classifier Level Approach

Bagging

Boosting

Combining Classifiers(Decision Optimization)

Two Important Concept (i)

(i) trainable vs. non-trainable

Trainable rules: parameters of the combiner, called “weights” determined through a separate training algorithm

Non-trainable rules: combination parameters are available as classifiers are generated; Weighted majority voting is an example

Two Important Concept (ii)

(ii) Type of the output of classifiers

Combine Classifier

Absolute output

Majority Voting

Naïve Bayes

Behavior Knowledge Space

Ranked output

Borda Counting

Maximum Ranking

Continuous output

Algebraic Metohd

Fuzzy Integral

Decesion Template

Example (“Zoo” UCI Data Set)

1. animal name: Unique for each instance2. hair: Boolean3. feathers: Boolean4. eggs: Boolean5. milk: Boolean6. airborne: Boolean7. aquatic: Boolean8. predator: Boolean9. toothed: Boolean10. backbone: Boolean11. breathes: Boolean12. venomous: Boolean13. fins: Boolean14. legs: Numeric (set of values: {0,2,4,5,6,8})15. tail: Boolean16. domestic: Boolean17. catsize: Boolean18. type: Numeric (integer values in range [1,7])

Conclusions

Ensemble systems are useful in practice

Diversity of the base classifiers is important

Ensemble generation techniques: bagging, AdaBoost, mixture of

experts

Classifier combination strategies: algebraic combiners, voting

methods, and decision templates.

No single ensemble generation algorithm or combination rule is

universally better than others

Effectiveness on real world data depends on the classifier diversity

and characteristics of the data

References

[1] Polikar R., “Ensemble Based Systems in Decision Making,” IEEE

Circuits and Systems Magazine, vol.6, no. 3, pp. 21-45, 2006

[2] Polikar R., “Bootstrap Inspired Techniques in Computational Intelligence,” IEEE Signal Processing Magazine, vol.24, no. 4, pp. 56-72, 2007

[3] Polikar R., “Ensemble Learning,” Scholarpedia, 2008.

[4] Kuncheva, L. I. , Combining Pattern Classifiers: Methods and

Algorithms. New York, NY: Wiley, 2004.

Engineering

Multiple Classifier Systems