Fuzzy-Rough Instance Selection

Preview:

DESCRIPTION

Fuzzy-Rough Instance Selection. Outline. The importance of instance selection Rough set theory Fuzzy-rough sets Fuzzy-rough instance selection Experimentation Conclusion. Instance selection. Knowledge discovery The problem of too much data Requires storage - PowerPoint PPT Presentation

Citation preview

Richard Jensen and Chris Cornelis

Chris CornelisGhent University, Belgium

Richard JensenAberystwyth University, UK

Fuzzy-Rough Instance Selection

Richard Jensen and Chris Cornelis

Outline• The importance of instance selection

• Rough set theory

• Fuzzy-rough sets

• Fuzzy-rough instance selection

• Experimentation

• Conclusion

Richard Jensen and Chris Cornelis

• Knowledge discovery

• The problem of too much data• Requires storage• Intractable for data mining algorithms

• Removing data that is noisy or irrelevant

Instance selection

Richard Jensen and Chris Cornelis

Rough set theory

Rx is the set of all points that are indiscernible with point x

UpperApproximation

Set A

LowerApproximation

Equivalence class Rx

Richard Jensen and Chris Cornelis

Fuzzy-rough sets• Approximate equality

• Handle real-valued features via fuzzy tolerance relations instead of crisp equivalence

• Better noise and uncertainty handling

• Focus has been on feature selection, not instance selection

Richard Jensen and Chris Cornelis

Fuzzy-rough sets• Parameterized relation

• Fuzzy-rough definitions:

Richard Jensen and Chris Cornelis

Instance selection: basic idea

Not needed

Remove objects to keep the underlying approximations unchanged

Richard Jensen and Chris Cornelis

Instance selection: basic idea

Remove objects to keep the underlying approximations unchanged

Richard Jensen and Chris Cornelis

FRIS-I

Richard Jensen and Chris Cornelis

FRIS-II

Richard Jensen and Chris Cornelis

FRIS-III

Richard Jensen and Chris Cornelis

Experimentation: setup

Richard Jensen and Chris Cornelis

Results: FRIS-I (heart)

• (214 objects, 9 features)

Richard Jensen and Chris Cornelis

Results: FRIS-II (heart)

Richard Jensen and Chris Cornelis

Results: FRIS-III (heart)

Richard Jensen and Chris Cornelis

Conclusion• Proposed new techniques for instance selection

based on fuzzy-rough sets• Managed to reduce the number of instances significantly,

retaining classification accuracy

• Future work• Many possibilities for novel fuzzy-rough instance

selection methods• Comparisons with non-rough techniques• Improving the complexity of FRIS-III• Combined instance/feature selection

Richard Jensen and Chris Cornelis

• WEKA implementations of all fuzzy-rough methods can be downloaded from:

http://users.aber.ac.uk/rkj/book/weka.zip

Recommended