14
1 Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology Knowledge discovery with classification rules in a cardiovascular dataset Advisor : Dr. Hsu Presenter : Zih-Hui Lin Author :Viii Podgorelec a,*, Pete r Kokol a, Milojka Molan Sti81ic b, Ma rjan Heri :ko a, Ivan Rozrnan a Computer Methods and Programs in Biomedicine, Volume 80, Supplement 1, December 2005, Pages S39-S49

Knowledge discovery with classification rules in a cardiovascular dataset

  • Upload
    kimball

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Knowledge discovery with classification rules in a cardiovascular dataset. Advisor : Dr. Hsu Presenter : Zih-Hui Lin Author :Viii Podgorelec a,*, Peter Kokol a, Milojka Molan Sti81ic b, Marjan Heri :ko a, Ivan Rozrnan a. - PowerPoint PPT Presentation

Citation preview

Page 1: Knowledge discovery with classification rules in a cardiovascular dataset

1Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Knowledge discovery with classification rules in a cardiovascular dataset

Advisor : Dr. Hsu

Presenter : Zih-Hui Lin

Author :Viii Podgorelec a,*, Peter Kokol a, Milojka Molan Sti81ic b, Marjan Heri :ko a, Ivan Rozrnan a

Computer Methods and Programs in Biomedicine, Volume 80, Supplement 1, December 2005, Pages S39-S49

Page 2: Knowledge discovery with classification rules in a cardiovascular dataset

2

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Motivation Objective The AREX algorithm Experiment Conclusions

Outline

Page 3: Knowledge discovery with classification rules in a cardiovascular dataset

3

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Motivation

Modern medicine generates huge amounts of data and there is an acute and widening gap between data collection and data comprehension.

it is very difficult for a human to make use of such amount of information (i.e. hundreds of attributes, thousand of images, several channels of 24 hours of ECG or EEG signals)

Page 4: Knowledge discovery with classification rules in a cardiovascular dataset

4

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Objective

confirm their existing knowledge about medical problem (ie. hundreds of attributes)

enable searching for new facts, which should reveal some new interesting patterns and possibly improve the existing medical knowledge.

Page 5: Knowledge discovery with classification rules in a cardiovascular dataset

5

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

The AREX algorithm

s

N

1.2Classify object with nt randomly chosen trees

S*

1.4From all N decision create M initial classification rules

1 multi-population self-adapting genetic algorithm for the induction of decision trees.

2 evolution of programs in an arbitrary programming language, which is used to evolve classification

3 an optimal set of classification rules is determined with a simple genetic algorithm

12

3

1.3if frequency of the most frequent decision class classified by nt trees > nt - ct

Oi

(ct=nt/2)

2.1create m/2+1 rules (randomly)

2.3 If s is not empty•Add |s| randomly chosen objects from s* to s•ct=ct+1•repeat 1.1

1.1Build N decision trees upon objects from S

2.2

Page 6: Knowledge discovery with classification rules in a cardiovascular dataset

6

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2. Select an attribute Xi

Genetic algorithm for the construction of decision trees

population

1.Number of attribute nodes M that will be in the tree

M attributes

Xi

null null

root

null nullXi

null null

4. Randomly select an attribute Xi( 還沒被選過的機率較高 )

•For each empty leaf the following algorithmdetermines the appropriate decision class

3. 選一空節點, (tree 深度愈高,選中機率愈低 )

Xi

null null

(1)Continuous attributes→split constant(2)Discrete attributes →randomly defined two disjunctive sets

Page 7: Knowledge discovery with classification rules in a cardiovascular dataset

7

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.proGenesys system & Finding the optimal set of rules

Page 8: Knowledge discovery with classification rules in a cardiovascular dataset

8

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Introduction

Advantage ─ transparency of the classification process that one can

easily interpret, understand and criticize.

Disadvantages─ poor processing of incomplete, noisy data,─ inability to build several trees for the same dataset─ inability to use the preferred attributes, etc.

Page 9: Knowledge discovery with classification rules in a cardiovascular dataset

9

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Dataset contains data of 100 patients from Maribor Hospital. The attributes include

─ general data (age, sex, etc.)─ health status (data from family history and child's previous illnesses), ─ general cardiovascular data (blood pressure, pulse, chest pain, etc.) ─ more specialized cardiovascular data - data from child's cardiac history

and clinical examinations (with findings of ultrasound, ECG, etc.). dataset five different diagnoses are possible:

─ innocent heart murmur良性雜音─ congenital heart disease with left-to-right shunt先天性心臟病 (左向右分流 )

─ aortic valve disease with aorta coarctation,主動脈辨疾病 (主動脈縮窄 )

─ arrhythmias心律不整─ chest pain.心悸

Page 10: Knowledge discovery with classification rules in a cardiovascular dataset

10

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Classification result –training set

Overfitting

Page 11: Knowledge discovery with classification rules in a cardiovascular dataset

11

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Classification result –testing set

Page 12: Knowledge discovery with classification rules in a cardiovascular dataset

12

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Classification result

Page 13: Knowledge discovery with classification rules in a cardiovascular dataset

13

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusions

One of the most evident advantages of AREX is the simultaneous very good ─ Generalization → high and similar overall accuracy on both

training set and test set─ Specialization → high and very similar accuracy of all decision

classes, also the least frequent ones.

equip physicians with a powerful technique to─ (1) confirm their existing knowledge about some medical

problem─ (2) enable searching for new facts, which should reveal some new

interesting patterns and possibly improve the existing medical knowledge.

Page 14: Knowledge discovery with classification rules in a cardiovascular dataset

14

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.My opinion

Advantage: 依屬性給予權重 Disadvantage: Apply: 台中醫院?