32
ZEROTH REVIEW SUBANYA.B 10CSR021 LAVANYA.M 10CSL149 RAJA.R 10CSR025 PROJECT GUIDE : Dr.R.R.RAJALAXMI 1

zeroth review1

Embed Size (px)

DESCRIPTION

dgf

Citation preview

1

ZEROTH REVIEW

SUBANYA.B

10CSR021

LAVANYA.M

10CSL149

RAJA.R

10CSR025

PROJECT GUIDE : Dr.R.R.RAJALAXMI

2

INTRODUCTIONDATA MINING • Data mining is the process of extracting knowledge from large amount of data• Knowledge Discovery in Databases

BASIC DATA MINING TASKS

CLASSIFICATION• predicts categorical class labels• classifies data (constructs a model) based on the training set • An algorithm that implements classification is known as a classifier

Predictive Descriptive

ClassificationRegressionTime Series AnalysisPrediction

ClusteringSummarizationAssociation RulesSequence Discovery

FEATURE EXTRACTION

Linear Non -Linear

FEATURE SELECTION

Feature Ranking

Subset Selection

Filter Approaches

Embedded Approaches

Wrapper Approaches

DIMENSIONALITY REDUCTION

4

LITERATURE SURVEY

• Reducing bioinformatics data dimension with ABC-KNN

• Feature Selection for medical diagnosis: Evaluation for cardiovascular diseases 

• Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients

5

PAPER -I

Reducing bioinformatics data dimension with ABC-KNN

• Authors: Thananan Prasartvit

Anan Banharnsakun

Boonserm Kaewkamnerdpong

Tiranee Achalakul• Year :2013

PAPER 1

6

PROBLEM• Analyzing a large amount of data often consumes extensive

computational resources and execution time

• All data features do not equally contribute to the end results

• Need to identify the major contributing features and other features with low contribution can be eliminated

• The need for dimension reduction arises because biological data can be massive, with tens of thousands of features to be explored

• The objective is to design an effective algorithm that can selectively remove irrelevant dimensions from data while preserving the semantics of the original data.

7

PROPOSED WORK• Proposed the Artificial Bee colony(ABC) as a

method for data dimension reduction in the classification problems

• The K-Nearest Neighbor (KNN) method is then used for fitness evaluation within the ABC framework

• ABC feature selection method wrapped with KNN for classification( ABC-KNN)

Artificial Bee Colony(ABC) Begin:

Initialize SolutionsRepeat // Employed Bees Process Updating_Feasible_Solutions // Onlooker Bees Process Selecting_Feasible_Solutions Updating_Feasible_Solutions // ScoutBeeProcess

Avoiding_Sub-Optimal_Solutions

Until (maximum number of iterations or thestopping criterion is met)End

K-Nearest Neighbor(KNN)

Begin:For i=1 to number of training data items

Store_dataEndFor j=1 to number of testing data items

Measure_distanceSort_by_distanceEvaluate_data_classEnd

End

8

DATASETS DESCRIPTION

9

THE FLOWCHART OF ABC-KNN METHOD

10

RESULTS

Colon cance

r

Acute_leuke

mia

Hepatoce

llular

_Carcinoma

High_gra

de_Glioma

Prosta

te_Caner

0102030405060708090

100

LS-SVMPCA-FDAMSDR-LGCLLDE-KNNABC-KNN

Accu

racy

Data Name

11

RESULTS (cont…)

12

CONCLUSION

• The experimental results of the gene expression analysis show that the proposed method can effectively reduce the data dimension while maintaining the high classification accuracy

• ABC-KNN can thus be employed to exclude the non-essential data as well as identify the vital elements from a vast amount of biological data

13

PAPER-II

Feature Selection for medical diagnosis Evaluation for cardiovascular diseases 

• Author :Swathi Shilaskar

Ashok Ghatol • Year :2013

14

PROBLEM

• To find suitable algorithm that generates smaller feature subset from high dimensional data with improved diagnosis ability for cardio vascular diseases

15

PROPOSED METHODFEATURE SELECTION METHODS Forward Feature Inclusion Back-elimination Feature Selection Forward feature Selection

DATA SETS DESCRIPTION

DATASET NO OF SAMPLES

NO OF FEATURES

CATEGORIES

ARRHYTHMIA 452 279 16

SPECTF CARDIAC

267 44 2

HEART DISEASE

303 14 used 4

16

HYBRID MODEL OF FEATURE SELECTION PROCESS

17

FORWARD FEATURE INCLUSION ALGORITHM

18

Back-elimination Feature Selection

19

FORWARD FEATURE SELECTION

20

RESULTS

DATA SET CLASSIFICATION PERFOMANCE WITH ALL FEATURES

CLASSIFICATION PERFORMANCE WITH PROPOSED FEATURE SELECTION ALGORITHM

No of all features

Accuracy with all features

No of feature in subset

Accuracy with feature subset

Arrhythmia 258 0.79 23 0.88

SPECTF cardiac

44 0.75 19 0.78

Heart Disease

10 0.81 4 0.85

21

CONCLUSION

• It gives proper estimation of classifier performance when dataset is balanced

• If the dataset is unbalanced ,it is found that accuracy is not a correct estimate of classifier performance

• Feature ranking methods investigated in this research work well for arrhythmia and heart disease dataset

• Hybrid forward feature selection algorithm successfully reduces feature dimensions and improves accuracy of classifier

• Highest accuracy is achieved when forward selection algorithm is used

22

PAPER-III

Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients

• Author : Susana M. Vieira

Luis F. Mendonca

Goncalo J. Farinha

Joao M.C. Sousa• Year : 2013

23

PROBLEM

• The medical condition taken is Sepsis, a common clinical condition defined by a whole-body inflammatory state, called systemic inflammatory response syndrome (SIRS)

• This clinical condition has different degrees of severity that can lead to severe sepsis and later to septic shock

24

PROPOSED METHOD• A modified binary particle swarm optimization (MBPSO)

method for feature selection with the simultaneous optimization of SVM kernel

• An enhanced version of BPSO, designed to cope with premature convergence of the BPSO algorithm

• The MBPSO is used as a wrapper method

NUMBER DATABASES SAMPLES FEATURES CLASSES

1 German(credit card)

1000 24 2

2 Sonar 208 60 2

3 WBCO 683(699) 9 2

4 WPBC 198 32 2

5 WDBC 569 30 2

6 Colon Cancer 62 2000 2

DATASET S DESCRIPTION

25

MBPSO

26

RESULTS

RESULTS

27

28

CONCLUSION

German Sonar WBCO WPBC WDBC Colon0

20

40

60

80

100

120

NO-FSBPSOIBPSOGAMBPSOAc

cura

cyMBPSO shows a better performance than the methods for PSO and similar or better results than GA

Data base

29

FUTURE WORK

• Future work considers experimenting the introduced algorithm(MBPSO) with other medical databases in order to more consistently compare its performance with other feature selection techniques

30

FINDINGS FROM THE LITERATURE SURVEY

• MBPSO or ABC-KNN can be applied over Heart disease databases to improve the accuracy in the diagnosis of Heart disease

• Hybrid models like PSO-KNN,GA-KNN or ABC with other classification algorithms can be developed and applied over the databases to improve the efficiency of finding the subsets

31

REFERENCES

[1] Thananan Prasartvit, Anan Banharnsakun, Boonserm Kaewkamnerdpong, Tiranee Achalakul,Reducing bioinformatics data dimension with ABC-kNN Neurocomputing 116(2013), 367-381

[2] Swati Shilaskar, Ashok Ghatol ,Feature selection for medical diagnosis :Evaluation for cardiovascular diseases , Expert Systems with Applications 40 (2013), 4146-4153

[3] Susana M. Vieira, Luís F. Mendonca, Gonçalo J. Farinha, Joao M.C. SousaModified binary PSO for feature selection using SVM applied to mortality prediction of septic patients,Applied Soft Computing 13(2013), 3494-3504

32