25
eNOSE calibration FOR LUNG CANCER DETECTION Andrea Merlina, Valerio Bruschi

eNOSE calibration

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: eNOSE calibration

eNOSE calibrationFOR LUNG CANCER DETECTION

Andrea Merlina, Valerio Bruschi

Page 2: eNOSE calibration

Statistics

In 2008 approximately 12.7 million cancers were diagnosed In 2010 nearly 7.98 million people died

Cancers account for approximately 13% of deaths

Most common type:▪ lung cancer (1,4 million deaths)

▪ stomach cancer (740.000)

▪ liver cancer (700.000)

Invasive cancer are the leading cause of death in the developed world and the second leading in the developing world

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysis

a) Intro

b) LDA

c) Mahalanobis

d) PCA-DA

e) Feature Selection

f) MLR

5. Conclusions

eNose

Page 3: eNOSE calibration

Objectives

Composition of the breath of patients with lung cancer contains information that could be used to detect the disease

Breath samples were collected and analyzed by two electronic noses

Two goals:

✓ Instrument calibration using a set of key compounds

✓ Quality assurance of eNose data in the medical setting

eNose

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysis

a) Intro

b) LDA

c) Mahalanobis

d) PCA-DA

e) Feature Selection

f) MLR

5. Conclusions

Mapping between instruments through the calculation of a model

distinguish healthy and ill patience

Page 4: eNOSE calibration

Data contextualization

Two datasets obtained as measures from two electronic noses

▪ Cyranose

▪ ROTV

In a medical environment

eNose

1. Statistics

2. Objectives

3. Data contextualization1. Outliers

4. Data Analysis

a) Intro

b) LDA

c) Mahalanobis

d) PCA-DA

e) Feature Selection

f) MLR

5. Conclusions

≈ 350 samples/dataset10 features 10 gas non-selective sensors

Page 5: eNOSE calibration
Page 6: eNOSE calibration
Page 7: eNOSE calibration

Outliers

Page 8: eNOSE calibration

Data Analysis

Using raw data to develop an inter instrumental model:

eNose 1 as traning and eNose 2 as test results:

eNose

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysisa) Intro

b) LDA

c) Mahalanobis

d) PCA-DA

e) Feature Selection

f) MLR

5. Conclusions

Class #1 Class #2

Class #1 45 74

Class #2 123 115

Classification Classification

Error (%)

Cla

sse

s

62,18

51,68

44,82Accuracy

• LDA: everybody is classified as ill• PCA – DA: everybody is classified as healthy• Mahalanobis:

RAW DATA PREPROCESSING CLASSIFICATION

Page 9: eNOSE calibration

Data Analysis: Intro

Hypotesis:

Normalization of a trunk using a baseline of:• 1 sample

• 5 samples (about 15%* error for class 1)

• 10 samples (about 15%* error for class 1)

• entire chunck (25%*)

In order to minimize memory effects and maximize reproducibility

Best tradeoff performances/easy of calibration

eNose

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysis

a) Introb) LDA

c) Mahalanobis

d) PCA-DA

e) Feature Selection

f) MLR

5. Conclusions

* With our best classification method

1• Max Accuracy

2• Min False Negative

3• Min number of samples

Page 10: eNOSE calibration
Page 11: eNOSE calibration
Page 12: eNOSE calibration

Data Analysis: IntroeNose

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysis

a) Introb) LDA

c) Mahalanobis

d) PCA-DA

e) Feature Selection

f) MLR

5. Conclusions

Linear discrimination

Mahalanobis

Components discrimination

Page 13: eNOSE calibration

Data Analysis: LDA

Class #1 Class #2

Class #1 89 29

Class #2 99 139

Classification Classification

Error (%)

Cla

sse

s

24,58

41,60

64,04Accuracy

eNose

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysis

a) Intro

b) LDAc) Mahalanobis

d) PCA-DA

e) Feature Selection

f) MLR

5. Conclusions Resulting confusion matrix:

Page 14: eNOSE calibration

Data Analysis: MahalanobiseNose

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysis

a) Intro

b) LDA

c) Mahalanobisd) PCA-DA

e) Feature Selection

f) MLR

5. Conclusions

▪Sample points are distributed about the center of mass in a ellipsoidal manner.

▪the probability of the test point to belong to the set depends not only on the distance from the center of mass, but also on the direction.

▪Result: everybody is classified as healthy

example

Page 15: eNOSE calibration

Data Analysis: PCA - DAeNose

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysis

a) Intro

b) LDA

c) Mahalanobis

d) PCA-DAe) Feature Selection

f) MLR

5. Conclusions

Page 16: eNOSE calibration

Data Analysis: PCA - DAeNose

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysis

a) Intro

b) LDA

c) Mahalanobis

d) PCA-DAe) Feature Selection

f) MLR

5. Conclusions

Class #1 Class #2

Class #1 100 18

Class #2 59 179

Classification Classification

Error (%)

Cla

sse

s

15,25

24,79

78,37Accuracy

• Optimum solution

• Maximum accuracy

• Minimum False Negative

Page 17: eNOSE calibration

Data Analysis: Feature SelectioneNose

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysis

a) Intro

b) LDA

c) Mahalanobis

d) PCA-DA

e) Feature Selectionf) MLR

5. Conclusions

Based on loadings of PCA

Page 18: eNOSE calibration

Data Analysis: Feature Selection

Class #1 Class #2

Class #1 32 86

Class #2 22 216

Classification Classification

Error (%)

Cla

sse

s

72,88

9,24

69,66Accuracy

eNose

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysis

a) Intro

b) LDA

c) Mahalanobis

d) PCA-DA

e) Feature Selectionf) MLR

5. Conclusions

Class #1 Class #2

Class #1 101 17

Class #2 65 173

Classification Classification

Error (%)

Cla

sse

s

14,41

27,31

76,97Accuracy

Class #1 Class #2

Class #1 90 28

Class #2 95 143

Classification Classification

Error (%)

Cla

sse

s

23,73

39,92

65,45Accuracy

Based on loadings of PCA

PCA - DA

Mahalanobis

LDA

Close to resultswithout feature

selection !!

Page 19: eNOSE calibration

Data Analysis: MLR (intro)eNose

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysis

a) Intro

b) LDA

c) Mahalanobis

d) PCA-DA

e) Feature Selection

f) MLR

5. Conclusions

❖ Can we find a regression model in order to reduce the number of features

(sensors) used in the experiment?

❖ Is the model reliable during time?

We noticed correlation between featureson the same eNose.

Page 20: eNOSE calibration
Page 21: eNOSE calibration
Page 22: eNOSE calibration

Multiple linear regression attempts to model the relationship between variables by fitting a linearequation to observed data.

𝑌 = 𝑋𝐵 + 𝐸

• Dataset was divided in training and test sets in a temporal way.

(first 3 days as training, the last one as testing)

𝐵 = 𝑋+ ∙ 𝑌

• RMSECV was computed for each feature predicted

𝑅𝑀𝑆𝐸𝐶𝑉 =𝑃𝑅𝐸𝑆𝑆

𝑁

with 𝑃𝑅𝐸𝑆𝑆 = σ𝑖 𝑦𝑖𝐿𝑆 − 𝑦𝑖

2

MLR hypothesis: 𝑒𝑦 ≈ 𝑒𝑥.

Regression less robust

Data Analysis: MLReNose

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysis

a) Intro

b) LDA

c) Mahalanobis

d) PCA-DA

e) Feature Selection

f) MLR

5. Conclusions

Page 23: eNOSE calibration

Class #1 Class #2

Class #1 100 18

Class #2 66 172

76,40Accuracy

Classification Classification

Error (%)C

lass

es

15,25

27,73

Page 24: eNOSE calibration

ConclusionseNose

1. Statistics

2. Objectives

3. Data contextualization

1. Outliers

4. Data Analysis

a) Intro

b) LDA

c) Mahalanobis

d) PCA-DA

e) Feature Selection

f) MLR

5. Conclusions

Optimum

Lesscomplexity

• 3 features removed

• False Positive (24% -> 27)

Prediction• 1 features predicted

per eNose

• As feature selection

• PCA –DA technique• Accuracy 78%

Page 25: eNOSE calibration

Thanks for the attention