15
Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: 550.635 Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor Geman By Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gil

Diagnosis of multiple cancer types by shrunken centroids of gene expression

Embed Size (px)

DESCRIPTION

Diagnosis of multiple cancer types by shrunken centroids of gene expression. By Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu. Course: 550.635 Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor Geman. Nearest Centroid Classification. - PowerPoint PPT Presentation

Citation preview

Page 1: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Diagnosis of multiple cancer types by shrunken centroids of gene expression

Course: 550.635 Topics in Bioinformatics Presenter: Ting YangTeacher: Professor Geman

By Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu

Page 2: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Nearest Centroid Classification

Example: small round blue cell tumors of childhood

• 63 training samples, 25 testing samples

• 4 classes: BL, EWS, NB, RMS

• Figure 1

• Nearest centroid classification

• Disadvantage

Page 3: Diagnosis of multiple cancer types by shrunken centroids of gene expression
Page 4: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Nearest shrunken Centroids

• A modification of the nearest centroid method

• Idea: First normalize class centroids by the within-class standard deviation for each gene, shrink each class centroid towards the overall centroid.

Page 5: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Details:

0( )ik i

ikk i

x xd

m s s

Mean expression value in class k for gene i

ith component of the overall centroid

Pooled within class standard deviation for gene i

:t statistics

1 1k

k

mn n

Page 6: Diagnosis of multiple cancer types by shrunken centroids of gene expression

:t statistics0( )

ik iik

k i

x xd

m s s

• It measures the difference between the gene i in class k and gene i in all classes combined.

• Idea: a gene that discriminates one class from the rest will have a statistic of large absolute value.

Page 7: Diagnosis of multiple cancer types by shrunken centroids of gene expression

• Shrink it toward zero to eliminate the genes that do not provide sufficient information.

• ‘De-noising’ step

( )( )ik ik ikd sign d d

Page 8: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Choosing the amount of shrinkage• Shrinkage amount is allowed to vary over a wide range.

• 10-fold cross validation ( choose the one that has the smallest error rate)

• Divide the set of samples (at random)into 10 equal size parts.

(classes were distributed proportionally among each of the 10 parts)

• Fit the model on 90% of the samples and then predict the class label of the remaining 10% (test samples).

• Repeat 10 times, add together the error (overall error).

• Figure 2

• Figure 1

Page 9: Diagnosis of multiple cancer types by shrunken centroids of gene expression
Page 10: Diagnosis of multiple cancer types by shrunken centroids of gene expression
Page 11: Diagnosis of multiple cancer types by shrunken centroids of gene expression

More Figures

• Figure 3

• Figure 4

Page 12: Diagnosis of multiple cancer types by shrunken centroids of gene expression
Page 13: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Classification

• A new sample is classified by comparing its expression profile with each shrunken centroid, over those 43 active genes.

• Distance function: prior information included.

Page 14: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Statistical details:

• t-statistic

• Estimates of the class probabilities (Figure 5)

0( )ik i

ikk i

x xd

m s s

Page 15: Diagnosis of multiple cancer types by shrunken centroids of gene expression