16
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology Harvard Medical School March 17, 2009

Transcriptional Diagnosis by Bayesian Network

  • Upload
    hisano

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Transcriptional Diagnosis by Bayesian Network. Hsun-Hsien Chang and Marco F. Ramoni. Children’s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology Harvard Medical School March 17, 2009. Background. - PowerPoint PPT Presentation

Citation preview

Page 1: Transcriptional Diagnosis by Bayesian Network

1

Harvard Medical School

Transcriptional Diagnosis by Bayesian Network

Hsun-Hsien Chang and Marco F. Ramoni

Children’s Hospital Informatics Program

Harvard-MIT Division of Health Sciences and Technology

Harvard Medical School

March 17, 2009

Page 2: Transcriptional Diagnosis by Bayesian Network

2

Harvard Medical School

Background

• Microarray technology enables profiling expression of thousands of genes in parallel on a single chip.

• Comparative analysis of gene expression across tissue states extracts signature genes for disease diagnosis.

• Challenge: – Number of variables (i.e., genes) is much greater than the

number observations (i.e., biological samples), inducing the problem of overfitting.

• Existing methods:– Gene selection: compute statistics (eg., t-statistics, SNR,

PCA) of individual genes and select high rank genes.– Classification model: create a classification function of

selected genes.

Page 3: Transcriptional Diagnosis by Bayesian Network

3

Harvard Medical School

Proposed Approach

• Issues:– Assumption on gene independencies is inadequate. – Other genes may be collinearly expressed with the signature.– Selection and classification are two non-integrated steps.

Need a cut-off threshold to select high rank genes.

• Proposed strategies:– Adopt system biology approach to infer the functional

dependence among genes.– Use the dependence network for tissue discrimination. – Integrate gene selection and classification model in Bayesian

network framework.

Page 4: Transcriptional Diagnosis by Bayesian Network

4

Harvard Medical School

Data Representation by Bayesian Network

Gene 1

Gene 2

Gene N

Cas

e 1

.

.

.

.

.

.

Cas

e 2

. . . .

Tissue state 1

Cas

e M

Tissue state 2

G1

Pheno

G2

GN

.

.

.

.

.

.

• Bayesian networks are directed acyclic graphs where:– Node corresponds to random variables.– Directed arcs encode conditional probabilities of the target

nodes on the source nodes.

Page 5: Transcriptional Diagnosis by Bayesian Network

5

Harvard Medical School

Gene Selection by Bayes Factor

Pheno

G1

G2

GN

Gp

Gq

G1

Pheno

G2

GN

.

.

.

.

.

.

gene selection by Bayes factor

Page 6: Transcriptional Diagnosis by Bayesian Network

6

Harvard Medical School

Collinearity Elimination via Network Learning

Pheno

G1

G2

GN

Gp

Gq

Pheno

G2

GN

Gp

Gq

G1

Gp

GN

collinearity elimination

Page 7: Transcriptional Diagnosis by Bayesian Network

7

Harvard Medical School

Sample Classification

• The phenotype variable is independent of the blue genes, given the green genes.

• Technically, the green genes are under the Markov blanket of the phenotype variable, and they are the signature genes used for phenotype determination.

• Tissue classification:

GN

Pheno

G2

Gp

Gq

G1

Page 8: Transcriptional Diagnosis by Bayesian Network

8

Harvard Medical School

Algorithm Summary

Gene Selection by Bayes Factor

Collinearity Elimination

Sample Classification

Optimize Performance

......

...

...

Optimize Hyperparameters

(sensitivity analysis)

...

Page 9: Transcriptional Diagnosis by Bayesian Network

9

Harvard Medical School

• Adenocarcinoma (AC) and squamous cell carcinoma (SCC) are major subtypes of lung cancer:– AC and SCC are distinct in survival, chances of metastasis,

and responses to chemotherapy and targeted therapy.

– Physicians lack confidence in correct recognition when there are multiple primary carcinomas.

• Training: – 58 ACs and 53 SCCs.– 77 genes selected in the network.– 25 signature genes.

Discriminate Lung Carcinoma Subtypes

Page 10: Transcriptional Diagnosis by Bayesian Network

10

Harvard Medical School

Bayesian Network for Lung Carcinoma

Page 11: Transcriptional Diagnosis by Bayesian Network

11

Harvard Medical School

Large-Scale Testing on Independent Samples

• 422 samples (232 ACs and 190 SCCs) aggregated from 7 cohorts (including Caucasians, African-Americans, Chinese).

• Accuracy = 95.2% AUROC.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1ROC curves

1-specificity

sen

sitiv

ity

Proposed Bayes Net (95.2%)

Page 12: Transcriptional Diagnosis by Bayesian Network

12

Harvard Medical School

Comparisons with Other Popular Methods

• Higher classification accuracy.• Small-sized signature to avoid overfitting.

Testing AUROC

p-value# signature

genes

Bayesian Network 95.2% --- 25

PCA/LDA 91.2% 0.0047 13PAM

(Tibshirani et al., PNAS 2002)91.0% 0.0014 77

Weighted Voting(Golub et al., Science 1999)

93.4% 0.6240 800

Page 13: Transcriptional Diagnosis by Bayesian Network

13

Harvard Medical School

KRT6 Family Characterizes the Lung Carcinoma Discrimination

Page 14: Transcriptional Diagnosis by Bayesian Network

14

Harvard Medical School

KRT6 Family Characterizes the Lung Carcinoma Discrimination

• Keratin-6 family genes (KRT6A, KRT6B, KRT6C) are important for distinguishing lung cancer subtypes.

– Accounting for 95% of the accuracy of the whole 25-gene signature.

– Located on chromosome 12q12-q13.

– A nonlinear, concave discriminative surface.

Page 15: Transcriptional Diagnosis by Bayesian Network

15

Harvard Medical School

Verification by Chr12q12-q13 Aberrations• Investigate DNA copy number changes in comparative

genomic hybridization (CGH) array.– 12 ACs and 13 SCCs from

Vrije University Medical Center, Netherland.

– A dumbbell discriminative surface achieves 80% classification accuracy.

– Treat average CGH values of genes occupying q12, q13, and q12-13 respectively as three features to construct a Naïve Bayes Classifier.

Page 16: Transcriptional Diagnosis by Bayesian Network

16

Harvard Medical School

Conclusion

• Reverse engineer regulatory network information for tissue classification.

• Adopt the system biology approach to infer gene dependencies network.– Select genes by Bayes factor.– Eliminate collinearity via network learning.– Integrate gene selection and classification model

in a single Bayesian network framework.• Demonstrate the promising translational

value of the system biology approach in clinical study.