The Applications of Microarrays and Artificial Intelligence for Diagnosis, Prognosis and Selection...

Preview:

Citation preview

The Applications of Microarrays and Artificial Intelligence for Diagnosis, Prognosis and

Selection of Therapeutic Targets

Oncogenomics SectionJaved Khan M.D.

February 2005

•5 year old male referred to NCI second opinion•Injury to R groin/inguinal while playing•Rapid evolving mass with initial resolution•Diagnosis= hematoma•Treatment observation

•Several weeks later mass enlarged•Suspected malignancy•Biopsy performed

Case History

Lymphoma?

Despite availability of immunohistochemistry, cytogeneticsand molecular techniques, in some cases incorrect

diagnoses are made

Alveolar Rhabdomyosarcoma

Lymphoma

Rhabdomyosarcoma

Ewing’s

Neuroblastoma

Small Round Blue Cell Tumors (SRBCT)

Diagnostic Dilemmas

Rhabdomyosarcoma non-Hodgkin's Lymphoma

Origin Muscle Lymphoid

Treatment Chemotherapy Chemotherapy

Lumbar PunctureIntra-thecal drugs

Yes Rarely

Yes No

Accurate Diagnosis is Essential for the Treatment Small Blue Round Cell Tumors

Prognosis 20-60% survival 50-90% survival

Surgery

Radiation

Evolution of Translational Applications of Microarrays

Schena et al.

1. Cancer Diagnosis using Artificial Neural Networks (ANN)

2. Prognosis Prediction using ANNs

3. Array-CGH Investigation of Genomic Imbalances & Characterization of Tumor Progression Models

Translational Applications of Microarrays Outline

Hypothesis

• Cancers belonging to a given diagnosis have diagnostic specific gene expression profiles

Applications of Microarrays for Tumor Diagnosis

Or

• Intrinsic genomic instability of tumors leads to extensive random fluctuations in global gene expression such that no two cancers have similar profiles.

Multidimensional Scaling (MDS)•Alveolar Rhabdomyosarcoma (ARMS cell lines) •1238 element cDNA microarray•2-Class problem comparing ARMS vs. “others”•First report to demonstrate that cancers of a given diagnosis have similar gene expression profiles•Utilized visualization (MDS) and clustering tools

Do Cancers Exhibit Diagnostic Specific Gene Expression Profiles?

Khan et al., Cancer Research , 58, 5009-5013, 1998

Using unsupervised clustering methods we demonstrated: Cancers (ARMS) belonging to a

specific type have similar gene expression profiles, raising the possibility for it’s application

for diagnostics.

Can gene expression profiles be used to reliably diagnose cancers belonging to multiple classes?

Unknown n=25Non-SRBCT n=5

Khan, Wei, Ringnér et al. Nat Med. 7: 673-9, 2001

Lymphoma(n=8)

Rhabdomyosarcoma

(n=20)

Ewing’s(n=23)

Neuroblastoma(n=12)

6567 element cDNA Microarray

Unsupervised clustering Principal Component Analysis (PCA) showed no diagnostic specific clustering

RMS

NB

EWS

BL

Why Artificial Neural Networks (ANNs)?•Supervised•Pattern recognition algorithms•Modeled on the human neuron/brain•Learning from prior experience by error minimization

APPLICATIONS•Defense•Voice and handwriting recognition•Fingerprint Recognition•Diagnosis of Arrhythmias•Diagnosis of Myocardial Infarcts•Interpreting Mammograms, Radiographs/MRI

•Input = any type data, e.g. gene expression •Output = any given number (1)

ANN Training & Validation of 63 SRBCT samples

3750Trained Models

Output: (0-1)

Ideal output:

e.g. for EWS

EWS RMS BL NB 1 0 0 0

25 UnknownTest Samples

Gene Minimization

Increasing Number of Ranked Genes in Order

Identified minimal top 96 (1.5%) that

perfectly classified all 4 SRBCT classes

How well do these top 96 genes perform?

EWS NB RMS BL

Hierarchical Clustering using Top 96 Ranked Clones Resulted in “Perfect” Clustering

Identified 41genes not previously reported

to be expressed in SRBCTs

Cancer Diagnosis? 25 UnknownTest Samples

Diagnostic Classification

•Euclidean Distance from Ideal

•Calculate 95th Percentile

•Diagnosis Confirmed if Distance if within 95th Percentile

•Construct Probability Distribution of Distances

•Highest Output Determines Classification

Distance from Ideal

Ideal Distance=0

Ideal output:EWS RMS BL NB 1 0 0 0

25 UnknownTest Samples

Sensitivity (%)

9310010096

Cancer

EWSBLNBRMS

Specificity (%)

100100100100

ANN Diagnostic Classification

The expression profile of 96 genes can predict the diagnosis of SRBCT using ANNs

Cancer Prognosis

•The expression profiles of cancer contains “prognostic information” at presentation prior to treatment.

•Computer algorithms can utilize this to predict outcome of individual patients with no prior knowledge

Hypothesis

Converse Hypothesis

Presentation Remission Relapse

Therapy

•Only a fraction of the original pretreatment tumor mass contains the “resistant clone” and this cannot be detected by whole tumor microarray experimentation.

Cancer Prognosis

NeuroblastomaIncidence:• 1 per 100,000 in children < 15 yrs in US• The most common solid tumor for children <1 yr• 7-10% of cancers of childhood

Survival:• 75% under 1 year of age• < 30% of children over 1 year old with advanced disease despite

aggressive therapy

Known Prognostic Factors

• Age, Stage, Shimada Histology, MYCN amplification, Ploidy• Other genes, such as TRKA, TRKB, hTERT, BCL2, FYN, CD44 and caspases

Neuroblastoma Prognosis

Age Stage

MYCN amplification Ploidy

Shimada Histology Risk

John MarisCurrent Opinion in Pediatrics 2005, 17:7–13

Children’s Oncology Group (COG) Neuroblastoma Risk Stratification

•Low: 90% survival•Intermediate : 70-90%•High: 10-30%

Wei JS, Greer BT et al.Cancer Research Oct 1, 2004; 64(19) 42k cDNA Microarray

7 biological repeats

.

Survival Probability Curves of 49 Patients

Can ANNs predict survival status of each individual patient?

ANN Experimental Design using all clones

49 NB Samples30 Alive (>3yrs)

19 Deceased

Gene Expression Profiling42578 Clones

25933 Unigenes

PCA & Train ANN

Output: 0=Alive 1=Dead

Leave-one-out(Test)

.

ANN prediction (88%) of 49 patients using 38k High Quality Clones in a leave-one-out strategy

0=Alive 1=Dead

Alive Dead

Output 0=Alive 1=Dead

.

ANNs (using all 38k genes) were able to predict outcome without prior knowledge- of known risk factors

Can we predict the outcomes with a small number of genes?

TEST21 NB samples

(16 Alive, 5 Deceased)

.

Clone Minimization

(19 Genes)

.PCA of NB tumor samples using 19 genes

Alive Dead

How well do these top 19 genes perform for predicting the outcome?

ANNs were retrained on the 35 NB training samples using the expression profiles the 19

genes and outcome predicted on 21 test samples

.

ANN prediction (98%) using 19 top ranked genes

St4-A-NB14St2-NA-NB18

Output 0=Alive 1=Dead

Alive Dead

.

Survival probability using ANN-ranked top 19 genes

Performance of ANN-19 genes(Train and Test)

.

Which patients will benefit the most from the 19-gene prediction models?

•Ultra High-Risk

•Good High-Risk

.

Survival probability of high-risk patients using the top 19 genes

All high-risk Without MYCN-A

Patients with COG high-risk disease will benefit most from 19-gene prediction models

.

The expression of the 24 predictor clones (19 genes)

No single gene performs better than the 19 genes

Details of 19 predictor genes •12 known genes, 7 ESTs, 2 previously described (MYCN & CD44)

•8 out of the 12 neural specific genes

•DLK1 human homologue Drosophilia Delta gene -expressed by developing neuroblasts

-activates Notch signaling pathway, inhibits neuronal differentiation

•ARC, MYCN, and SLIT3 also neural development -higher expression in the poor-outcome tumors suggest a more aggressive less differentiated phenotype, reminiscent of proliferating and migrating neural crest progenitors

•SLIT3 neuron axon repellant gene was high; ROBO2, of one of its receptors was low in poor risk suggesting these NB cells secrete a substrate to repel connecting axons and potentially prevent differentiation

•Three secreted proteins DLK1, SLIT3, and PRSS3 in poor risk tumors

1. Gene expression profiles contain prognostic information in the pre-treatment diagnostic samples for patients with NB

Summary

2. Identified 19 prognostic specific genes that performed the best

3. Ability to further partition current COG high risk patients into ultra-high and survivors groups

1. Develop a multiplex PCR based prognosis prediction assay / serum markers test

2. Validate in a larger cohort of patients: national trials

Future Direction

3. Biological studies of the role of selected genes in the tumorigenic process

4. Isotope-coded affinity tags (ICAT) analysis of ultra-high risk tumor samples

5. New treatment trial for ultra-high risk patients

Comparative Genomic Hybridization(CGH)

Known Genomic Alterations that Correlate with Poor Prognosis in NB

•MYCN amplification: 20-25%

•1p deletion: 30-36%

•17q gain: 70-80%

•11q deletion: 44%

Objectives

• Perform a systematic survey of genomic copy

number alterations in Neuroblastoma

• Identify genomic alterations that correlate with

stage and MYCN amplifications

• Infer a model for tumor progression

Chen QR, Bilke S et al.

Preparation of genomic DNA

DNA labeling

Hybridization

Image analysis

BAC array DNA arrayMetaphase spread

CANCER

KlenowLabeling

Harvest DNA

DNA

DNADNA

DNA

CGH

Gene3-5cDNA<1MB2BAC

5-10MB2Metaphase CGH

ResolutionFold Sensitivity

● 20 Stage 1 tumor samples

● 53 Stage 4 tumors - 15 Stage 4S - 20 Stage 4 without MYCN amplification (4-) - 18 Stage 4 with MYCN amplification (4+)

•12 neuroblastoma cell lines (MYCN-A & NA)

•73 tumor samples

Neuroblastoma Sample and Dataset

•42 k clone cDNA microarray

Sensitivity of cDNA A-CGH to Detect Single Copy Number Changes

0.5 (Expected) equal to 0.9 (Observed)

Control X chr Expected ratio

46XY 1 0.546XX 2 1.047XXX 3 1.548XXXX 4 2.049XXXXX 5 2.5

Ref

Increased Sensitivity for Detecting Single Copy Changes using Topological t-statistics

Local genomic sequence mapping information

+t-statistics

Sample Ratio Data

Self-Self Ratio Data

Self-Self Ratio Data

Self-Self Ratio Data

Bilke S, Chen QR, et al.Bioinformatics. 2004 Nov 11

Topological t-statistics of NB data set

Chen QR, Bilke S et al.BMC Genomics 2004, 5:70

1p36

17

Objectives

• Perform a systematic survey of genomic copy

number alterations in Neuroblastoma

• Identify genomic alterations that correlate with

stage and MYCN amplifications

• Infer a model for tumor progression

Staging of Neuroblastoma

Stage 1: Localized tumor with complete gross excision

Stage 2: Localized tumor with incomplete gross excision

Stage 3: Unresectable unilateral tumor infiltrating across the midline

Stage 4: Distant Spread +/- MYCN Amplification

Stage 4S: <1 yr age. Localized primary with metastasis to skin, liver, and/or bone marrow (<10% infiltration)- Survival >90%

Rajagopalan et al. Nature Reviews 2003, Vol. 9. 695-701

Least Aggressive Most Aggressive

Stage 4S? (>90% survival)

Stage 1 Stage 2 Stage 3 Stage 4 Stage 4-MYCN-A

Tumor Progression Models

3 Principles for building models:

• All the stages arise from a common ancestor

• All changes within a parent genotype must be present in the

daughter (the inheritance signature). The daughter will

acquire additional genomic changes.

• Unobserved intermediate genotypes are possible but the

model with the smallest number of genotypes is utilized.

4+1- 4-

Tumor Progression Model from Array-CGH Data

All Possible Progression Models for 3 StagesStage 1

Stage 4+ Stage 4-

• “Linear Progression” is incompatible with our data • Tumor type is determined early when the set of genomic changes is acquired• Unobserved stages may be Neuroblastoma in situ (Beckwith, Perrin 1963)

17q21-25

2p25-24, 1p36

7p15-14, 7p11-q11,7q21-233p21, 4p16,11q13,11q23,14q11

11q14, 11q21-25

11q15,11q12-132q23,2q35, 11p1117p11-q11

Stage 4+

Stage 4-

Stage 1 Stage 4S

Genomic progression model for all 4 stages neuroblastoma

Summary• cDNA based array-CGH is an effective tool for genome-

wide high-resolution DNA copy number measurements

• Possible to detect low copy number changes by using topological t-statistics

• Identified genomic alterations specific to stage and MYCN amplification

• Characteristic pattern of genomic imbalances allows the identification of a model of tumor progression for neuroblastoma

• These regions may harbor prognostic markers, tumor suppressor genes or oncogenes and potential drug targets.

AcknowledgementsOncogenomic SectionPediatric Oncology BranchNational Cancer Institute

• Jun Wei • Braden Greer• Sven Bilke• Qingrong Chen • Craig Whiteford• Nicola Cenacchi• Alexei Krasnoselsky• Chang-Gue Son

GCRC, Germany• Frank Westermann • Frank Berthold

• Manfred Schwab Cooperative Human Tissue Network/COG• John Maris

The Children’s Hospital,Westmead, Australia • Daniel Catchpoole

NHGRI • Sean Davis

NCI • Seth Steinberg

NICHD Brain and Tissue Bank

http://home.ccr.cancer.gov/oncology/oncogenomics/

•150 Normal Samples•19 Different Organs•30 Individuals•18, 927 Unique Genes•Tool for identifying “cancer specific targets”•Immune therapy, molecular targeted therapy•Tool for identifying co-regulated genes

Genome Research March 2005

“….. I seem to have been only like a boy playing on the seashore, and diverting myself in now and then finding a smoother pebble or a prettier shell than ordinary, whilst the great ocean of truth lay all

undiscovered before me.” Isaac Newton