55
Lecture II: Genomic Methods Dennis P. Wall, PhD Frederick G. Barr, MD, PhD Deborah G.B. Leonard, MD, PhD 1 TRiG Curriculum: Lecture 2 March 2012

Lecture II: Genomic Methods

  • Upload
    elias

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Lecture II: Genomic Methods. Dennis P. Wall, PhD Frederick G. Barr, MD, PhD Deborah G.B. Leonard, MD, PhD. Why Pathologists? We have access, we know testing. Personalized Risk Prediction, Medication Dosing, Diagnosis/ Prognosis. Pathologists. Access to patient’s genome. - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 1

Lecture II: Genomic Methods

Dennis P. Wall, PhDFrederick G. Barr, MD, PhD

Deborah G.B. Leonard, MD, PhD

March 2012

Page 2: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 2

Why Pathologists? We have access, we know testing

PersonalizedRisk Prediction,MedicationDosing,Diagnosis/Prognosis

Physician sendssample toPathology (blood/tissue)

Pathologists

Access to patient’s genome

Just another laboratory test

March 2012

Page 3: Lecture II:  Genomic Methods

The path to genomic medicine

Sample Collection

Testing: Sequencing, Gene chips

Analysis

Pathologists

Access to patient’s genome

Sample Collection

March 2012

Page 4: Lecture II:  Genomic Methods

What we will cover today:

• Types of genetic alterations• Current and future

molecular testing methods

– Cytogenetics, in situ hybridization, PCR

– Microarrays• Genotyping• Expression profiling• Copy number variation

– Next generation sequencing (NGS)

• Whole genome• Transcriptome

4TRiG Curriculum: Lecture 2March 2012

Page 5: Lecture II:  Genomic Methods

DNA alterations – the small stuff

Point mutation

Repeat alteration

Deletion/Insertion

CCTGAGGAG CCTGTGGAG

TTCCAG…(CAG)5…CAGCAA

GAATTAAGAGAAGCA GAAGCA

Example: hemoglobin, beta – sickle cell disease

Example: epidermal growth factor receptor – lung cancer

TTCCAG…(CAG)60…CAGCAA

Example: huntingtin – Huntington disease

5TRiG Curriculum: Lecture 2March 2012

Page 6: Lecture II:  Genomic Methods

DNA alterations – the bigger stuff

Translocation

Amplification

Deletion/Insertion

Example: 22q11.2 region –

DiGeorge syndrome

Example: 17q21.1 (ERBB2) –

Breast cancer

Example: t(11;22)(q24;q12) –

Ewing’s sarcoma

11

22

Der 11

Der 22

6TRiG Curriculum: Lecture 2March 2012

Page 7: Lecture II:  Genomic Methods

Current strategies to detect DNA alterations

Cytogenetics: Large indels, amplification, translocations

In situ hybridization: large indels, amplification, translocations

http://moon.ouhsc.edu

EGFR amplification in glioblastomat(6;15) in a woman with repeated abortions

http://www.indianmedguru.com

7TRiG Curriculum: Lecture 2March 2012

Page 8: Lecture II:  Genomic Methods

Current strategies to detect DNA alterations

PCR-based approaches: Mutations, small indels, repeat alterations, large indels, amplification, translocations

Alsmadi OA, et al. BMC Genomics 2003 4:21

Factor V Leiden mutation

8TRiG Curriculum: Lecture 2March 2012

Page 9: Lecture II:  Genomic Methods

What we will cover today:

• Types of genetic alterations• Current and future

molecular testing methods

– Cytogenetics, in situ hybridization, PCR

– Microarrays• Genotyping• Expression profiling• Copy number variation

– Next generation sequencing (NGS)

• Whole genome• Transcriptome

9TRiG Curriculum: Lecture 2March 2012

Page 10: Lecture II:  Genomic Methods

DNA microarray - the basics

• Purpose: multiple simultaneous measurements by hybridization of labeled probe

• DNA elements may be: Oligonucleotides cDNA’s Large insert genomic clones

• Microarray is generated by: Printing Synthesis

10TRiG Curriculum: Lecture 2March 2012

Page 11: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 11

Microarray technologies

March 2012

Page 12: Lecture II:  Genomic Methods

Organization of a DNA microarray

(adapted from Affymetrix)

1.28 cm

1.28 cm

12TRiG Curriculum: Lecture 2March 2012

Page 13: Lecture II:  Genomic Methods

Hybridization of a labeled probeto the microarray

13

(adapted from Affymetrix)

TRiG Curriculum: Lecture 2March 2012

Page 14: Lecture II:  Genomic Methods

Detection of hybridization on microarray

Light from laser

14

(adapted from Affymetrix)

TRiG Curriculum: Lecture 2March 2012

Page 15: Lecture II:  Genomic Methods

Hybridization intensities on DNA microarray following laser scanning

15TRiG Curriculum: Lecture 2March 2012

Page 16: Lecture II:  Genomic Methods

Overview of SNP array technology

LaFramboise T. Nucleic Acids Res. 2009; 37:4181

16TRiG Curriculum: Lecture 2March 2012

Page 17: Lecture II:  Genomic Methods

Microarray Applications

• DNA analysis Polymorphism/mutation detection –

e.g. Disease susceptibility testingDrug efficacy/sensitivity testing

Copy number detection (comparative genomic hybridization) – e.g. Constitutional or cancer karyotyping

Bacterial DNA – e.g. Identification and speciation

• RNA analysis Expression profiling – e.g. Breast cancer prognosis

Cancer of unknown primary origin

cv

17TRiG Curriculum: Lecture 2March 2012

Page 18: Lecture II:  Genomic Methods

Genome-wide association studies of lung cancer microarray with 317,139 SNP’s

Hung RJ, et al. Nature Genetics. 2008; 452:633

Cases/controlsFrom differentpopulations

18TRiG Curriculum: Lecture 2March 2012

Page 19: Lecture II:  Genomic Methods

Genotype calling

Hybridization intensities translated into genotypesLarge SNP numbers requires automated procedureRecent algorithms – clustering/pooling strategies

• Raw hybridization intensities normalized • Information combined across different samples at

each SNP• Assign genotypes to entire clusters• For each sample, estimate probability of each of

three genotype calls at each SNP• Genotype assigned based on defined threshold of

probability• Missing genotypes dependent on algorithm &

threshold used

Teo YY, Curr Op in Lipidology. 2008; 19:133

19TRiG Curriculum: Lecture 2March 2012

Page 20: Lecture II:  Genomic Methods

Genotyping - Limitations & quality control

• Accuracy of algorithm– Depends on number of samples in each cluster– Prone to errors for small number of samples or SNP’s with rare alleles

• High rates of missing genotypes:– Array problems – plating/synthesis issue– Poor quality DNA – degradation– Hybridization failure– Differential performance between SNP’s

• Excess heterozygosity - sample contamination?

Just another laboratory test

20TRiG Curriculum: Lecture 2March 2012

Page 21: Lecture II:  Genomic Methods

• Analyzed 8,101 genes on chip microarrays

• Reference= pooled cell lines

• Breast cancer subgroups

Perou CM, et al. Nature. 2000; 406, 747

21TRiG Curriculum: Lecture 2March 2012

Page 22: Lecture II:  Genomic Methods

Original two probe strategy for expression profiling on cDNA arrays

Duggan DJ, et al., Nature Genetics. 1999; 21:10

22TRiG Curriculum: Lecture 2March 2012

Page 23: Lecture II:  Genomic Methods

Expression profiling: challenges and limitations

Biological• Dynamic & complex nature of gene expression• Heterogeneous nature of tissue samples• Variation in RNA quality

Technological• Reproducibility across microarray platforms• Selection of probes – dependence on binding efficiency• Controlling for technical variability

Statistical/bioinformatic• Adequate experimental design• Normalization to remove variability among chips• Multiple testing correction• Validation of results Just another

laboratory test23TRiG Curriculum: Lecture 2March 2012

Page 24: Lecture II:  Genomic Methods

Copy number variation: Comparative genomic hybridization

CGH

Array-CGH

Metaphase Chromosomes

ArrayedDNA’s

Tumor DNA Reference DNA

Hybridization

Deletion

GainDeletion

Gain

24

http://www.advalytix.com/advalytix/hybridization_330.htm

TRiG Curriculum: Lecture 2March 2012

Page 25: Lecture II:  Genomic Methods

Constitutional genomic imbalances detected by copy number arrays

10.9 Mbdeletionat 7q11

7.2 Mbduplication

on 11q

Miller DT, et al, Amer J Hum Genet. 2010; 86:74925TRiG Curriculum: Lecture 2March 2012

Page 26: Lecture II:  Genomic Methods

Copy number - Limitations & quality control

Artifacts may be caused by:• GC content

– Wavy patterns correlate with GC content – Algorithms developed to remove waviness

• DNA sample quantity and quality– Can impact on level of signal noise and false positive rate– Whole genome amplification associated with signal noise

• Sample composition– In cancer studies, normal cells dilute cancer aberrations– Tumor heterogeneity will also affect copy number

26

Just another laboratory test

TRiG Curriculum: Lecture 2March 2012

Page 27: Lecture II:  Genomic Methods

What we will cover today:

• Types of genetic alterations• Current and future genetic

test methods

– Cytogenetics, in situ hybridization, PCR

– Microarrays• Genotyping• Expression profiling• Copy number variation

– Next generation sequencing (NGS)

• Whole genome• Transcriptome

27TRiG Curriculum: Lecture 2March 2012

Page 28: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 28

Cancer Treatment: NGS in AML

Welch JS, et al. JAMA, 2011;305, 1577

March 2012

Page 29: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 29

Case History

• 39 year old female with APML by morphology

• Cytogenetics and RT-PCR unable to detect PML-RAR fusion

• Clinical question: Treat with ATRA versus allogeneic stem cell transplant

March 2012

Page 30: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 30

Methods/Results

• Paired-end NGS sequencing

• Result: Cytogenetically cryptic event: novel fusion protein

• Took 7 weeks

March 2012

Page 31: Lecture II:  Genomic Methods

77-kilobase segment from Chr. 15 was inserted en bloc into the second intron of the gene RARA on Chr. 17.

March 2012

Page 32: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 32

Workflow

Image processing and base calling

Raw Data Analysis

Alignment to reference genome

Whole Genome Mapping

Detection of genetic variation(SNPs, Indels, Insertions)

Variant Calling

Linking variants to biological information

Annotation

March 2012

Page 33: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 33

Overview of Paired End Sequencing

Short Insert

DNA

Random Shearing

Adapters Ligated

Annealed to Surface

Sequenced

Synthesized

Sequencing done with labeled NTPs and massively parallel

March 2012

Page 34: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 34

Short read output format

Read ID

Sequence

Quality line

March 2012

Page 35: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 35

Quality control is critical

Just another laboratory test

March 2012

Page 36: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 36

Measuring Accuracy• Phred is a program that assigns a quality score to

each base in a sequence. These scores can then be used to trim bad data from the ends, and to determine how good an overlap actually is.

• Phred scores are logarithmically related to the probability of an error: a score of 10 means a 10% error probability; 20 means a 1% chance, 30 means a 0.1% chance, etc.

– A score of 20 is generally considered the minimum acceptable score.

March 2012

Page 37: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 37

Workflow

Image processing and base calling

Raw Data Analysis

Alignment to reference genome

Whole Genome Mapping

Detection of genetic variation(SNPs, Indels, Insertions)

Variant Calling

Linking variants to biological information

Annotation

March 2012

Page 38: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 38

Alignment/Mapping

…CCATAGGCTATATGCGCCCTATCGGCAATTTGCGGTATAC…GCGCCCTA

GCCCTATCGGCCCTATCG

CCTATCGGACTATCGGAAA

AAATTTGCAAATTTGC

TTTGCGGTTTGCGGTA

GCGGTATA

GTATAC…

TCGGAAATTCGGAAATTT

CGGTATAC

TAGGCTATA

GCCCTATCGGCCCTATCG

CCTATCGGACTATCGGAAA

AAATTTGCAAATTTGC

TTTGCGGT

TCGGAAATTCGGAAATTTCGGAAATTT

AGGCTATATAGGCTATATAGGCTATAT

GGCTATATGCTATATGCG

…CC…CC…CCA…CCA…CCAT

ATAC…C…C…

…CCAT…CCATAG TATGCGCCC

GGTATAC…CGGTATAC

GGAAATTTG

…CCATAGGCTATATGCGCCCTATCGGCAATTTGCGGTATAC…ATAC……CC

GAAATTTGC

Read depth is critical for accurate reconstruction

March 2012

Page 39: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 39

Alignment approachesAligner DescriptionIllumina platform    ELAND Vendor-provided aligner for Illumina data    Bowtie Ultrafast, memory-efficient short-read aligner for Illumina data

  Novoalign A sensitive aligner for Illumina data that uses the Needleman–Wunsch algorithm

    SOAP Short oligo analysis package for alignment of Illumina data   MrFAST A mapper that allows alignments to multiple locations for CNV

detectionSOLiD platform  Corona-lite Vendor-provided aligner for SOLiD data   SHRiMP Efficient Smith–Waterman mapper with colorspace correction

454 Platform   Newbler Vendor-provided aligner and assembler for 454 data   SSAHA2 SAM-friendly sequence search and alignment by hashing

program   BWA-SW SAM-friendly Smith–Waterman implementation of BWA for

long readsMulti-platform

    BFAST BLAT-like fast aligner for Illumina and SOLiD data    BWA Burrows-Wheeler aligner for Illumina, SOLiD, and 454 data    Maq A widely used mapping tool for Illumina and SOLiD; now

deprecated by BWA

Koboldt DC, et al. Brief Bioinform 2010 Sep;11(5):484-98

March 2012

Page 40: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 40

Short read alignmentGiven a reference and a set of reads, report at

least one “good” local alignment for each read if one existsApproximate answer to question: where in genome did read

originate?

…TGATCATA… GATCAA

…TGATCATA… GAGAAT

better than

• What is “good”? For now, we concentrate on:

…TGATATTA… GATcaT

…TGATCATA… GTACAT

better than

– Fewer mismatches = better

– Failing to align a low-quality base is better than failing to align a high-quality base

March 2012

Page 41: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 41

Post alignment: what do you get?

Alignment of reads including read pairs

SAM file

Read Pair

CIGAR field

Simplified pileup output

Li H, et al. Bioinformatics. 2009;25:2078

March 2012

Page 42: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 42

Workflow

Image processing and base calling

Raw Data Analysis

Alignment to reference genome

Whole Genome Mapping

Detection of genetic variation(SNPs, Indels, Insertions)

Variant Calling

Linking variants to biological information

Annotation

March 2012

Page 43: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 43

Discovering Genetic Variation

SNPs

ATCCTGATTCGGTGAACGTTATCGACGATCCGATCGA CGGTGAACGTTATCGACGATCCGATCGAACTGTCAGC GGTGAACGTTATCGACGTTCCGATCGAACTGTCAGCG

TGAACGTTATCGACGTTCCGATCGAACTGTCAGCGGCTGAACGTTATCGACGTTCCGATCGAACTGTCAGCGGCTGAACGTTATCGACGTTCCGATCGAACTGTCAGCGGC

GTTATCGACGATCCGATCGAACTGTCAGCGGCAAGCTTTATCGACGATCCGATCGAACTGTCAGCGGCAAGCT

ATCCTGATTCGGTGAACGTTATCGACGATCCGATCGAACTGTCAGCGGCAAGCTGATCGATCGATCGATGCTAGTG TTATCGACGATCCGATCGAACTGTCAGCGGCAAGCT

TCGACGATCCGATCGAACTGTCAGCGGCAAGCTGATATCCGATCGAACTGTCAGCGGCAAGCTGATCG CGATTCCGATCGAACTGTCAGCGGCAAGCTGATCG CGATC TCCGATCGAACTGTCAGCGGCAAGCTGATCGATCGA

GATCGAACTGTCAGCGGCAAGCTGATCG CGATCGA AACTGTCAGCGGCAAGCTGATCG CGATCGATGCTA

TGTCAGCGGCAAGCTGATCGATCGATCGATGCTAG

INDELs

ATCCTGATTCGGTGAACGTTATCGACGATCCGATCGA

TCAGCGGCAAGCTGATCGATCGATCGATGCTAGTG

reference genome

March 2012

Page 44: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 44March 2012

Page 45: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 45March 2012

Page 46: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 46

Workflow

Image processing and base calling

Raw Data Analysis

Alignment to reference genome

Whole Genome Mapping

Detection of genetic variation(SNPs, Indels, Insertions)

Variant Calling

Linking variants to biological information

Annotation

March 2012

Page 47: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 47

Where to go to annotate genomic data, determine clinical relevance?

• Online Mendelian Inheritance in Man (http://www.ncbi.nlm.nih.gov/omim)

• International HapMap project (http://hapmap.ncbi.nlm.nih.gov)

• Human genome mutation database (http://www.hgvs.org/dblist/glsdb.html)

• PharmGKB (http://www.pharmgkb.org)• Scientific literature

March 2012

Page 48: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 48

Case-control study design = variable results

•Need for Clinical Grade Database•Ease of use•Continually updated•Clinically relevant SNPs/variations

Ng PC, et al. Nature. 2009; 461: 724

March 2012

Page 49: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 49

Cancer Treatment: NGS of Tumor

Jones SJM, et al. Genome Biol. 2010;11:R82.

March 2012

Page 50: Lecture II:  Genomic Methods

Case History

• 78 year old male• Poorly differentiated

papillary adenocarcinoma of tongue

• Metastatic to lymph nodes

• Failed chemotherapy• Decision to use next-

generation sequencing methods

50TRiG Curriculum: Lecture 2March 2012

Page 51: Lecture II:  Genomic Methods

TRiG Curriculum: Lecture 2 51

Workflow

Image processing and base calling

Raw Data Analysis

Alignment to reference genome

Whole Genome Mapping

Detection of genetic variation(SNPs, Indels, Insertions)

Variant Calling

Linking variants to biological information

Annotation

March 2012

Page 52: Lecture II:  Genomic Methods

Methods and Results

• Analysis– Whole genome– Transcriptome

• Findings– Upregulation of

RET oncogene– Downregulation of

PTEN

52TRiG Curriculum: Lecture 2March 2012

Page 53: Lecture II:  Genomic Methods

Transcriptome and Whole-exome

• Transcriptome– Convert RNA to cDNA– Perform sequencing– Only expressed genes– Can get expression levels

• Whole-exome– Use selection procedure to

enrich exons– No intron data– Results depends on

selection procedure

53

Martin JA, Wang Z. Nat Rev Genet. 2011; 12:671.

TRiG Curriculum: Lecture 2March 2012

Page 54: Lecture II:  Genomic Methods

A few words about samples…

• Can use formalin-fixed paraffin-embedded tissue for whole-exome or transcriptome sequencing

• Need frozen tissue for whole-genome sequencing– Better quality DNA

• Small quantity of DNA needed – For whole-exome sequencing,

amount off a few slides

54TRiG Curriculum: Lecture 2March 2012

Page 55: Lecture II:  Genomic Methods

Summary• Microarrays

– SNPs– Expression profiling– Copy number variation

• Major steps in NGS– Base calling– Alignment– Variant calling– Annotation

• Technology will change but just another test– Accuracy– Precision– Need to validate findings with

traditional methods

Roychowdhury S, et al. Sci Transl Med. 2011; 3: 111ra121

55TRiG Curriculum: Lecture 2March 2012