25
2/3/2015 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor of Pharmacology School of Pharmacy, Department of Pharmaceutical Sciences Room V20-3124 (303) 724-3362 [email protected] Why RNA-seq? Crick (1970) Nature 227:561-563 Phenotype Genetic architecture Developmental stage Environmental influences Tissue type Disease state HMGP 7620: Advanced Genome Analysis

Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

Embed Size (px)

Citation preview

Page 1: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

1

HMGP 7620: Advanced Genome Analysis

Quantitative RNA Sequencing (RNA-seq) and Exome Analysis

Richard A. Radcliffe, Ph.D.Professor of Pharmacology

School of Pharmacy, Department of Pharmaceutical SciencesRoom V20-3124(303) 724-3362

[email protected]

Why RNA-seq?

Crick (1970) Nature 227:561-563

Phenotype

Genetic architectureDevelopmental stage

Environmental influencesTissue type

Disease state

HMGP 7620: Advanced Genome Analysis

Page 2: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

2

“Understanding the transcriptome is essential for interpreting the functional elements of the genome and revealing the molecular

constituents of cells and tissues, and also for understanding development and disease.”

• Catalogue all species of transcript, including mRNAs, non-coding RNAs and small RNAs

• Determine the transcriptional structure of genes, in terms of their start sites, 5′ and 3′ ends, splicing patterns and other post-transcriptional modifications

• Quantify the changing expression levels of each transcript during development and under different conditions.

• Pathway/network/ontology analysis.

Why RNA-seq?

Massively parallel expression analysis

Wang et al. (2009) Nat Rev Genetics 10:57-63HMGP 7620: Advanced Genome Analysis

RNA-seq OverviewAAAAAA AAAAAA

AAAAAA

AAAAAA

AAAAAAAAAAAA

AAAAAA AAAAAA

AAAAAA

AAAAAA AAAAAAAAAAAA AAAAAA AAAAAA

AAAAAA AAAAAA

Adapted from: Pepke et al. (2009) Nat Methods 6:S22-S32

Analysis(QC, quantitation, transcript

annotation)

Select fraction of interest

Library prep

Sequence and map to reference genome

HMGP 7620: Advanced Genome Analysis

Page 3: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

3

Library Prep

HMGP 7620: Advanced Genome Analysis Corney (2013) Mater Methods 3:203

Library Prep: Some Considerations

HMGP 7620: Advanced Genome Analysis

• RNA fraction – Many different RNA species– Poly(A)– Size (<200 nt vs. >200 nt)

• Strandedness• Read length• Single- vs. pair-end• Multiplexing

Page 4: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

4

RNA Fraction

Mattick & Makunin (2006) Hum Mol Genet 1:R17-29 Genomes, 2nd Edition, Oxford: Wiley-Liss, 2002

~80% ~15%

HMGP 7620: Advanced Genome Analysis

Genomic Distribution Total RNA Distribution

TranscribedBoth strandstranscribed

RR34

HMGP 7620: Advanced Genome Analysis

• RNA fraction – Many different RNA species– Poly(A)– Size (<200 nt vs. >200 nt)

• Strandedness– Overlapping transcripts– Annotation of novel transcripts

• Read length• Single- vs. pair-end • Multiplexing

Library Prep: Some Considerations

Page 5: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

Slide 7

RR34 The area of the box represents the genome. The area of large green circle is equivalent to the documented extent of transcription, with the darker green area corresponding to that on both strands. CDSs are protein-coding sequences, and UTRs are 5′- and 3′-untranslated sequences in mRNAs. The dots indicate (and in fact overstate) the proportion of the genome occupied by known snoRNAs and miRNAs. Richard Radcliffe, 1/26/2015

Page 6: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

5

Strandedness

HMGP 7620: Advanced Genome Analysis

Strandedness

HMGP 7620: Advanced Genome Analysis

<<<<< <<<

<<<

<<<<<<<<<<<<<<<<<<<<<<

<<<<<<<<<<<<<<<<<<<<<<

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Transcription

DS library prep

Alignment

<<<<<<<<<<<<<<<<<<<<<<

<<<<<<<<<<<<<<<<<<<<<<

Ncstn (-)

Copa (+)

<<<<<<<<<<<<<<<<<<<<<<

SS library prep

<<<<<<<<<<<<<<<<<<<<<<

Which strand (gene) didthe fragment come from?

No question about which strand(gene) the fragment came from.

Page 7: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

6

HMGP 7620: Advanced Genome Analysis

• RNA fraction – Many different RNA species– Poly(A)– Size (<200 nt vs. >200 nt)

• Strandedness• Read length• Single- vs. pair-end• Multiplexing

Library Prep: Some Considerations

Read Length

HMGP 7620: Advanced Genome Analysis

• Read length is related to:– Sequencing accuracy: quality declines as a function of the length of a read– Mapping accuracy: the longer the read, the more accurately it maps

Page 8: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

7

HMGP 7620: Advanced Genome Analysis

• RNA fraction – Many different RNA species– Poly(A)– Size (<200 nt vs. >200 nt)

• Strandedness• Read length• Single- vs. pair-end • Multiplexing

Library Prep: Some Considerations

Single vs. Paired-end

HMGP 7620: Advanced Genome Analysis Zhernakova et al. (2013) PLoS Genet e1003594

Page 9: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

8

HMGP 7620: Advanced Genome Analysis

• RNA fraction – Many different RNA species– Poly(A)– Size (<200 nt vs. >200 nt)

• Strandedness• Read length• Single- vs. pair-end • Multiplexing

Library Prep: Some Considerations

Mapping to the Reference Genome

HMGP 7620: Advanced Genome Analysis

Alignment

@HWUSI-EA541_0032:1:2:0:325#0 CCATCTTTTTGATGTCCGCAATGATTT+WTORTSOQXTVVYXRXXXVPTXXXWUUL

@HWUSI-EA541_0032:1:2:0:325#0 - chr7 13619194 CCATCTTT…

• Bowtie, BWA• Computational

considerations

Page 10: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

9

Mapping to the Genome: Some Considerations

HMGP 7620: Advanced Genome Analysis

• Non-unique reads – Gene families– Repeat sequences (simple repeats, transposons)

• Depth– Probability of representation & limits of detection– Transcript isoform quantification– Variant calling (SNPs, small indels)

• Reference genome effects

HMGP 7620: Advanced Genome Analysis

Number of multiple alignment reads allowed (bowtie option -m)

100 101 102 103 104 105

Fra

ctio

n o

f re

ads

supp

ress

ed (

%)

0

4

8

12

16

20

Num

ber

of a

lignm

ents

(10

6)

0

50

100

150

200

250

Non-unique Reads

Page 11: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

10

Non-unique Reads: Gene Families

HMGP 7620: Advanced Genome Analysis

Non-unique Reads: Repeats

HMGP 7620: Advanced Genome Analysis

Page 12: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

11

Mapping to the Genome: Some Considerations

HMGP 7620: Advanced Genome Analysis

• Non-unique reads – Gene families– Repeat sequences (simple, SINEs, LINEs, etc.)

• Depth– Probability of representation & limits of detection– Transcript isoform quantification– Variant calling (SNPs, small indels)

• Reference genome effects

Depth: Transcript Quantification

HMGP 7620: Advanced Genome Analysis

Page 13: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

12

Depth: Variant Calling

HMGP 7620: Advanced Genome Analysis

Mapping to the Genome: Some Considerations

HMGP 7620: Advanced Genome Analysis

• Non-unique reads – Gene families– Repeat sequences (simple, SINEs, LINEs)

• Depth– Probability of representation & limits of detection– Variant calling (SNPs, small indels)– Transcript isoform quantification

• Reference genome effects

Page 14: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

13

Reference Genome Effects

HMGP 7620: Advanced Genome Analysis

RNA seq: ISS(ISS genome)

RNA seq: ISS(mm10 genome)

ILS DNA Sequencing

ISS DNA Sequencing

GeneAnnotations

Analysis

HMGP 7620: Advanced Genome Analysis

• QC• Assembly/Quantification

– Reads Per Kilobase Exon per Million Mapped Reads (RPKM)

• Differential expression• Pathway/network functional analysis• Annotation

– Novel exons – novel splice junctions – novel genes

Page 15: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

14

Quality Control

HMGP 7620: Advanced Genome Analysis

• Pre-library construction:– RNA quality

• Pre-alignment:– Per base quality– Per read quality– Nucleotide distribution per position – GC content– Sequence over-representation

• Post-alignment:– Mean coverage, 5’-3’ and 3’-5’– Ribosomal RNA contamination– Percent mapped reads

Quality Control: RNA Degradation

HMGP 7620: Advanced Genome Analysis

18s

28s

Page 16: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

15

Quality Control

HMGP 7620: Advanced Genome Analysis

Quality per position Quality per read Nucleotide distribution

Analysis

HMGP 7620: Advanced Genome Analysis

• QC• Assembly/Quantification

– Reads Per Kilobase Exon per Million Mapped Reads (RPKM)

• Differential expression• Pathway/network functional analysis• Annotation

– Novel exons – novel splice junctions – novel genes

Page 17: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

16

Assembly/Quantification: RPKM

HMGP 7620: Advanced Genome Analysis

RPKM = C/LN

3.18

Analysis

HMGP 7620: Advanced Genome Analysis

• QC• Assembly/Quantification

– Reads Per Kilobase Exon per Million Mapped Reads (RPKM)

• Differential expression• Pathway/network functional analysis• Annotation

– Novel exons – novel splice junctions – novel genes

Page 18: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

17

Differential Expression

HMGP 7620: Advanced Genome Analysis

Hddc3

Analysis

HMGP 7620: Advanced Genome Analysis

• QC• Assembly/Quantification

– Reads Per Kilobase Exon per Million Mapped Reads (RPKM)

• Differential expression• Pathway/network functional analysis• Annotation

– Novel exons – novel splice junctions – novel genes

Page 19: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

18

Pathway/Network Functional Analysis

HMGP 7620: Advanced Genome AnalysisDarlington et al. (2013) Genes Brain Behav 12:263-274Bennett et al. (2015) Alcohol Clin Exp Res NIHMS658870

Weighted Gene Co-expression Network Analysis (WGCNA)

Gene Ontology (GO) Cluster Analysis

Analysis

HMGP 7620: Advanced Genome Analysis

• QC• Assembly/Quantification

– Reads Per Kilobase Exon per Million Mapped Reads (RPKM)

• Differential expression• Pathway/network functional analysis• Annotation

– Novel exons – novel splice junctions – novel genes

Page 20: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

19

Annotation

HMGP 7620: Advanced Genome Analysis

Exome Sequencing

HMGP 7620: Advanced Genome Analysis

• Why– Identification of variants (SNPs, CNVs, small InDels)– Linkage/association/pedigree studies– Clinical diagnostics

• How– Isolate, fragment DNA– Build library– Exome enrichment– Sequence– Align to reference genome– Variant calling– Higher order genetic analysis

Page 21: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

20

Exome Enrichment

HMGP 7620: Advanced Genome Analysis www.genomics.agilent.com

Variant Calling

HMGP 7620: Advanced Genome Analysis Altmann et al. (2012) Hum Genetics 131:1541-1554

RR1

Page 22: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

Slide 40

RR1 Examples of intragenic deletion and duplication detected by WES and confirmed by exome aCGH. Each bar in the graphs (a)–(c) and (e)–(g) represents an exon. (a–c) WES data from a family trio in which the (a) proband has inherited a whole-gene duplication of KRT34 from the (b) father, whereas the (c) mother shows normal copy number at that gene. (e–g) WES data from a family trio in which the (e) proband has inherited a partial-gene heterozygous deletion in the SYCP2L gene from the (g) mother, whereas the (f) father shows normal copy number at those exons. Each dot in panels d and h represents an oligonucleotide probe in the gene of interest on the exome array, with a duplication shown by probes deviating to a positive log2 ratio (marked in red) and a deletion shown by probes deviating to a negative log2 ratio (marked in green). Panels d and h show confirmation of the KRT34 duplication and the SYCP2L deletion, respectively, by exome aCGH. aCGH, array comparative genomic hybridization; WES, whole-exome sequencing.Radcliffe, Richard, 2/1/2015

Page 23: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

21

Variant Calling: CNVs/Indels

HMGP 7620: Advanced Genome Analysis Retterer et al. (2014) Genetics Med doi:10.1038/gim.2014

Child

Father

Mother

RR2

Genetic Analysis: Mendelian Inheritance

HMGP 7620: Advanced Genome Analysis

Assumptions:• Only consider small indels and

SNPs• Causal variants are coding• Causal variants alter protein

sequence• Near complete penetrance

Rabbani et al. (2012) J Hum Genetics 57:621-632

Page 24: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

Slide 41

RR2 Examples of intragenic deletion and duplication detected by WES and confirmed by exome aCGH. Each bar in the graphs (a)–(c) and (e)–(g) represents an exon. (a–c) WES data from a family trio in which the (a) proband has inherited a whole-gene duplication of KRT34 from the (b) father, whereas the (c) mother shows normal copy number at that gene. (e–g) WES data from a family trio in which the (e) proband has inherited a partial-gene heterozygous deletion in the SYCP2L gene from the (g) mother, whereas the (f) father shows normal copy number at those exons. Each dot in panels d and h represents an oligonucleotide probe in the gene of interest on the exome array, with a duplication shown by probes deviating to a positive log2 ratio (marked in red) and a deletion shown by probes deviating to a negative log2 ratio (marked in green). Panels d and h show confirmation of the KRT34 duplication and the SYCP2L deletion, respectively, by exome aCGH. aCGH, array comparative genomic hybridization; WES, whole-exome sequencing.Radcliffe, Richard, 2/1/2015

Page 25: Quantitative RNA Sequencing (RNA-seq) and Exome … 1 HMGP 7620: Advanced Genome Analysis Quantitative RNA Sequencing (RNA-seq) and Exome Analysis Richard A. Radcliffe, Ph.D. Professor

2/3/2015

22

Genetic Analysis

HMGP 7620: Advanced Genome Analysis Ku et al. (2012) Ann Neurology 71:5-14

A Few ReferencesRNA-seq:• Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS, Morin RD, Corbett R, Tang MJ, Hou YC, Pugh TJ, Robertson G,

Chittaranjan S, Ally A, Asano JK, Chan SY, Li HI, McDonald H, Teague K, Zhao Y, Zeng T, Delaney A, Hirst M, Morin GB, Jones SJ, Tai IT, Marra MA (2010) Alternative expression analysis by RNA sequencing. Nat Methods 7:843-847.

• Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621-628.

• Munger SC, Raghupathy N, Choi K, Simons AK, Gatti DM, Hinerfeld DA, Svenson KL, Keller MP, Attie AD, Hibbs MA, Graber JH, Chesler EJ, Churchill GA (2014) RNA-Seq Alignment to Individualized Genomes Improves Transcript Abundance Estimates in Multiparent Populations. Genetics 198:59-73.

• Oshlack A, Robinson MD, Young MD (2010) From RNA-seq reads to differential expression results. Genome Biol 11:220.

• Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57-63.

Exome sequencing:• Altmann A, Weber P, Bader D, Preuß M, Binder E, Müller-Myhsok B (2012) A beginners guide to SNP calling from high-

throughput DNA-sequencing data. Hum Genet 131:1541-1554.

• Biesecker LG, Green RC (2014) Diagnostic clinical genome and exome sequencing. The New England Journal of Medicine370:2418-2425.

• Krumm N, Sudmant PH, Ko A, O'Roak BJ, Malig M, Coe BP, Quinlan AR, Nickerson DA, Eichler EE (2012) Copy number variation detection and genotyping from exome sequence data. Genome Res 22:1525-1532.

• Majewski J, Schwartzentruber J, Lalonde E, Montpetit A, Jabado N (2011) What can exome sequencing do for you? Journal of Medical Genetics 48:580-589.

• Singleton AB (2011) Exome sequencing: a transformative technology. The Lancet Neurology 10:942-946.