30
RNA-Seq as a Discovery Tool Julia Salzman

RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Embed Size (px)

Citation preview

Page 1: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

RNA-Seq as a Discovery Tool

Julia Salzman

Page 2: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Deciphering the Genome

Page 3: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Salzman, Gawad, WangLacayo, Brown, 2012

Power of RNA-Seq: Quantification and Discovery

• RNA Isoform specific gene expression

• Gene fusions

• Overlooked RNA structural variants

Page 4: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Paired-end RNA-Seq

Matched sequences are obtained for each library molecule

CTTC…..GAAG GGAC…..GCCT

Data: millions of 70-150+ bp A/C/G/Tsequences

Page 5: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

• Part 1: Isoform Specific Expression

Page 6: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Example: Paired-end Data Aligned

Some reads are informative about isoform-specific expression

Page 7: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Paired-end RNA-Seq for RNA Isoform Specific Gene Expression

• Since the size distribution of library molecules is known, inferred insert lengths can be used to increase statistical power and inference

Rnpep

Goal: estimate the expression of each isoform?

Nontrivial : we only observe fragments of sequences

Exon 4 Exon 1

Page 8: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

100 200 300Base pairs

Sequenced molecule

length

Insert lengths of entire library (pooled) can be calculated and used to precisely estimate the distribution of sizes of cDNA in the library:

Insert Length Distributions

Page 9: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Paired-end RNA-Seq Model• Compute genome-wide insert length distribution

Salzman, Jiang, Wong 2011

•Mapped to Isoform 1 length 150•Mapped to Isoform 2 length 90

100 200 300Base pairs

Sequenced molecule

length

Page 10: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Using PE for quantification is statistically more powerful

• PE model is a statistical improvement over naïve models and has optimal information reduction

• “Information” gain using PE Sequencing

• Overall, using “mate pair” information, more power, but sometimes experimental artifacts can effect results

Page 11: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Paired-end Size Distributions are Foundation for Tophat and other

PE-RNA Seq AlgorithmsSummary and Problems:• rely on a reference• assume uniformity of size distributions in library• over look biases’

Rep.1

Rep.2

Page 12: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

• Part 2: Gene Fusions

Page 13: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Recurrent Gene Fusions in Cancer

A handful of recurrent fusions in solid tumors• PAX8 -PPARγ fusion (thyroid cancer)• EML4-ALK fusion (non small cell lung cancer)• TMPRSS2-ERG family fusion (prostate cancer)

More to be learned by unbiased study of RNA

Not Genome-wide

Page 14: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Fusion Discovery

• 2 flavors– Totally “de novo” discovery• Search for any RNA fragments out of order with respect

to the reference genome– not necessarily coinciding with exon boundaries• Noisy

– Discovery with a reference database• Discover fusions at annotated exon boundaries (protein

coding) and better statistical checks• Misses some fusions

Page 15: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Reference Approach

• Search for gene fusions with exon A in gene 1 spliced to exon B of gene 2

Exon A Exon B

Page 16: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Algorithm (with respect to reference)

• Remove all PE reads consistent with the reference

• Identify gene pairs PE reads where (read1, read2) map to (gene1, gene2)

• Find PE reads of the form: (gene A, gene A-B junction)

Exon A Exon B

Page 17: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Paired-End RNA-Seq for Gene Fusions in Ovarian Tumors

• Paired-end sequencing of poly-A selected RNA from 12 late stage tumors– genome wide search

• Top hit of our algorithm : ESRRA-C11orf20

ESRRA

Fusion

C11orf20

• Isoform-specific estimation: ESRRA and the fusion are expressed at roughly equal magnitude (Salzman, Jiang, Wong)

Salzman et al, 2011

Page 18: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

• Part 3: Exploratory Analysis of RNA Rearrangements

Page 19: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Bioinformatic Analysis

• Thousands of exon scrambling events in RNA from human leukocytes and cancer samples

Wildtype genome: DNA

Canonical transcript

Inconsistent with the reference genome!

Page 20: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Potential Biological Mechanisms for RNA Rearrangements

DNA Rearrangement

RNA rearrangement

Trans-splicing

Template switching

PCR artifact

Page 21: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Analysis of Leukocyte Data• Exons in ‘scrambled’ (non-increasing) order with respect to

canonical exon order

• Thousands of genes with evidence of exon scrambling• Naïve estimate of fractional abundance of scrambled read rate: all read rate (per

transcript)

Page 22: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

100s of Transcripts with High Fractions of Scrambled Isoforms

Canonical Isoform

Scrambled Isoform

< 25%

> 75%

100sof

genes

100s of transcripts from B cells, stem cells and neutrophils have >50% copies from scrambled isoform

Page 23: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

What Models Can Explain Exon Scrambling in RNA?

Page 24: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Model 1 to Explain RNA Exon Scrambling

Page 25: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Model 1 Prediction

Can be made statistically precise

Model 1 is statistically inconsistent with vast majority of data

Page 26: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Alternative Model

Model and data are consistent

Page 27: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Mining RNA-Seq Data for Evidence Consistent with Circular RNA?

• In poly-A depleted samples, expect to see strong evidence of scrambled exons (circular RNA)

• In poly-A selected samples, expect to see little evidence of scrambled exons (circular RNA)

Page 28: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Poly-A Depleted Samples Enriched for Scrambled Exons

Align all reads to a custom database

Page 29: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

• RNA-Seq can be used for discovery

• Tophat and other fusion/splicing algorithms gives a broad picture

• May have significant noise

• Miss important features of RNA expression

Summary of RNA-Seq for NGS

Page 30: RNA-Seq as a Discovery Tool Julia Salzman. Deciphering the Genome

Currently, all published/downloadable algorithms

will miss identifying circular RNA!(feel free to contact me for the algorithm to identify circular RNA!)