RNA-seq analysis - Bioinformaticsbioinformatics.org.au/ws/wp-content/uploads/sites/... ·...

Preview:

Citation preview

RNA-seqanalysis:Fromreadstodifferen6alexpression

AliciaOshlackMurdochChildrensResearchIns6tute

@AliciaOshlack

The current MCRI bioinformatics team •  Dr Nadia Davidson •  Dr Anthony Hawkins •  Dr Jovana Maksimovic •  Dr Katrina Bell •  Dr Belinda Phipson •  Dr Simon Sadedin •  Harriet Dashnow •  Luke Zappia •  Rebecca Evans

PhDposi6onsavailable

MCRIBioinforma6csCollabora6veanalysisandmethodsdevelopment

Transcriptomics Epigenomics Genomics ClinicalGenomics

•  RNA-seq•  miRNA-seq•  Differen6al

expression•  Cancer•  Non-model

organisms•  Alterna6ve

splicing•  Singlecell

RNA-seq

•  Methyla6onarrays

•  Bisulfitesequencing

•  Histonemodifica6ons

•  ChIP-seq

•  Exomeanalysis

•  WGS•  Targeted

capture•  SNPs•  CNV•  Tandem

repeats

•  Clinicalexomes

•  TumourRNA-seq

RNA-seqanalysis

Genesandtranscripts

Gene

transcript

SlidefromAliciaOshlack

RNA-seq

Pepkeetal,NatureMethods,2009

Twowaystolookatsequencingdata

Sequenceof(mapped)read• genomesequencing• variantdetec6on• genomicrearrangements• Bisulfite-seq(methyla6on)• RNAedi6ngetc.

Posi6onofmappedread• RNA-seq• ChIP-seq• MeDIP-seqforDNAmethyla6onetc.

7

TwowaystolookatRNA-seqdata

Sequenceof(mapped)read• Assembly• Determininggenes/transcripts

Posi6onofmappedread• Expressionlevels• Differen6alexpression

8

Rawdata(fastqfiles)

•  Shortsequencereads•  Qualityscores

@HWI-ST1148:308:C694RACXX:5:1101:1768:1990 1:N:0:CGTACG NTAGGCCTTGGCAGTTTTGGAGAATCACTGCTGCCAAAGAGTCTACTTGG + #0<FFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFIIIIIII @HWI-ST1148:308:C694RACXX:5:1101:3409:1990 1:N:0:CGTACG NAGTTACCCTAGGGATAACAGCGCAATCCTATTCTAGAGTCCATATCAAC + #000BFBFFFFFFF<BFFFFBBBBBFBBFF<<FBFFIBFFFBFFFIIBFF

50bpsequence

Differen6alexpression•  Whichgenesarechangingexpressionlevelbetweensamples?–  treatedvsuntreated,–  diseasevsnondisease,–  earlyvslate6mepoints,–  cellsinoneenvironmentvsanotherenvironment,–  etc,…

•  Thousandsofgenesbutonlyafewsamples–  Sophis6catedsta6s6calmethodsarerequiredforanalysis

RNA-seqanalysisstepsRawsequencereads

Mapontogenome

Summarizereadstotranscripts

Sta6s6caltes6ng:Determinedifferen6allyexpressedgenes

Systemsbiology

DenovoassemblyAnnota6onbased Genomeguidedassembly

Whichtranscriptome?

RNA-seqanalysisstepsRawsequencereads

Mapontogenome

Summarizereadstotranscripts

Sta6s6caltes6ng:Determinedifferen6allyexpressedgenes

Systemsbiology

DenovoassemblyAnnota6onbased Genomeguidedassembly

Whichtranscriptome?

Mappingreadstothegenome•  Wheredothemillionsofshortsequencescomefrominthegenome?

•  Sequencingtranscripts,notthegenome

CDS CDS CDS CDS

CDS CDS CDS CDS

Gene

transcript

Lotsofgoodalignershandlesplicejunc6onswell(e.g.TopHat,Star)

Exon1 Exon2

RNA-seqdatainIGV

Whichtranscriptometouse?

Rawsequencereads

Mapontogenome

Summarizereadstotranscripts

Sta6s6caltes6ng:Determinedifferen6allyexpressedgenes

Systemsbiology

RNA-seqanalysissteps

DenovoassemblyAnnota6onbased GenomeguidedassemblyWhichtranscriptome?

Op6on1

•  Useannota6on(knowngenes)– Workswellforwellstudiedorganisms(human,mouse,arabidopsis,drosophila,…)

– onlyasgoodasyourannota6on– Nonoveltranscriptsareanalysed

Op6on2:Genomeguidedtranscriptassembly

•  Usestheloca6onanddensityofreadsalongthegenometoassembletranscripts

•  E.g.Cufflinks

•  Can’tassembleacrossbreaksinthegenome– Cancer,poorgenomes

Op6on3:Denovotranscriptomeassembly

•  Assembletranscriptsfromthedatawithoutusingareferencegenome

•  “Harder”thangenomeassembly–  Ordersofmagnitudevaria6onincoverage–  Con6gsareshort–  Alterna6veisoforms/transcriptshaveoverlappingsequences–  *Very*computa6onallyintensive

•  Sokwareincludes –  Trinity–  Oases(velvet)–  TransAbyss–  …

Example:Annota6ngthechickenWchromosome

ZZ ZW

Male Female

Twohypothesesformechanismsofaviansexdetermina6on:1.  DominantovarydetermininggeneonW(cfmammals)2.  DosageofZ-linkedgenes

Thereisanannotatedchickengenome

•  ChickenWchromosomeispoorlyassembled•  AregenesonotherchromosomesreallyontheW,inpar6culartherandomchromosome?

Chromosome Assembled Size (Mb)

Size inc. random (Mb)

Estimated Size (Mb)

Estimated Genes

(Ensembl) Z 69 70 80 796

W 0.24 0.89 18-54 46

Un_random 56 - - 1287

Experimentaldesign

PCRSexing

+12hourBlastoderms

HandplateforPCRSexing

RNA

Stage26pairedgonads(day4.5)

12Female

RNA

16Femalegonads

PooledSamples

12Female

16Femalegonads

12Male

12Male

16Malegonads

16Malegonads

RNA-seq• IlluminaHiSeq2000• Paired-end100bp• 4lanes• >80millionreads/sample

Definingthetranscriptome

•  Annota6on~20,000genes•  Genomeguidedassembly(Cufflinks)~45,000genes

•  Denovotranscriptomeassembly~2.5milliontranscripts(Abysswithfiltering)!

Acombinedapproach•  Assemblecufflinkgenesusingtranscriptsfromourdenovoassembly

Annota6onofthechickenWcombinedallthreeapproaches

W/W_random ChromsomeUn_random ChromosomeAutosomes

Blastoderm CoverageGonads Coverage

Abyss TranscriptsCufflinks TranscriptsEnsembl Transcripts

1000 1500 2000 2500 3000 3500

base position

039

1

Coverage

Genome

EnsemblCufflinks

AbyssRASA1−W

1400 1600 1800 2000 2200

base position

012

6

Coverage

Genome

EnsemblCufflinks

AbyssST8SIA3−W

2000 2200 2400 2600 2800

base position

069

9

Coverage

Genome

EnsemblCufflinks

AbyssGOLPH3−W

0 500 1000 1500 2000 2500

base position

019

4

Coverage

Genome

EnsemblCufflinks

AbyssZSWIM6−W

0 200 400 600 800 1000 1200 1400

base position

082

Coverage

Genome

EnsemblCufflinks

AbyssNEDD4−like−W

FulllistofWgenes/transcriptsfordifferen6alexpressionAyersetal,2013

RNA-seqanalysisstepsRawsequencereads

Mapontogenome

Summarizereadstotranscripts

Sta6s6caltes6ng:Determinedifferen6allyexpressedgenes

Systemsbiology

DenovoassemblyAnnota6onbased Genomeguidedassembly

Whichtranscriptome?

Summariza6on

Takeyour“transcriptome”andaddupthereads

Coun6ngoverexonsvscoun6ngovergenes

Exon1 Exon2

Exon1=8readsExon2=10reads

Coun6ngoverwholegene(Exon1+Exon2)=15

Non-modelorganismsDenovotranscriptomeassembly

1.  Clusterassembledcon6gsinto“genes”(independentoftheassembler)

2.  Performreadcoun6ngpercluster

Corset Ourmethod

Davidson&Oshlack,GenomeBiology,2014

Summariza6onturnsmappedreadsintoatableofcounts

**veryhighdimensionaldata**

TagID A1 B1ENSG00000124208 478 4830

ENSG00000182463 27 48

ENSG00000125835 132 560ENSG00000125834 42 131ENSG00000197818 21 52ENSG00000125831 0 0ENSG00000215443 4 9ENSG00000222008 30 0ENSG00000101444 46 54ENSG00000101333 2256 2702

… …tensofthousandsmoretags…

RNA-seqanalysisstepsRawsequencereads

Mapontogenome

Summarizereadstotranscripts

Sta6s6caltes6ng:Determinedifferen6allyexpressedgenes

Systemsbiology

DenovoassemblyAnnota6onbased Genomeguidedassembly

Whichtranscriptome?

Summariza6onturnsmappedreadsintoatableofcounts

**veryhighdimensionaldata**

TagID A1 B1ENSG00000124208 478 4830

ENSG00000182463 27 48

ENSG00000125835 132 560ENSG00000125834 320 131ENSG00000197818 21 52ENSG00000125831 0 0ENSG00000215443 4 9ENSG00000222008 30 0ENSG00000101444 46 54ENSG00000101333 2256 2702

… …tensofthousandsmoretags…

Whichgenesaredifferen6allyexpressed?

•  LikeanyotherexperimentRNA-seqneedstobereplicatedsowecangetameasureofvariance

Replica6onisessen6al

TagID A1 A2 A3 B1 B2 B2ENSG00000124208 478 619 559 4830 7165 6651

ENSG00000182463 27 20 18 48 55 56

ENSG00000125835 132 290 450 560 408 266ENSG00000125834 320 462 355 131 99 91ENSG00000197818 21 29 23 52 44 65ENSG00000125831 0 0 0 0 0 0ENSG00000215443 4 4 4 9 7 3ENSG00000222008 30 23 23 0 0 0ENSG00000101444 46 63 55 54 53 52ENSG00000101333 2256 2793 2931 2702 2976 2226

… …tensofthousandsmoretags…

**veryhighdimensionaldata**

Qualitycontrol–checkyourdata!

DatafromAndrewElefanty

Sortedcellpopula6ons

Dataexplora6on

Sta6s6caltestsfordifferen6alexpression

TagID A1 A2 A3 B1 B2 B2ENSG00000124208 478 619 559 4830 7165 6651

ENSG00000182463 27 20 18 48 55 56

ENSG00000125835 132 290 450 560 408 266ENSG00000125834 320 462 355 131 99 91ENSG00000197818 21 29 23 52 44 65ENSG00000125831 0 0 0 0 0 0ENSG00000215443 4 4 4 9 7 3ENSG00000222008 30 23 23 0 0 0ENSG00000101444 46 63 55 54 53 52ENSG00000101333 2256 2793 2931 2702 2976 2226

… …tensofthousandsmoretags…

**veryhighdimensionaldata**

•  Normalisa6on–  Librarysize(sequencingdepth)

•  IncludeasoffsetinGLM•  Scalingnormalisa6on(sizefactors)

–  Composi6onbias(TMM)–  Batcheffects(RUVSeq)

Thingstothinkaboutbeforesta6s6caltes6ng

.

(a)

log2(Kidney1 NK1) − log2(Kidney2 NK2)

Den

sity

-6 -4 -2 0 2 4 6

0.0

0.4

0.8

log2(Liver NL) - log2(Kidney NK)

Den

sity

-6 -4 -2 0 2 4 6

0.0

0.2

0.4(b)

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

● ●

●●

●●●

●●

● ●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●●

●●

●● ●

●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●● ●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ● ●

● ●

● ●●

● ● ●

●●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●● ●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●●●

● ●

●●

●●

●● ●●

● ●

●●

●●

●●●

●●●

● ●

●●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

●● ●●

● ●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●●

●● ●●● ● ●

●●●

●●

●● ●

● ●

● ●

●●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

●●

●● ● ●

●●

●●

●●

●●

●●

●●

●●

● ●● ●

● ●●

● ●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

●● ●

●●●●

●●●

●●

●●

●●

● ●

●●

●●

● ●●

● ●

●● ●●

●●

●●

●●

● ●

●●

●●

● ●

● ● ●

●●

●●

● ●●

●●

●●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●●●

●●

● ●

●●

●●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

● ●

●● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●● ●

●●

●●

● ●

●●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●● ●

● ●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●● ●●

●●

●●

●●

●●

●●

● ● ●●

●●

●●

●●

● ● ●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●●●

●●

●● ●

●●

●●

●● ●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

● ●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●● ●●

●●

●●

●● ●

●● ●

● ●

●●

●●

●●

● ●

●● ●

●●

● ●●

● ●

●●

●●

● ● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●●

●●

● ●

●●● ●● ●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●

●●●

● ● ●

●●

●●

●●

●● ●

●●●

● ●

●●●

●●

● ●●

●●

●●

● ●●

●●●●●

●●●

●●

●●

●●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●●

● ●

● ●

●●●

● ●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●● ●

●●

●●

●●

● ●● ●

● ●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

● ● ●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●●

● ●

● ●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●● ●

● ●

●●

●●

●●

● ●

●●

●●● ●

●●

● ●

●●

●●

●●

●● ●●●

●●

●●

●●

● ●

●●

●● ●

● ●●

●●

●●

●● ●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●● ●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

●●

● ●

●● ●●

●●

● ●

● ●

● ●

●●●

●●●

● ●

●●

● ●

●●

● ●

●●

●●

●● ●

● ●

●●

●●● ● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●● ● ●●●

●●

● ●●

●●●

● ● ●●

●●

●●

●●●●

●●

●● ●

●●

●●

● ●●

●●

●●●●

● ●●●

●●

●●

●●

●● ●

● ●●

●●

●●

●●

●●

●●●

● ● ●

●●

●●

● ●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

● ●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●● ●●● ●

● ●

● ●

●●●

● ●

●●

●●

●●

● ●●●

●●●●

●● ●

● ● ●

●● ●

●●

●● ●

●●

●●

●●

●●

●●●●

● ●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●●

● ● ●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●● ●

●●

●●

● ●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●●

●●

● ●

●●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

● ●

●●●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●●

●●

●●

●●

●●

● ●●

● ●

●●

●●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●● ●

● ●

●●

●● ●

●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●

●● ●

●●

●●

● ●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●●

●● ●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●●● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●● ●

● ●

● ●

●● ●●

●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●

●●

●●

● ●●

● ●

● ●●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●● ●

●●

●●

● ●●●

●●

● ●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●●

●● ●● ●

● ●●

●●

●● ●● ●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●●

●● ●

●●

● ●

●●

● ●

● ●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

● ●

●●

●● ●

●●

● ● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●

●●●

●●

●● ●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●● ●

● ●

● ●

●●

●●

●●

●●

●●●

●●●

● ●●●● ●●

●●

●●

●●

●●● ● ●

● ●

●●●

●●

● ●

●● ●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

● ● ●

●●

●●

● ●

●● ●

● ●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

● ●

●●●

●●●●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

● ●

●● ●●

●●

●●

● ●

● ●

●●

●●

● ●

●●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●

●●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●●●

●●●

●●

●● ●

● ●

● ● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●●●

●●●

● ●●

● ●

●●

●● ●●

●●

● ●●

●● ●

●●●

●●

●●●

●●●

● ●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●● ●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●●●●

●●

●●

●●

●●●

●●

●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●● ●

●●

● ●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

● ●

●● ● ●

●●

●●●

●●

● ●●

●●

●●

●●

● ●●●

●●

● ●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●● ● ●

● ●●

● ●

●●

●●

●●

● ●

● ●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●● ●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

● ●● ●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●● ●●

●●

● ●

● ●● ● ●●

● ●

●●

●●

●●

●●●●

●●

●●

●● ●

● ●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●● ●●

● ●

●● ●

●●●

● ●

● ●

●●

●●

●●

●● ●

● ● ●

●●

● ●

●●

●●

●●

● ● ●●

●●

●●●

●●

● ●

●●

●●

● ● ●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●● ● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●●●

●● ●

●● ●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

●● ●

●●

● ●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

● ●

●●

●●

●●●

●●

●●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

● ● ●

●●

-20 -15 -10

-50

5

A = log2( Liver NL Kidney NK)

M=

log 2

(Liv

erN

L)-

log 2

(Kid

ney

NK)

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●

● ●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●●

●●

● ●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Housekeeping genesUnique to a sample

(c)

Sta6s6caltes6ngforDE

•  ForEACHGENE,isthemeanexpressionlevelforthegeneunderonecondi6onsignificantlydifferentfromthemeanexpressionlevelunderadifferentcondi6on?

39

TagID A1 A2 A3 B1 B2 B2ENSG00000124208 478 619 559 4830 7165 6651

ENSG00000182463 27 20 18 48 55 56

ENSG00000125835 132 290 450 560 408 266ENSG00000125834 320 462 355 131 99 91ENSG00000197818 21 29 23 52 44 65

… …tensofthousandsmoretags…

Samplingreadsfrompopula6onofDNAfragmentis(approx.)mul2nomial

•  Take sample •  Sequence DNA

Library 1

Feature 1 λ1 Feature 2 λ2 Feature 3 λ3 Feature 4 λ4 Feature 5 λ5 Feature 6 λ6 …

DNA population

For a single gene, it’s a coin toss, i.e. Binomial

Library 1

… feature i λi …

Yi ~ Binomial( M, λi ) Yi - observed number of reads for feature i M - total number of sequences λi - proportion Large M, small λi à approximated well by Poisson( μi = M�λi )

42

ASmallRNA-SeqExperiment(TechReps)

Condi6onA Condi6onB

λg1 λg2 λg3 λg4

yg1 yg2 yg3 yg4

Genesg=1,…,30k

M1

M2M3 M4

E(ygi)=MiλgiReadsMi≈20million

TrueTechnicalRepsShowPoissonVaria6onforEachGene

Data:Marionietal.,GenomeRes,2008 43

DavisMcCarthy

44

ASmallRNA-SeqExperiment(BiologicalReps)

Condi6onA Condi6onB

λg1 λg2 λg3 λg4

yg1 yg2 yg3 yg4

Genesg=1,…,30k

M1

M2M3 M4

E(ygi)=MiλgiReadsMi≈20million

DavisMcCarthy

BiologicalReplicateDatashowsQuadra6cMean-VarianceRela6onship

(developmentcycleofslimemould,2samplesathr00,&2athr04)

binnedvariance,samplevariance

Data:Parikhetal,GenomeBiology,2010

45DavisMcCarthy

Manydifferentsta6s6calmethods•  Modelthecountsdirectly

– Nega6vebinomialmodellingisbestbecauseitcapturesbiologicalaswellastechnicalvariability

– MostpopularpackagesinR•  edgeR•  DESeq/DESeq2•  Lotsofothersexist(baySeq,NBPSeq,…)

•  Transformthecountsandusednormalbasedmethods– Voom+limma

Sta6s6caltes6nggiveseachgeneap-valueforevidenceofDE

TagID P-valueENSG00000124208 0.0002

ENSG00000182463 0.12

ENSG00000125835 0.34

ENSG00000125834 0.08

ENSG00000197818 0.64

ENSG00000125831 1

ENSG00000215443 1

ENSG00000222008 0.06

ENSG00000101444 0.73

ENSG00000101333 0.22

… …tensofthousandsmoretags…

TagID A1 A2 A3 B1 B2 B2ENSG00000124208 478 619 559 4830 7165 6651ENSG00000182463 27 20 18 48 55 36ENSG00000125835 132 290 450 560 408 266ENSG00000125834 320 462 355 131 99 91ENSG00000197818 21 29 23 52 44 65ENSG00000125831 0 0 0 0 0 0ENSG00000215443 4 4 4 9 7 3ENSG00000222008 30 23 23 0 0 0ENSG00000101444 46 63 55 54 53 52ENSG00000101333 2256 2793 2931 2702 2976 2226

… …tensofthousandsmoretags…

RNA-seqanalysisstepsRawsequencereads

Mapontogenome

Summarizereadstotranscripts

Sta6s6caltes6ng:Determinedifferen6allyexpressedgenes

Systemsbiology

DenovoassemblyAnnota6onbased Genomeguidedassembly

Whichtranscriptome?

RNA-seqanalysisstepsRawsequencereads

Mapontogenome

Summarizereadstotranscripts

Sta6s6caltes6ng:Determinedifferen6allyexpressedgenes

Systemsbiology

DenovoassemblyAnnota6onbased Genomeguidedassembly

Whichtranscriptome?

Learnsomething!

MovingbeyondDifferen6alExpression

Findingfusiongenesincancer

•  Genomicsbreaksandrearrangementscanleadtogenefusions•  Somefusionsareoncogenic,otherarejustbystanders.•  Iden6fyingoncogenicgenefusionsisbeneficialfor

•  ClinicalTreatmentse.g.•  BCR-ABLfusionin95%ofCMLsImaAnib

•  Transcriptomesofanyorganismscanbesequencedandanalysed

Non-modelorganisms

Blackwidowvenomtranscriptome

Desertpoplartranscriptomeinsalinecondi6ons

Exploringthehumantranscriptome

Mostgeneshavemorethanoneexpressedisoform Mostgenehaveonemajorisoform

Unknownandrareeventsinthehumantranscriptome

•  ENCODEsays–75%ofthegenomeistranscribed

Non-canonicalRNAstructuresCircularRNAs

Intronreten6on

Memczaketal.Nature,2013

Wongetal.Cell,2013

Thefuture

•  Analysismethodologyiscri6calands6lldevelopingforspecificpurposes

•  Effec6velydealingwithnogenomeorpoorqualitygenomesorcancer

•  Integra6ngRNA-seqdatawithothergenomicsdata

•  Opportuni6estousethisdatainnewandimagina6veways–requiresnewanalysismethodology

Recommended