28
RNA-sequencing Next Generation sequencing analysis 2016 Anne-Mette Bjerregaard Center for biological sequence analysis (CBS)

RNAseq NGS analysis - DTU Bioinformatics€¦ · 2 DTU Sytems Biology, Technical University of Denmark Presentation name 17/04/2008 Terms and definitions TRANSCRIPTOME The full set

  • Upload
    vanthu

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

RNA-sequencingNext Generation sequencing analysis 2016

Anne-Mette BjerregaardCenter for biological sequence analysis (CBS)

17/04/2008Presentation name2 DTU Sytems Biology, Technical University of Denmark

Terms and definitions

TRANSCRIPTOME

The full set of RNA transcripts and their associated abundance in a sample

RNA-seq

High-throughput sequencing technology used for probing the transcriptome of a sample

EXPRESSION

17/04/2008Presentation name3 DTU Sytems Biology, Technical University of Denmark

Data generation

17/04/2008Presentation name4 DTU Sytems Biology, Technical University of Denmark

Library preparationDiffers primarily from the standard sequencing protocol by an additional reverse transcription step (RNA->cDNA)

RNA-seq Protocol

Martin et al., Nature Genetics Review (2011)

Poly-A selection

17/04/2008Presentation name5 DTU Sytems Biology, Technical University of Denmark

RNA-seq: Strand-specific library preparationThe standard library preparation protocol do not preserve information about which strand was originally transcribed

Strand-specific sequencing (e.g. dUTP method)

Martin et al., Nature Genetics Review (2011)

Levin et al., Nature Methods (2010)

17/04/2008Presentation name6 DTU Sytems Biology, Technical University of Denmark

More terms

Gene

Transcripts

Wiki: https://en.wikipedia.org/wiki/Alternative_splicing

17/04/2008Presentation name7 DTU Sytems Biology, Technical University of Denmark

Data Analysis

17/04/2008Presentation name8 DTU Sytems Biology, Technical University of Denmark

Data analysis

Martin et al., Nature Genetics Review (2011)

FastQC

CutAdapt

Trimming

17/04/2008Presentation name9 DTU Sytems Biology, Technical University of Denmark

Assemble into transcripts

Reads Transcripts

17/04/2008Presentation name10 DTU Sytems Biology, Technical University of Denmark

Transcriptome assembly strategies

• Reference-based

• De novo

• Combined

• Pseudoalignment

17/04/2008Presentation name11 DTU Sytems Biology, Technical University of Denmark

Reference-based assembly

Align reads

Traverse graph

Graph

Assembl

17/04/2008Presentation name12 DTU Sytems Biology, Technical University of Denmark

Alignment tools

N. Bray et al., Nature Biotechnology (2016)

• NGS common alignment program: – BWA – Bowtie (Bowtie2) – Novoalign

• Take into account splice-junction – Tophat / Cufflinks

17/04/2008Presentation name13 DTU Sytems Biology, Technical University of Denmark

De novo assembly

Kmers

De Bruijn graph

Collapse

Traverse graph

Assemble

17/04/2008Presentation name14 DTU Sytems Biology, Technical University of Denmark

De novo assembly tools

N. Bray et al., Nature Biotechnology (2016)

• Velvet – Genomic and transcriptomic

• Trinity – Transcriptomic

• Cufflinks – Transcriptominc, reassemble pre-aligned transcripts to find

alternative splicing based on differential expression

17/04/2008Presentation name15 DTU Sytems Biology, Technical University of Denmark

Combined assembly

17/04/2008Presentation name16 DTU Sytems Biology, Technical University of Denmark

Pseudoalignment - Kallisto

N. Bray et al., Nature Biotechnology (2016)

17/04/2008Presentation name17 DTU Sytems Biology, Technical University of Denmark

Advantages and disadvantages

Novel transcripts

Existing reference Fast

Trans-splicedgenes

Reference-based ✗ ✓ ✓ ✗

De Novo ✓ ✗ ✗ ✓

Pseudo ✗ ✓ ✓ ✗

17/04/2008Presentation name18 DTU Sytems Biology, Technical University of Denmark

Expression analysis

17/04/2008Presentation name19 DTU Sytems Biology, Technical University of Denmark

Expression analysis

Normalized expressionTranscripts

17/04/2008Presentation name20 DTU Sytems Biology, Technical University of Denmark

Within sample normalizationCompare expression levels of different transcripts / genes within the same sample.

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Gene 1

Gene 2

Gene 3

17/04/2008Presentation name21 DTU Sytems Biology, Technical University of Denmark

Within sample normalization• RPKM (Reads Per Kilobase Million)

– Single end reads • FPKM (Fragments Per Kilobase Million)

– Paired end reads

• TPM (Transcripts Per Million)

1. Sequencing depth (million)

2. Transcript length (kilobase)

1. Transcript length

2. Sequencing depth

1015 8 10 1010

RPKM / FPKM TPM

17/04/2008Presentation name22 DTU Sytems Biology, Technical University of Denmark

Between sample normalization (BSN)Compare expression levels of the same transcript between samples

0

1

2

3

4

5

6

Patient 1 before

Patient 1 treatet

Patient 2 before

Patient 2 treatet

Gene A

Gene B

Gene C

17/04/2008Presentation name23 DTU Sytems Biology, Technical University of Denmark

HTSeq-count

Simply counts the number of reads that map to each feature

• Feature: Can be a gene or transcript annotation

• Overlapping features: Union or intersection

17/04/2008Presentation name24 DTU Sytems Biology, Technical University of Denmark

Statistics

http://www.ats.ucla.edu/stat/stata/seminars/count_presentation/count.htm

Poisson and neg. binomial parameter

Additional negative binomial parameter. when overdispersion = 0 neg. binom = Poisson

17/04/2008Presentation name25 DTU Sytems Biology, Technical University of Denmark

DeSeq2Differential gene expression analysis based on the negative binomial distribution

• Input: Read count tables (HTSeq)• Output: Table containing statistics for whether a gene is differential

expressed between two conditions

17/04/2008Presentation name26 DTU Sytems Biology, Technical University of Denmark

Thank you!

17/04/2008Presentation name27 DTU Sytems Biology, Technical University of Denmark

Within sample normalization – more details?1. http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/

2. https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/

17/04/2008Presentation name28 DTU Sytems Biology, Technical University of Denmark

Between sample normalization – more details?

https://haroldpimentel.wordpress.com/2014/12/08/in-rna-seq-2-2-between-sample-normalization/