23
Obstacles and challenges in the analysis of microRNA sequencing data (miRNA-Seq) David Humphreys Genomics core Dr Victor Chang AC 1936-1991, Pioneering Cardiothoracic Surgeon and Humanitarian

Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Obstacles and challenges in the analysis of microRNA sequencing data

(miRNA-Seq)

David Humphreys

Genomics core

Dr Victor Chang AC 1936-1991, Pioneering Cardiothoracic Surgeon and Humanitarian

Page 2: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

The ABCs about miRNAs (Annotation, Biogenesis, Curation)

www.mirbase.org• Mature fasta file• Stem loop fasta file• Gff (genome coordinate file)

Page 3: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

miRNA-Seq applications

Read length covers entire mature transcript

Discovery

- Novel miRNAs

- Isoforms

- Biogenesisiii ) non canonical processingiv) Strand selectionv) length/ non-template additions

Quantification

- Differentially expressed miRNAs

- Differential processing

Page 4: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Experimental design

• Sample selection• Species, replicates

• RNA extraction

• Library preparation

Page 5: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Kim et al., (2011)Molecular Cell 43, 1005-1014

Low confluence = 500,000 cellsHigh confluence = 800,000 cells

Cell number(L) = 200,000(H) = 800,000

RNA extraction

ColumnLiquid Bead

Prep time ++ ++++ +++

miRNA purification +++ ++++ ++++

Recovery ++++ +++ +++

Limitations/pitfalls Low input miRNA bias

Early protocols no miRNA ???

Kim et al., (2012)Molecular Cell 46, 893-895

NO change!!

Rati

o 1

41/2

00c

Down regulated miRNAs:

141, 29b , 21, 106b, 15a, 34a

• Most susceptible:

- Low GC content,

- 2ndary structure

• Small RNA ppt with longer RNA

Page 6: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

RNA quantification and integrity

Nano drop Qubit Agilent

seqanswers.com/forums/showthread.php?t=21280

WARNING!- Accuracy poor below 50ng/ul- Careful of concentrations > 1ug/ul

WARNING!- Known biases in quantifying

ssRNA < 50ng/ul

230 260 280

WARNING!- Quantification only accurate in

the defined range (read manual)

Assays specific for DNA/RNA Quantitate sizeCan detect salt & other contaminants

Absorb

ance

Page 7: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Library prep kit comparison

Sample prep

Adaptor ligation

RT(Reverse

Transcription)

PCR

miRNAP- -OH

miRNA

i) Hybridisation

ii) Ligation

iii) DenaturationSequential Ligation

miRNA miRNA

# Hafner et al., (2011) RNA 17(9), 1-16

# Sequence# Temperature# Incubation times

# PCR cycles … OK

# Input amount

# PH, buffers/salts/ATP

Page 8: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Summary

• Sample selection• Species, replicates

• RNA extraction• Use same method for all preps

• Quantify (2 methods)

• Assess integrity

• Library preparation• Consistent input

• Consistent ligation conditions (time/temperature)

• Use same kits

Page 9: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

miRNA-Seq Bioinformatics

(Trim - ALIGN – Report)

Page 10: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Anscombe’s Quartet

• Maths is a tool for analysis.• You can blindly ignore biases and errors in data sets.

- mean, stdev, variance, correlation are the same!

Image from wikipediahttps://en.wikipedia.org/wiki/Anscombe%27s_quartet

Page 11: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Challenges

Multimappers

Mismatches

AlignersSharing data

• Length of a sequence read covers entire microRNA transcript

• Upstream bias will have impacts on analysis

Sample preparation

SequencingLibrary preparation

Clonal amplification

Bioinformatics

Normalisation

Differential expression

Feature counting

Visualisation

Page 12: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Choice of reference?

Genome miRBase stem-loop

Better discovery

Possible incorrect/loss of mappings Forced (biased) mapping

Faster, less complicated.Slower, computationally restrictive?

Limited discovery

Page 13: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

miR-486

Multi-mappers (1)

• miRBase does NOT ACCURATELY report number of times a read aligns to genome

• Multi-loci miRBase entries provide some information

0

40

80

120

160

200

0 20 40 60 80 > 100

Number of mapped locations

Num

ber

miR

s

Human multi-mappers #

miR-486

# Human miRbase entries mapped using bowtie aligner allowing all multi-mappers

Example

Page 14: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Multi-mappers (2)

• Multi-mapping rate increases as read length decreases.

• What should the minimum length miRNA read?

• Shortest length in miRbase is 17nt !

miR-133 family

miR-133a-1-3p uuugguccccuucaaccagcug

miR-133a-1-3p uuugguccccuucaaccagcug

miR-133b-1-3p uuugguccccuucaaccagcua

• Where do you assign multi-loci counts?

- Assign to each position?

- Assign fraction to each position?

- Intelligently assign to a position?

- Ignore?

miR-133a

miR-133b

Page 15: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Mismatches

• Sequencing Variantsi) Error in library prep

ii) Variants in reference genome

iii) Sequencer

• RNA editing

Type Enzyme Comment

A to I (G) ADAR Predominantly on pre-miRs

C to T Apobec Not identified yet?

Chawla et al., (2014) Nucleic Acids Research, 42 (8): 5245–5255Tomaselli et al., (2013) Int. J. Mol. Sci. 14, 22796-22816

Ohanian et al. (2013) BMC Genetics, 14:18

Page 16: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Aligners

• (Too) Many choices…

• Each aligner has a wide array of options with DIFFERENT default settings.

• Bowtie aligner provides error rate and multi-mapping control :

bowtie -p 4 -n 1 -l 21 --nomaqround -k 10 --best --strata --chunkmbs 256

Report up to 10 multi-mappers

Allow 1 mismatch in a length of 21nt

Fastq calibration dataset:

hsa-let-7f-5p_M_chr9_94176353_94176374_+#chrX_53557246_53557267_- 0 chr9 94176353

255 22M * 0 0 TGAGGTAGTAGATTGTATAGTT

• Available for ALL species present in miRBase, features include:

i) Each header defines miRBase mapping location

ii) Contains all miRbase entries with all single nucleotide mismatches

miRNA ID Mapping location #1 Mapping location #2

Page 17: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Non template additions (NTA)

i) Adenylation

ii) Uridylation

Koppers-Lalic et al., (2014), Cell Reports 8, 1649–1658

DETECTION METHODS:

• Aligners tend to softclip 3’ mismatches!!

• Remove adaptor- Hard trim (18nt)- Extend alignment. - Look for mismatch clusters at end of read.

<miRNA seq> + (A)n

<miRNA seq> + (T)n

Page 18: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Assigning miRNA counts

Mature miRNA analysis

i) 5’ isomirsii) 3’ isomirsiii) Non canonicaliv) Arm switchingv) Lengthvi) Editing

Cistronic Analysis(i) (ii)

Humphreys et al., 2013, NAR

Page 19: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

miRspring

• Small (<2MB) HTML document that replicates the miRNA aligned sequencing data.

• Needs NO internet connectivity.

• Provides visualization of sequence data + research tools == complete transparency.

http://miRspring.victorchang.edu.au

Humphreys D.T., and Suter C.M. Nucleic Acids Research 2013.

Page 20: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Cummulative distribution of miRNA reads

Sampling bias!

TissueAtlas

HeartKidneyLiverLungOvarySpleenTestes

ThymusBrain

Placenta

AGO IP

THP-1

ENCODE

HeLa S3A549

Ag04450Bj

Gm1287H1hescHepG2HuvecK562MCF7NheK

Sknshra• 73 miRspring documents

• 895 million sequence tags

• < 55 megabytes of disk space

In most cell lines and tissues the most

abundant miRNA should comprise < 35% of all

aligned miRNA sequences

OK ☺

Page 21: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Top 100 miRNAs typically:- 22nt long- Good correlation with miRBase

Page 22: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Conclusions

• Many challenges in miRNA-seq analysis

• Multi-mappers

• Mismatches

• Best practises…. be methodical

• Know the question you wish to address

• Know your species (reference/miRbase)

• Know your aligner

• Test your pipeline!

• Know what you are missing

• Quality control metrics/ visualisation

Page 23: Obstacles and challenges in the analysis of microRNA ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/2015... · Obstacles and challenges in the analysis of microRNA

Joshua Ho

Peter Szot

Catherine Suter

Diane Fatkin

Thomas Priess

St Vincent’s Hospital

Chris Hayward

Kavitha

Andrew Jabbour

If you would like a miRBase test data set for any species/reference combination

please don’t hesistate to contact me.

[email protected]

miRspring.victorchang.edu.au

- Fastq synthetic data sets

- Intelligently assign multi-mappers

- R objects