Processing of miRNA samples and primary data analysis

S

Processing of miRNA samples and primary data

analysis

Preparing the starting material

Initial evaluation of small RNA sample on

Bioanalyzer Bioanalyzer small RNA chip

Mature miRNAs are 16-29 bases (usually 22-23 bases)

Library construction

Size selection for miRNA inserts

(PAGE gel, cut & purify)

80

60

PCR

135 120

Sequence on SOLiD

The size-selected, bar-coded libraries are sequenced on the SOLiD 5500.

Reads are from single end, 50 bp.

Target Read Counts for miRNA

The vast majority of miRNA-seq reads do map successfully to miRNA (~90%)

Target read counts will be a function of how well resolved low abundance miRNA need to be resolved

Large shift or shifts in abundant miRNAs do not necessitate many reads. We aimed for about 10 million reads per condition, which

was achievable for 9 samples on one multiplexed lane

Only a few miRNAs tend to dominate the population

# miRNAs

Cum

ula

tive %

of

Read

s

80% of reads from 30 miRNA; 90% from 54

~340 miRNAs were described by populations of 1000+ reads across conditions in our experiment

Treating Raw miRNA Data

Due to the short length of inserts, trimming of adapter sequence is required.

Due to a high level of redundancy, it’s often advisable to collapse identical reads to speed alignment. Unique sequences align only once rather than

aligning the same sequence thousands of times. Retain count information for quantitation following

alignment.

Aligning miRNA reads

Alignment is often performed in two stages 1st against a prepared reference containing ONLY known

miRNA sequences for the appropriate organism (miRBase or elsewhere).

2nd against the genome for identification of novel small RNA.

Any typical aligner works well for this purpose Novocraft, Bowtie(1), BWA, etc

Other packages exist that ease this process and identification of novel miRNA such as miRanalyzer.

miRanalyzer Available via command-line or by a webapp (common organisms).

http://bioinfo5.ugr.es/miRanalyzer/miRanalyzer.php

Novel miRNA and Quantitation

Novel identified sequences need to be evaluated for the possibility of forming hairpin structures miRanalyzer does this already, scoring novel alignment regions

for the possibility of forming miRNAs

Read count tables are produced for further analysis and comparison Reads per miRNA

Novel miRNA are only really comparable between experiments in which the same species are observed and are typically kept separately

Comparison Between Conditions

Normal RNAseq tools for identifying differential expression from quantitated data tables is the preferred method. DESeq, edgeR, baySeq, limma, etc

DESeq was utilized on count tables produced from miRanalyzer (and is also a part of the webapp package).

Triplicates from three experimental conditions were compared pairwise for differential expression of miRNA. p-values for exact test of change between conditions are

generated padj values result from Benjamini-Hochberg multiple testing to

determine a FDR (cutoff of 0.1 is typically applied here). Output varies depending on tool used.

Additional tasks

Target Database/Prediction mining of differentially expressed miRNAs miRbase, miRanda, TarBase (experimental

observations), etc

Validation of DE of miRNA and targets

Enrichment analysis

Documents

Processing of miRNA samples and primary data analysis