46
Wet-lab Considerations for Illumina data analysis Based on a presentation by Henriette O’Geen Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Bioinformatics workshop Sept 2014

  • Upload
    lutzfr

  • View
    699

  • Download
    2

Embed Size (px)

Citation preview

Wet-lab Considerations for Illumina data analysis

Based on a presentation by Henriette O’Geen

Lutz Froenicke

DNA Technologies and Expression AnalysisCores

UCD Genome Center

RNA-seqGene Expression

Small RNA

ChIP-SEQ

Genotyping

De novo genomeSequencing

DNA Methylation

Metagenomics

Exome Sequencing

Splice IsoformAbundance

Genome Resequencing

3D Organization

SNPs, IndelsCNVs

Rearrangements

Illumina Workflow

Library

Preparation

Cluster

FormationSequencing Computer

Analysis

DNA

(0.1-1.0 ug)

Single molecule array

Library

preparation Cluster generation5’

5’3’

G

T

C

A

G

T

C

A

G

T

C

A

C

A

G

TC

A

T

C

A

C

C

T

AG

CG

T

A

GT

Sequencing

Illumina Sequencing Technology

Sequencing By Synthesis (SBS) Technology

TruSeq Chemistry: Flow

CellSimplified

workflow

• Clusters in a contained environment (no need for clean rooms)

• Sequencing performed in the flow cell on the clusters

Surface of flow

cell coated

with a lawn of

oligo pairs

8

channels

1.6 Billion Clusters

Per Flow Cell

20

Microns

100

Microns

Sequencing

6

Sequencing workflow

Library Construction

Cluster Formation

Illumina Sequencing

Data Analysis

Examples of DNA input requirements

* Unique protocol using “tagmentation”:DNA is simultaneously fragmented and tagged with

sequencing adapters

Illumina library prep kit Starting material

TruSeq DNA > 100 ng

KAPA DNA > 10 ng

NEB Ultra Low > 5 ng

TruSeq ChIP/MeDIP 10-50 ng

Rubicon ThruPLEX 50 pg – 50 ng

Nextera Kit * 50 ng

Nextera Kit * for Single Cell 0.125 - 0.375 ng

DNA library construction

5

5

5

5

5

’ 5

’HO

PP

OH

5

5

’ AP

P

A

TT

Fragmented DNA

End Repair

Blunt End Fragments

“A” Tailing

Single Overhang Fragments

Adapter Ligation

DNA Fragments

with Adapter Ends

“If you can put adapters on it,

we can sequence it!”

Know your sample

DNA fragmentation

Mechanical shearing:

• NGS BioRuptor• Covaris

Enzymatic:

• Fragmentase, Transposase

Chemical

All methods are sensitive to• Purity of DNA• DNA concentration

Size selection and clean-up usingSPRI Beads

SPRI = Solid Phase Reversible Immobilization

Ratio of SPRI beads/PEG solution to sample determines size cut off

Optional: PCR-free libraries

PCR-free library:

– Library can be sequenced if concentration allows

– Reduction of PCR bias against e.g. GC rich orAT rich regions, especially for metagenomic samples

OR

Library enrichment by PCR:

– Ideal combination: high input and low cycle number

Enrichment of library fragments

5’

5’

PCR Amplification

THE EVOLUTION OF ILLUMINA ADAPTERS

ISABELLE HENRY

T AA

Fragmented DNA

+

Adaptors“Regular”adaptors

Advantages:SimpleObtain 1-2 reads (F and R)Problems:No multiplexing

TA

AT

Forward read

Reverse read

TA

AT

T AA

Fragmented DNA

+

Adaptors

“in-line barcodes”adaptors

Advantages:Can multiplexSimpleObtain 1-2 reads (F and R)Problems:Cluster detection on the High SeqLose sequence data in the barcodes

TA

AT

TA

AT

TForward read

TReverse read

T AA

Fragmented DNA

+

Adaptors“Truseq –style” indexed

adaptors

Advantages:Index independent of read-> more data-> no more clustering problemsProblems:Need more reagentsIndex only on one side

TA

AT

TA

AT

Forward read

Reverse read

Index read

T AA

Fragmented DNA

+

Adaptors “Dual indexed”adaptors

Advantages:CheaperIndexing information on both sidesProblems:TBA…

Forward read

Reverse read

Index read 1

Index read 2

For 96 reactions

Simple index:96 B adaptors1 A adaptor

Dual index:12 A adaptors8 B adaptors

TA

AT

TA

AT

Quantitation & QC methods

Intercalating dye methods (PicoGreen, Qubit, etc.):Specific to dsDNA, accurate at low levels of DNA

Great for pooling of indexed libraries to be sequenced in one lane

Requires standard curve generation, many accurate pipetting steps

Bioanalyzer:Quantitation is good for rough estimate

Invaluable for library QC

High-sensitivity DNA chip allows quantitation of low DNA levels

qPCRMost accurate quantitation method

More labor-intensive

Must be compared to a control

Library QC by Bioanalyzer

Predominant species of appropriate MW

Minimal primer dimer or adapter dimers

Minimal higher MW material

Bioanalyzer ChIP options

DNA1000

High

Sensitivity

0.1-5ng/uL

Library QC by Bioanalyzer

Beautiful 100% Adapters

Beautiful

~ 125 bp

Library QC

Examples for successful libraries Adapter

contamination

at ~125 bp

~125

bp

Library quantitation by qPCRThis step is usually performed by

sequencing service center

Use amplifying primers

corresponding to ends of

adapters

Use standards of known

concentration to generate

standard curve of threshold Ct vs.

concentration

Use conversion factor to deduce

concentration of unknown

libraries

Take library size into

consideration!

Commercial kits are available

Primer 1

Primer 2

Examples of RNA input requirements

Library prep kit Starting material

mRNA (TruSeq) 100 ng - 4 μg total RNA

Directional mRNA (TruSeq) 1-5 μg total RNA or 50 ng

mRNA

NEB ultra directional RNA 10 -100 ng mRNA or ribo

depleted RNA

Small RNA (TruSeq) 1 μg total RNA

Ribo depletion (Epicentre) 1-5 μg total RNA

SMARTer™ Ultra Low RNA

(Clontech)

100 pg – 10 ng total RNA

Single cell

SMART-seq2 Single cell

Standard RNA-Seq library protocol

QC of total RNA to assess integrity

Removal of rRNA (most common)

mRNA isolation

rRNA depletion

Fragmentation of RNA

Reverse transcription and second-

strand cDNA synthesis

Ligation of adapters

PCR Amplify

Purify, QC and Quantify

Is strand-specific information important?

Standard library

(non-directional)

antisense

sense

Neu1

Strand-specific RNA-seq

Standard library (non-directional)

Antisense non-coding RNA

Sense transcripts

Informative for non-coding RNAs and antisense transcripts

Essential when NOT using polyA selection (mRNA)

No disadvantage to preserving strand specificity

Your Sequence Data

• Filtered vs. UnfilteredIllumina chastity filter (fluorescence ratio under threshold twice in first 25 bases)

Passing Filter@HWI-M02034:55:000000000-A85G4:1:1101:21460:1468 1:N:0:_AACGCTTA

CGTTTGATAAGCTGAAAATCGCCCTGACTCAAGCTCCAATTGTGAGAGGACCAG

+

A-ABC7-C9-<CE89,,,CC,CCCC8,CFF8,,;CCF8,CE,E9,,,,,,CD@<

NOT Passing Filter@HWI-M02034:55:000000000-A85G4:1:1101:21460:1468 1:Y:0:_AACGCTTA

Your Sequence Data

Your Sequence Data

• PhiX (phi X 174)Illumina internal standard for QC

• now in all MiSeq and HiSeq lanes

Not aligned to PhiX@HWI-M02034:55:000000000-A85G4:1:1101:21460:1468 1:N:0:

CGTTTGATAAGCTGAAAATCGCCCTGACTCAAGCTCCAATTGTGAGAGGACCAG

+

A-ABC7-C9-<CE89,,,CC,CCCC8,CFF8,,;CCF8,CE,E9,,,,,,CD@<

Aligned to PhiX@HWI-M02034:55:000000000-A85G4:1:1101:21460:1468 1:N:18:

Targeted sequencing

- exomes- cancer gene panels- RNA-seq- any non-repeat ROI

- 2 HiSeq lanes / genome- 20 exomes / lane

Amplicon sequencing

• Sequencing of amplified regions of interest

• Common application: 16S/18S small subunit ribosomal RNA (SSU rRNA) genes as phylogenetic markers

Primer 1

Primer 2Standard

library preparation

OR

Reduced Representation Seq

- RAD-seq, GBS

restriction associated sites

- High variation samples

Single Molecule Real Time (SMRT™) sequencingSequencing of single DNA molecule by single polymerase Very long reads: average reads over 8 kb, up to 30 kbHigh error rate (~13%).Complementary to short accurate reads of Illumina

http://pacificbiosciences.com

THIRD GENERATION

DNA SEQUENCING

70 nm aperture

“Zero Mode

Waveguide”

Damien Peltier

Paul Hagerman, Biochemistry and Molecular

Medicine, SOM.• Single-molecule sequencing of pure CGG array,

- first for disease-relevant allele. Loomis et al. (2012)

Genome Research.

- applicable to many other tandem repeat disorders.

• Direct genomic DNA sequencing of methyl groups,

- direct epigenetic sequencing (paper under review).

• Discovered 100% bias toward methylation of 20 CGG-

repeat allele in female,

– first direct methylated DNA sequencing in human

disease.

• DoD STTR award with PacBio. Basis of R01

applications.

First Sequencing of CGG-repeat Alleles in Human Fragile X

Syndrome using PacBio RS Sequencer

Nucleotide position

CGG36 CGG95

A

C

G

T

MinION: disposable

DNA sequencer

GridION

www.nanoporetech.com

The best is yet to come ….. e.g.

Nick Loman

Thank you!