Upload
lutzfr
View
699
Download
2
Embed Size (px)
Citation preview
Wet-lab Considerations for Illumina data analysis
Based on a presentation by Henriette O’Geen
Lutz Froenicke
DNA Technologies and Expression AnalysisCores
UCD Genome Center
RNA-seqGene Expression
Small RNA
ChIP-SEQ
Genotyping
De novo genomeSequencing
DNA Methylation
Metagenomics
Exome Sequencing
Splice IsoformAbundance
Genome Resequencing
3D Organization
SNPs, IndelsCNVs
Rearrangements
DNA
(0.1-1.0 ug)
Single molecule array
Library
preparation Cluster generation5’
5’3’
G
T
C
A
G
T
C
A
G
T
C
A
C
A
G
TC
A
T
C
A
C
C
T
AG
CG
T
A
GT
Sequencing
Illumina Sequencing Technology
Sequencing By Synthesis (SBS) Technology
TruSeq Chemistry: Flow
CellSimplified
workflow
• Clusters in a contained environment (no need for clean rooms)
• Sequencing performed in the flow cell on the clusters
Surface of flow
cell coated
with a lawn of
oligo pairs
8
channels
Examples of DNA input requirements
* Unique protocol using “tagmentation”:DNA is simultaneously fragmented and tagged with
sequencing adapters
Illumina library prep kit Starting material
TruSeq DNA > 100 ng
KAPA DNA > 10 ng
NEB Ultra Low > 5 ng
TruSeq ChIP/MeDIP 10-50 ng
Rubicon ThruPLEX 50 pg – 50 ng
Nextera Kit * 50 ng
Nextera Kit * for Single Cell 0.125 - 0.375 ng
DNA library construction
5
’
5
’
5
’
5
’
5
’ 5
’HO
PP
OH
5
’
5
’ AP
P
A
TT
Fragmented DNA
End Repair
Blunt End Fragments
“A” Tailing
Single Overhang Fragments
Adapter Ligation
DNA Fragments
with Adapter Ends
DNA fragmentation
Mechanical shearing:
• NGS BioRuptor• Covaris
Enzymatic:
• Fragmentase, Transposase
Chemical
All methods are sensitive to• Purity of DNA• DNA concentration
Size selection and clean-up usingSPRI Beads
SPRI = Solid Phase Reversible Immobilization
Ratio of SPRI beads/PEG solution to sample determines size cut off
Optional: PCR-free libraries
PCR-free library:
– Library can be sequenced if concentration allows
– Reduction of PCR bias against e.g. GC rich orAT rich regions, especially for metagenomic samples
OR
Library enrichment by PCR:
– Ideal combination: high input and low cycle number
T AA
Fragmented DNA
+
Adaptors“Regular”adaptors
Advantages:SimpleObtain 1-2 reads (F and R)Problems:No multiplexing
TA
AT
Forward read
Reverse read
TA
AT
T AA
Fragmented DNA
+
Adaptors
“in-line barcodes”adaptors
Advantages:Can multiplexSimpleObtain 1-2 reads (F and R)Problems:Cluster detection on the High SeqLose sequence data in the barcodes
TA
AT
TA
AT
TForward read
TReverse read
T AA
Fragmented DNA
+
Adaptors“Truseq –style” indexed
adaptors
Advantages:Index independent of read-> more data-> no more clustering problemsProblems:Need more reagentsIndex only on one side
TA
AT
TA
AT
Forward read
Reverse read
Index read
T AA
Fragmented DNA
+
Adaptors “Dual indexed”adaptors
Advantages:CheaperIndexing information on both sidesProblems:TBA…
Forward read
Reverse read
Index read 1
Index read 2
For 96 reactions
Simple index:96 B adaptors1 A adaptor
Dual index:12 A adaptors8 B adaptors
TA
AT
TA
AT
Quantitation & QC methods
Intercalating dye methods (PicoGreen, Qubit, etc.):Specific to dsDNA, accurate at low levels of DNA
Great for pooling of indexed libraries to be sequenced in one lane
Requires standard curve generation, many accurate pipetting steps
Bioanalyzer:Quantitation is good for rough estimate
Invaluable for library QC
High-sensitivity DNA chip allows quantitation of low DNA levels
qPCRMost accurate quantitation method
More labor-intensive
Must be compared to a control
Library QC by Bioanalyzer
Predominant species of appropriate MW
Minimal primer dimer or adapter dimers
Minimal higher MW material
Library quantitation by qPCRThis step is usually performed by
sequencing service center
Use amplifying primers
corresponding to ends of
adapters
Use standards of known
concentration to generate
standard curve of threshold Ct vs.
concentration
Use conversion factor to deduce
concentration of unknown
libraries
Take library size into
consideration!
Commercial kits are available
Primer 1
Primer 2
Examples of RNA input requirements
Library prep kit Starting material
mRNA (TruSeq) 100 ng - 4 μg total RNA
Directional mRNA (TruSeq) 1-5 μg total RNA or 50 ng
mRNA
NEB ultra directional RNA 10 -100 ng mRNA or ribo
depleted RNA
Small RNA (TruSeq) 1 μg total RNA
Ribo depletion (Epicentre) 1-5 μg total RNA
SMARTer™ Ultra Low RNA
(Clontech)
100 pg – 10 ng total RNA
Single cell
SMART-seq2 Single cell
Standard RNA-Seq library protocol
QC of total RNA to assess integrity
Removal of rRNA (most common)
mRNA isolation
rRNA depletion
Fragmentation of RNA
Reverse transcription and second-
strand cDNA synthesis
Ligation of adapters
PCR Amplify
Purify, QC and Quantify
Strand-specific RNA-seq
Standard library (non-directional)
Antisense non-coding RNA
Sense transcripts
Informative for non-coding RNAs and antisense transcripts
Essential when NOT using polyA selection (mRNA)
No disadvantage to preserving strand specificity
Your Sequence Data
• Filtered vs. UnfilteredIllumina chastity filter (fluorescence ratio under threshold twice in first 25 bases)
Passing Filter@HWI-M02034:55:000000000-A85G4:1:1101:21460:1468 1:N:0:_AACGCTTA
CGTTTGATAAGCTGAAAATCGCCCTGACTCAAGCTCCAATTGTGAGAGGACCAG
+
A-ABC7-C9-<CE89,,,CC,CCCC8,CFF8,,;CCF8,CE,E9,,,,,,CD@<
NOT Passing Filter@HWI-M02034:55:000000000-A85G4:1:1101:21460:1468 1:Y:0:_AACGCTTA
Your Sequence Data
• PhiX (phi X 174)Illumina internal standard for QC
• now in all MiSeq and HiSeq lanes
Not aligned to PhiX@HWI-M02034:55:000000000-A85G4:1:1101:21460:1468 1:N:0:
CGTTTGATAAGCTGAAAATCGCCCTGACTCAAGCTCCAATTGTGAGAGGACCAG
+
A-ABC7-C9-<CE89,,,CC,CCCC8,CFF8,,;CCF8,CE,E9,,,,,,CD@<
Aligned to PhiX@HWI-M02034:55:000000000-A85G4:1:1101:21460:1468 1:N:18:
Targeted sequencing
- exomes- cancer gene panels- RNA-seq- any non-repeat ROI
- 2 HiSeq lanes / genome- 20 exomes / lane
Amplicon sequencing
• Sequencing of amplified regions of interest
• Common application: 16S/18S small subunit ribosomal RNA (SSU rRNA) genes as phylogenetic markers
Primer 1
Primer 2Standard
library preparation
OR
Single Molecule Real Time (SMRT™) sequencingSequencing of single DNA molecule by single polymerase Very long reads: average reads over 8 kb, up to 30 kbHigh error rate (~13%).Complementary to short accurate reads of Illumina
http://pacificbiosciences.com
THIRD GENERATION
DNA SEQUENCING
Paul Hagerman, Biochemistry and Molecular
Medicine, SOM.• Single-molecule sequencing of pure CGG array,
- first for disease-relevant allele. Loomis et al. (2012)
Genome Research.
- applicable to many other tandem repeat disorders.
• Direct genomic DNA sequencing of methyl groups,
- direct epigenetic sequencing (paper under review).
• Discovered 100% bias toward methylation of 20 CGG-
repeat allele in female,
– first direct methylated DNA sequencing in human
disease.
• DoD STTR award with PacBio. Basis of R01
applications.
First Sequencing of CGG-repeat Alleles in Human Fragile X
Syndrome using PacBio RS Sequencer
Nucleotide position
CGG36 CGG95
A
C
G
T