42
Workshop on Whole Genome Sequencing and Analysis, 2-4 Oct. 2017 Sequencing techniques

Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Workshop on Whole Genome Sequencing and Analysis, 2-4 Oct. 2017

Sequencing techniques

Page 2: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Learning objective:

After this lecture, you should be able to…

…account for different techniques for whole genome sequencing (Illumina, Ion Torrent, PacBio, Nanopore)

..identify the elements that make up the raw sequence files

..at a general level assess the quality of your data

Page 3: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Preparing for sequencing 2nd generation sequencing have many steps in common

1. DNA isolation

2. DNA fragmentation

3. Primer ligation

4. Amplification

Amplification primers

Sequencing primers

BarcodeIsolated DNA

Page 4: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Illumina sequencing

Page 5: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 6: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 7: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 8: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 9: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 10: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 11: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 12: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 13: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 14: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 15: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 16: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Question

In the figure above each coloured spot represents a spot on the flow cell where millions of identical DNA templates are clustered and each grey square one cycle of sequencing. What is the sequence of the template DNA strand in the lower right corner of the flow cell?

Page 17: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 18: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 19: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 20: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 21: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence
Page 22: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Illumina reads have equal lengths. One base is determined per cycle

A

T

C

A

C

TA

C

A

A

G

T

A

T

T

A

C

C

C

C

T

>Read_1 >Read_2 >Read_3

End cycle 1:

A

C

A

C

T

G

G

T

G

A TC

A

C

A

C

T

G

G

T

G

ATEnd cycle 2: TCCA

C

T

T

A

C

A

C

G

End cycle 3: ATA CAT TCC

Page 23: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Ion Torrent - also a 2nd gen. sequencing technology

• Does not rely on optical signals from fluorescently labelled nucleotides

• Detects the small pH change caused by H+ release, when a nucleotide is incorporated

https://www.youtube.com/watch?v=ZL7DXFPz8rU&t=4s

• Has difficulties correctly calling homopolymers (stretches of identical nucleotides)

Page 24: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Imagine the above is the output from an Ion Torrent run. Which sequence does it represent?

Question

Type of nucleotide flooded across well

Page 25: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Ion Torrent reads do not usually have equal lengths

A

C

A

C

TA

C

A

G

T

C

C

A

C

C

C

GT

>Read_1 >Read_2 >Read_3

A A A

End cycle 1: T

CCC

End cycle 2: TG

G

G

G

G

G

End cycle 3: TGCCC

T

T

T

CEnd cycle 4: A CA TGCCCA

Page 26: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Third generation sequencing

• No template amplification step (single molecule sequencing)

• Fast

• Produces very long reads (>10,000 bp)

• Assembly gets much easier

Page 27: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

PacificBiosciences - PacBio (3rd gen. sequencing)• The first 3rd generation sequencer on the market • Uses Single-Molecule Sequencing in Real Time (SMRT) technology

• Single DNA polymerases are attached to the bottom surface of individual detector wells

• DNA is sequenced as fluorescently labelled nucleotides are incorporated into the complementary strand, since incorporation results in retention of the nucleotide, and this retention can be detected

Page 28: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Advantages: • Long reads • Quick run time

Disadvantages: • Big, expensive machine • Relatively low accuracy (but not context specific

errors) • Reagent costs per run is expensive when only

one bacterial strains is sequenced per run

RSII

Sequel

Recently: Protocol for multiplexing 5 Mb microbial genomes (e.g., E. coli) up to 12-plex and 2 Mb genomes (e.g., Campylobacter) up to 16-plex making sequencing of microbial genomes more affordable.

PacBio

Page 29: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Oxford Nanopore (3rd. gen. sequencing)

https://www.youtube.com/watch?v=CE4dW64x3Ts

• The newest kid in class

• Sequences while single-stranded DNA is passed through nanopore

• The minION is the size of a small cell phone

• VERY long reads (up to 1.000.000?)

• So far also very high error rates (up to 15%)

Page 30: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Comparing sequencing technologies

Platform Sequencer

Costs sequencing

platform ($)

Output per run/lane

Max. read lengths

(bp)

Average run time

Illumina HiSeq 3000 750,000 150 gbp 250 4 days

Illumina MiSeq 100,000 15 gbp 300 2 days

Ion torrent Proton II 224,000 66 gbp 200 4 hours

Ion torrent PGM 318 50,000 2 gbp 400 7 hours

PacBio RS II 700,000 400 mbp 54,000 3 hours

Nanopore MinION 1,000 1-10 gbp 150,000 n.a*

*Machine run time is adjusted to need of sequencing depth. Example given is for 48 hours

Bleidorn C., Systematics and Biodiversity (2016), 14(1):1-8

Page 31: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

What is the data?Fastq files

Fastq example:@FCC0CD5ACXX:1:1101:1103:2048#ACCGT/1

ACNGTGTTTTTAGTTATTGTTTTGTTAAGTTGGGTTTTTTGTACCCAATAGCCAACAAGCCGCCTTTATGGCGGTTTTTTTGTGCCTGAAAAGTGGGCGCA

+

BP`ccceggcegihiiighiifhihfddgfhi^efgfhhhhhegiiiiiiiihiihihggeeccdddcccacWTT^acc[ab_`]`[_b`^BBBBBBBB

@FCC0CD5ACXX:1:1101:1165:2058#ACGTT/1

ACGTTAGCAGAATCGCTTTCTGTTCGTTTTCCACCTGCGACAGACGCACCGGACCACGGTTGGCGAGATCGTCGCGCAGAATATCGGCGGCACGCTGCGAC

+

bb_eeceefeggehhdagfghhiihfghighhffhifhhcghfdhiihafgdceba`a\aaccc^V]^baccaccXaaX^bbcccaac[_X]]a[aacXT

@FCC0CD5ACXX:1:1101:1135:2082#AGCGT/1

AGCGTGACAAACATTTTATTGCGCCCGGTTTTATCCAGCTTGAATGCCTGACGAAAGAAGATGATGGTGACGACGATGGAGAGAACAATCAGCACCAGATT

+

bbbeeeeefggfgiihgiigiiiiiiiffgifgeghiiihhfefffhhhfgh_fhggdgegeaceeacbdcbcc\^aa]``_^bb]bcccccbac_a^bc

@FCC0CD5ACXX:1:1101:1239:2083#AGCGT/1

AGCGTCTGACTCACACAAAAACGGTAACACAGTTATCCACAGAATCAGGGGATAAGGCCGGAAAGAACATGTGAGCAAAAAGGCAAAGCCAGGACAAAAGG

+

bbbeeeeegggggiiiiiiiiiigifhhiiighiiihhiiiiiiihiiiiiiiiiihiigcdbbdcdcccccdccccccccacccccccbcccacccccc

1 read, 4 lines

Page 32: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

@FCC0CD5ACXX:1:1101:1103:2048#ACCGT/1

ACNGTGTTTTTAGTTATTGTTTTGTTAAGTTGGGTTTTTTGTACCCAATAGCCAACAAGCCGCCTTTATGGCGGTTTTTTTGTGCCTGAAAAGTGGGCGCA

+

_BP`ccceggcegihiiighiifhihfddgfhi^efgfhhhhhegiiiiiiiihiihihggeeccdddcccacWTT^acc[ab_`]`[_b`^BBBBBBBB

@FCC0CD5ACXX:1:1101:1165:2058#ACGTT/1

ACGTTAGCAGAATCGCTTTCTGTTCGTTTTCCACCTGCGACAGACGCACCGGACCACGGTTGGCGAGATCGTCGCGCAGAATATCGGCGGCACGCTGCGAC

+

bb_eeceefeggehhdagfghhiihfghighhffhifhhcghfdhiihafgdceba`a\aaccc^V]^baccaccXaaX^bbcccaac[_X]]a[aacXT

@FCC0CD5ACXX:1:1101:1135:2082#AGCGT/1

AGCGTGACAAACATTTTATTGCGCCCGGTTTTATCCAGCTTGAATGCCTGACGAAAGAAGATGATGGTGACGACGATGGAGAGAACAATCAGCACCAGATT

+

bbbeeeeefggfgiihgiigiiiiiiiffgifgeghiiihhfefffhhhfgh_fhggdgegeaceeacbdcbcc\^aa]``_^bb]bcccccbac_a^bc

@FCC0CD5ACXX:1:1101:1239:2083#AGCGT/1

AGCGTCTGACTCACACAAAAACGGTAACACAGTTATCCACAGAATCAGGGGATAAGGCCGGAAAGAACATGTGAGCAAAAAGGCAAAGCCAGGACAAAAGG

+

bbbeeeeegggggiiiiiiiiiigifhhiiighiiihhiiiiiiihiiiiiiiiiihiigcdbbdcdcccccdccccccccacccccccbcccacccccc

Header/ID

What is the data?Fastq files

Fastq example:

Page 33: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

@FCC0CD5ACXX:1:1101:1103:2048#ACCGT/1

ACNGTGTTTTTAGTTATTGTTTTGTTAAGTTGGGTTTTTTGTACCCAATAGCCAACAAGCCGCCTTTATGGCGGTTTTTTTGTGCCTGAAAAGTGGGCGCA

+

_BP`ccceggcegihiiighiifhihfddgfhi^efgfhhhhhegiiiiiiiihiihihggeeccdddcccacWTT^acc[ab_`]`[_b`^BBBBBBBB

@FCC0CD5ACXX:1:1101:1165:2058#ACGTT/1

ACGTTAGCAGAATCGCTTTCTGTTCGTTTTCCACCTGCGACAGACGCACCGGACCACGGTTGGCGAGATCGTCGCGCAGAATATCGGCGGCACGCTGCGAC

+

bb_eeceefeggehhdagfghhiihfghighhffhifhhcghfdhiihafgdceba`a\aaccc^V]^baccaccXaaX^bbcccaac[_X]]a[aacXT

@FCC0CD5ACXX:1:1101:1135:2082#AGCGT/1

AGCGTGACAAACATTTTATTGCGCCCGGTTTTATCCAGCTTGAATGCCTGACGAAAGAAGATGATGGTGACGACGATGGAGAGAACAATCAGCACCAGATT

+

bbbeeeeefggfgiihgiigiiiiiiiffgifgeghiiihhfefffhhhfgh_fhggdgegeaceeacbdcbcc\^aa]``_^bb]bcccccbac_a^bc

@FCC0CD5ACXX:1:1101:1239:2083#AGCGT/1

AGCGTCTGACTCACACAAAAACGGTAACACAGTTATCCACAGAATCAGGGGATAAGGCCGGAAAGAACATGTGAGCAAAAAGGCAAAGCCAGGACAAAAGG

+

bbbeeeeegggggiiiiiiiiiigifhhiiighiiihhiiiiiiihiiiiiiiiiihiigcdbbdcdcccccdccccccccacccccccbcccacccccc

DNA sequenceFastq example:

What is the data?Fastq files

Page 34: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

@FCC0CD5ACXX:1:1101:1103:2048#ACCGT/1

ACNGTGTTTTTAGTTATTGTTTTGTTAAGTTGGGTTTTTTGTACCCAATAGCCAACAAGCCGCCTTTATGGCGGTTTTTTTGTGCCTGAAAAGTGGGCGCA

+

_BP`ccceggcegihiiighiifhihfddgfhi^efgfhhhhhegiiiiiiiihiihihggeeccdddcccacWTT^acc[ab_`]`[_b`^BBBBBBBB

@FCC0CD5ACXX:1:1101:1165:2058#ACGTT/1

ACGTTAGCAGAATCGCTTTCTGTTCGTTTTCCACCTGCGACAGACGCACCGGACCACGGTTGGCGAGATCGTCGCGCAGAATATCGGCGGCACGCTGCGAC

+

bb_eeceefeggehhdagfghhiihfghighhffhifhhcghfdhiihafgdceba`a\aaccc^V]^baccaccXaaX^bbcccaac[_X]]a[aacXT

@FCC0CD5ACXX:1:1101:1135:2082#AGCGT/1

AGCGTGACAAACATTTTATTGCGCCCGGTTTTATCCAGCTTGAATGCCTGACGAAAGAAGATGATGGTGACGACGATGGAGAGAACAATCAGCACCAGATT

+

bbbeeeeefggfgiihgiigiiiiiiiffgifgeghiiihhfefffhhhfgh_fhggdgegeaceeacbdcbcc\^aa]``_^bb]bcccccbac_a^bc

@FCC0CD5ACXX:1:1101:1239:2083#AGCGT/1

AGCGTCTGACTCACACAAAAACGGTAACACAGTTATCCACAGAATCAGGGGATAAGGCCGGAAAGAACATGTGAGCAAAAAGGCAAAGCCAGGACAAAAGG

+

bbbeeeeegggggiiiiiiiiiigifhhiiighiiihhiiiiiiihiiiiiiiiiihiigcdbbdcdcccccdccccccccacccccccbcccacccccc

Name field (optional)

Fastq example:

What is the data?Fastq files

Page 35: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

@FCC0CD5ACXX:1:1101:1103:2048#ACCGT/1

ACNGTGTTTTTAGTTATTGTTTTGTTAAGTTGGGTTTTTTGTACCCAATAGCCAACAAGCCGCCTTTATGGCGGTTTTTTTGTGCCTGAAAAGTGGGCGCA

+

_BP`ccceggcegihiiighiifhihfddgfhi^efgfhhhhhegiiiiiiiihiihihggeeccdddcccacWTT^acc[ab_`]`[_b`^BBBBBBBB

@FCC0CD5ACXX:1:1101:1165:2058#ACGTT/1

ACGTTAGCAGAATCGCTTTCTGTTCGTTTTCCACCTGCGACAGACGCACCGGACCACGGTTGGCGAGATCGTCGCGCAGAATATCGGCGGCACGCTGCGAC

+

bb_eeceefeggehhdagfghhiihfghighhffhifhhcghfdhiihafgdceba`a\aaccc^V]^baccaccXaaX^bbcccaac[_X]]a[aacXT

@FCC0CD5ACXX:1:1101:1135:2082#AGCGT/1

AGCGTGACAAACATTTTATTGCGCCCGGTTTTATCCAGCTTGAATGCCTGACGAAAGAAGATGATGGTGACGACGATGGAGAGAACAATCAGCACCAGATT

+

bbbeeeeefggfgiihgiigiiiiiiiffgifgeghiiihhfefffhhhfgh_fhggdgegeaceeacbdcbcc\^aa]``_^bb]bcccccbac_a^bc

@FCC0CD5ACXX:1:1101:1239:2083#AGCGT/1

AGCGTCTGACTCACACAAAAACGGTAACACAGTTATCCACAGAATCAGGGGATAAGGCCGGAAAGAACATGTGAGCAAAAAGGCAAAGCCAGGACAAAAGG

+

bbbeeeeegggggiiiiiiiiiigifhhiiighiiihhiiiiiiihiiiiiiiiiihiigcdbbdcdcccccdccccccccacccccccbcccacccccc

Quality scores (also called PHRED scores)

Fastq example:

What is the data?Fastq files

Page 36: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Why are quality scores necessary?In a perfect world…

In our world…

Page 37: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

PHRED (Q) quality scores

• PHRED quality score, Q = -10 log10 P

• Error probability, P = 10-Q/10

Example: Base call with Q = 30 has error probability of 10-3 meaning 1 out of 1000 bases called with this quality score would be wrong

Encodes the probability of an erroneous call

Phred Quality Score (Q)

Error probability

(P)

Probability of incorrect base

call

Base call accuracy

10 0.1 1 in 10 90 %

20 0.01 1 in 100 99 %

30 0.001 1 in 1000 99,9 %

40 0.0001 1 in 10,000 99,99 %

50 0.00001 1 in 100,000 99,999 %

Page 38: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

The PHRED quality scores are written using ASCII encoding

Shown here is the Sanger/Phred+33 conversion table currently used by Illumina

Page 39: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Data quality assessed via FastQC

Great data!

Page 40: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

• FastQC is freely downloadable (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

• Great for generating reports on your WGS data

• Not able to trim the data

Data quality assessed via FastQC

Horrible data!

Page 41: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

How to perform read trimming using PRINSEQ

Page 42: Sequencing techniques - GoSeqIt – Sequence based microbial … · 2017. 9. 30. · Sequencing techniques. Learning objective: After this lecture, you should be ... up the raw sequence

Recap by multiple choice scratch cards