38
V4 Sequencing Reagent Experience Joshua Bridgers Duke University Center for Human Genome Variation

V4 Sequencing Reagent Experience

Embed Size (px)

DESCRIPTION

Slide Deck from Josh's 2014 presentation at the Illumina user group meeting in RTP. Slides describe our experience with V3 and V4 chemistries on a very large cohort of exome sequenced samples.

Citation preview

Page 1: V4 Sequencing Reagent Experience

V4 Sequencing Reagent Experience

Joshua BridgersDuke University

Center for Human Genome Variation

Page 2: V4 Sequencing Reagent Experience

V3 vs V4 Chemistry

• V3– 100bp x 100bp– 12 days run time– Requires loading of

pair-end reagents– ~300gb per flowcell

• V4– Requires HiSeq 2500

or newer 2000– 125bp x 125bp– 6 day run time*– Pair-end reagents are

loaded at the start of the run

– ~600gb per flowcell

Page 3: V4 Sequencing Reagent Experience

Throughput

2400gb / 12 day 2400gb / 12 day

V4 HiSeq 2000/2500

V3 HiSeq 2000

Page 4: V4 Sequencing Reagent Experience

2nd Generation Sequencing Advances

• V3 System Chemistry– 300GB per flowcell– 12 days to data– Genome: $4700, Exome: $790

• V4 System Chemistry– 600GB per flowcell– 6 days to data– Genome: $3000, Exome: $640

• X System Chemistry– 1GB per patterned flowcell– 3 days to data– Genome: $1500, Exome: $500

Page 5: V4 Sequencing Reagent Experience

Data Quality – Percent Q30 V3

600 700 800 900 1000 1100 1200 13000.65

0.7

0.75

0.8

0.85

0.9

0.95

1

V3 %q30R1

V3 %q30R2

Cluster Density K/mm2

Perc

ent Q

30

Page 6: V4 Sequencing Reagent Experience

Data Quality – Percent Q30 V4

600 700 800 900 1000 1100 1200 13000.65

0.7

0.75

0.8

0.85

0.9

0.95

1

V4 %q30R1

V4 %q30R2

Cluster Density K/mm2

Perc

ent Q

30

Page 7: V4 Sequencing Reagent Experience

Data Quality – Percent Q30

600 700 800 900 1000 1100 1200 13000.65

0.7

0.75

0.8

0.85

0.9

0.95

1

V3 %q30R1

V4 %q30R1

V3 %q30R2

V4 %q30R2

Cluster Density K/mm2

Perc

ent Q

30

Page 8: V4 Sequencing Reagent Experience

Data Quality – Average Quality Score V3

600 700 800 900 1000 1100 1200 130029

30

31

32

33

34

35

36

37

V3 Avg. Qscore R1V3 Avg. Qscore R2

Cluster Density K/mm2

Qua

lity

Scor

e

Page 9: V4 Sequencing Reagent Experience

Data Quality – Average Quality Score V4

600 700 800 900 1000 1100 1200 130029

30

31

32

33

34

35

36

37

V4 Avg. Qscore R1V4 Avg. Qscore R2

Cluster Density K/mm2

Qua

lity

Scor

e

Page 10: V4 Sequencing Reagent Experience

Data Quality – Average Quality Score

600 700 800 900 1000 1100 1200 130029

30

31

32

33

34

35

36

37

V3 Avg. Qscore R1V3 Avg. Qscore R2V4 Avg. Qscore R1V4 Avg. Qscore R2

Cluster Density K/mm2

Qua

lity

Scor

e

Page 11: V4 Sequencing Reagent Experience

Data Volume and Processing

• Run folders– .bcl files are now compressed – V3 Run Folders: ~350GB/flowcell– V4 Run Folders: ~500GB/flowcell

• Fastq generation cluster usage per flowcell– V3: 121.5 minutes, 283gb max memory used– V4: 184.9 minutes, 673gb max memory used

Page 12: V4 Sequencing Reagent Experience

Lane-level Alignment

Indel Re-Alignment

Base QualityRecalibration

Merging & Sorting Alignments

PCR Duplicate Removal

BWA - http://bio-bwa.sourceforge.net/

Bioinformatics Pipeline

Page 13: V4 Sequencing Reagent Experience

Lane-level Alignment

Indel Re-Alignment

Base QualityRecalibration

Merging & Sorting Alignments

PCR Duplicate Removal

SAMtools - http://samtools.sourceforge.net/

Bioinformatics Pipeline

Page 14: V4 Sequencing Reagent Experience

Lane-level Alignment

Indel Re-Alignment

Base QualityRecalibration

Merging & Sorting Alignments

PCR Duplicate Removal

Alignment

Picard MarkDuplicates - http://picard.sourceforge.net/

Bioinformatics Pipeline

Page 15: V4 Sequencing Reagent Experience

Lane-level Alignment

Indel Re-Alignment

Base QualityRecalibration

Merging & Sorting Alignments

PCR Duplicate Removal

GATK - http://www.broadinstitute.org/gatk/

Bioinformatics Pipeline

Page 16: V4 Sequencing Reagent Experience

Core-released Reads

Alignment

Indel Re-Alignment

Base QualityRecalibration

Sorting/Merging Alignments

PCR Duplicate Removal

Analysis-Ready Read Alignments

GATK Unified Genotyper

GATK VQSR

Coverage Depth

Ti/Tv Ratio

dbSNP Overlap

Genotyping & Preliminary QC

Duplicate Read Pct.

Aligned Read Pct.

Gender Check

Bioinformatics Pipeline

Page 17: V4 Sequencing Reagent Experience

Test Sample Description

• Sequenced one trio on V3 and V4 Illumina chemistry

• 400bp size-selected exome capture– V3 sequenced samples have higher overall

coverage

Page 18: V4 Sequencing Reagent Experience

Overall Metrics

• Percent Bases Covered 5x are similar despite coverage difference

• SNV hom/het ratio changed• Indel hom/het ratio changed• dbSNP Overlap, Ti/Tv similar

Page 19: V4 Sequencing Reagent Experience

Variant Call Overlap

V3

36.5kV4

13.6k114.8kV3

34.5kV4

13.9k118.1k

V3

34.5kV4

13.2k114.8k

Sample 1 Sample 2

Sample 3

Page 20: V4 Sequencing Reagent Experience

Variant Call Overlap (Pass/Intermediate,Both 10x Covered)

V3

8.2kV4

7.7k104.1kV3

8.6kV4

7.5k107.3k

V3

7.4kV4

7.7k103.2k

Sample 1 Sample 2

Sample 3

Page 21: V4 Sequencing Reagent Experience

Variant Call Overlap (High Confidence SNV)

V3

141V4

11322553V3

118V490

V3

109V483

Sample 1 Sample 2

Sample 3

22689

22257

Page 22: V4 Sequencing Reagent Experience

Variant Call Overlap (High Confidence SNV)

V3

0.62%V4

0.50%98.9%V3

0.52%V4

0.39%

V3

0.49%V4

0.37%

Sample 1 Sample 2

Sample 3

99.1%

99.1%

Page 23: V4 Sequencing Reagent Experience

Homopolymer Runs

V3

V4

Page 24: V4 Sequencing Reagent Experience

V3

42V42422404

Sample 1

Variant Call Overlap (Low Complexity Regions)

Page 25: V4 Sequencing Reagent Experience

CCDS Coverage

• Analyzed 72 Caucasian unaffected adults for % coverage across a modified CCDS release 14

• Same cohort• 34 V3 samples• 38 V4 samples

• Gender unbiased• All unaffected parents• Overall coverage between 80-90x

Page 26: V4 Sequencing Reagent Experience

CCDS Coverage

V3 Average V4 Average

3x Coverage 97.61% 98.46%

10x Coverage 95.92% 96.71%

20x Coverage 92.93% 92.83%

• Overall greater coverage at 3x and 10x• Similar coverage at 20x

Page 27: V4 Sequencing Reagent Experience

Extended Coverage

V3

V4

Page 28: V4 Sequencing Reagent Experience

Conclusion

• Sequencing throughput increased ~400%– 71% temporary storage space usage– 75% CPU hours for fastq conversion– 120% maximum Vmem usage

• Higher average qscore at higher cluster densities• Higher percent Q30 at higher cluster densities

Page 29: V4 Sequencing Reagent Experience

Conclusion

• High confidence variant calls largely unaffected

• Low complexity regions and indel calls can still be problematic

• Overall increased coverage of CCDS

Page 30: V4 Sequencing Reagent Experience

Questions?

Page 31: V4 Sequencing Reagent Experience

Acknowledgements

• CHGV– Brian Krueger– Slave Petrovski – Linda Hong– Erin Campbell

• Illumina– Adam Jerald– Kenny Patridge

Page 32: V4 Sequencing Reagent Experience

Kaizen

改善

kai

zen

“Good”

“Change”

Cheaper sequencing, extended coverage, lower IT overhead

Page 33: V4 Sequencing Reagent Experience

Data Quality – Percent Q30

V3

• Greater degradation in quality as cycles increase

• Looser distribution

V4 • Small drop in %q30 as cycles

increase• Tighter distribution

Page 34: V4 Sequencing Reagent Experience

Data Quality – Cluster Passed Filter

600 700 800 900 1000 1100 1200 130065

70

75

80

85

90

95

100

V3 Cluster PFV4 Cluster PF

Cluster Density K/mm2

Perc

ent P

ass F

ilter

Page 35: V4 Sequencing Reagent Experience

New SNVHomo and IndelHomo

#SNV Hom #SNV Het SNVHomo Ratio #Indel Hom #Indel Het Indel Homo Ratio

10x coverage

SQC0243F77 12012 118884 0.101039669 2102 18349 0.114556652

SQC0243F77_V4TEST 10181 101963 0.099849946 1833 14415 0.127159209

SQC0243F77 shared 9637 92697 0.103962372 1283 11216 0.114390157

SQC0243F77 missing 404 3345 0.12077728 404 3345 0.12077728

SQC0243F77_V4TEST missing 1036 11552 0.08968144 446 3617 0.123306608

Page 36: V4 Sequencing Reagent Experience

Additional Filters (SNV)

• Percent Alt Read 0.3 – 1• GQ >50• SB < 60• HaplotypeScore < 13• MQ > 40• QD > 2• QUAL > 50• RPRS > -6• MQRS > -6• NON_SYNONYMOUS_CODING | SYNONYMOUS_CODING |

START_GAINED | START_LOST | STOP_LOST | STOP_GAINED |SPLICE_SITE_ACCEPTOR | SPLICE_SITE_DONOR | EXON

Page 37: V4 Sequencing Reagent Experience

Discarded Variants

• Sample 1– 6.9k NON_SYNONYMOUS_CODING | SYNONYMOUS_CODING |

START_GAINED | START_LOST | STOP_LOST | STOP_GAINED |SPLICE_SITE_ACCEPTOR | SPLICE_SITE_DONOR | EXON

– 25.6k 10X coverage for both samples– 32.1k 10X coverage for one sample– 43.0k Percent Alt Read 0.3 – 1– 43.9k QUAL > 50– 46.3k GQ >50– 52.5k HaplotypeScore < 13– 56.8k Passed/Intermediate– 59.1k MQ > 40– 59.6k QD > 2– 67.6k SB < 60– 70.9k MQRS > -6– 71.3k RPRS > -6

Page 38: V4 Sequencing Reagent Experience

Homopolymer Runs

V3

V4