Upload
thermo-fisher-scientific
View
34
Download
0
Tags:
Embed Size (px)
Citation preview
Jennifer D. Churchill, Ph.D. Department of Molecular and Medical Genetics
Institute of Applied Genetics University of North Texas Health Science Center
Moving Towards a Validated High Throughput Sequencing Solution for Human Identification: An Evaluation of Two SNP Panels, Autosomal STRs, and Whole Mitochondrial
Genomes
DNA Typing
• Identification • Associations • Investigative leads • Databases • Demands
• High volume • High throughput • Special attention situations • Expedites
• Technology developments and enhancements to meet demands 3
Areas of Interest
• STRs • Mitochondria • HID SNPs • Ancestry SNPs • Phenotype SNPs • Pharmacogenetics • Microbial Forensics • …gonna need a bigger boat
Image courtesy: geek.com
4
Features of Massively Parallel Sequencing
5
• Higher throughput
• Samples
• Markers
• Multiplexing different types of markers
• Determine actual sequence of alleles
• Massively parallel sequencing
• Depth of coverage
• Reduction in error
• Faster
• Lower cost per nucleotide
OVERVIEW OF ION PGM™ TECHNOLOGY AND WORKFLOW
6
PrepFiler® Express
Quantifiler® Trio
Ion AmpliSeq™ Library Kit 2.0 Ion Library Quantitation Kit
Ion OneTouch™ 2
Ion OneTouch™ ES
Ion PGM™
Torrent Suite™ software
HID SNP Genotyper
Quantification
Library Preparation
Emulsion PCR
Data Analysis
Enrichment & Chip loading
Sequencing
Day 1
DNA Extraction
Day 2
Day 3
Plug-in analysis
1:30
7:00
2:00
3:00
2:00
1:00
6:00
Cycle Time
1:20 0:30
0:30
2:30
0:30
1:30
Hands-on
HID-Ion PGM™ Workflow (8 Samples Ion 314™ Chip) Sample Processing Workflow
7
Library and Template Preparation
Clonal amplification using emulsion PCR
ISP
Primer dNTPs
Polymerase MgCl2
Isolate templated ISPs
Final Templated ISPs ready for sequencing
Emulsion droplet
8
Chemistry
9
Sequence Detection by pH
10
• An “ionogram” is the output of the signals in flow space • Must be read “up-and-down” along with “left-to-right” • Height of bar indicates how many nucleotides incorporated
during flow
Data Output is an Ionogram
Sequence: AATCTTCTG…
Key Sequence TTT
TCAG
AA
11
Bioinformatics
• A science in itself • Many science experiments are carried out with
bioinformatics • “the new field that merges biology, computer science,
and information technology to manage and analyze the data, with the ultimate goal of understanding and modeling living systems."
• Genomics and Its Impact on Medicine and Society - A 2001 Primer U.S. Department of Energy Human Genome Program
12
Markers
• STRs*** • SNPs*** • Indels • Innuls • mtDNA***
• HID-Ion PGMTM System
13
SAMPLES
14 www.dreamstime.com
Samples
• Data from blind study • Provided by Green Mountain Conference • 12 samples • Used 1 ng template DNA
• Evaluation Criteria • Coverage • Strand Bias • Allele coverage ratio
• Heterozygote allele balance
16 www.nitw.ac.in
AATG AATG AATG AATG AATG
AATG AATG AATG AATG AATG
AATG AATG AATG AATG AATG
AATG
AATG AATG
STRs
• Primary markers for identity testing • High discrimination power
17
STR Panel • 10plex STR Panel
• Amelogenin • D16S359 • D3S1358 • D5S818 • CSF1PO
• Analyzed with: • STRait Razor • STR Genotyper Plugin
• Compared data to genotypes generated on 3130xl Genetic Analyzer
• Further evaluation of sequence variants within alleles
STRait Razor: Warshauer et al. 2013
• D7S820 • D8S1179 • TH01 • TPOX • vWA
STR Panel Allele Coverage Ratio
19
STRs
STR Panel
20
Sequence Coverage Ratio
STRs
STR Panel
Comparable method of visualizing STR results between MPS and CE platforms
21
STR Panel Sample AMEL CSF1PO D16S539 D3S1358 D5S818 D7S820 D8S1179 TH01 TPOX vWA
1 X,Y 11,11 8,12 16*,17 12,12 8,10 13,13 8,9.3 10,12 17,18
3 X,X 10,11 12,12 15,18 9,11 11,12 12*,13 9.3,9.3 11,11 16,17
4 X,Y 12,15 12,13 16*,17 10,12 10,10 11,15 9,9.3 8,8 16,17
5 X,X 11,11 10,12 15,15 11,12 10,11 12*,13 6,7 8,8 15,19
6 X,Y 11,12 10,11 15,17 11,13 8,11 12*,13 7,9.3 8,11 16*,17
7 X,Y 10,12 9,12 15*,16 9,10 11,11 12*,13 6,9 10,11 14,18
10 X,X 12,12 11,14 16,18 11,11 10,11 11,13 7,9 9,11 14*,15
13 X,X 12,12 11,12 16*,17 12,13 10,11 14*,14* 6,7 9,11 15,18
14 X,X 12,13 9,12 16*,17 12,12 8,8 13*,13* 6,8 8,9 15*,17
15 X,X 12,12 9,13 16,17 11,12 8,9 10,13 6,9 9,9 15,16
16 X,Y 11,13 9,9 15,17 12,13 8,11 12,13 6,8 8,11 15,17
17 X,X 10,12 11,12 15*,16 12,13 8,8 13*,13* 7,8 8,9 17,19
All STR genotypes in complete concordance with CE data
22 * Indicates the presence of a sequence variant for that allele
Guide for Selection of Candidate STRs
• Select STRs that would be superior to effect mixture interpretation
23
Resolving Same Size Alleles
24
STR Panel
Locus Allele Number of Varying Sequences
vWA 14 2
vWA 15 2
vWA 16 2
D3S1358 15 3
D3S1358 16 3
D8S1179 12 2
D8S1179 13 3
D8S1179 14 2
25
mtDNA Whole Genome Sequencing
26
mtDNA Genome
• Mitochondrial genome • Sequenced with Ion PGM™ workflow • Long PCR primers taken from Gunnarsdottir et al. 2011
• Aligned with Ion Torrent Suite Software v4.0.2 and variant calls made with variantCaller plugin v4.0
• Analyzed in IGV, mitoSAVE, and HaploGrep (Haplogroups verified with EMMA)
• Provided information on population background and maternal relationships
EMMA: Rock et al. 2013 Mito Library Prep Protocol can be found at
ioncommunity.lifetechnolgies.com/community/applications/hid/mito/how_to
mtDNA Genome
IGV
mitoSAVE: King et al. 2014
mitoSAVE
• Mitochondrial Sequencing Analysis of Variants in Excel • Excel-based workbook
• Input VCF files • All vcf file formats
• Develop haplotypes with standardized-forensic conventions
• An .hsd file uploaded to Haplogrep
29
mtDNA Genome
Coverage plots showing areas of consistently high and low coverage across four individuals
30
mtDNA Genome Average Coverage
31
Coverage across genome ranged from 489X – 7029X
mtDNA Genome Strand Balance
32
Positive Negative
mtDNA Genome
33 mtDNA Genome by Nucleotide Position
mtDNA Variants
Coverage threshold for variant calls: 10X
mtDNA Genome
np 16,569/1
np 8,264
Control Region
6 Variants
34
mtDNA Genome
35
mtDNA Genome Sample Haplogroup
1 J1c5 3 H3b 4 U7b 5 H6a1b4 6 H33 7 M7b1a1c1
10 H5n 13 L2a1f 14 H1c 15 H1c 16 L3e1a1a 17 H1c
36
mtDNA Genome
37
mtDNA Genome Sample Haplogroup Population
1 J1c5 European 3 H3b European 4 U7b European 5 H6a1b4 European 6 H33 European 7 M7b1a1c1 Asian
10 H5n European 13 L2a1f African 14 H1c European 15 H1c European 16 L3e1a1a African 17 H1c European
38
mtDNA Genome Sample Haplogroup
1 J1c5 3 H3b 4 U7b 5 H6a1b4 6 H33 7 M7b1a1c1
10 H5n 13 L2a1f 14 H1c 15 H1c 16 L3e1a1a 17 H1c
39
• Highly degraded or compromised samples require other DNA markers to obtain genetic information
• SNPs are single nucleotide base substitutions (or insertion deletions) in the genome and account for 85% of the genetic variability in humans
Single Nucleotide Polymorphisms
41
Types of SNPs
• Individual Identification SNPs: • SNPs that collectively give very low probabilities of two individuals having the
same multisite genotype; individualization, high heterozygosity, low Fst • Ancestry Informative SNPs:
• SNPs that collectively give a high probability of an individual’s ancestry being from one part of the world or being derived from two or more areas of the world
• Lineage Informative SNPs: • Sets of tightly linked SNPs that function as multiallelic markers that can serve to
identify relatives with higher probabilities than simple di-allelic SNPs • Phenotype Informative SNPs:
• SNPs that provide high probability that the individual has particular phenotypes, such as a particular skin color, hair color, eye color, etc.
• Pharmocogenetic SNPs – molecular autopsy
42
General Criteria for Forensic SNP Use
• Easily typed
• Multiplexing
• Highly informative for the stated purpose
43
SNP Panels
• Two SNP Panels • HID-Ion AmpliSeq™ Identity Panel
• 90 autosomal SNPs • 34 upper Y-clade SNPs
• HID-Ion AmpliSeq™ Ancestry Panel • 165 autosomal SNPs
• Sequenced with Ion PGM™ workflow • Analyzed with HID SNP Genotyper Plugin
44
HID-Ion AmpliSeq™ Identity Panel
Learn more at lifetechnologies.com/identity
HID-Ion AmpliSeq™ Ancestry Panel
Learn more at lifetechnologies.com/ancestry
HID-Ion AmpliSeq™ Identity Panel Average Coverage
SNPs
Rea
d D
epth
47 Average read depth =2,233X (2SD = 569X – 3898X)
HID-Ion AmpliSeq™ Identity Panel Strand Balance
SNPs
Posi
tive
Stra
nd C
over
age
Rat
io
48 89/90 (99%) of the SNPs displayed 60-100% strand balance
HID-Ion AmpliSeq™ Identity Panel Allele Coverage Ratio
SNPs
Maj
or A
llele
Cov
erag
e R
atio
49 All SNPs fell within 0.30-0.50 range (all but 2 were >0.40)
HID-Ion AmpliSeq™ Identity Panel Y-SNPs
Average Coverage
SNPs
Rea
d D
epth
50 Average read depth =975X (2SD = 272X – 1678X)
Strand Balance
SNPs
Posi
tive
Stra
nd C
over
age
Rat
io
51
HID-Ion AmpliSeq™ Identity Panel Y-SNPs
33/34 (97%) of the SNPs displayed 60-100% strand balance
HID-Ion AmpliSeq™ Ancestry Panel
Average Coverage
SNPs
Rea
d D
epth
52 Average read depth =1511X (2SD = 242X -2783X)
Strand Balance
SNPs
Posi
tive
Stra
nd C
over
age
Rat
io
53
HID-Ion AmpliSeq™ Ancestry Panel
162/165 (98%) of the SNPs displayed 60-100% strand balance
Allele Coverage Ratio
SNPs
Maj
or A
llele
Cov
erag
e R
atio
54
HID-Ion AmpliSeq™ Ancestry Panel
155/165 (94%) of the SNPs fell within 0.30-0.50 range (10 SNPs were homozygotes)
HID-Ion AmpliSeq™ Identity Panel
• Information from Identity SNPs: • Identification
• Potentially analyze degraded samples • Familial Relationships • Y-SNPs (Gender; Paternal Relationships; Paternal
lineage) • Generated genotypes for 124 SNPs on all male
samples and 90 SNPs (autosomal) on all female samples
55
HID-Ion AmpliSeq™ Identity Panel Sample Gender
1 Male 3 Female 4 Male 5 Female 6 Male 7 Male
10 Female 13 Female 14 Female 15 Female 16 Male 17 Female
56
HID-Ion AmpliSeq™ Identity Panel
Sample rs25
3463
6
rs35
2849
70
rs97
8618
4
rs97
8613
9
rs16
9812
90
rs17
2508
45
L298
P256
P202
rs17
3066
71
rs41
4188
6
rs20
3259
5
rs20
3259
9
rs20
320
rs20
3260
2
rs81
7902
1
rs20
3262
4
rs20
3263
6
rs93
4127
8
rs20
3265
8
rs23
1981
8
rs17
2698
16
rs17
2225
73
M47
9
rs38
4898
2
rs39
00
rs39
11
rs20
3263
1
rs20
3267
3
rs20
3265
2
rs16
9804
26
rs13
4474
43
rs17
8425
18
rs20
3300
3
1 C C A G C G T G T T A T T G T C C G G G G C A C C G A A T T T A G C 4 C C C G C G T G T T A T T G T T A G G A G C A C C G A A T T T A G C 6 C C C G C C T G T A A T T G T C A G G A G C A C C C A G T T T A G A 7 C C C G A G T G T T A T T G T C A G G A G C A C C G A G T T G G G C
16 C C C A C G T G T T G T T G T C A G G A G C A C T C A G T C T A T A
No evidence of paternal relationships among the males
Y-SNPs:
57
HID-Ion AmpliSeq™ Identity Panel
Y-SNPs:
Sample Y-Clade Region 1 R1b West Asia, Russian Plain or Central Asia 4 Q Central Asia, the Indian Subcontinent, Siberia 6 J Arabian Peninsula 7 O2 Asia
16 E Africa
58
HID-Ion AmpliSeq™ Ancestry Panel
• Information from Ancestry SNPs: • Population background
• Generated genotypes for 165 SNPs on all 12 samples
59 Sample: 14
Sample: 7 Sample: 5
Sample Biogeographic Ancestry 1 European 3 European 4 Asian 5 European 6 European 7 Asian
10 European 13 African Americans 14 African admix 15 African admix 16 African 17 European
HID-Ion AmpliSeq™ Ancestry Panel
60
• For this study - bioancestry assignment was limited to major populations • Marker dependent • Reference population dependent • Software dependent
HID-Ion AmpliSeq™ Ancestry Panel
61
IDENTIFYING RELATIONSHIPS
62
Identifying Relationships
Genotypes from STRs and Identity SNPs allow for expansion and refinement of the partial pedigree identified with the mitochondrial haplotypes
63
Identifying Relationships
12,13 13,13
13,13
10,13
Sequence variants present within the D8S1179 alleles
[TCTA]2 [TCTG]1 [TCTA]9 [TCTA]2 [TCTG]1 [TCTA]10
[TCTA]13 [TCTA]1 [TCTG]1 [TCTA]11
[TCTA]2 [TCTG]1 [TCTA]10 [TCTA]1 [TCTG]1 [TCTA]11
[TCTA]10 [TCTA]1 [TCTG]1 [TCTA]11
12,13
13,13
13,13
10,13
64
Identifying Relationships
Pedigree supported by data from Ancestry SNP panel 65
Identifying Relationships
STRs: Likelihood Ratio Results: Posterior probability = 0.999999996671495 Combined likelihood ratio = 300 million
SNPs: Likelihood Ratio Results: Combined likelihood ratio = 3.34 E46
Internal Thermo Fisher kinship algorithm used for calculations
Population Affinity Summary
Sample Gender Maternal Lineage Paternal Lineage Biogeographic Ancestry Relationship
1 Male European West Asia, Russian Plain, or Central Asia
European Unknown
3 Female European Unknown European Unknown
4 Male European Central Asia, the Indian Subcontinent, Siberia
Asian Unknown
5 Female European Unknown European Unknown
6 Male European Arabian Peninsula European Unknown
7 Male Asian Asia Asian Unknown
10 Female European Unknown European Unknown
13 Female African Unknown African American Unknown
14 Female European Unknown African admix Mother
15 Female European Unknown African admix Daughter
16 Male African Africa African Grandfather
17 Female European Unknown European Grandmother
Population Affinity Summary
# Biogeographic Ancestry GM
1 European Caucasian American
3 European Caucasian Southern Europe
4 Asian Southern Asia
5 European Central Caribbean
6 European Southern Europe
7 Asian Eastern Asia
10 European Caucasian American/Eastern Europe
13 African American African American
14 African admix American/Central Caribbean
15 African admix South American/Central Caribbean
16 African Central Caribbean/African
17 European Eastern European
Population Affinity Summary
Biogeography = Bioancestry
Phenotype: brown hair, hazel eyes, white fair skin complexion
69
Analysis of 12 Blinded Samples
• All results consistent with information provided by Green Mountain
• Completed all four marker systems on Ion PGM™ in a reasonably quick time-frame
• Complete profiles • Coverage, strand balance, and allele balance supports
reliable data • Future goals:
• Full validation studies • Continue developing tools for data analysis
70
Next Steps
• Sensitivity of detection • Alternate polymerases • Population data and genetic analyses • Validation studies • Reproducibility • Mixtures • Stochastic effects • Mock cases • ……
71
Sensitivity Study
• Identity Panel • Y SNPs and Ancestry Panel gave similar results
• data not shown because of limited time • 1 ng, 750 pg, 500, pg 250 pg, 100 pg • Criteria
• Depth of Coverage • Strand Bias • Allele Coverage Ratio (~similar to peak height ratio) • Concordance across dilution series
72
Good Chip
73
Good Data
74
Genotype Concordance
One sample with different input DNA
75
Identity Panel Depth of Coverage 1 ng
Avg - 2365X Range – 800 - 4435X
76
Identity Panel Depth of Coverage 750 pg
Avg - 2054X Range – 774 - 4040X
77
Identity Panel Depth of Coverage 500 pg
Avg - 1461X Range – 312 - 3074X
78
Identity Panel Depth of Coverage 250 pg
Avg - 1316X Range – 381 - 2817X
79
Identity Panel Depth of Coverage 100 pg
Avg - 392X Range – 7 -1272X
80
Identity Panel Depth of Coverage bone sample
Avg - 450X Range – 0 -1441X
81
Identity Panel Strand Bias 1 ng
82
Identity Panel Strand Bias 750 pg
83
Identity Panel Strand Bias 500 pg
84
Identity Panel Strand Bias 250 pg
85
Identity Panel Strand Bias 100 pg
86
Identity Panel Strand Bias bone
87
Identity Panel ACR 1 ng
homozygotes
88
Identity Panel ACR 750 pg
homozygotes
89
Identity Panel ACR 500 pg
homozygotes
90
Identity Panel ACR 250 pg
homozygotes
91
Identity Panel ACR 100 pg
Homozygotes and drop out
92
Identity Panel ACR bone
Homozygotes and drop out
93
“Bad” Chip
94
But Good Data!
95
Genotype Concordance (Sample 025)
One sample with different input DNA
96
Identity Panel Depth of Coverage 1 ng
97
Identity Panel Depth of Coverage 750 pg
98
Identity Panel Depth of Coverage 500 pg
99
Identity Panel Depth of Coverage 250 pg
100
Identity Panel Depth of Coverage 100 pg
101
Identity Panel Depth of Coverage bone sample
102
Identity Panel Strand Bias 1 ng
103
Identity Panel Strand Bias 750 pg
104
Identity Panel Strand Bias 500 pg
105
Identity Panel Strand Bias 250 pg
106
Identity Panel Strand Bias 100 pg
107
Identity Panel Strand Bias bone
108
Identity Panel ACR 1 ng
homozygotes homozygotes
109
Identity Panel ACR 750 pg
homozygotes homozygotes
110
Identity Panel ACR 500 pg
homozygotes homozygotes
111
Identity Panel ACR 250 pg
homozygotes homozygotes
112
Identity Panel ACR 100 pg
Homozygotes and drop out
Homozygotes and drop out
113
Identity Panel ACR bone
Homozygotes and drop out Homozygotes and drop out
114
Next Steps
• Used 22 PCR cycles • Establish a baseline
• Increase cycles • 26 cycles were used for 100 pg (Seo et al, 2013) • 10 µl of library for input into the OneTouch2
• Identity Panel • Establish baseline
• Increase input
115
Conclusions
• Robust panels of identity and ancestry SNPs • Robust STR panel • Whole genome mtDNA sequencing • Highly informative • Sensitive • Quantitative – scaling comparison • Low density chip is not necessarily a bad chip • Wide range of density can still yield high quality data • Based on results continue development and validation
• UNT Health Science Center o Jonathan King o Bruce Budowle
• Institute of Legal Medicine,
Innsbruck Austria; Penn State Eberly College of Science o Walther Parson
#1 ACKNOWLEDGMENTS
• Thermo Fisher o Robert Lagace o Wenchi Liao o Joseph Chang o Narasimhan
Rajagopalan o Sharon Wootton o Chien-Wei Chang o Reina Langit o Nnamdi Ihuegbu o Carolina Dallett o Gloria Lam o Jianye Ge