Moving Towards a Validated High Throughput Sequencing Solution for Human Identification: An Evaluation of Two SNP Panels, Autosomal STRs, and Whole Mitochondrial Genomes

Jennifer D. Churchill, Ph.D. Department of Molecular and Medical Genetics

Institute of Applied Genetics University of North Texas Health Science Center

Moving Towards a Validated High Throughput Sequencing Solution for Human Identification: An Evaluation of Two SNP Panels, Autosomal STRs, and Whole Mitochondrial

Genomes

Background

2

http://www.rsc.org/Publishing/Journals/cb/Volume/2009/1/Forensic_science.asp

DNA Typing

• Identification • Associations • Investigative leads • Databases • Demands

• High volume • High throughput • Special attention situations • Expedites

• Technology developments and enhancements to meet demands 3

Areas of Interest

• STRs • Mitochondria • HID SNPs • Ancestry SNPs • Phenotype SNPs • Pharmacogenetics • Microbial Forensics • …gonna need a bigger boat

Image courtesy: geek.com

4

Features of Massively Parallel Sequencing

5

• Higher throughput

• Samples

• Markers

• Multiplexing different types of markers

• Determine actual sequence of alleles

• Massively parallel sequencing

• Depth of coverage

• Reduction in error

• Faster

• Lower cost per nucleotide

OVERVIEW OF ION PGM™ TECHNOLOGY AND WORKFLOW

6

PrepFiler® Express

Quantifiler® Trio

Ion AmpliSeq™ Library Kit 2.0 Ion Library Quantitation Kit

Ion OneTouch™ 2

Ion OneTouch™ ES

Ion PGM™

Torrent Suite™ software

HID SNP Genotyper

Quantification

Library Preparation

Emulsion PCR

Data Analysis

Enrichment & Chip loading

Sequencing

Day 1

DNA Extraction

Day 2

Day 3

Plug-in analysis

1:30

7:00

2:00

3:00

2:00

1:00

6:00

Cycle Time

1:20 0:30

0:30

2:30

0:30

1:30

Hands-on

HID-Ion PGM™ Workflow (8 Samples Ion 314™ Chip) Sample Processing Workflow

7

Library and Template Preparation

Clonal amplification using emulsion PCR

ISP

Primer dNTPs

Polymerase MgCl2

Isolate templated ISPs

Final Templated ISPs ready for sequencing

Emulsion droplet

8

Chemistry

9

Sequence Detection by pH

10

• An “ionogram” is the output of the signals in flow space • Must be read “up-and-down” along with “left-to-right” • Height of bar indicates how many nucleotides incorporated

during flow

Data Output is an Ionogram

Sequence: AATCTTCTG…

Key Sequence TTT

TCAG

AA

11

Bioinformatics

• A science in itself • Many science experiments are carried out with

bioinformatics • “the new field that merges biology, computer science,

and information technology to manage and analyze the data, with the ultimate goal of understanding and modeling living systems."

• Genomics and Its Impact on Medicine and Society - A 2001 Primer U.S. Department of Energy Human Genome Program

12

Markers

• STRs*** • SNPs*** • Indels • Innuls • mtDNA***

• HID-Ion PGMTM System

13

SAMPLES

14 www.dreamstime.com

http://www.dreamstime.com/stock-photography-loading-dna-samples-pcr-image25092092

http://www.dreamstime.com/stock-photography-loading-dna-samples-pcr-image25092092

https://www.google.com/search?q=DNA+samples+as+image&sa=X&rlz=1T4SKPT_enUS423US425&biw=1441&bih=629&tbm=isch&tbs=simg:CAQSWQmjeQApH3WepxpFCxCwjKcIGjwKOggCEhSTG5IQlRuSD-cYjg-6Fd0Y_1Q_1aChogmsv5BEe7orQx_1kzu5f1uzsAZ3-0G4nOtjt--r28YqJUMIf1hSnXbtmDL

Samples

• Data from blind study • Provided by Green Mountain Conference • 12 samples • Used 1 ng template DNA

• Evaluation Criteria • Coverage • Strand Bias • Allele coverage ratio

• Heterozygote allele balance

http://www.cafepress.com/mf/5828392/double-blind-study-mug-large-black-white_mugs

http://www.google.com/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&docid=URhnhW97qDjw3M&tbnid=sitrMAG7tFJt8M:&ved=0CAUQjRw&url=http://www.cafepress.com/mf/5828392/double-blind-study-mug-large-black-white_mugs&ei=dmsDVJWPLIOyyATs54CgCA&psig=AFQjCNGKPELtCClK_1_-eFn_2tyQx1lH7A&ust=1409596506574786

http://www.google.com/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&docid=URhnhW97qDjw3M&tbnid=sitrMAG7tFJt8M:&ved=0CAUQjRw&url=http://www.cafepress.com/mf/5828392/double-blind-study-mug-large-black-white_mugs&ei=dmsDVJWPLIOyyATs54CgCA&psig=AFQjCNGKPELtCClK_1_-eFn_2tyQx1lH7A&ust=1409596506574786

16 www.nitw.ac.in

http://www.nitw.ac.in/automation/

http://www.nitw.ac.in/automation/

https://www.google.com/search?q=results+as+images&sa=X&rlz=1T4SKPT_enUS423US425&biw=1441&bih=629&tbm=isch&tbs=simg:CAQSWQlRwpRoR5CcFBpFCxCwjKcIGjwKOggCEhTiC6kWhQz1Fs0W0A2FC98M8QrdCBog_1hDBobOZ63CPELuoJ00YWtKfO9CeHqL76FP9nve1m5YMIZcLC1NsLRa5

AATG AATG AATG AATG AATG



AATG

AATG AATG

STRs

• Primary markers for identity testing • High discrimination power

17

STR Panel • 10plex STR Panel

• Amelogenin • D16S359 • D3S1358 • D5S818 • CSF1PO

• Analyzed with: • STRait Razor • STR Genotyper Plugin

• Compared data to genotypes generated on 3130xl Genetic Analyzer

• Further evaluation of sequence variants within alleles

STRait Razor: Warshauer et al. 2013

• D7S820 • D8S1179 • TH01 • TPOX • vWA

STR Panel Allele Coverage Ratio

19

STRs

STR Panel

20

Sequence Coverage Ratio

STRs

STR Panel

Comparable method of visualizing STR results between MPS and CE platforms

21

STR Panel Sample AMEL CSF1PO D16S539 D3S1358 D5S818 D7S820 D8S1179 TH01 TPOX vWA

1 X,Y 11,11 8,12 16*,17 12,12 8,10 13,13 8,9.3 10,12 17,18

3 X,X 10,11 12,12 15,18 9,11 11,12 12*,13 9.3,9.3 11,11 16,17

4 X,Y 12,15 12,13 16*,17 10,12 10,10 11,15 9,9.3 8,8 16,17

5 X,X 11,11 10,12 15,15 11,12 10,11 12*,13 6,7 8,8 15,19

6 X,Y 11,12 10,11 15,17 11,13 8,11 12*,13 7,9.3 8,11 16*,17

7 X,Y 10,12 9,12 15*,16 9,10 11,11 12*,13 6,9 10,11 14,18

10 X,X 12,12 11,14 16,18 11,11 10,11 11,13 7,9 9,11 14*,15

13 X,X 12,12 11,12 16*,17 12,13 10,11 14*,14* 6,7 9,11 15,18

14 X,X 12,13 9,12 16*,17 12,12 8,8 13*,13* 6,8 8,9 15*,17

15 X,X 12,12 9,13 16,17 11,12 8,9 10,13 6,9 9,9 15,16

16 X,Y 11,13 9,9 15,17 12,13 8,11 12,13 6,8 8,11 15,17

17 X,X 10,12 11,12 15*,16 12,13 8,8 13*,13* 7,8 8,9 17,19

All STR genotypes in complete concordance with CE data

22 * Indicates the presence of a sequence variant for that allele

Guide for Selection of Candidate STRs

• Select STRs that would be superior to effect mixture interpretation

23

Resolving Same Size Alleles

24

STR Panel

Locus Allele Number of Varying Sequences

vWA 14 2

vWA 15 2

vWA 16 2

D3S1358 15 3

D3S1358 16 3

D8S1179 12 2

D8S1179 13 3

D8S1179 14 2

25

mtDNA Whole Genome Sequencing

26

mtDNA Genome

• Mitochondrial genome • Sequenced with Ion PGM™ workflow • Long PCR primers taken from Gunnarsdottir et al. 2011

• Aligned with Ion Torrent Suite Software v4.0.2 and variant calls made with variantCaller plugin v4.0

• Analyzed in IGV, mitoSAVE, and HaploGrep (Haplogroups verified with EMMA)

• Provided information on population background and maternal relationships

EMMA: Rock et al. 2013 Mito Library Prep Protocol can be found at

ioncommunity.lifetechnolgies.com/community/applications/hid/mito/how_to

mtDNA Genome

IGV

mitoSAVE: King et al. 2014

mitoSAVE

• Mitochondrial Sequencing Analysis of Variants in Excel • Excel-based workbook

• Input VCF files • All vcf file formats

• Develop haplotypes with standardized-forensic conventions

• An .hsd file uploaded to Haplogrep

29

mtDNA Genome

Coverage plots showing areas of consistently high and low coverage across four individuals

30

mtDNA Genome Average Coverage

31

Coverage across genome ranged from 489X – 7029X

mtDNA Genome Strand Balance

32

Positive Negative

mtDNA Genome

33 mtDNA Genome by Nucleotide Position

mtDNA Variants

Coverage threshold for variant calls: 10X

mtDNA Genome

np 16,569/1

np 8,264

Control Region

6 Variants

34

mtDNA Genome

35

mtDNA Genome Sample Haplogroup

1 J1c5 3 H3b 4 U7b 5 H6a1b4 6 H33 7 M7b1a1c1

10 H5n 13 L2a1f 14 H1c 15 H1c 16 L3e1a1a 17 H1c

36

mtDNA Genome

37

mtDNA Genome Sample Haplogroup Population

1 J1c5 European 3 H3b European 4 U7b European 5 H6a1b4 European 6 H33 European 7 M7b1a1c1 Asian

10 H5n European 13 L2a1f African 14 H1c European 15 H1c European 16 L3e1a1a African 17 H1c European

38

mtDNA Genome Sample Haplogroup

1 J1c5 3 H3b 4 U7b 5 H6a1b4 6 H33 7 M7b1a1c1

10 H5n 13 L2a1f 14 H1c 15 H1c 16 L3e1a1a 17 H1c

39

SNPS

40

http://www.snpsandchips.com/tutorial_a1.html

• Highly degraded or compromised samples require other DNA markers to obtain genetic information

• SNPs are single nucleotide base substitutions (or insertion deletions) in the genome and account for 85% of the genetic variability in humans

Single Nucleotide Polymorphisms

41

Types of SNPs

• Individual Identification SNPs: • SNPs that collectively give very low probabilities of two individuals having the

same multisite genotype; individualization, high heterozygosity, low Fst • Ancestry Informative SNPs:

• SNPs that collectively give a high probability of an individual’s ancestry being from one part of the world or being derived from two or more areas of the world

• Lineage Informative SNPs: • Sets of tightly linked SNPs that function as multiallelic markers that can serve to

identify relatives with higher probabilities than simple di-allelic SNPs • Phenotype Informative SNPs:

• SNPs that provide high probability that the individual has particular phenotypes, such as a particular skin color, hair color, eye color, etc.

• Pharmocogenetic SNPs – molecular autopsy

42

General Criteria for Forensic SNP Use

• Easily typed

• Multiplexing

• Highly informative for the stated purpose

43

SNP Panels

• Two SNP Panels • HID-Ion AmpliSeq™ Identity Panel

• 90 autosomal SNPs • 34 upper Y-clade SNPs

• HID-Ion AmpliSeq™ Ancestry Panel • 165 autosomal SNPs

• Sequenced with Ion PGM™ workflow • Analyzed with HID SNP Genotyper Plugin

44

HID-Ion AmpliSeq™ Identity Panel

Learn more at lifetechnologies.com/identity

HID-Ion AmpliSeq™ Ancestry Panel

Learn more at lifetechnologies.com/ancestry

HID-Ion AmpliSeq™ Identity Panel Average Coverage

SNPs

Rea

d D

epth

47 Average read depth =2,233X (2SD = 569X – 3898X)

HID-Ion AmpliSeq™ Identity Panel Strand Balance

SNPs

Posi

tive

Stra

nd C

over

age

Rat

io

48 89/90 (99%) of the SNPs displayed 60-100% strand balance

HID-Ion AmpliSeq™ Identity Panel Allele Coverage Ratio

SNPs

Maj

or A

llele

Cov

erag

e R

atio

49 All SNPs fell within 0.30-0.50 range (all but 2 were >0.40)

HID-Ion AmpliSeq™ Identity Panel Y-SNPs

Average Coverage

SNPs

Rea

d D

epth

50 Average read depth =975X (2SD = 272X – 1678X)

Strand Balance

SNPs

Posi

tive

Stra

nd C

over

age

Rat

io

51

HID-Ion AmpliSeq™ Identity Panel Y-SNPs

33/34 (97%) of the SNPs displayed 60-100% strand balance


Average Coverage

SNPs

Rea

d D

epth

52 Average read depth =1511X (2SD = 242X -2783X)

Strand Balance

SNPs

Posi

tive

Stra

nd C

over

age

Rat

io

53


162/165 (98%) of the SNPs displayed 60-100% strand balance

Allele Coverage Ratio

SNPs

Maj

or A

llele

Cov

erag

e R

atio

54


155/165 (94%) of the SNPs fell within 0.30-0.50 range (10 SNPs were homozygotes)


• Information from Identity SNPs: • Identification

• Potentially analyze degraded samples • Familial Relationships • Y-SNPs (Gender; Paternal Relationships; Paternal

lineage) • Generated genotypes for 124 SNPs on all male

samples and 90 SNPs (autosomal) on all female samples

55

HID-Ion AmpliSeq™ Identity Panel Sample Gender

1 Male 3 Female 4 Male 5 Female 6 Male 7 Male

10 Female 13 Female 14 Female 15 Female 16 Male 17 Female

56


Sample rs25

3463

6

rs35

2849

70

rs97

8618

4

rs97

8613

9

rs16

9812

90

rs17

2508

45

L298

P256

P202

rs17

3066

71

rs41

4188

6

rs20

3259

5

rs20

3259

9

rs20

320

rs20

3260

2

rs81

7902

1

rs20

3262

4

rs20

3263

6

rs93

4127

8

rs20

3265

8

rs23

1981

8

rs17

2698

16

rs17

2225

73

M47

9

rs38

4898

2

rs39

00

rs39

11

rs20

3263

1

rs20

3267

3

rs20

3265

2

rs16

9804

26

rs13

4474

43

rs17

8425

18

rs20

3300

3

1 C C A G C G T G T T A T T G T C C G G G G C A C C G A A T T T A G C 4 C C C G C G T G T T A T T G T T A G G A G C A C C G A A T T T A G C 6 C C C G C C T G T A A T T G T C A G G A G C A C C C A G T T T A G A 7 C C C G A G T G T T A T T G T C A G G A G C A C C G A G T T G G G C

16 C C C A C G T G T T G T T G T C A G G A G C A C T C A G T C T A T A

No evidence of paternal relationships among the males

Y-SNPs:

57


Y-SNPs:

Sample Y-Clade Region 1 R1b West Asia, Russian Plain or Central Asia 4 Q Central Asia, the Indian Subcontinent, Siberia 6 J Arabian Peninsula 7 O2 Asia

16 E Africa

58


• Information from Ancestry SNPs: • Population background

• Generated genotypes for 165 SNPs on all 12 samples

59 Sample: 14

Sample: 7 Sample: 5

Sample Biogeographic Ancestry 1 European 3 European 4 Asian 5 European 6 European 7 Asian

10 European 13 African Americans 14 African admix 15 African admix 16 African 17 European


60

• For this study - bioancestry assignment was limited to major populations • Marker dependent • Reference population dependent • Software dependent


61

IDENTIFYING RELATIONSHIPS

62

Identifying Relationships

Genotypes from STRs and Identity SNPs allow for expansion and refinement of the partial pedigree identified with the mitochondrial haplotypes

63


12,13 13,13

13,13

10,13

Sequence variants present within the D8S1179 alleles

[TCTA]2 [TCTG]1 [TCTA]9 [TCTA]2 [TCTG]1 [TCTA]10

[TCTA]13 [TCTA]1 [TCTG]1 [TCTA]11

[TCTA]2 [TCTG]1 [TCTA]10 [TCTA]1 [TCTG]1 [TCTA]11

[TCTA]10 [TCTA]1 [TCTG]1 [TCTA]11

12,13

13,13

13,13

10,13

64


Pedigree supported by data from Ancestry SNP panel 65


STRs: Likelihood Ratio Results: Posterior probability = 0.999999996671495 Combined likelihood ratio = 300 million

SNPs: Likelihood Ratio Results: Combined likelihood ratio = 3.34 E46

Internal Thermo Fisher kinship algorithm used for calculations

Population Affinity Summary

Sample Gender Maternal Lineage Paternal Lineage Biogeographic Ancestry Relationship

1 Male European West Asia, Russian Plain, or Central Asia

European Unknown

3 Female European Unknown European Unknown

4 Male European Central Asia, the Indian Subcontinent, Siberia

Asian Unknown


6 Male European Arabian Peninsula European Unknown

7 Male Asian Asia Asian Unknown


13 Female African Unknown African American Unknown

14 Female European Unknown African admix Mother

15 Female European Unknown African admix Daughter

16 Male African Africa African Grandfather

17 Female European Unknown European Grandmother


# Biogeographic Ancestry GM

1 European Caucasian American

3 European Caucasian Southern Europe

4 Asian Southern Asia

5 European Central Caribbean

6 European Southern Europe

7 Asian Eastern Asia

10 European Caucasian American/Eastern Europe

13 African American African American

14 African admix American/Central Caribbean

15 African admix South American/Central Caribbean

16 African Central Caribbean/African

17 European Eastern European


Biogeography = Bioancestry

Phenotype: brown hair, hazel eyes, white fair skin complexion

69

Analysis of 12 Blinded Samples

• All results consistent with information provided by Green Mountain

• Completed all four marker systems on Ion PGM™ in a reasonably quick time-frame

• Complete profiles • Coverage, strand balance, and allele balance supports

reliable data • Future goals:

• Full validation studies • Continue developing tools for data analysis

70

Next Steps

• Sensitivity of detection • Alternate polymerases • Population data and genetic analyses • Validation studies • Reproducibility • Mixtures • Stochastic effects • Mock cases • ……

71

Sensitivity Study

• Identity Panel • Y SNPs and Ancestry Panel gave similar results

• data not shown because of limited time • 1 ng, 750 pg, 500, pg 250 pg, 100 pg • Criteria

• Depth of Coverage • Strand Bias • Allele Coverage Ratio (~similar to peak height ratio) • Concordance across dilution series

72

Good Chip

73

Good Data

74

Genotype Concordance

One sample with different input DNA

75

Identity Panel Depth of Coverage 1 ng

Avg - 2365X Range – 800 - 4435X

76

Identity Panel Depth of Coverage 750 pg

Avg - 2054X Range – 774 - 4040X

77


Avg - 1461X Range – 312 - 3074X

78


Avg - 1316X Range – 381 - 2817X

79


Avg - 392X Range – 7 -1272X

80

Identity Panel Depth of Coverage bone sample

Avg - 450X Range – 0 -1441X

81

Identity Panel Strand Bias 1 ng

82

Identity Panel Strand Bias 750 pg

83


84


85


86

Identity Panel Strand Bias bone

87

Identity Panel ACR 1 ng

homozygotes

88

Identity Panel ACR 750 pg

homozygotes

89


homozygotes

90


homozygotes

91


Homozygotes and drop out

92

Identity Panel ACR bone


93

“Bad” Chip

94

But Good Data!

95

Genotype Concordance (Sample 025)

One sample with different input DNA

96

Identity Panel Depth of Coverage 1 ng

97


98


99


100


101

Identity Panel Depth of Coverage bone sample

102

Identity Panel Strand Bias 1 ng

103


104


105


106


107

Identity Panel Strand Bias bone

108

Identity Panel ACR 1 ng

homozygotes homozygotes

109



110



111



112




113

Identity Panel ACR bone

Homozygotes and drop out Homozygotes and drop out

114

Next Steps

• Used 22 PCR cycles • Establish a baseline

• Increase cycles • 26 cycles were used for 100 pg (Seo et al, 2013) • 10 µl of library for input into the OneTouch2

• Identity Panel • Establish baseline

• Increase input

115

Conclusions

• Robust panels of identity and ancestry SNPs • Robust STR panel • Whole genome mtDNA sequencing • Highly informative • Sensitive • Quantitative – scaling comparison • Low density chip is not necessarily a bad chip • Wide range of density can still yield high quality data • Based on results continue development and validation

• UNT Health Science Center o Jonathan King o Bruce Budowle

• Institute of Legal Medicine,

Innsbruck Austria; Penn State Eberly College of Science o Walther Parson

#1 ACKNOWLEDGMENTS

• Thermo Fisher o Robert Lagace o Wenchi Liao o Joseph Chang o Narasimhan

Rajagopalan o Sharon Wootton o Chien-Wei Chang o Reina Langit o Nnamdi Ihuegbu o Carolina Dallett o Gloria Lam o Jianye Ge