Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
8/19/2019
1
Artificial Intelligence in Forensic DNA Interpretation:
Artifact Management and Number of Contributor
Prediction
Michael A. Marciano, Ph.D. Jonathan D. Adelman, M.S.Research Assistant Professor Research Assistant Professor
College of Arts and SciencesForensic and National Security Sciences Institute
August 1 2019
Green Mountain DNA Conference 2
Overview
• Overview of AI (Machine Learning) Why? What to expect?
• Application: Artifact ID and NoC PACE v1 vs PACE v2
3
The Anatomy of Decision
https://www.pinterest.com/pin/1477812348803917/
4
This is a long lasting love story…
Data + __________ DecisionData & Decision making Experience
Validation conclusionsComputational/Statistical output
Input Prediction Judgement Decision
8/19/2019
2
5
Process of making a decision
Input is needed
• Electropherogram• Experience• How much DNA?• Degraded• Locus and profile wide assessments• Process related expectations (e.g. pull-up,
stutter etc)• Validation data
1. Artifact vs Allele?2. NOC?
6
Process of making a decision
JUDGEMENT
How do we value or weight the input?
https://www.abc.net.au/news/2018‐11‐07/legal‐system‐1/10465232
7
What is machine learning?
http://www.itbriefcase.net/machine‐learning‐an‐intuitive‐definition
8
Definitions
• Artificial intelligence• Definition: capability of a machine
to…• …imitate intelligent human behavior• …perform tasks that normally require
human intelligence, such as:• speech recognition• image recognition• translation• decision-making
College of Arts and Sciences | Forensic and National Security Sciences Institute
8/19/2019
3
9
Definitions
• Machine learning• Definition: capability of a computer
to learn without being explicitly programmed
• Branch of AI• Unlike other AI, these algorithms are
dynamic, adjust in response to data• Example: handwritten address
interpretation
College of Arts and Sciences | Forensic and National Security Sciences Institute 10
Projected utility
• “The last 10 years have been about building a world that is mobile-first. In the next 10 years, we will shift to a world that is AI-first.” (Sundar Pichai, CEO of Google, October 2016)
• “It’s hard to overstate how big of an impact it's going to have on society over the next 20 years.” (Jeff Bezos, CEO of Amazon, May 2016)
• Total value of machine learning M&A, 2014-2017• A.I. startup acquisitions, 2013-2017
College of Arts and Sciences | Forensic and National Security Sciences Institute
11
Projected utility
• Two core aspects of machine learning:• Data
• Bottleneck: machine learning requires massive data sets
• Resolution: Big data; internet of things• Computational power
• Bottleneck: Moore’s Law• Resolution: GPUs
• Take-home points:• No immediate bottlenecks• Potential application space >> applied
application space
College of Arts and Sciences | Forensic and National Security Sciences Institute 12
Advances
• Rapid adoption…
College of Arts and Sciences | Forensic and National Security Sciences Institute
8/19/2019
4
13
Advances
• Rapid adoption…
College of Arts and Sciences | Forensic and National Security Sciences Institute 14
Advances
• Rapid adoption…
College of Arts and Sciences | Forensic and National Security Sciences Institute
15
Advances
• Rapid adoption…
College of Arts and Sciences | Forensic and National Security Sciences Institute 16
Advances
• Rapid adoption…but not in forensic science• Latent prints• Firearms• DNA
College of Arts and Sciences | Forensic and National Security Sciences Institute
8/19/2019
5
17
Advances
• Rapid adoption…but not in forensic science• Latent prints: age estimation (Merkel et al.)• Firearms• DNA
College of Arts and Sciences | Forensic and National Security Sciences Institute 18
Advances
• Rapid adoption…but not in forensic science• Latent prints: age estimation (Merkel et al.)• Firearms: chemical analysis of GSR (Gallidabino et al.)• DNA
College of Arts and Sciences | Forensic and National Security Sciences Institute
19
Advances
• Rapid adoption…but not in forensic science• Latent prints: age estimation (Merkel et al.)• Firearms: chemical analysis of GSR (Gallidabino et al.)• DNA: 2014 NIJ – mixture interpretation (Marciano and Adelman)
PACECell morphology (Christopher Ehrhardt et al.)EPG peak classification (Taylor et al.)
College of Arts and Sciences | Forensic and National Security Sciences Institute 20
Advances
• Rapid adoption…but not in forensic science• Latent prints: age estimation (Merkel et al.)• Firearms: chemical analysis of GSR (Gallidabino et al.)• DNA: 2014 NIJ – mixture interpretation (Marciano and Adelman)
PACECell morphology (Christopher Ehrhardt et al.)EPG peak classification (Taylor et al.)
• Evolutionary steps forward, but not disruptive innovation• Biggest hurdles are data availability and practitioners’ caution
College of Arts and Sciences | Forensic and National Security Sciences Institute
8/19/2019
6
21
Conclusions
• AI is coming, whether forensic science is ready or not• Preparation for the paradigm shift:
• Treat data as fuel• Gain basic MLAI understanding
College of Arts and Sciences | Forensic and National Security Sciences Institute
http://www.buckhamduffy.com/blog/artificial‐intelligence‐machine‐learning‐and‐data
22
Application:
PACE: Probabilistic Assessment for Contributor Estimation
23
Why?
Adapted from: https://undsci.berkeley.edu/article/_0_0/howscienceworks_09
So, how many contributors do
you think there is?
You might want to get a snack first…this could take
a while.
24
What is PACE v2?
• Hybrid statistical and machine learning technique
• Probabilistic method to predict the number of contributors and assess profile complexity Fully Continuous Rapid 100% reproducible Outputs probabilities of classes (1-4+) … 5+ coming soon
• Artifact identification/correction Not just contributor estimation Traditional and non-traditional stutter Pull-up Excess noise
8/19/2019
7
25
Introducing PACE : Software Assist Tool
College of Arts and Sciences | Forensic and National Security Sciences Institute 26
PACE is NOT…
• Not…a method to assign allelic data to individual contributors PACE Focus: number of contributors
• Does not directly use allele labels to make conclusions PACE Focus: distinguishing true signal from artifactual signal and
recognizing patterns associated with NOC
• Not… a magic bullet PACE is an assist tool, meant to complement analyst intuition
27
PACE Models , Helper Algorithms and Results
College of Arts and Sciences | Forensic and National Security Sciences Institute 28
Sample sets
PowerPlex® Fusion 6c Globalfiler ™
Sample # 1969 3921Individuals 120 79
Template Range 3.0pg – 5.1ng 3.0pg – 3.5ngMixture Ratios 49 88
Instruments 9 ( 5 – 3500, 4 – 31XX series) 3 (3500s)
Injection time / voltage 9 times / 2 kVs 4 times / 2 kVs
College of Arts and Sciences | Forensic and National Security Sciences Institute
8/19/2019
8
29
Artifact Identification/Correction
Dynamic analytical threshold
• Detect alleles and remove low-level noise. • Locus - sample - specific threshold (LSST).
1
2
Pull-up • Machine learning• Automated ID and removal
30
Dynamic Analytical Threshold : Locus-Sample Specific
63.9
6.6
0.0625 ng
4.0 ng
= 23.12 ± 10.21𝑥
= 1.98 ± 1.16𝑥
31
Artifact Identification/Correction
Dynamic analytical threshold
Stutter filter
Trimming algorithms
Machine learning signal assessment
• Detect alleles and remove low-level noise. • Locus - sample - specific threshold (LSST).
• Removes noise remaining from thresholding
• Removes effects of stutter• Models: a-10 to a+5
• Probabilistic assessment and correction of remaining signal
1
2
3
4
Pull-up • Machine learning• Automated ID and removal
5
32
Results: Accuracy of detection and stutter removal
Threshold / trimming method
Stutter filter
Dropout alleles Accuracy
Incorrect remaining
alleles
Percentage of additional alleles
LSST-NR modeled 583 97.2% 142 0.79%
LSST modeled 362 98.2% 746 3.67%
50 RFU - NR modeled 1225 94.1% 44 0.23%
50 RFU stock 3004 85.5% 116 0.66%
100 RFU - NR modeled 2301 88.9% 16 0.09%
100 RFU stock 4059 80.4% 51 0.31%
150 RFU - NR modeled 3330 83.9% 6 0.03%
150 RFU stock 4957 76.0% 31 0.20%
Marciano, M. A., Williamson, V. R. & Adelman, J. D. A hybrid approach to increase the informedness of CE‐based data using locus‐specific thresholding and machine learning. Forensic Sci. Int. Genet. 35, 26–37 (2018)
Increasing
Incr
easi
ng
8/19/2019
9
33
Accuracy: Artifact removal
0 1000 2000 3000 4000 5000 6000 7000Number of peaks
0 20000 40000 60000 80000 100000 120000
area/height
low-min:max
excess noise
pull-up
stutter
LSST-NR
Number of peaks
95.9%
96.7%
95.7%
96.7%
94.7%
96.4%
94.1%
94.1%
93.5%
96.4%
97.0%
96.8%
PACE-GF Accuracy PACE-PPF6c Accuracy
34
PACE: Maximum Probability PP Fusion 6c® Results
PACE Predicted Number of Contributors Accuracy1 2 3 4+
Expected Number of
Contributors
1 58 2 0 0 96.0%2 6 72 0 1 91.4%3 0 7 26 2 94.9%
4+ 0 1 1 21 97.5%
College of Arts and Sciences | Forensic and National Security Sciences Institute
Submitted for Publication
PP Fusion 6c®
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑒𝑣𝑒𝑛𝑡𝑠
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑒𝑣𝑒𝑛𝑡𝑠
35
PACE-PPF6c: Detailed results
36
PACE: Maximum Probability Globalfiler™ Results
PACE Predicted Number of Contributors Accuracy1 2 3 4+
Expected Number of
Contributors
1 271 0 0 0 98.6%2 9 137 4 1 96.3%3 2 11 119 8 94.4%
4+ 0 4 19 200 95.9%
College of Arts and Sciences | Forensic and National Security Sciences Institute
Globalfiler™
8/19/2019
10
37
PACE-GF: Detailed results
38
4-contributor, 9:1:1:1, 0.18ng, degraded-12mU DNase I,
Pr(3)=0.18 Pr(4+)=0.82
39
5-contributor, 1:1:1:1:1, 0.075ng, degraded-12mU DNase IPr(1)= 0.22, Pr(2)=0.29, Pr(3)= 0.11 and Pr(4+)=0.37
40
PPF6c Incorrect call – sample quality
Bottom line –low quality sample
Actual nExpected mixture
ratio
Template DNA amplified (ng)
PACE predicted n
Class probability % dropout alleles
Mean % allele sharing1 2 3 4+
2 2:1 0.0375 4 0.00 0.40 0.19 0.40 27.7% 29.3%
8/19/2019
11
41
Interpreting Results
https://marketoonist.com/2014/04/big‐data‐analytics.html
42
Interpreting NoC Results…Output
College of Arts and Sciences | Forensic and National Security Sciences Institute
Correct Incorrect Percent Correct
Expected NOC
1 33 2 94 %2 48 0 100 %3 38 4 90 %4 17 3 85 %
P(NOC_1) P(NOC_2) P(NOC_3) P(NOC_4)
0 0.65 0.35 0
0 0.73 0.27 0
• What is the probability threshold…is 0.73 good enough?o At what probability should you know that results lack confidence?
43
PACE Results: Probability threshold
• At what probability do we expect correct result?o At what probability should you know that results lack confidence?
• Ultimately this is a lab-specific validation task
Maximum Class
Probability
PACE-PPF6c PACE-GF
% Correct % of total samples % Correct % of total samples
0.95 97.6 62.1% (123/198) 99.4 59.1% (464/785)0.9 97.8 69.7% (138/198) 99.3 68.3% (536/785)0.8 96.1 77.8% (154/198) 99.0 78.1% (613/785)
44
Prediction : Putting on the thinking cap
Prediction (method)• Input
• Electropherogram• Experience• Quality/ Quantity of DNA • Locus and profile wide
assessments• Process related components• Validation data
• PACE OUTPUT
Traditional
New
Input Prediction Judgement Decision
PACE…A new “input” tool
8/19/2019
12
45
Conclusions (1)
College of Arts and Sciences | Forensic and National Security Sciences Institute
• Fully continuous probabilistic approach • High accuracy and resolution between 3 & 4+• Reproducible• No change in computational resources• Fast (seconds) • Use prior to PG
46
Conclusions (2)
College of Arts and Sciences | Forensic and National Security Sciences Institute
PACE assigns weights to each probability class, allowing the analyst to assess the distribution of probabilities to aid in the decision-making process.
The combination of PACE and manual interpretation is more accurate and robust than either used in isolation.
47
Resources
College of Arts and Sciences | Forensic and National Security Sciences Institute
Developmental Validation of PACE™: Automated Artifact Identification and Contributor Estimation for use with GlobalFiler™ and PowerPlex® Fusion 6c Generated Data
Michael Marciano, Jonathan Adelman
FSI Genetics 2019 (accepted revision last night)
48
Acknowledgements
Other• Laura Haarer• Victoria Williamson• Angie Zhao• Ebrar Mohammed
National Institute of Justice
• NYC OCME• Oakland PD• Indiana State Police• Washington DC DFS• Rutgers University• San Diego Sheriffs Dept• Onondaga County CFS
College of Arts and Sciences | Forensic and National Security Sciences Institute
Niche Vision LLC• Luigi Armogida• Vic Meles• Tom Faris
Contributing Laboratories• CT Division of Scientific Services• Palm Beach County Sheriffs Office• Kansas City PD• Michigan State Police• Erie County DFS• Kentucky State Police• Promega Corporation• Idaho State Police Forensic Services
8/19/2019
13
49
Thank You
Questions
[email protected] ; [email protected]
Phone: 315-443-5279107 College Place; 120 LSB
Syracuse, NY 13244
College of Arts and Sciences | Forensic and National Security Sciences Institute