26
Computational Strategies for Tool Marks: Principal Component Analysis and Other Methods 3 2 1 0 1 2 3

Computational Strategies for Tool Marks: Principal  Component  Analysis  and  Other  Methods

  • Upload
    caesar

  • View
    33

  • Download
    2

Embed Size (px)

DESCRIPTION

Computational Strategies for Tool Marks: Principal  Component  Analysis  and  Other  Methods. Outline. Introduction Details of Our Approach Data acquisition Surface/Profile Pre-processing Methods of Statistical Discrimination Error Rate Estimates Suggested Operating Guidelines. - PowerPoint PPT Presentation

Citation preview

Page 1: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Computational Strategies for Tool Marks: Principal  Component  Analysis  and  Other

 Methods

3 2 1 0 1 2 3

Page 2: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Outline• Introduction

• Details of Our Approach• Data acquisition

• Surface/Profile Pre-processing

• Methods of Statistical Discrimination

• Error Rate Estimates

• Suggested Operating Guidelines

Page 3: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

• All impressions made by tools and firearms can be viewed as numerical patterns– Machine learning trains a computer to recognize

patterns • Can give “…the quantitative difference between an

identification and non-identification”Moran • Can yield identification error rate estimates• May be even confidence for I.D.s, posterior error

probability estimates…??

Background Information

Page 4: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

• Obtain striation/impression patterns from 3D confocal microscopy

• Store files in ever expanding database• Data acquisition is labor

intensive and time consuming

• Data files are available to practitioner community through web interface

Data Acquisition

Page 5: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

2D profiles3D surfaces(interactive)

Screwdriver Striation Patterns in Lead

Page 6: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Glock 19 fired cartridge cases

Bottom ofFiring pin imp.

Glock Fired Cartridges

Page 7: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Shear marks on primer of two different Glock 19s

Page 8: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Mean total profile:

Mean “waviness” profile:

Mean “roughness” profile:

Page 9: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

• Nice for data sets to be large but real data acquisition is SLOW• Need SOMETHING in the interim

Profile Simulator

• Our simulator is based on DWT MRA• May shed light on processed generating surfaces

• Should be extendable to 2D striations/impressions…

Page 10: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

• Multivariate Statistical pattern comparison!

• Modern algorithms are called machine learning• Idea is to measure

features of the physical evidence that characterize it

• Train algorithm to recognize “major” differences between groups of featureswhile taking into account

natural variation and measurement error.

What Other Statistics Can Be Used?

Page 11: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

• Need a data matrix to do machine learning

Setup for Multivariate Analysis

Represent as a vector of values

{-4.62, -4.60, -4.58, ...} • Each profile vector is a row in the data matrix • Typical length is ~4000 points/profile• 2D surfaces are far longer• HIGHLY REDUNDANT representation

of surface data

• PCA can:• Remove much of the redundancy• Make discrimination computations

far more tractable

Page 12: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

• 3D PCA 24 Glocks, 720 simulated and real primer shear profiles:

• ~47% variance retained

• How many PCs should we use to represent the data??

• No unique answer

• FIRST we need an algorithm to I.D. a toolmark to a tool

Page 13: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Support Vector Machines• Support Vector Machines (SVM) determine

efficient association rules• In the absence of any knowledge of probability

densities

• For small to medium sized data setsGaussian LDA decision boundary

SVM decision boundary

Page 14: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

• How many PCs should we use?• Gather training set

• Use systematically increase PC dimension and check Hold-one-out Cross-Validation with SVM

PCA-SVM

With 7 PCs, expect ~3% error rate

Page 15: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

• Cross-Validation: systematically hold-out chunks of data set for testing • Most common: Hold-one-out

Error Rate Estimation

• Bootstrap: Make up data sets with randomly selected observation vectors (with replacement)• Repeat thousands of times

• Can yield confidence intervals around error rate estimate

• The Best: Small training set, BIG test set

Page 16: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Refined bootstrapped I.D. error rate for primer shear striation patterns= 0.35% 95% C.I. = [0%, 0.83%]

(sample size = 720 real and simulated profiles)

18D PCA-SVM Primer Shear I.D. Model, 2000 Bootstrap Resamples

Page 17: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

How good of a “match” is it?Conformal Prediction

• You choose the confidence level• A Set of possible I.D. obtained

• Varying confidence changes

the size of the sets

Cum

ulat

ive

# of

Err

ors

Sequence of Unk Obs Vects

80% confidence20% errorSlope = 0.2

95% confidence5% errorSlope = 0.05

99% confidence1% errorSlope = 0.01

• Can give a judge or jury an easy to understand measure of reliability of classification result • Confidence on a scale of 0%-100%

• This is an orthodox “frequentist”

approach

Page 18: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Empirical Bayes’

• Bayes’ Rule: can we realistically estimate posterior error probabilities empirically/falsifiably??• Given algorithm indicates a toolmark is associated with a

tool, what is the probability it is truly not a “match”?

• Probability that there is another tool in the pop. that can generate a toolmark which the algorithm will match to the wrong tool

Probability of no actual association given a test/algorithm indicates a positive ID

Page 19: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Empirical Bayes’• Erfon’s machinery for “empirical Bayes’ two-groups

model”Efron 2007

• Surprisingly simple!• S-, truly no association, Null hypothesis

• S+, truly an association, Non-null hypothesis

• z, a Gaussian random variate derived from a machine learning task to ID an unknown pattern with a group

• Scheme yields estimate of Pr(S-|z) along with it’s standard error• Called the local false discovery rate (fdr) or posterior error

probability

• Given a similarity score, fdr is an estimate of the probability that the computer is wrong in calling a “match”

• Catch: you need A LOT of z and they should be fairly independent and Pr(S-) > 0.9

Page 20: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Empirical Bayes’• Try Erfon’s machinery for “empirical Bayes’ two-groups

model”• Gives a calibrated “no-match posterior probability” model

The SVM alg got thesePrimer shear IDs wrong

Page 21: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Empirical Bayes’• Model’s use with crime scene “unknowns”:

This is the est. post. prob. of no association = 0.00027 = 0.027%

Computer outputs “match” for: unknown crime scene toolmarks-with knowns from “Bob the burglar” tools

This is an uncertainty in the estimate

Page 22: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Future Directions• All feature vectors must be aligned and of the same length,

• Use CCF to align within and between guns.

• Padded any overhang with uniform random noise

Glock 1

Glock 23

A real pain for 1D, not to mention 2D!!

Page 23: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Future Directions• Try invariant feature extraction from digital imaging:

• Legendre moments:

• Translationally invariant

• Trying for 2D striation patterns

• Zernike moments:

• Rotationally and translationally invariant but more expensive.

• Trying for 2D impression patterns

Page 24: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Future Directions

• Extend ImageJ Surface Metrology functionality

• DigitalSurf Mountains map is EXPEN$IVE!!

• DWT and MODWT modules for

• De-noising

• Form-removal

• Surface Filtering

• GP-GPU implementation of computationally intensive routines

• This WILL work in Java with JOCL bindings

• Need big surfaces to make copying overhead worthwhile

Page 25: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Acknowledgements• Research Team:

• Mr. Peter Diaczuk

• Ms. Carol Gambino

• Dr. James Hamby

• Dr. Brooke Kammrath

• Dr. Thomas Kubic

• Mr. Chris Lucky

• Off. Patrick McLaughlin

• Dr. Linton Mohammed

• Mr. Jerry Petillo

• Mr. Nicholas Petraco

• Dr. Graham Rankin

• Dr. Jacqueline Speir

• Dr. Peter Shenkin

• Mr. Peter Tytell

• Helen Chan

• Julie Cohen

• Aurora Dimitrova

• Frani Kammerman

• Loretta Kuo

• Dale Purcel

• Stephanie Pollut

• Chris Singh

• Melodie Yu

Page 26: Computational Strategies for Tool Marks:  Principal  Component  Analysis  and  Other  Methods

Website Information and Reprints/Preprints:

toolmarkstatistics.no-ip.org/

[email protected]

[email protected]