15
NEAFS Probabilistic DNA Mixture Interpretation Workshop 9/26/2015 Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic DNA Mixture Interpretation Workshop Allentown, PA September 25-26, 2015 Simone Gittelson, Ph.D., [email protected] Michael Coble, Ph.D., [email protected] Acknowledgement I thank Michael Coble, Bruce Weir and John Buckleton for their helpful discussions. Disclaimer Points of view in this presentation are mine and do not necessarily represent the official position or policies of the National Institute of Standards and Technology.

Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 1

Likelihood Ratios for Mixtures: Continuous

Approach

NEAFSProbabilistic DNA Mixture

Interpretation WorkshopAllentown, PA

September 25-26, 2015

Simone Gittelson, Ph.D., [email protected]

Michael Coble, Ph.D., [email protected]

AcknowledgementI thank Michael Coble, Bruce Weir and John Buckleton for their helpful discussions.

DisclaimerPoints of view in this presentation are mine and do not necessarily represent the official position or policies of the National Institute of Standards and Technology.

Page 2: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 2

From Semi-continuous to Continuous

13

14

15

275

103

325

AT1413 15

D8S1179

• The observed peaks as a discrete variable:

• The observed peaks as a continuous variable:

presentabsent

0 50 100 150

Discrete vs. Continuous

0 50 100 150

Page 3: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 3

15 15

Discrete vs. Continuous

The observed peak as a discrete variable:

absent present

𝑃𝑟 𝑝𝑒𝑎𝑘 𝑎𝑏𝑠𝑒𝑛𝑡= 𝐶 1 − 𝑝15 + 𝑝15𝐷

𝑃𝑟 𝑝𝑒𝑎𝑘 𝑝𝑟𝑒𝑠𝑒𝑛𝑡= 𝑝15 𝐷 + 1 − 𝑝15 𝐶𝑝15

no drop-in and donor has no 15or

no drop-in, donor has 15 and drop-out

donor has 15 and no drop-outor

donor has no 15 and drop-in of 15

The observed peaks as a continuous variable:

Discrete vs. Continuous

1515 15 15 15

15111423

14671578

1382

𝑃𝑟 𝑝𝑒𝑎𝑘 𝑖𝑠 1511𝑟𝑓𝑢

𝑃𝑟 𝑝𝑒𝑎𝑘 𝑖𝑠 1423𝑟𝑓𝑢

𝑃𝑟 𝑝𝑒𝑎𝑘 𝑖𝑠 1467𝑟𝑓𝑢

𝑃𝑟 𝑝𝑒𝑎𝑘 𝑖𝑠 1578𝑟𝑓𝑢

𝑃𝑟 𝑝𝑒𝑎𝑘 𝑖𝑠 1382𝑟𝑓𝑢

Page 4: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 4

The observed peaks as a continuous variable:

Discrete vs. Continuous

15

1511

15

1423

15

1467

15

1578

15

1382

Normal distribution

Gamma distribution

• The observed peaks as a discrete variable:

• The observed peaks as a continuous variable:

Discrete vs. Continuous

absence presence

pro

bab

ility

peak height

pro

bab

ility

den

sity

This is a probability.

This is a probability density.

Page 5: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 5

𝐿𝑅 = 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑠𝑒𝑡𝑠|𝐻𝑝

𝑤𝑒𝑖𝑔ℎ𝑡 × 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑚𝑎𝑡𝑐ℎ 𝑝𝑟𝑜𝑏.

𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑠𝑒𝑡𝑠|𝐻𝑑𝑤𝑒𝑖𝑔ℎ𝑡 × 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑚𝑎𝑡𝑐ℎ 𝑝𝑟𝑜𝑏.

Likelihood Ratio: from semi-continuous to continuous

stays the same

(Hardy-Weinberg Law, NRC II recommendation 4.1, NRC II recommendation 4.2)

𝐿𝑅 = 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑠𝑒𝑡𝑠|𝐻𝑝

𝑤𝑒𝑖𝑔ℎ𝑡 × 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑚𝑎𝑡𝑐ℎ 𝑝𝑟𝑜𝑏.

𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑠𝑒𝑡𝑠|𝐻𝑑𝑤𝑒𝑖𝑔ℎ𝑡 × 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑚𝑎𝑡𝑐ℎ 𝑝𝑟𝑜𝑏.

Likelihood Ratio: from semi-continuous to continuous

changessemi-continuous weight: any value between 0 and 1, including0 and 1, assigned using probabilities of drop-out and drop-in

continuous: a probability density, generally assigned basedon the results of simulations that attempt to reproduce the

observed peak heights

Page 6: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 6

Likelihood Ratio

match probabilities

the probability of genotype set 𝑆𝑗 given that we have observed

genotypes 𝐺𝐾 and that the contributors are as specified in 𝐻𝑝 or 𝐻𝑑

𝐿𝑅 = 𝑗=1𝑚 f 𝐺𝐶𝑆|𝑆𝑗 Pr 𝑆𝑗|𝐺𝐾 , 𝐻𝑝, 𝐼

𝑗=1𝑚 f 𝐺𝐶𝑆|𝑆𝑗 Pr 𝑆𝑗|𝐺𝐾 , 𝐻𝑑 , 𝐼

Likelihood Ratio

𝐿𝑅 = 𝑗=1𝑚 f 𝐺𝐶𝑆|𝑆𝑗 Pr 𝑆𝑗|𝐺𝐾 , 𝐻𝑝, 𝐼

𝑗=1𝑚 f 𝐺𝐶𝑆|𝑆𝑗 Pr 𝑆𝑗|𝐺𝐾 , 𝐻𝑑 , 𝐼

weights

the probability of obtaining these DNA typingresults for genotype set 𝑆𝑗

Page 7: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 7

Probabilistic Genotyping

13

14

15

275

103

325

ST

AT

Major Minor 𝒘 𝒎𝒂𝒕𝒄𝒉 𝒑𝒓𝒐𝒃.𝒎𝒂𝒋𝒐𝒓 𝒎𝒂𝒕𝒄𝒉 𝒑𝒓𝒐𝒃.𝒎𝒊𝒏𝒐𝒓

14,15 12,13 𝟏 2𝑝14𝑝15 2𝑝12𝑝13

14,15 13,13 𝟏 2𝑝14𝑝15 𝑝132

14,15 13,14 𝟏 2𝑝14𝑝15 2𝑝13𝑝14

14,15 13,15 𝟏 2𝑝14𝑝15 2𝑝13𝑝15

14,15 13,16 𝟏 2𝑝14𝑝15 2𝑝13𝑝16

14,15 13,17 𝟏 2𝑝14𝑝15 2𝑝13𝑝17

𝒘𝒆𝒊𝒈𝒉𝒕

the probability of obtaining these DNA typing resultsfor a particular genotype set

Concept of weights

Page 8: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 8

single source:

Concept of weights

14 15

14

15

13 14 15

13 14

13 15

genotype = {14,15}

total: 50 marbles

single source:

Concept of weights

14 15

14

15

13 14 15

13 14

13 15

genotype = {14,15}

weight =5

50

total: 50 marbles

Page 9: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 9

how large or small a weight is depends on:

• possibility of allele drop-out

• possibility of allele drop-in

• amount of template DNA

• possibility of degradation

• possibility of oversized stutter peaks

Concept of weights

Simulations attempt to reproduce the observedpeak heights by varying a set of parameters.

The better 𝑆𝑗 and the set of parameters explain𝐺𝐶𝑆, the greater the probability density𝑓 𝐺𝐶𝑆|𝑆𝑗 assigned to that genotype set.

observed peak heights

simulated peak heights

Continuous Model: short explanation

Page 10: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 10

1) Randomly choose a genotype set and values for all of the parameters that are necessary to model the expected peak heights.

genotype set 𝑆𝑗

parameter 1

parameter 2

parameter 3

Continuous model: longer explanation

Markov Chain Monte Carlo (MCMC) sampling

2) Model the expected peak heights given the chosen genotype set 𝑆𝑗 and parameter values.

genotype set 𝑆𝑗

parameter 1

parameter 2

parameter 3

Continuous model: longer explanation

expected peak heights

Page 11: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 11

3) Compare the expected peak heights with the observed peak heights. How similar are they?

Continuous model: longer explanation

expected peak heights observed peak heights

Assign a probability density for the observed peak heights given the expected peak heights.

4) Repeat steps 1) through 3) a large number of times. A large number of simulations are performed that randomly vary the genotype set and the parameter values.

Continuous model: longer explanation

Page 12: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 12

3) Compare the expected peak heights with the observed peak heights. How similar are they?

4) Repeat steps 1) through 3) a large number of times. A large number of simulations are performed that randomly vary the genotype set and the parameter values.

Continuous model: longer explanation

probability density 1

probability density 2

If probability density 2 ≥ probability density 1,save 𝑆𝑗 and the parameter values of simulation

2.

If probability density 2 < probability density 1,save 𝑆𝑗 and the parameter values of simulation

2 only some of the times. The probability of

saving the values is equal to 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 2

𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 1.

3) Compare the expected peak heights with the observed peak heights. How similar are they?

4) Repeat steps 1) through 3) a large number of times. A large number of simulations are performed that randomly vary the genotype set and the parameter values.

Continuous model: longer explanation

Metropolis-Hastings algorithm

probability density 1

probability density 2

Page 13: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 13

MCRobot-2.1:

after taking 100 samplesafter taking 300 samplesafter taking 2100 samples

burn-in phase

384

Continuous model: longer explanation

5) Assign the weight 𝑓 𝐺𝐶𝑆|𝑆𝑗 based on the

saved simulation results.

Continuous model: longer explanation

saved simulation results:

Genotype set Parameter 1 Parameter 2

6,7 and 8,9

6,7 and 8,8

7,7 and 6,8

6,7 and 8,8

6,7 and 8,8

6,8 and 7,8

6,7 and 8,8

6,7 and 8,8

6,8 and 7,8

discard burn-in

Page 14: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 14

5) Assign the weight 𝑓 𝐺𝐶𝑆|𝑆𝑗 based on the

saved simulation results.

Continuous model: longer explanation

saved simulation results:

Genotype set Parameter 1 Parameter 2

6,7 and 8,9

6,7 and 8,8

7,7 and 6,8

6,7 and 8,8

6,7 and 8,8

6,8 and 7,8

6,7 and 8,8

6,7 and 8,8

6,8 and 7,8

The weight 𝑓 𝐺𝐶𝑆|𝑆𝑗 is

assigned as the proportion of genotype sets 𝑆𝑗 among

the saved genotype sets.

Example: 𝑆𝑗 = 6,7 and 8,8

𝑓 𝐺𝐶𝑆|𝑆𝑗 =4

6

Continuous model: longer explanation

1) Randomly choose a genotype set 𝑆𝑗 and values for all of the parameters that are necessary to model the expected peak heights.

2) Model the expected peak heights given the chosen genotype set 𝑆𝑗 and parameter values.

3) Compare the expected peak heights with the observed peak heights. How similar are they?

4) Repeat steps 1) through 3) a large number of times. A large number of simulations are performed that randomly vary the genotype set 𝑆𝑗 and the parameter values.

5) Assign the weight 𝑓 𝐺𝐶𝑆|𝑆𝑗 based on the saved simulation results.

sim

ula

tio

ns

Page 15: Likelihood Ratios for Mixtures: Continuous Approach...Cedar Crest College Forensic Science Training Institute 1 Likelihood Ratios for Mixtures: Continuous Approach NEAFS Probabilistic

NEAFS Probabilistic DNA Mixture Interpretation Workshop

9/26/2015

Cedar Crest CollegeForensic Science Training Institute 15

• The peak heights are a continuous variable.

• Mathematically speaking, 𝑓 𝐺𝐶𝑆|𝑆𝑗 is the probability density for the observed peak heightsgiven the genotype set 𝑆𝑗.

• The probability densities 𝑓 𝐺𝐶𝑆|𝑆𝑗 are assignedbased on the results of simulations that attemptto reproduce the observed peak heights by varying a set of parameters.

Summary of main points