Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is

Introduction

3 2 1 0 1 2 3

Osborn

• Daubert is a benchmark!!!:• Daubert (1993)- Judges are the “gatekeepers” of

scientific evidence.

• Must determine if the science is reliable • Has empirical testing been done?

• Falsifiability

• Has the science been subject to peer review?

• Are there known error rates?

• Is there general acceptance?

• Frye Standard (1928) essentially

• Federal Government and 26(-ish) States are “Daubert States”

“Legal” Science

• Any time an observation is made, one is making a “measurement”

Measurement and Randomness

1. Experimental error is inherent in every measurement

• Refers to variation in observations between repetitions of the same experiment.

• It is unavoidable and many sources contribute

2. Error in a statistical context is a technical termBHH

• Experimental error is a form of randomness• Randomness: inherent unpredictability in a

process• The the outcomes of the process follow a probability

distribution

Measurement and Randomness

• Statistical tools are used to both:• Describe the randomness

• Make inferences taking into account the randomness

• Careful!:• Bad data, assumptions and models lead to

garbage (GIGO)

• Frequency: ratio of the number of observations of interest (ni) to the total number of observations (N)

• Probability (frequentist): frequency of observation i in the limit of a very large number of observations

• We will almost always use this definition• It is EMPIRICAL!

Probability

• Belief: A “Bayesian’s” interpretation of probability.

• An observation (outcome, event) is a “measure of the state of knowlege”Jaynes.• Bayesian-probabilities reflect degree of belief and can

be assigned to any statement

• Beliefs (probabilities) can be updated in light of new evidence (data) via Bayes theorem.

Probability

• Study of relationships in data

• Descriptive Statistics – techniques to summarize data• E.g. mean, median, mode, range, standard deviation, stem

and leaf plots, histograms, box and whiskers plots, etc.

• Inferential Statistics – techniques to draw conclusions from a given data set taking into account inherent randomness• E.g. confidence intervals, hypothesis testing, Bayes’

theorem, forecasting, etc.

What is Statistics??

• For the Sciences, we ask:• Are the differences in measurements characterizing

two (or more) objects real or just due to (the characteristic) randomness?

Why do we use statistical tools?

• Furthermore, for the Forensic Sciences we ask:• Do two pieces of evidence originate from a

common source?

• For this, we must at least answer the above.

• Almost all of statistics is based on a sample drawn from a population.• Population: The totality of observations that might

occur as a result of repeatedly performing an experiment• Why not measure the whole population?

• Usually impossible

• Likely wasteful

• Population should be relevant.• Part logic

• Part guess

• Part philosophy….

Population and Sample

• Sampling:• Sample: a few observations that are made from a

population

• Draw members out of as population with some given probability• Random sample: if all observations have an equal

chance of being made and no observation affects any other

• Also called independent and identically distributed random sample (I.I.D.)

• Want a random sample to be representative of the population

• Biased sample if not the case*

Population and Sample

• Sample Representations:

Data and Sampling

PopulationRepresentativeSample

Biased Samples

Population

Sample

Population

Sample

• Types of sampling:• (Simple) Random Sampling

• Every data item is selected independently of every other.

• Every member of a population has an equal chance of being selected

• Systematic Sampling• Pick every kth data item to be in the sample

• Easier to conduct but risk getting a biased sample

• Stratified Sampling• Partition population into disjoint groups containing specific

attributes of a particular category (strata)

• Random sample from the groups

Data and Sampling

• Parameter: any function of the population

• Statistic: any function of a sample from the population• Statistics are used to estimate population

parameters• Statistics can be biased or unbiased

• Sample average is an unbiased estimator for population mean

• We may construct distributions for statistics• Populations have distributions for observations

• Samples have distributions for observations and statistics

Parameters and Statistics

• Univariate Statistics: Statistical tools used to analyze one random variable

Univariate vs. Multivariate Statistics

• Random variable could be raw observation or a statistic

• Common tools are: (univariate) hypothesis testing, ANOVA, linear regression

• Multivariate Statistics: Statistical tools used to analyze many random variables• Random variables can also be raw observations

(often encountered in chemometrics) or statistics (currently popular in marketing, finance, surface metrology)

• Don’t if you can see clear differences/similarities in your data and can clearly articulate how in court!

• If you can’t differentiate or want to study/search for differences within a well defined population• AND univariate methods don’t do the trick:

Why Use Multivariate Statistics?

A linear (or non-linear) combination of many experimental variables (multivariate) may do the trick!

Documents

Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is