Upload
kellie-norris
View
212
Download
0
Embed Size (px)
Citation preview
Introduction
3 2 1 0 1 2 3
Osborn
• Daubert is a benchmark!!!:• Daubert (1993)- Judges are the “gatekeepers” of
scientific evidence.
• Must determine if the science is reliable • Has empirical testing been done?
• Falsifiability
• Has the science been subject to peer review?
• Are there known error rates?
• Is there general acceptance?
• Frye Standard (1928) essentially
• Federal Government and 26(-ish) States are “Daubert States”
“Legal” Science
• Any time an observation is made, one is making a “measurement”
Measurement and Randomness
1. Experimental error is inherent in every measurement
• Refers to variation in observations between repetitions of the same experiment.
• It is unavoidable and many sources contribute
2. Error in a statistical context is a technical termBHH
• Experimental error is a form of randomness• Randomness: inherent unpredictability in a
process• The the outcomes of the process follow a probability
distribution
Measurement and Randomness
• Statistical tools are used to both:• Describe the randomness
• Make inferences taking into account the randomness
• Careful!:• Bad data, assumptions and models lead to
garbage (GIGO)
• Frequency: ratio of the number of observations of interest (ni) to the total number of observations (N)
• Probability (frequentist): frequency of observation i in the limit of a very large number of observations
• We will almost always use this definition• It is EMPIRICAL!
Probability
• Belief: A “Bayesian’s” interpretation of probability.
• An observation (outcome, event) is a “measure of the state of knowlege”Jaynes.• Bayesian-probabilities reflect degree of belief and can
be assigned to any statement
• Beliefs (probabilities) can be updated in light of new evidence (data) via Bayes theorem.
Probability
• Study of relationships in data
• Descriptive Statistics – techniques to summarize data• E.g. mean, median, mode, range, standard deviation, stem
and leaf plots, histograms, box and whiskers plots, etc.
• Inferential Statistics – techniques to draw conclusions from a given data set taking into account inherent randomness• E.g. confidence intervals, hypothesis testing, Bayes’
theorem, forecasting, etc.
What is Statistics??
• For the Sciences, we ask:• Are the differences in measurements characterizing
two (or more) objects real or just due to (the characteristic) randomness?
Why do we use statistical tools?
• Furthermore, for the Forensic Sciences we ask:• Do two pieces of evidence originate from a
common source?
• For this, we must at least answer the above.
• Almost all of statistics is based on a sample drawn from a population.• Population: The totality of observations that might
occur as a result of repeatedly performing an experiment• Why not measure the whole population?
• Usually impossible
• Likely wasteful
• Population should be relevant.• Part logic
• Part guess
• Part philosophy….
Population and Sample
• Sampling:• Sample: a few observations that are made from a
population
• Draw members out of as population with some given probability• Random sample: if all observations have an equal
chance of being made and no observation affects any other
• Also called independent and identically distributed random sample (I.I.D.)
• Want a random sample to be representative of the population
• Biased sample if not the case*
Population and Sample
• Sample Representations:
Data and Sampling
PopulationRepresentativeSample
Biased Samples
Population
Sample
Population
Sample
• Types of sampling:• (Simple) Random Sampling
• Every data item is selected independently of every other.
• Every member of a population has an equal chance of being selected
• Systematic Sampling• Pick every kth data item to be in the sample
• Easier to conduct but risk getting a biased sample
• Stratified Sampling• Partition population into disjoint groups containing specific
attributes of a particular category (strata)
• Random sample from the groups
Data and Sampling
• Parameter: any function of the population
• Statistic: any function of a sample from the population• Statistics are used to estimate population
parameters• Statistics can be biased or unbiased
• Sample average is an unbiased estimator for population mean
• We may construct distributions for statistics• Populations have distributions for observations
• Samples have distributions for observations and statistics
Parameters and Statistics
• Univariate Statistics: Statistical tools used to analyze one random variable
Univariate vs. Multivariate Statistics
• Random variable could be raw observation or a statistic
• Common tools are: (univariate) hypothesis testing, ANOVA, linear regression
• Multivariate Statistics: Statistical tools used to analyze many random variables• Random variables can also be raw observations
(often encountered in chemometrics) or statistics (currently popular in marketing, finance, surface metrology)
• Don’t if you can see clear differences/similarities in your data and can clearly articulate how in court!
• If you can’t differentiate or want to study/search for differences within a well defined population• AND univariate methods don’t do the trick:
Why Use Multivariate Statistics?
A linear (or non-linear) combination of many experimental variables (multivariate) may do the trick!