Upload
ernest-sparks
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Final review - statisticsFinal review - statisticsSpring 03Spring 03
Also, see final review - research design
StatisticsStatistics
Descriptive Statistics
Statistics to summarize and describe the data we collected
Inferential Statistics
Statistics to make inferences from samples to the populations
A summary of your dataCenter / Central Tendencies
Indicates a central value for the variable
Measures of Dispersion (Variability / Spread)
Indicate how much each participants’ score vary from each other
Measures of Association Indicates how much variables go together
(Shown in Tables, Graphs, Distributions)
Measures of CenterMeasures of Center Mode A value with the highest frequency
The most common value
Median The “middle” score
Mean Average
WHY are LEVELS / SCALE of WHY are LEVELS / SCALE of MEASUREMENT IMPORTANT?MEASUREMENT IMPORTANT?
Because you need to match the statistic you use to the kind of variable you have
Measures of Central Tendency, Measures of Central Tendency, CenterCenter
Nominal Ordinal Interval/Ratio
Mode Mode Mode
Median Median
Mean
SummarySummary
Ratio
Interval
Ordinal
Nominal
Difference
Order
Equal Interval
Meaningful Zero
Calculate Math
Info
of
dif
fere
nce
am
ong
valu
es
Level of Measurement
Why “Equal Distance” Matters?Why “Equal Distance” Matters?
If the distance between values are equal (as in interval or ratio data), you are able to calculate (add, subtract, multiply, divide) values
You can get a mean only for interval/ratio variablesA wider variety of statistical tests are available for interval/ratio variables
4 5 6 7 8 9 10
What are the Mean, Median, and Mode for this distribution?
What is this distribution shape called?
Types of Types of Measures of Dispersion Measures of Dispersion Variability / SpreadVariability / Spread
Frequencies / Percentages Range
The distance between the highest score and the lowest score (highest – lowest)
Standard deviation / Variance
Variance / Variance / Standard DeviationStandard Deviation
Variance (S-squared): An approximate average of the squared deviations from the mean
Standard Deviation(S or SD): Square root of variance
The larger the variance/ SD is, the higher variability the data has or larger variation in scores, or distributions that vary widely from the mean.
Measures of DispersionMeasures of Dispersion
Nominal Ordinal
Frequency, %
Frequency, %
Frequency, %
Range, IQR Range, IQR
StandardDeviatn, Variance
Interval/Ratio
CORRELATIONCORRELATION
Co-relation 2 variables tend to “go together” Indicates how strongly and
in which direction two variables are correlated with each other
*** Correlation does NOT EQUAL cause
SIGNSIGN
0: No systematic relationship
• Positive correlation: As one variable increases, so does the 2nd
• Negative correlation: As one variable increases, the 2nd gets smaller
Correlation Co-efficientCorrelation Co-efficient
+1-1 0Negative Positive
Stronger StrongerWeaker
Perfect PerfectNone
SIZESIZE Ranges from –1 to + 1 0 or close to 0 indicates NO relationship +/- .2 - .4 weak +/- .4 - .6 moderate +/- .6 - .8 strong +/- .8 - .9 very strong +/- 1.00 perfectNegative relationships are NOT weaker!
Significance TestSignificance Test
Correlation co-efficient also comes with significance test (p-value)
p=.05: .05 probability of no correlation in the population = 5% risk of TYPE I Error = 95% confidence level
If p<.05, reject H0 and support Ha at 95% confidence level
1. Infer characteristics of a population from the characteristics of the samples.
2. Hypothesis Testing
3. Statistical Significance
4. The Decision Matrix
Sample Statistics
X SD n
Population Parameters
N
P opu l ati on
I n fer
Inferential StatisticsInferential Statistics
assess -- are the sample statistics indicators of the population parameters?
Differences between 2 groups -- happened by chance?
What effect do random sampling errors have on our results?
Random sampling errorRandom sampling error
Random sampling error: Difference between the sample
characteristics and the population characteristics caused by chance
Sampling bias:
Difference between the sample characteristics and the population characteristics
caused by biased (non-random) sampling
ProbabilityProbability Probability (p) ranges between 1 and 0 p = 1 means that the event would occur in
every trial p = 0 means the event would never occur in
any trial The closer the probability is to 1, the more
likely that the event will occur The closer the probability is to 0, the less
likely the event will occur
P > .05 means that …P > .05 means that …
95%
Means of two groups fall in 95% central area of normal distribution with one population mean
Mean 1
Mean 2
P < .05 means thatP < .05 means that … …
1 2
Means of two groups do NOT fall in 95% central area of normal distribution of one population mean, so it is more reasonable to assume that they belong to different populations
Null HypothesisNull Hypothesis
Says IV has no influence on DV
There is no difference between the two variables.
There is no relationship between the two variables.
Null HypothesisNull Hypothesis States there is NO true difference between
the groups If sample statistics show any difference, it
is due to random sampling error Referred as H0
(Research Hypothesis = Ha) If you can reject H0, you can support Ha If you fail to reject H0, you reject Ha
Be conservative. What are chances I would get these
results if null hypothesis is true? Only if pattern is highly unlikely (p
.05) do you reject null hypothesis and support your hypothesis
Since cannot be 100% sure your conclusion is correct, you take up to 5% risk.
Your p-value tells you the risk /the probability of making TYPE I Error
Correct
Correct
Wrong person to marry
Type II error
You think it’s the wrong person to marry
Type Ierror
True state
Correct
Correct
No fire
Type II error
No Alarm
Type Ierror
True state
Correct
Correct
Ho (no fire) Ha
Ho = null hypothesis = there is NO fire
Ha = alternative hyp. = there IS a FIRE
Accept Ho
(no alarm)
Type IIerror
Type I errorReject Ho
True State
You decide...
Easy ways to LOSE pointsEasy ways to LOSE points
Use the word “prove” Better to say support the hypothesis or
consistent with the hypothesis
Tentative statements acknowledge possibility of making a Type 1 or Type 2 error
Use the word “random” incorrectly
Significance TestSignificance Test
Significance test examines the probability of TYPE I error (falsely rejecting H0)
Significance test examines how probable it is that the observed difference is caused by random sampling error
Reject the null hypothesis if probability is <.05 (probability of TYPE I error
is smaller than .05)
Principle LogicPrinciple Logic
P < .05
Reject Null Hypothesis (H0)
Support Your Hypothesis (Ha)
Logic of Hypothesis Testing
Statistical tests used in hypothesis testing deal with the probability of a particular event occurring by chance.
Is the result common or a rare occurrence
if only chance is operating?
A score (or result of a statistical test) is “Significant”
if score is unlikely to occur on basis of chance alone.
The “Level of Significance” is a cutoff point for determining significantly rare or unusual scores.
Scores outside the middle 95% of a distribution are considered “Rare” when we adopt the standard
“5% Level of Significance”
This level of significance can be written as:
p = .05
Level of Significance
Decision Rules
Reject Ho (accept Ha) when
the sample statistic is statistically significant at the
chosen p level, otherwise accept Ho (reject Ha).
Possible errors:
• You reject the Null Hypothesis when in fact it is true,
a Type I Error, or Error of Rashness.
B. You accept the Null Hypothesis when in fact it is false,
a Type II Error, or Error of Caution.
Type I error
Correct
Data results are by
chance (Null is true)
CorrectData indicates something significant is happening (reject
null)
Type II error
There is nothing happening except chance variation (accept
the null)
Data indicates something is
happening (Null is false)
True state
Your decision:
To compare two groups on Mean Scores use t-test. For more than 2 groups use Analysis of Variance
(ANOVA)
Can’t get a mean from nominal or ordinal data.
Chi Square tests the difference in Frequency Distributions of two or more groups.
Parametric TestsParametric Tests
Used with data w/ mean score or standard deviation.
t-test, ANOVA and Pearson’s Correlation r.
Use a t-test to compare mean differences between two groups (e.g., male/female and married/single).
Parametric TestsParametric Tests
use ANalysis Of VAriance (ANOVA) to compare more than two groups (such as age and family income) to get probability scores for the overall group differences.
Use a Post Hoc Tests to identify which subgroups differ significantly from each other.
When comparing two groups on MEAN SCORES use the t-test.
t =
+1 2
M ean - M ean
2SD
n
2SD
n 2
2
1
1
T-testT-test
If p<.05, we conclude that two groups are drawn from populations with different distribution (reject H0) at 95% confidence level
When comparing two groups on MEAN SCORES
use the t-test.
t =
+1 2
M ean - M ean
2SD
n
2SD
n 2
2
1
1
Our Research Hypothesis: hair length leads to different perceptions of a person.
The Null Hypothesis: there will be no difference between the pictures.
Short Hair:
Mean = 2.2
SD = 1.9
n = 100
Long Hair:
Mean = 4.1
SD = 1.8
n = 100
Mean scores come from different distributions.
Mean scores reflect just chance differences from
a single distribution.
Accept Ha
Accept Ho
p = .03
“I think she is one of those people who quickly earns
respect.”
2.2 4.1
3.1
Short Hair:
Mean = 1.6
SD = 1.7
n = 100
Long Hair:
Mean = 3.6
SD = 1.2
n = 100
Mean scores come from different distributions.
Mean scores reflect just chance differences from
a single distribution.
Accept Ha
Accept Ho
p = .01
“In my opinion, she is a mature person.”
1.6 3.6
2.6
Short Hair:
Mean = 3.7
SD = 1.8
n = 100
Long Hair:
Mean = 3.9
SD = 1.5
n = 100
Mean scores are just chance differences from
a single distribution.
Accept Ha
Accept Ho
p = .89
Mean scores come from different distributions.
“I think we are quite similar to one another.”
3.7 3.9
3.8
A nonsignificant result may A nonsignificant result may be caused by abe caused by a
A. low sample size. B. very cautious significance level. C. weak manipulation of independent
variables. D. true null hypothesis.
When to use various statisticsWhen to use various statistics
Parametric Interval or ratio data
Non-parametric Ordinal and nominal
data
Chi-Square XChi-Square X22
Chi Square tests the difference in frequency distributions of two or more groups.
Test of Significance of two nominal variables or of a nominal variable & an ordinal variable Used with a cross tabulation table