View
229
Download
0
Embed Size (px)
Citation preview
THE CONCEPT OF STATISTICAL SIGNIFICANCE:CHI-SQUARE AND THE NULL
HYPOTHESIS
READINGS
• Pollock, Essentials, ch. 5 and ch. 6, pp. 121-135
• Pollock, SPSS Companion, ch. 7
OUTLINE
1. Strategies for Sampling
2. Establishing Confidence Intervals
3. Chi-Square and the Null Hypothesis
4. Critical Values of Chi-Square
Why Sample?
•Goal: description of a population
•Advantages: savings of time and money
•Basic paradox: credibility of results from a sample depends on size and quality of the sample itself, and not on the size of the population
Types of Samples
Probability sampling: Every individual in the populationhas a known probability of being included in the sample
Random sample (SRS): each individual has an equal chanceof being selected, and all combinations are equally possible
Systematic sample: every kth individual—more or lessequivalent to SRS if first selection is made through random process
Stratified sample: individuals separated into categories, and independent (SRS) samples selected within the categories
Cluster sample: population divided into clusters, and random sample (SRS) then drawn of the clusters
Parameters and Statistics
A parameter is a number that describes the population. It isa fixed number, though we do not know its value.
A statistic is a number that describes a sample. We use statistics to estimate unknown parameters.
A goal of statistics: To estimate the probability that the null hypothesis holds true for the population. Forms:
(a) Parameter may not fall within a confidence band that can be placed around a sample statistic, or
(b) A relationship observed within a sample may not have a satisfactory probability of existing within the population.
Problems with Sampling (I)
1. Bias:
• A consistent, repeated deviation of the sample statistic from the population parameter• Convenience sampling• Voluntary response sampling • Solution: Use SRS
2. Variation:
• Signal: large standard deviation within sample• Range of sample statistics• Solution: Use larger N
Problems in Sampling (II)
Ho for SampleAccepted Rejected
Ho for Population
True Type I
False Type II
Where Ho = null hypothesis
What is Chi-square?
A measure of “significance” for cross-tabular relationships
Where fo = “observed frequency” (or cell count)
And fe = “expected frequency” (or cell count)
X2 = Σ (fo – fe)2/fe
Calculating Expected Frequencies:
fe = col Σ (row Σ/total N)
for upper left-hand cell
= 802 (200/1,679)
= 95.5
fo = 44
fo – fe = 44 – 95.5 = -51.5 (fo – fe)2 = 2,652.25
(fo – fe)2/fe = 27.77
• Expected frequencies represent the “null hypothesis” (no relationship)
• Observed frequencies present visible results• Question 1: Are observed frequencies different from
expected frequencies?• Question 2: Are they sufficiently different to allow us
to reject the possibility that the true relationship (within the universe of case) is null?
Conceptualizing Chi-Square
Figuring Degrees of Freedom:
df = (r – 1)(c – 1)
Illustration: Given marginal values,
________X__________Y__ L H Σ
L 30 50
H 50
Σ 60 40 100 and df = 1
Characteristics of Chi-Square
• Distribution for null hypothesis has a known distribution—skewed to the right
• Specific distributions have corresponding degrees of freedom, defined as (r-1)(c-1)
• For a 2x2 table, chi-square of 3.841 or greater would occur no more than 5% of the time in event of null hypothesis (thus, “.05 level or better”)
POSTSCRIPT
• X2 = f (strength of relationship, sample size)
• The stronger the observed relationship within the sample, the higher the X2
• The larger the sample (SRS), the higher the X2
• The higher the X2 (given degrees of freedom), the greater the probability that null hypothesis does not hold in the population (p < .05)
Limitations of Chi-Square
• No more than 20% of expected frequencies less than 5 and all individual expected frequencies are 1 or greater
• Directly proportional to N observations• Rejection of null hypothesis does not
directly confirm strength or direction of relationship
• Lambda-b PRE, ranges from zero to unity; measures strength only
• Gamma Form and strength (-1 to +1), based on “pairs” of observations
• Chi-square Significance, based on deviation from “null hypothesis”
Review: Summary Measures for Cross-Tabulations