31
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 11: Bivariate Relationships: t-test for Comparing the Means of Two Groups

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 11: Bivariate Relationships: t-test for Comparing the Means of Two Groups

Embed Size (px)

Citation preview

© 2008 McGraw-Hill Higher Education

The Statistical Imagination

Chapter 11:

Bivariate Relationships:

t-test for Comparing the

Means of Two Groups

© 2008 McGraw-Hill Higher Education

Bivariate Analysis

• Bivariate – or “two variable” – analysis involves searching for statistical relationships between two variables

• A statistical relationship between two variables asserts that the measurements of one variable tend to consistently change with the measurements of the other,

making one variable a good predicator of the other

© 2008 McGraw-Hill Higher Education

Independent and Dependent Variables

• The predictor variable is the independent variable

• The predicted variable is the dependent variable

© 2008 McGraw-Hill Higher Education

Three Approaches to Measuring Statistical Relationships

1. Difference of means testing

(Ch. 11 & 12)

2. Counting the frequencies of joint occurrences of attributes of two nominal/ordinal variables (Ch. 13)

3. Measuring the correlation between two interval/ratio variables (Ch. 14 & 15)

© 2008 McGraw-Hill Higher Education

Difference of Means Testing

• Compares means of an interval/ratio variable among the categories or groups of a nominal/ordinal variable

• Chapter 11. The two-group difference of means test – for a dependent interval/ratio and an independent dichotomous nominal/ordinal variable

• Chapter 12. Analysis of variance – to test for a difference among three or more group means

© 2008 McGraw-Hill Higher Education

Frequencies of Joint Occurrences of Two Nominal Variables

• Chapter 13. Chi-square test – to determine a relationship between two nominal variables

• Web site Chapter Extensions to Chapter 13: Gamma test – to determine a relationship between two ordinal variables

© 2008 McGraw-Hill Higher Education

Measuring Correlation

• Chapter 14-15. Correlation – to determine a relationship between two interval/ratio variables

• Web site Extensions to Chapter 15: Rank-order correlation test – to determine a relationship between two numbered ordinal level variables

© 2008 McGraw-Hill Higher Education

2-Group Difference of Means Test: Independent Samples (t-test)

• Useful for testing a hypothesis that the means of a variable differ between two populations comprised of different groups of individuals

© 2008 McGraw-Hill Higher Education

When to Use an Independent Samples t-test

• Two variables from one population and sample, one interval/ratio and one dichotomous nominal/ordinal

• Or: There are two populations and samples and one interval/ratio variable; the samples are representative of their population

• The interval/ratio variable is typically the dependent variable

• The groups do not consist of same subjects• Population variances are assumed equal

© 2008 McGraw-Hill Higher Education

Features of anIndependent Samples t-test

• The t-test focuses on the computed difference between two sample means and addresses the question of whether the observed difference between the sample means reflects a real difference in the population means or is simply due to sampling error

© 2008 McGraw-Hill Higher Education

Features of an Independent Samples t-test (cont.)

• Step 1. Stating the H0:

The mean of population 1 equals the mean of population 2

• That is, there is no difference in the means of the interval/ratio variable, X, for the two populations

© 2008 McGraw-Hill Higher Education

Features of an Independent Samples t-test (cont.)

• Step 2. The sampling distribution is the approximately normal t-distribution

• The pooled variance formula for the standard error is used when we can assume that population variances are equal

• The separate variance formula for the standard error is used when we cannot assume that population variances are equal

© 2008 McGraw-Hill Higher Education

Features of an Independent

Samples t-test (cont.)

• Step 4. The effect is the difference between the sample means

• The test statistic is the effect divided by the standard error

• The p-value is estimated using the

t-distribution table

© 2008 McGraw-Hill Higher Education

Assumption of Equality of Population Variances

• When one sample variance is not larger than twice the size of the other, this suggests that the two population variances are equal and we assume equality of variances

• We may use the pooled variance estimate of the standard error

• Equality of variances is also termed homogeneity of variances or homoscedasticity

© 2008 McGraw-Hill Higher Education

Assumption of Equality of Population Variances (cont.)

• Heterogeneity of variances, or heteroscedasticity, is when variances of the two populations appear unequal

• Here we use the separate variance estimate of the standard error and calculate degrees of freedom differently

© 2008 McGraw-Hill Higher Education

Test for Nonindependent or Matched-Pair Samples

• This is a test of the difference of means between two sets of scores of the same research subjects, such as two questionnaire items or scores measured at two points in time

• This test is especially useful for before-after or test-retest experimental designs

© 2008 McGraw-Hill Higher Education

When to Use aNonindependent Samples t-test

• There is one population with a representative sample from it

• There are two interval/ratio variables with the same score design

• Or: There is a single variable measured twice for the same sample subjects

• There is a target value of the variable (usually zero) to which we may compare the mean of the differences between the two sets of scores

© 2008 McGraw-Hill Higher Education

Features of a Nonindependent Samples or Matched-Pair t-test

• Step 1. Stating the H0:

The mean of differences between the scores in a population is equal to zero

© 2008 McGraw-Hill Higher Education

Nonindependent Samples or Matched-Pair t-test (cont.)

• Step 2. The sampling distribution is

the approximately normal

t-distribution

• The standard error is calculated as the standard deviation of differences between scores divided by the square root of n - 1

© 2008 McGraw-Hill Higher Education

Nonindependent Samples or Matched-Pair t-test (cont.)

• Step 4. The effect is the mean of differences between scores

• The test statistic is the effect divided by the standard error

• The p-value is estimated using the t-distribution table

© 2008 McGraw-Hill Higher Education

Distinguishing Between Practical and Statistical Significance

• A hypothesis test determines significance in terms of likely sampling error – whether a sample difference is so large that there probably is a difference in the populations

• Practical significance is an issue of substance. A statistically significant difference may not be practically significant

© 2008 McGraw-Hill Higher Education

Practical and Statistical Significance (cont.)

• E.g., a hypothesis test reveals a statistically significant difference in the mean number of personal holidays of men and women in a corporation: women average 0.1 days per year more. The test tells us with 95% confidence that the 0.1 day difference in the samples truly exists in the populations

• However, is one-tenth day per year meaningful? Might such a small statistical effect be accounted for by some other variable?

© 2008 McGraw-Hill Higher Education

Four Aspects of Statistical Relationships

• When examining a relationship between two variables, we can address four things: existence, direction, strength, and practical applications

• These four aspects provide a checklist for what to say in writing up the results of a hypothesis test

© 2008 McGraw-Hill Higher Education

Existence of a Relationship

• Existence: On the basis of statistical analysis of a sample, can we conclude that a relationship exists between two variables among all subjects in the population?

• Established by rejection of the H0 • Testing for the existence of a relationship is

the first step in any analysis. If a relationship is found not to exist, the other three aspects of a relationship are irrelevant

© 2008 McGraw-Hill Higher Education

Direction of a Relationship

• Direction: Can the dependent variable be expected to increase or decrease as the independent variable increases?

• Direction is stated in the alternative hypothesis (HA) of step 1 of the six steps of statistical inference

© 2008 McGraw-Hill Higher Education

Strength of a Relationship

• Strength: To what extent are errors reduced in predicting the scores of a dependent variable when an independent variable is used as a predictor?

© 2008 McGraw-Hill Higher Education

Practical Applications of a Relationship

• Practical Applications: In practical, everyday terms, how does knowledge of a relationship between two variables help us understand and predict outcomes of the dependent variable?

© 2008 McGraw-Hill Higher Education

Existence of a Relationship for 2-Group Difference of Means Test

• Existence: Established by using independent samples or nonindependent samples t-test

• When the H0 is rejected, a relationship exists

© 2008 McGraw-Hill Higher Education

Direction of a Relationship for 2-Group Difference of Means Test

• For the two group tests, direction and strength are not relevant

• Direction: Not relevant

• Strength: Not relevant

© 2008 McGraw-Hill Higher Education

Practical Applications of Relationship for a 2-Group Difference of Means Test

• Practical Applications: Describe the effect of the test in everyday terms, where the effect of the independent variable on the dependent variable is the difference between sample means

© 2008 McGraw-Hill Higher Education

Statistical Follies

• Avoid a common tendency: Difference in means testing is so widely used that researchers often focus too heavily on mean differences while ignoring the differences in variances (or standard deviations)