Upload
reginald-finamore
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
SW388R6Data Analysis
and Computers I
Slide 1
General Linear Models
The theory of general linear models posits that many statistical tests can be solved as a regression analysis, including t-tests and ANOVA’s.
General linear models become even more useful when our analysis includes both numeric (interval level) and categorical variables (nominal level), since both can directly be entered into the analysis, and SPSS will do any needed dummy coding.
In this example, we will demonstrate the equivalence of regression and ANOVA. We will use the SPSS General Linear Models procedure for a variety of tests in the future.
SW388R6Data Analysis
and Computers I
Slide 2
This problem uses the data set GSS2000R.Sav to compare the average score on the variable "highest year of school completed" [educ] for groups of survey respondents defined by the variable "subjective class identification" [class]. Using a one-way analysis of variance and a post hoc test with an alpha of .05, is the following statement true, true with caution, false, or an incorrect application of a statistic?
Survey respondents who said they belonged in the working class completed fewer years of school (M = 12.58, SD = 2.50) than survey respondents who said they belonged in
the middle class (M = 13.83, SD = 3.14).
o Trueo True with cautiono Falseo Incorrect application of a statistic
Homework problems: One-way Analysis of Variance – Specific Relationship
Tested
In the PowerPoint for One-Way ANOVA, we solved this problem, using SPSS’ One-Way ANOVA command.
Applying the theory of general linear models, we will solve this problem with linear regression.
SW388R6Data Analysis
and Computers I
Slide 3
This problem uses the data set GSS2000R.Sav to compare the average score on the variable "highest year of school completed" [educ] for groups of survey respondents defined by the variable "subjective class identification" [class]. Using a one-way analysis of variance and a post hoc test with an alpha of .05, is the following statement true, true with caution, false, or an incorrect application of a statistic?
Survey respondents who said they belonged in the working class completed fewer years of school (M = 12.58, SD = 2.50) than survey respondents who said they belonged in
the middle class (M = 13.83, SD = 3.14).
o Trueo True with cautiono Falseo Incorrect application of a statistic
Converting the One-Way ANOVA problem
to a Regression problem
To solve this problem with regression, we need to dummy code the independent variable.
Since the problem includes, a specific comparison, we need to select the reference group that makes this comparison possible.
Specifically, we will use the working class category as the reference group, so that we can compare the difference between the middle class and the working class.
We could just as easily have chose the middle class as the reference category.
SW388R6Data Analysis
and Computers I
Slide 4
Coding scheme for new variables
Original Variable Coding
Coding for New Variables
lowerClass
middleClass
upperClass
1 = lower class 1 0 0
2 = working class 0 0 0
3 = middle class 0 1 0
4 = upper class 0 0 1
The coding scheme for the new variables in shown in the table below.
The class variable contained the four categories in the first column.
We will create three new dichotomous variables: lowerClass, middleClass, and upperClass. Each new variable will have a 1 in the matching category from the original variable and zeros for all of the other categories.
SW388R6Data Analysis
and Computers I
Slide 5
Using Recoding in SPSS to Create New Variables
Select the Recode > Into Different Variables command from the Transform menu.
SW388R6Data Analysis
and Computers I
Slide 6
Creating the lowerClass variable
First, select the variable to be dummy-coded, class, from the list of variables and move it to the Numeric Variable -> Output Variable list box.
Third, click on the Change button to replace the ? with this new variable name.
Second, type in the name for the new variable.
SW388R6Data Analysis
and Computers I
Slide 7
Assigning values to new variable
Next, click on the Old and New Values button to assign values to the new variable.
SW388R6Data Analysis
and Computers I
Slide 8
Preserving missing values
Third, click on the Add button to include this recoding for the variable
First, mark the System- or user-missing option button on the Old Value panel.
Second, mark the System-missing option button on the New Value panel.
If we forget to explicitly assign missing values, cases with missing data will be recoded with a 0 and become part of the reference group.
SW388R6Data Analysis
and Computers I
Slide 9
Coding the lowerClass category
Third, click on the Add button to include this recoding for the variable
First, to recode the 1 = lower class category to the dummy variable, mark the Value option button and type a 1 in the text box on the Old Value panel.
Second, mark the Value option button and type a 1 in the text box on the New Value panel. This coding says: if they were originally in the lower class category, they are assigned a value of 1 for the lowerClass dummy variable.
SW388R6Data Analysis
and Computers I
Slide 10
Coding the other categories
Third, click on the Add button to include this recoding for the variable
First, to identify subjects in the categories other than lower class, mark the All other values option button on the Old Value panel.
Second, mark the Value option button and type a 0 in the text box on the New Value panel. This coding says: if they were originally NOT in the lower class category, they are assigned a value of 0 for the lowerClass dummy variable.
SW388R6Data Analysis
and Computers I
Slide 11
Completing the recoding
When we have completed the coding for the new variable, click on the Continue button.
SW388R6Data Analysis
and Computers I
Slide 12
Completing the lowerClass variable
Click on the OK button to create the new variable in the data editor.
SW388R6Data Analysis
and Computers I
Slide 13
Dummy variable coding for middleClass variable
Following the same steps, we create the dummy variable for subjects who were 3 = middle class on the original class variable.
The coding is similar to that for married subjects, except the category that was originally coded 3 = middle class is translated into a 1 on the new variable.
SW388R6Data Analysis
and Computers I
Slide 14
Dummy variable coding for upperClass variable
Following the same steps, we create the dummy variable for subjects who were 4 = upper class on the original class variable.
The coding is similar to that for married subjects, except the category that was originally coded 4 = upper class is translated into a 1 on the new variable.
SW388R6Data Analysis
and Computers I
Slide 15
Dummy-coded variables for class - 1
Subjects with a code value of 2 on the original class variable now have a 0 for all the new variables.
Subjects with a code value of 3 on the original class variable now have a 1 for middleClass and a 0 for the other new variables.
SW388R6Data Analysis
and Computers I
Slide 16
Dummy-coded variables for class - 2
Subjects with a code value of 4 on the original class variable now have a 1 for upperClass and a 0 for the other new variables.
Subjects with a code value of 1 on the original class variable now have a 1 for lowerClass and a 0 for the other new variables.
Since it is very easy to make a mistake in recoding, it is imperative that we check the results of our recoding.
SW388R6Data Analysis
and Computers I
Slide 17
Regression of education on class variables - 1
Select the Regression > Linear command from the Analyze menu.
SW388R6Data Analysis
and Computers I
Slide 18
Regression of education on class variables - 2
Third, click on the OK button to produce the output.
First, we move the dependent variable to the Dependent Variable text box.
Second, we move the three dummy coded variables to the list of Independents.
SW388R6Data Analysis
and Computers I
Slide 19
Results of regression of education on class variables – overall relationship
The overall relationship is statistically significant, (F(3, 264) = 4.97, p < .01).
SW388R6Data Analysis
and Computers I
Slide 20
Comparison to One-way ANOVA of education by class – overall relationship
The overall relationship is statistically significant, (F(3, 264) = 4.97, p < .01).
Moreover, all of the statistical values in the ANOVA table are identical to the results from regression.
SW388R6Data Analysis
and Computers I
Slide 21
Results of regression of education on class variables – individual
relationships
The tests of individual relationships are a comparison each group to the reference group.
The difference between the middle class group and the working group is statistically significant.
SW388R6Data Analysis
and Computers I
Slide 22
Results of regression of education on class variables – individual
relationships
Subjects in the middle class had, on average, 1.249 more years of education than the working class.
B coefficients are interpreted as the increase or decrease in the estimate of the dependent variable associated with the change from the reference group to the dummy-coded group.
SW388R6Data Analysis
and Computers I
Slide 23
Comparison to One-way ANOVA of education by class – individual
relationship
In the post hoc test, the difference between the middle class and the working class was also 1.249 years of education, and was a statistically significant relationship.
SW388R6Data Analysis
and Computers I
Slide 24
Comparison to One-way ANOVA of education by class – individual
relationship
However, the calculations for the post hoc test are completely different from the test of the b coefficient in the regression, which is reasonable since they are very different tests. The test of the b coefficient is a test of the hypothesis that b is not equal to 0.
Post hoc tests are not hypothesis tests. The only hypothesis tested in the One-Way ANOVA was that one of the group means was different from the others. The post hoc test provided additional information about the differences, but it is not a hypothesis test because no hypothesis test was specified in advance of the statistical calculations.
The significance of the test of the b coefficient was .001, while the significance of the post hoc test was .005.
In this example we would make a similar interpretation, but that is not always the case.
SW388R6Data Analysis
and Computers I
Slide 25
Using linear contrasts to test specific group hypotheses - 1
It is possible to include a hypothesis test of differences between specific groups within the one-way ANOVA, using linear contrasts.
Using the notation from the text, we would specify the linear contrast as the difference between the working class and the middle class. Since the problem indicated that middle class respondents had more education than working class respondents, we would write the contrast as:
l = μmiddle class – μworking class
where l is a linear contrast and μ’s are group means
SW388R6Data Analysis
and Computers I
Slide 26
Using linear contrasts to test specific group hypotheses - 2
If we explicitly include coefficients for the population means in the contrast equation
l = μmiddle class – μworking class
becomesl = +1 × μmiddle class –1 × μworking class
and if we add in the means for the other groupsl = +1 × μmiddle class –1 × μworking class
+0 × μlower class +0 × μupper class
which is the contrast we will enter into SPSS
SW388R6Data Analysis
and Computers I
Slide 27
Testing a hypothesis comparing groupswithin One-Way ANOVA - 1
Select the Compare Means > One-Way ANOVA command from the Analyze menu.
SW388R6Data Analysis
and Computers I
Slide 28
Testing a hypothesis comparing groupswithin One-Way ANOVA - 2
First, move the dependent variable educ and the independent variable class into the list boxes.
Second, click on the Contrasts button to add the linear contrast.
SW388R6Data Analysis
and Computers I
Slide 29
Testing a hypothesis comparing groupswithin One-Way ANOVA - 3
The contrasts must be entered in the same order that the variable is coded, i.e. from low to high codes for categories.
First, type the contrast coefficient for the lower class group, 0, into the Coefficients text box.
Second, click on the Add button to add the coefficent to the list box.
The contrast coefficients were:
•0 for lower class•-1 for working class•+1 for middle class•0 for upper class
SW388R6Data Analysis
and Computers I
Slide 30
Testing a hypothesis comparing groupswithin One-Way ANOVA - 1
Click on the Continue button to close the dialog box.
Add the contrast coefficients for the working class (-1), the middle class (+1), and the upper class (0) to the list box.
SW388R6Data Analysis
and Computers I
Slide 31
Testing a hypothesis comparing groupswithin One-Way ANOVA - 5
Click on the OK button to request the output.
SW388R6Data Analysis
and Computers I
Slide 32
Testing a hypothesis comparing groupswithin One-Way ANOVA - 6
The value and significance of the F-test are identical to the results obtained in the regression, as well as the one-way ANOVA with the post hoc tests.
Moreover, the results for the contrast test match the test of the b coefficient in the regression analysis (β(264) =3.372, p < .01)
SW388R6Data Analysis
and Computers I
Slide 33
SPSS’ general linear models procedure
SPSS has a command for directly computing general linear models that is much more versatile that the regression command that we just used. The procedure contains options and diagnostic statistics that are not available in its linear regression command.
The default for group comparisons with this command is to compute contrasts with group with the highest numeric code. Since we want the comparison to be with the working class group, we will first change the numeric code for the group from 2 to 5 so that it is the highest numeric value.
SW388R6Data Analysis
and Computers I
Slide 34
Recoding the class variable - 1
To change the numeric coding for the working category so it is the highest numeric value, we again select Recode > Into Different Variables command from the Transform variable.
SW388R6Data Analysis
and Computers I
Slide 35
Recoding the class variable - 2
First, select the variable to be dummy-coded, class, from the list of variables and move it to the Numeric Variable -> Output Variable list box.
Third, click on the Change button to replace the ? with this new variable name.
Second, type in the name for the new variable.
SW388R6Data Analysis
and Computers I
Slide 36
Recoding the class variable - 3
Next, click on the Old and New Values button to assign values to the new variable.
SW388R6Data Analysis
and Computers I
Slide 37
Recoding the class variable - 4
Third, click on the Add button to include this recoding for the variable
First, mark the System- or user-missing option button on the Old Value panel.
Second, mark the System-missing option button on the New Value panel.
SW388R6Data Analysis
and Computers I
Slide 38
Recoding the class variable - 5
Third, click on the Add button to include this recoding for the variable
First, to recode the 2 = working class category to the dummy variable, mark the Value option button and type a 2 in the text box on the Old Value panel.
Second, mark the Value option button and type a 5 in the text box on the New Value panel. This coding says: if they were originally in the working class category, they are assigned a value of 5 for the new variable.
SW388R6Data Analysis
and Computers I
Slide 39
Recoding the class variable - 5
Third, click on the Add button to include this recoding for the variable
First, since we want all of the other codes to remain the same, we click on the All other values option button.
Second, mark the Copy old values option button to retain the codes for the remaining groups.
SW388R6Data Analysis
and Computers I
Slide 40
Recoding the class variable - 6
When we have completed the coding for the new variable, click on the Continue button.
SW388R6Data Analysis
and Computers I
Slide 41
Recoding the class variable - 7
Click on the OK button to create the new variable in the data editor.
SW388R6Data Analysis
and Computers I
Slide 42
Recoding the class variable - 8
We check the values in the data editor to make sure the recode worked as anticipated. In this example, we see that the 2’s for class are correctly recoded as 5’s.
SW388R6Data Analysis
and Computers I
Slide 43
Using SPSS’ general linear models - 1
To solve the problem using SPSS’ General Linear Model command, select General Linear Model > Univariate from the Analyze menu.
The univariate command indicates that we have a single dependent variable.
SW388R6Data Analysis
and Computers I
Slide 44
Using SPSS’ general linear models - 2
First, we move the dependent variable to the Dependent Variable text box.
Second, we move the newly created independent variable to the Fixed Factors list box.
Fixed factors are those for which all possible codes are represented in the data set.
Third, click on the Options button to specify additional output. While the univariate GLM command has numerous specifications, we only need one request for this problem.
Random Factors are categorical variables which can take on values different from those in our data set.
Covariates are interval level variables or variables we wish to treat as interval level.
SW388R6Data Analysis
and Computers I
Slide 45
Using SPSS’ general linear models - 3
First, mark the check box for Parameter estimates. This will compute and test the coefficients.
Second, click on the Continue button to close the dialog box.
SW388R6Data Analysis
and Computers I
Slide 46
Using SPSS’ general linear models - 4
Click on the OK button to produce the output.
SW388R6Data Analysis
and Computers I
Slide 47
SPSS’ general linear models output
The value and significance of the F-test are identical to the results obtained in the regressionand the one-way ANOVA with the post hoc tests.
Subjects in the middle class (code 3) had, on average, 1.249 more years of education than the working class. The difference is statistically significant and identical to the findings from the other comparisons, (β(264) =3.372, p < .01)