Upload
omar-tubbs-128-ventura-vii
View
117
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Hypothesis Testing for SPSS
Citation preview
1. T-test: Difference in means to test the statistical significance in the difference in
means ex. income by gender, the num. of years at work by gender
2. T-test: Difference in proportionsto test the statistical significance in the difference in proportions ex. the proportion employed in government jobs by gender
3. Contingency Table/Chi-Square Analysis to test whether all categories contain the same
proportion of values or not by comparing expected and actual values.
ex. the proportion employed in government jobs by gender
Hypothesis Testing
1. A Research Question2. The Null Hypothesis
usually assumes NO difference 2 tailed-test
3. Select Cases4. T-test or Contingency/Chi-Square Analysis5. Interpret Test Results
t-score, significance level, confidence interval,
likelihood ratio (for Chi-Square Analysis)
6. “Reject” or “Not reject” the null hypothesis
Hypothesis Testing Procedure
• Research Question: Are there differences in income between male and female graduates and if so, what factors might explain this difference?
1. Is there a difference in average income between male and female graduates?
2. Is there a significant difference in average length of time on the job, between male and female graduates?
3. Is there a difference in the proportion employed in government jobs between males and females?
Hypothesis Testing
Research Question: Is there a difference in average income
between male and female graduates?
H0: There is NO difference in average income between male and female graduates
Note: Limit the data to full-time employees or self- employed with income more than $20,000
and less than $400,000.
1. T-test: Difference in Means
Step 1: Data/Select Cases
• Select Data/Select Cases
Data/Select Cases
• In a Select Cases dialogue box, you specify logical expressions to select cases.– Select the “If condition is
satisfied” option
– Click on the If… button
Data/Select Cases
Specifying fullself and income range
Type logical expression: fullself = 1 & income > 20000 & income < 400000
to limit cases to alumni who work full-time or are self-employed and make more than $20,000 and less than $400,000.
Data/Select Cases
Data/Select Cases
Step 2: Independent T-Test
Analyze/Compare Means/Independent-Samples T-Test
Step 2: Independent T-Test
income
gender(? ?)
Step 2: Independent T-Test
Group 1: 1 for FemaleGroup 2: 2 for male
Note: The grouping variable can only have two categories.
Step 2: Independent T-Test
gender(1 2)
T-test: Results
Using the Unequal Variance model, we REJECT H0 and conclude that there is a significant difference in average income between male and female graduates.
Group Statistics
128 79868.22 35165.875 3108.254
137 98606.49 47980.995 4099.293
GenderFemale
Male
IncomeN Mean Std. Deviation
Std. ErrorMean
Independent Samples Test
10.443 .001 -3.605 263 .000 -18738.270 5197.537 -28972.4 -8504.190
-3.642 249.145 .000 -18738.270 5144.458 -28870.4 -8606.100
Equal variancesassumed
Equal variancesnot assumed
IncomeF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
>-1.96 < 0.05 -18,738 Doesn’t include 0
Possible explanation for the difference in income:
Male income is higher because men have been on the job longer than women.
Research Question:
Is there a difference in average length of time on the job (YEARS) between male and female graduates?
H0: There is NO difference in length of time on the job between male and female graduates
1-2. T-test: Difference in Means
Step 2: Independent T-Test
Analyze/Compare Means/Independent-Samples T-Test
Step 2: Independent T-Test
Years at Current Position [years]
gender(1 2)
T-test: Results
Using the Unequal Variance model, we REJECT H0 and conclude that there is a significant difference in average length of time on the job between male and female graduates.
Group Statistics
128 4.15 4.315 .381
137 5.90 5.764 .492
GenderFemale
Male
Years at Current PositionN Mean Std. Deviation
Std. ErrorMean
Independent Samples Test
13.386 .000 -2.786 263 .006 -1.752 .629 -2.991 -.514
-2.813 251.276 .005 -1.752 .623 -2.979 -.525
Equal variancesassumed
Equal variancesnot assumed
Years at Current PositionF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Does not include 0
>-1.96 < 0.05 -1.752
Possible explanation for the difference in income: Male income is higher because more females work for government than males.
Research Question:
Is there a difference in the proportion employed in government jobs between male and female graduates?
H0: There is NO difference in the proportion employed in government jobs between male and female graduates
2. T-test: Difference in Proportions
• Create a new variable GOV that – has the value 1 if the EMPLOYER (1-6) indicates the
alumnus works for a government organization.
– has the value 0 if the EMPLOYER is not 1-6.
1. Use Transform/Compute to convert the EMPLOYER variable into a new categorical variable GOV.
2. Use Transform/Recode/Into Different Variables to create a new categorical variable GOV.
Step 1: Create a new variable (GOV)
OUTPUT:Analyze/Descriptive Statistics/Frequencies
Employer
8 2.9 2.9 2.9
12 4.3 4.3 7.2
13 4.7 4.7 11.9
45 16.2 16.2 28.2
17 6.1 6.1 34.3
3 1.1 1.1 35.4
5 1.8 1.8 37.2
5 1.8 1.8 39.0
20 7.2 7.2 46.2
11 4.0 4.0 50.2
47 16.9 17.0 67.1
51 18.3 18.4 85.6
4 1.4 1.4 87.0
25 9.0 9.0 96.0
11 4.0 4.0 100.0
277 99.6 100.0
1 .4
278 100.0
Gov: Federal
Gov: State
Gov: County
Gov: City
Gov: Special Agency
Gov: Non U.S.
Private: Single Person
Private: 2-4 Persons
Private: 5-19 Persons
Private: 20-49 Persons
Private: >= 50 Persons
Non-Profit (U.S.)
International Org.
Educational Inst.
Other
Total
Valid
SystemMissing
Total
Frequency Percent Valid PercentCumulative
Percent
7-11 Private
Missing Values
1-6 Government
Transform/Recode/Into Different Variables
Transform/Recode/Into Different Variables
Select the income variable, type “GOV”, click the “Change” button, click the “Old and New Values” button…
Transform/Recode/Into Different Variables
Transform/Recode/Into Different Variables
Transform/Recode/Into Different Variables
Transform/Recode/Into Different Variables
Transform/Recode/Into Different Variables
Save the data file!!
• Analyze/Descriptive Statistics/Frequencies
Step 2: Create a frequency table for GOV
Thirty five percent of the graduates employed full time or self-employed and making more than $20,000 and less than $400,000 work in government jobs.
Government Job
179 64.4 64.6 64.6
98 35.3 35.4 100.0
277 99.6 100.0
1 .4
278 100.0
No
Yes
Total
Valid
SystemMissing
Total
Frequency Percent Valid PercentCumulative
Percent
Step 2: Independent T-Test
Analyze/Compare Means/Independent-Samples T-Test
Step 2: Independent T-Test
gov
gender(1 2)
T-test: Results
Using the Unequal Variance model, we CANNOT REJECT H0 and cannot conclude that there is a significant difference between male and female graduates with respect to the proportion working in the government sector.
Group Statistics
127 .3543 .48020 .04261
137 .3650 .48319 .04128
GenderFemale
Male
Government JobN Mean Std. Deviation
Std. ErrorMean
Independent Samples Test
.129 .720 -.179 262 .858 -.01063 .05934 -.12748 .10622
-.179 260.726 .858 -.01063 .05933 -.12746 .10619
Equal variancesassumed
Equal variancesnot assumed
Government JobF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
<-1.96 > 0.05 Includes 0
3. Contingency Table/Chi-Square Analysis
The same question can be analyzed by a contingency table with GOV and GENDER and testing using the Chi-Square statistic.
H0: There is NO relationship between employment sector and gender.
Analyze/Descriptive statistics/Crosstabs
Analyze/Descriptive statistics/Crosstabs
Counts : ObservedPercentages : Row
Column
Select “gov” for “Row” & “Gender” for “column.”
Contingency tableAnalyze/Descriptive statistics/Crosstabs
Government Job * Gender Crosstabulation
82 87 169
48.5% 51.5% 100.0%
64.6% 63.5% 64.0%
45 50 95
47.4% 52.6% 100.0%
35.4% 36.5% 36.0%
127 137 264
48.1% 51.9% 100.0%
100.0% 100.0% 100.0%
Count
% within Government Job
% within Gender
Count
% within Government Job
% within Gender
Count
% within Government Job
% within Gender
No
Yes
GovernmentJob
Total
Female Male
Gender
Total
Contingency tableAnalyze/Descriptive statistics/Crosstabs
Chi-Square value = 0.032 < 3.84 (1.962 = Cutoff value at 95% confidence level at 1 df).We CANNOT REJECT the null hypothesis and cannot concludethere is a statistically significant relationship between gender and whether or not a person works for the government.
> 0.05
Chi-Square Tests
.032b 1 .857
.003 1 .959
.032 1 .857
.898 .480
264
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
0 cells (.0%) have expected count less than 5. The minimum expected count is 45.70.
b.
< 3.84 > 0.05
OUTPUT:Analyze/Descriptive Statistics/Frequencies
Missing Values
Employer
8 2.9 2.9 2.9
12 4.3 4.3 7.2
13 4.7 4.7 11.9
45 16.2 16.2 28.2
17 6.1 6.1 34.3
3 1.1 1.1 35.4
5 1.8 1.8 37.2
5 1.8 1.8 39.0
20 7.2 7.2 46.2
11 4.0 4.0 50.2
47 16.9 17.0 67.1
51 18.3 18.4 85.6
4 1.4 1.4 87.0
25 9.0 9.0 96.0
11 4.0 4.0 100.0
277 99.6 100.0
1 .4
278 100.0
Gov: Federal
Gov: State
Gov: County
Gov: City
Gov: Special Agency
Gov: Non U.S.
Private: Single Person
Private: 2-4 Persons
Private: 5-19 Persons
Private: 20-49 Persons
Private: >= 50 Persons
Non-Profit (U.S.)
International Org.
Educational Inst.
Other
Total
Valid
SystemMissing
Total
Frequency Percent Valid PercentCumulative
Percent
7-11. Private
3-2. Contingency Table/Chi-Square Analysis
How about analyzing the difference in the proportion of males and females in the private sector by a contingency table with PRIVATE and GENDER.
H0: There is NO relationship between employment sector and gender.
• Create a new variable PRIVATE that – has the value 1 if the EMPLOYER (7-11) indicates the
alumnus works for a government organization.
– has the value 0 if the EMPLOYER is not 7-11 (else).
Method 2.
Use Transform/Recode/Into Different Variables to create a new categorical variable PRIVATE.
Step1: Create a new variable (PRIVATE)
Analyze/Descriptive statistics/Crosstabs
Counts: ObservedPercentages: Row
Column
Select “private” for “Row” & “Gender” for “column.”
Contingency tableAnalyze/Descriptive statistics/Crosstabs
Private Sector Job * Gender Crosstabulation
92 87 179
51.4% 48.6% 100.0%
72.4% 63.5% 67.8%
35 50 85
41.2% 58.8% 100.0%
27.6% 36.5% 32.2%
127 137 264
48.1% 51.9% 100.0%
100.0% 100.0% 100.0%
Count
% within PrivateSector Job
% within Gender
Count
% within PrivateSector Job
% within Gender
Count
% within PrivateSector Job
% within Gender
.00
1.00
Private SectorJob
Total
Female Male
Gender
Total
Contingency tableAnalyze/Descriptive statistics/Crosstabs
Chi-Square value = 2.411 < 3.84 (1.962).We CANNOT REJECT the null hypothesis and cannot conclude that the difference in the proportion of males and females in the private sector is statistically significant.
Chi-Square Tests
2.411b 1 .120
2.019 1 .155
2.422 1 .120
.147 .077
264
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
0 cells (.0%) have expected count less than 5. The minimum expected count is 40.89.
b.
< 3.84 > 0.05
The degrees of freedom in the chi-square test of a contingency table:
d.o.f = (r-1)*(c-1)
where
r & c are the number of rows and columns (or the number of categories of two variables) in a table.
The number of d.o.f is the number of comparisons between actual and expected frequencies minus the number of restrictions imposed on these frequencies.
Since the number of cells in a contingency tables is r*c, there are r*c actual frequencies to be compared with the corresponding expected frequencies. Because the sum (total) of the frequencies in each row and each column are given, there are r+c-1 restrictions.
Therefore, the number of d.o.f is: r*c - (r+c-1) = (r-1)*(c-1).
The degrees of freedom in the chi-square test
• What other factors may influence income?• Control for job sector (government, private, non-profit),
and examine a difference in average income between males and females within each sector.– Select cases: Data/Select Casesif STATUS =1 & INCOME >20000 & INCOME > 400000 & GOV = 1
if STATUS =1 & INCOME >20000 & INCOME > 400000 & PRIVATE = 1
– Compare means/Independent Sample T-test
• If we see differences within each sector, other factors besides job sector are influencing income.
Extensions to the Analysis