45
1. T-test: Difference in means to test the statistical significance in the difference in means ex. income by gender, the num. of years at work by gender 2. T-test: Difference in proportions to test the statistical significance in the difference in proportions ex. the proportion employed in government jobs by gender 3. Contingency Table/Chi-Square Analysis to test whether all categories contain the same proportion of values or not by comparing expected and actual values. ex. the proportion employed in government jobs by gender Hypothesis Testing

Hypothesis testng

Embed Size (px)

DESCRIPTION

Hypothesis Testing for SPSS

Citation preview

Page 1: Hypothesis testng

1. T-test: Difference in means to test the statistical significance in the difference in

means ex. income by gender, the num. of years at work by gender

2. T-test: Difference in proportionsto test the statistical significance in the difference in proportions ex. the proportion employed in government jobs by gender

3. Contingency Table/Chi-Square Analysis to test whether all categories contain the same

proportion of values or not by comparing expected and actual values.

ex. the proportion employed in government jobs by gender

Hypothesis Testing

Page 2: Hypothesis testng

1. A Research Question2. The Null Hypothesis

usually assumes NO difference 2 tailed-test

3. Select Cases4. T-test or Contingency/Chi-Square Analysis5. Interpret Test Results

t-score, significance level, confidence interval,

likelihood ratio (for Chi-Square Analysis)

6. “Reject” or “Not reject” the null hypothesis

Hypothesis Testing Procedure

Page 3: Hypothesis testng

• Research Question: Are there differences in income between male and female graduates and if so, what factors might explain this difference?

1. Is there a difference in average income between male and female graduates?

2. Is there a significant difference in average length of time on the job, between male and female graduates?

3. Is there a difference in the proportion employed in government jobs between males and females?

Hypothesis Testing

Page 4: Hypothesis testng

Research Question: Is there a difference in average income

between male and female graduates?

H0: There is NO difference in average income between male and female graduates

Note: Limit the data to full-time employees or self- employed with income more than $20,000

and less than $400,000.

1. T-test: Difference in Means

Page 5: Hypothesis testng

Step 1: Data/Select Cases

• Select Data/Select Cases

Page 6: Hypothesis testng

Data/Select Cases

• In a Select Cases dialogue box, you specify logical expressions to select cases.– Select the “If condition is

satisfied” option

– Click on the If… button

Page 7: Hypothesis testng

Data/Select Cases

Specifying fullself and income range

Type logical expression: fullself = 1 & income > 20000 & income < 400000

to limit cases to alumni who work full-time or are self-employed and make more than $20,000 and less than $400,000.

Page 8: Hypothesis testng

Data/Select Cases

Page 9: Hypothesis testng

Data/Select Cases

Page 10: Hypothesis testng

Step 2: Independent T-Test

Analyze/Compare Means/Independent-Samples T-Test

Page 11: Hypothesis testng

Step 2: Independent T-Test

income

gender(? ?)

Page 12: Hypothesis testng

Step 2: Independent T-Test

Group 1: 1 for FemaleGroup 2: 2 for male

Note: The grouping variable can only have two categories.

Page 13: Hypothesis testng

Step 2: Independent T-Test

gender(1 2)

Page 14: Hypothesis testng

T-test: Results

Using the Unequal Variance model, we REJECT H0 and conclude that there is a significant difference in average income between male and female graduates.

Group Statistics

128 79868.22 35165.875 3108.254

137 98606.49 47980.995 4099.293

GenderFemale

Male

IncomeN Mean Std. Deviation

Std. ErrorMean

Independent Samples Test

10.443 .001 -3.605 263 .000 -18738.270 5197.537 -28972.4 -8504.190

-3.642 249.145 .000 -18738.270 5144.458 -28870.4 -8606.100

Equal variancesassumed

Equal variancesnot assumed

IncomeF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

>-1.96 < 0.05 -18,738 Doesn’t include 0

Page 15: Hypothesis testng

Possible explanation for the difference in income:

Male income is higher because men have been on the job longer than women.

Research Question:

Is there a difference in average length of time on the job (YEARS) between male and female graduates?

H0: There is NO difference in length of time on the job between male and female graduates

1-2. T-test: Difference in Means

Page 16: Hypothesis testng

Step 2: Independent T-Test

Analyze/Compare Means/Independent-Samples T-Test

Page 17: Hypothesis testng

Step 2: Independent T-Test

Years at Current Position [years]

gender(1 2)

Page 18: Hypothesis testng

T-test: Results

Using the Unequal Variance model, we REJECT H0 and conclude that there is a significant difference in average length of time on the job between male and female graduates.

Group Statistics

128 4.15 4.315 .381

137 5.90 5.764 .492

GenderFemale

Male

Years at Current PositionN Mean Std. Deviation

Std. ErrorMean

Independent Samples Test

13.386 .000 -2.786 263 .006 -1.752 .629 -2.991 -.514

-2.813 251.276 .005 -1.752 .623 -2.979 -.525

Equal variancesassumed

Equal variancesnot assumed

Years at Current PositionF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Does not include 0

>-1.96 < 0.05 -1.752

Page 19: Hypothesis testng

Possible explanation for the difference in income: Male income is higher because more females work for government than males.

Research Question:

Is there a difference in the proportion employed in government jobs between male and female graduates?

H0: There is NO difference in the proportion employed in government jobs between male and female graduates

2. T-test: Difference in Proportions

Page 20: Hypothesis testng

• Create a new variable GOV that – has the value 1 if the EMPLOYER (1-6) indicates the

alumnus works for a government organization.

– has the value 0 if the EMPLOYER is not 1-6.

1. Use Transform/Compute to convert the EMPLOYER variable into a new categorical variable GOV.

2. Use Transform/Recode/Into Different Variables to create a new categorical variable GOV.

Step 1: Create a new variable (GOV)

Page 21: Hypothesis testng

OUTPUT:Analyze/Descriptive Statistics/Frequencies

Employer

8 2.9 2.9 2.9

12 4.3 4.3 7.2

13 4.7 4.7 11.9

45 16.2 16.2 28.2

17 6.1 6.1 34.3

3 1.1 1.1 35.4

5 1.8 1.8 37.2

5 1.8 1.8 39.0

20 7.2 7.2 46.2

11 4.0 4.0 50.2

47 16.9 17.0 67.1

51 18.3 18.4 85.6

4 1.4 1.4 87.0

25 9.0 9.0 96.0

11 4.0 4.0 100.0

277 99.6 100.0

1 .4

278 100.0

Gov: Federal

Gov: State

Gov: County

Gov: City

Gov: Special Agency

Gov: Non U.S.

Private: Single Person

Private: 2-4 Persons

Private: 5-19 Persons

Private: 20-49 Persons

Private: >= 50 Persons

Non-Profit (U.S.)

International Org.

Educational Inst.

Other

Total

Valid

SystemMissing

Total

Frequency Percent Valid PercentCumulative

Percent

7-11 Private

Missing Values

1-6 Government

Page 22: Hypothesis testng

Transform/Recode/Into Different Variables

Page 23: Hypothesis testng

Transform/Recode/Into Different Variables

Select the income variable, type “GOV”, click the “Change” button, click the “Old and New Values” button…

Page 24: Hypothesis testng

Transform/Recode/Into Different Variables

Page 25: Hypothesis testng

Transform/Recode/Into Different Variables

Page 26: Hypothesis testng

Transform/Recode/Into Different Variables

Page 27: Hypothesis testng

Transform/Recode/Into Different Variables

Page 28: Hypothesis testng

Transform/Recode/Into Different Variables

Save the data file!!

Page 29: Hypothesis testng

• Analyze/Descriptive Statistics/Frequencies

Step 2: Create a frequency table for GOV

Thirty five percent of the graduates employed full time or self-employed and making more than $20,000 and less than $400,000 work in government jobs.

Government Job

179 64.4 64.6 64.6

98 35.3 35.4 100.0

277 99.6 100.0

1 .4

278 100.0

No

Yes

Total

Valid

SystemMissing

Total

Frequency Percent Valid PercentCumulative

Percent

Page 30: Hypothesis testng

Step 2: Independent T-Test

Analyze/Compare Means/Independent-Samples T-Test

Page 31: Hypothesis testng

Step 2: Independent T-Test

gov

gender(1 2)

Page 32: Hypothesis testng

T-test: Results

Using the Unequal Variance model, we CANNOT REJECT H0 and cannot conclude that there is a significant difference between male and female graduates with respect to the proportion working in the government sector.

Group Statistics

127 .3543 .48020 .04261

137 .3650 .48319 .04128

GenderFemale

Male

Government JobN Mean Std. Deviation

Std. ErrorMean

Independent Samples Test

.129 .720 -.179 262 .858 -.01063 .05934 -.12748 .10622

-.179 260.726 .858 -.01063 .05933 -.12746 .10619

Equal variancesassumed

Equal variancesnot assumed

Government JobF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

<-1.96 > 0.05 Includes 0

Page 33: Hypothesis testng

3. Contingency Table/Chi-Square Analysis

The same question can be analyzed by a contingency table with GOV and GENDER and testing using the Chi-Square statistic.

H0: There is NO relationship between employment sector and gender.

Page 34: Hypothesis testng

Analyze/Descriptive statistics/Crosstabs

Page 35: Hypothesis testng

Analyze/Descriptive statistics/Crosstabs

Counts : ObservedPercentages : Row

Column

Select “gov” for “Row” & “Gender” for “column.”

Page 36: Hypothesis testng

Contingency tableAnalyze/Descriptive statistics/Crosstabs

Government Job * Gender Crosstabulation

82 87 169

48.5% 51.5% 100.0%

64.6% 63.5% 64.0%

45 50 95

47.4% 52.6% 100.0%

35.4% 36.5% 36.0%

127 137 264

48.1% 51.9% 100.0%

100.0% 100.0% 100.0%

Count

% within Government Job

% within Gender

Count

% within Government Job

% within Gender

Count

% within Government Job

% within Gender

No

Yes

GovernmentJob

Total

Female Male

Gender

Total

Page 37: Hypothesis testng

Contingency tableAnalyze/Descriptive statistics/Crosstabs

Chi-Square value = 0.032 < 3.84 (1.962 = Cutoff value at 95% confidence level at 1 df).We CANNOT REJECT the null hypothesis and cannot concludethere is a statistically significant relationship between gender and whether or not a person works for the government.

> 0.05

Chi-Square Tests

.032b 1 .857

.003 1 .959

.032 1 .857

.898 .480

264

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea.

0 cells (.0%) have expected count less than 5. The minimum expected count is 45.70.

b.

< 3.84 > 0.05

Page 38: Hypothesis testng

OUTPUT:Analyze/Descriptive Statistics/Frequencies

Missing Values

Employer

8 2.9 2.9 2.9

12 4.3 4.3 7.2

13 4.7 4.7 11.9

45 16.2 16.2 28.2

17 6.1 6.1 34.3

3 1.1 1.1 35.4

5 1.8 1.8 37.2

5 1.8 1.8 39.0

20 7.2 7.2 46.2

11 4.0 4.0 50.2

47 16.9 17.0 67.1

51 18.3 18.4 85.6

4 1.4 1.4 87.0

25 9.0 9.0 96.0

11 4.0 4.0 100.0

277 99.6 100.0

1 .4

278 100.0

Gov: Federal

Gov: State

Gov: County

Gov: City

Gov: Special Agency

Gov: Non U.S.

Private: Single Person

Private: 2-4 Persons

Private: 5-19 Persons

Private: 20-49 Persons

Private: >= 50 Persons

Non-Profit (U.S.)

International Org.

Educational Inst.

Other

Total

Valid

SystemMissing

Total

Frequency Percent Valid PercentCumulative

Percent

7-11. Private

Page 39: Hypothesis testng

3-2. Contingency Table/Chi-Square Analysis

How about analyzing the difference in the proportion of males and females in the private sector by a contingency table with PRIVATE and GENDER.

H0: There is NO relationship between employment sector and gender.

Page 40: Hypothesis testng

• Create a new variable PRIVATE that – has the value 1 if the EMPLOYER (7-11) indicates the

alumnus works for a government organization.

– has the value 0 if the EMPLOYER is not 7-11 (else).

Method 2.

Use Transform/Recode/Into Different Variables to create a new categorical variable PRIVATE.

Step1: Create a new variable (PRIVATE)

Page 41: Hypothesis testng

Analyze/Descriptive statistics/Crosstabs

Counts: ObservedPercentages: Row

Column

Select “private” for “Row” & “Gender” for “column.”

Page 42: Hypothesis testng

Contingency tableAnalyze/Descriptive statistics/Crosstabs

Private Sector Job * Gender Crosstabulation

92 87 179

51.4% 48.6% 100.0%

72.4% 63.5% 67.8%

35 50 85

41.2% 58.8% 100.0%

27.6% 36.5% 32.2%

127 137 264

48.1% 51.9% 100.0%

100.0% 100.0% 100.0%

Count

% within PrivateSector Job

% within Gender

Count

% within PrivateSector Job

% within Gender

Count

% within PrivateSector Job

% within Gender

.00

1.00

Private SectorJob

Total

Female Male

Gender

Total

Page 43: Hypothesis testng

Contingency tableAnalyze/Descriptive statistics/Crosstabs

Chi-Square value = 2.411 < 3.84 (1.962).We CANNOT REJECT the null hypothesis and cannot conclude that the difference in the proportion of males and females in the private sector is statistically significant.

Chi-Square Tests

2.411b 1 .120

2.019 1 .155

2.422 1 .120

.147 .077

264

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea.

0 cells (.0%) have expected count less than 5. The minimum expected count is 40.89.

b.

< 3.84 > 0.05

Page 44: Hypothesis testng

The degrees of freedom in the chi-square test of a contingency table:

d.o.f = (r-1)*(c-1)

where

r & c are the number of rows and columns (or the number of categories of two variables) in a table.

The number of d.o.f is the number of comparisons between actual and expected frequencies minus the number of restrictions imposed on these frequencies.

Since the number of cells in a contingency tables is r*c, there are r*c actual frequencies to be compared with the corresponding expected frequencies. Because the sum (total) of the frequencies in each row and each column are given, there are r+c-1 restrictions.

Therefore, the number of d.o.f is: r*c - (r+c-1) = (r-1)*(c-1).

The degrees of freedom in the chi-square test

Page 45: Hypothesis testng

• What other factors may influence income?• Control for job sector (government, private, non-profit),

and examine a difference in average income between males and females within each sector.– Select cases: Data/Select Casesif STATUS =1 & INCOME >20000 & INCOME > 400000 & GOV = 1

if STATUS =1 & INCOME >20000 & INCOME > 400000 & PRIVATE = 1

– Compare means/Independent Sample T-test

• If we see differences within each sector, other factors besides job sector are influencing income.

Extensions to the Analysis