Upload
iiteba
View
214
Download
0
Embed Size (px)
Citation preview
7/29/2019 A Review of the t
1/17
A Review of the t-test
The t-test is used for testing differences between two means. In order to use a t-test,
the same variable must be measured in different groups, at different times, or in
comparison to a known population mean. Comparing a sample mean to a known
population is an unusual test that appears in statistics books as a transitional step inlearning about the t-test. The more common applications of the t-test are testing the
difference between independent groups or testing the difference between dependent
groups.
A t-test for independent groups is useful when the same variable has been measured in
two independent groups and the researcher wants to know whether the difference
between group means is statistically significant. "Independent groups" means that the
groups have different people in them and that the people in the different groups have
not been matched or paired in any way. A t-test for related samples or a t-test for
dependent means is the appropriate test when the same people have been measured ortested under two different conditions or when people are put into pairs by matching
them on some other variable and then placing each member of the pair into one of two
groups.
The t-test For Independent Groups on SPSS
A t-test for independent groups is useful when the researcher's goal is to compare the
difference between means of two groups on the same variable. Groups may be formedin two different ways. First, a preexisting characteristic of the participants may be
used to divide them into groups. For example, the researcher may wish to compare
college GPAs of men and women. In this case, the grouping variable is biological
sex and the two groups would consist of men versus women. Other preexisting
characteristics that could be used as grouping variables include age (under 21 years
vs. 21 years and older or some other reasonable division into two groups), athlete
(plays collegiate varsity sport vs. does not play), type of student (undergraduate vs.
graduate student), type of faculty member (tenured vs. nontenured), or any other
variable for which it makes sense to have two categories. Another way to form groups
is to randomly assign participants to one of two experimental conditions such as agroup that listens to music versus a group that experiences a control condition.
Regardless of how the groups are determined, one of the variables in the SPSS data
file must contain the information needed to divide participants into the appropriate
groups. SPSS has very flexible features for accomplishing this task.
7/29/2019 A Review of the t
2/17
Like all other statistical tests using SPSS, the process begins with data. Consider the
fictional data on college GPA and weekly hours of studying used in the correlation
example. First, let's add information about the biological sex of each participant to the
data base. This requires a numerical code. For this example, let a "1" designate a
female and a "2" designate a male. With the new variable added, the data would look
like this:
Participant Current GPA Weekly Study Time Sex
Participant #01 1.8 15 hrs 2
Participant #02 3.9 38 hrs 1
Participant #03 2.1 10 hrs 2
Participant #04 2.8 24 hrs 1
Participant #05 3.3 36 hrs .
Participant #06 3.1 15 hrs 2
Participant #07 4.0 45 hrs 1
Participant #08 3.4 28 hrs 1
Participant #09 3.3 35 hrs 1
Participant #10 2.2 10 hrs 2
Participant #11 2.5 6 hrs 2
With this information added to the file, two methods of dividing participants intogroups can be illustrated. Note that Participant #05 has just a single dot in the column
for sex. This is the standard way that SPSS indicates missing data. This is a common
occurrence, especially in survey data, and SPSS has flexible options for handling this
situation. Begin the analysis by entering the new data for sex. Use the arrow keys or
mouse to move to the empty third column on the spreadsheet. Use the same technique
as previously to enter the new data. When data is missing (such as Participant #5 in
7/29/2019 A Review of the t
3/17
this example), hit the key when there is no data in the top line (you will
need to the previous entry) and a single dot will appear in the variable
column. Once the data is entered, clickData> Define Variable and type in the name
of the variable, "Sex." Then go to "value" And type a "1" in the box. For "Value
Label," type "Female." Then click on ADD. Repeat the sequence, typing "2" and
"male" in the appropriate boxes. Then clickADD again. Finally, clickCONTINUE>OKand you will be back to the main SPSS menu.
Back to the Top of the Page
To request the t-test, clickStatistics > Compare Means > Independent
SamplesT Test. Use the right-pointing arrow to transfer COLGPA to the "Test
Variable(s)" box. Then highlight Sex in the left box and click the bottom arrow
(pointing right) to transfer sex to the "Grouping Variable" box. Then click Define
Groups. Type "1" in the Group 1 box and type "2" in the Group 2 box. Then
clickContinue. ClickOptions and you will see the confidence interval or the methodof handling missing data can be changed. Since the default options are just fine,
clickContinue > OKand the results will quickly appear in the output window.
Results for the example are shown below:
T-Test
Group Statistics
Variable N Mean Std. Deviation Std. Error Mean
SEX 1.00 Female 5 3.4800 .487 .218
2.00 Male 5 2.3400 .493 .220
Independent Samples Test
Levene's Test for Equality of Variances
F Sig.
SEXEqual variances
assumed.002 .962
Equal Variances not
assumed
http://www.mhhe.com/socscience/psychology/runyon/spss/ttest.html#tophttp://www.mhhe.com/socscience/psychology/runyon/spss/ttest.html#tophttp://www.mhhe.com/socscience/psychology/runyon/spss/ttest.html#top7/29/2019 A Review of the t
4/17
t-test for Equality of Means
t df Sig. (2-tailed)Mean
Difference
SEX
Equal
variancesassumed
3.68 8 .021 .1750
Equal
variances notassumed
3.68 8.00 .025 .1750
The output begins with the means and standard deviations for the two variables which
is key information that will need to be included in any related research report. The
"Mean Difference" statistic indicates the magnitude of the difference between means.
When combined with the confidence interval for the difference, this information can
make a valuable contribution to explaining the importance of the results. "Levene's
Test for Equality of Variances" is a test of the homogeneity of variance assumption.
When the value forFis large and the P-value is less than .05, this indicates that the
variances are heterogeneous which violates a key assumption of the t-test. The next
section of the output provides the actual t-test results in two formats. The first format
for "Equal" variances is the standard t-test taught in introductory statistics. This is the
test result that should be reported in a research report under most circumstances. The
second format reports a t-test for "Unequal" variances. This is an alternative way of
computing the t-test that accounts for heterogeneous variances and provides an
accurate result even when the homogeneity assumption has been violated (as indicatedby the Levene test). It is rare that one needs to consider using the "Unequal" variances
format because, under most circumstances, even when the homogeneity assumption is
violated, the results are practically indistinguishable. When the "Equal" variances and
"Unequal" variances formats lead to different conclusions, seek consultation. The
output for both formats shows the degrees of freedom (df) and probability (2-tailed
significance). As in all statistical tests, the basic criterion for statistical significance is
a "2-tailed significance" less than .05. The .021 probability in this example is clearly
less than .05 so the difference is statistically significant.
7/29/2019 A Review of the t
5/17
When two samples are involved, the samples can come from different individuals who
are not matched (the samples are independent of each other.) Or the sample can come
from the same individuals (the samples are paired with each other) and the samples
are not independent of each other. A third alternative is that the samples can come
from different individuals who have been matched on a variable of interest; this type
of sample will not be independent. The form of the t-test is slightly different for theindependent samples and dependent samples types of two sample tests, and SPSS has
separate procedures for performing the two types of tests.
The Independent Samples t-test can be used to see if two means are different from
each other when the two samples that the means are based on were taken from
different individuals who have not been matched. In this example, we will determine
if the students in sections one and two of PSY 216 have a different number of older
siblings.
We will follow our customary steps:
1. Write the null and alternative hypotheses first:H0: Section 1 = Section 2
H1: Section 1 Section 2
Where is the mean number of older siblings that the PSY 216 students have.
2. Determine if this is a one-tailed or a two-tailed test. Because the hypothesisinvolves the phrase "different" and no ordering of the means is specified, this
must be a two tailed test.
3. Specify the level: = .054. Determine the appropriate statistical test. The variable of interest, older, is on a
ratio scale, so a z-score test or a t-test might be appropriate. Because the
population standard deviation is not known, the z-test would be inappropriate.
Furthermore, there are different students in sections 1 and 2 of PSY 216, and
they have not been matched. Because of these factors, we will use the
independent samples t-test.
5. Calculate the t value, or let SPSS do it for you!
The command for the independent samples t tests is found at Analyze |
Compare Means | Independent-Samples T Test (this is shorthand for clicking
on the Analyze menu item at the top of the window, and then clicking on
Compare Means from the drop down menu, and Independent-Samples T Test
7/29/2019 A Review of the t
6/17
from the pop up menu.):
The Independent-Samples t Test dialog box will appear:
Select the dependent variable(s) that you want to test by clicking on it in the
left hand pane of the Independent-Samples t Test dialog box. Then click on the
upper arrow button to move the variable into the Test Variable(s) pane. In this
example, move the Older variable (number of older siblings) into the Test
7/29/2019 A Review of the t
7/17
Variables box:
Click on the independent variable (the variable that defines the two groups) inthe left hand pane of the Independent-Samples t Test dialog box. Then click on
the lower arrow button to move the variable in the Grouping Variable box. In
this example, move the Section variable into the Grouping Variable box:
You need to tell SPSS how to define the two groups. Click on the Define
Groups button. The Define Groups dialog box appears:
7/29/2019 A Review of the t
8/17
In the Group 1 text box, type in the value that determines the first group. In this
example, the value of the 10 AM section is 10. So you would type 10 in the
Group 1 text box. In the Group 2 text box, type the value that determines the
second group. In this example, the value of the 11 AM section is 11. So you
would type 11 in the Group 2 text box:
Click on the Continue button to close the Define Groups dialog box. Click on
the OK button in the Independent-Samples t Test dialog box to perform the t-
test. The output viewer will appear with the results of the t test. The resultshave two main parts: descriptive statistics and inferential statistics. First, the
descriptive statistics:
This gives the descriptive statistics for each of the two groups (as defined by
the grouping variable.) In this example, there are 14 people in the 10 AMsection (N), and they have, on average, 0.86 older siblings, with a standard
deviation of 1.027 older siblings. There are 32 people in the 11 AM section
(N), and they have, on average, 1.44 older siblings, with a standard deviation of
1.318 older siblings. The last column gives the standard error of the mean for
each of the two groups.
7/29/2019 A Review of the t
9/17
The second part of the output gives the inferential statistics:
The columns labeled "Levene's Test for Equality of Variances" tell us whether
an assumption of the t-test has been met. The t-test assumes that the variability
of each group is approximately equal. If that assumption isn't met, then a
special form of the t-test should be used. Look at the column labeled "Sig."under the heading "Levene's Test for Equality of Variances". In this example,
the significance (p value) of Levene's test is .203. If this value is less than or
equal to your level for the test (usually .05), then you can reject the null
hypothesis that the variability of the two groups is equal, implying that the
variances are unequal. If the p value is less than or equal to the level, then
you should use the bottom row of the output (the row labeled "Equal variances
not assumed.") If the p value is greater than your level, then you should use
the middle row of the output (the row labeled "Equal variances assumed.") In
this example, .203 is larger than , so we will assume that the variances are
equal and we will use the middle row of the output.
The column labeled "t" gives the observed or calculate t value. In this example,
assuming equal variances, the t value is 1.461. (We can ignore the sign of t for
a two tailed t-test.) The column labeled "df" gives the degrees of freedom
associated with the t test. In this example, there are 44 degrees of freedom.
The column labeled "Sig. (2-tailed)" gives the two-tailed p value associated
with the test. In this example, the p value is .151. If this had been a one-tailed
test, we would need to look up the critical t in a table.
6. Decide if we can reject H0: As before, the decision rule is given by: If p ,then reject H0. In this example, .151 is not less than or equal to .05, so we fail
to reject H0. That implies that we failed to observe a difference in the number
of older siblings between the two sections of this class.
7/29/2019 A Review of the t
10/17
If we were writing this for publication in an APA journal, we would write it as:
A ttest failed to reveal a statistically reliable difference between the mean number of
older siblings that the 10 AM section has (M = 0.86, s = 1.027) and that the 11 AM
sollowing working examples refer to thedatasetfrom theUS General Social
Survey 1993.
1.Analyze-> Compare Means->Independent-Samples T Test
"independent samples t-test" is usually adopted to compare means (1 variable,
e.g. age or GPA score) between two groups on a categorical variable in a
survey
if each respondent (i.e. each case) has 2 different scores (i.e. 2 variables) to
compare, e.g. GPA of term 1 and GPA of term 2, "Paired Samples T-Test"
should be used
o it is used when 2 measures relate to one another
2. Select and put interval (or ratio) scale variable in "Test Variable(s)" box
3. Select and put categorical variable in "Grouping Variable" box
the categorical variable can invovle 2 or more categories
however, T-Test can only compare 2 groups each time
define which 2 groups will be included in the comparison by pressingtheDefine Groupsbutton
o when there are only 2 categories in the variable, it is still necessary to
define groups
http://www.soc.qc.edu/QC_Software/downloads/gss/gss93.ziphttp://www.soc.qc.edu/QC_Software/downloads/gss/gss93.ziphttp://www.soc.qc.edu/QC_Software/downloads/gss/gss93.ziphttp://www.icpsr.umich.edu/GSS/rnd1998/merged/indx-mod/1993.htmhttp://www.icpsr.umich.edu/GSS/rnd1998/merged/indx-mod/1993.htmhttp://www.icpsr.umich.edu/GSS/rnd1998/merged/indx-mod/1993.htmhttp://www.icpsr.umich.edu/GSS/rnd1998/merged/indx-mod/1993.htmhttp://www.icpsr.umich.edu/GSS/rnd1998/merged/indx-mod/1993.htmhttp://www.icpsr.umich.edu/GSS/rnd1998/merged/indx-mod/1993.htmhttp://www.soc.qc.edu/QC_Software/downloads/gss/gss93.zip7/29/2019 A Review of the t
11/17
fill in the codes representing the 2 groups to be compared
then pressContinue
as in other analyses, pressOKif you want to get the results immediately, or
pressPasteto copy out the command syntax, then run it in the Syntax window
to get the output
4. SPSS Output for T-Test
4.1 We want to know whether mean age of those who voted was differnet from thosewho did not vote in the 1992 election
SPSS will first produce the following table to show the mean age of the 2
groups in comparison
7/29/2019 A Review of the t
12/17
in average, those who voted were about 5 years (47.85 - 42.71) older than those
who did not
from the sample means, we can draw an initial conclusion that voters were
older than non-voters
however, we are interested more in inferring the sample finding to the target
population, the above conclusion must be tested for statistical significance
T-Test
Group Statistics
VOTE92 Voting in 1992
ElectionN Mean
Std.
Deviation
Std. Error
Mean
AGE Age of
Respondent
1 voted 1028 47.85 16.953 .529
2 did not vote 420 42.71 18.010 .879
4.2 Test for significance of difference
the null hypothesis is: voters and non-voters had no difference in age 2 rows contain the same nature of informtion:t,df,Sig. (2-tailed)...
o Equal variances assumed
o Equal variances not assumed
as you may notice, we have to choose one row of information to believe , but
which one?
Equal variances assumedorEqual variances not assumed?
o variances here refer to the variance of each group mean
o rule of decision:
the null hypothesis is: the variances of the means (2 groups) are
equal looking at the pink box, the significance corresponds to the F-
value (in green box)
if the significance level is greater than 0.05,the null hypothesis is
accepted i.e., choose the blue boxEqual variances assumedfor
information on t-test
7/29/2019 A Review of the t
13/17
if the significance level is less than or equal to 0.05, the null
hypothesis is rejected
i.e., choose the yellow boxEqual variances not
assumedfor information on t-test
o the significance level is 0.202, therefore null hypothesis is accepted
o we have to choose the blue boxEqual variances assumedforinformation on t-test
theSig. (2-tailed)tells us about the level of significance of the t-value
o the significance shows .000, but it does not mean the probability iszero,
it actually means the significance level is less than 0.0005
o as a convention, we reject the null hypothesis at p 0.05
hence, we may conclude that voters were older than non-voters in our target
population
Independent Samples Test
Levene's
Test for
Equality
of
Variance
s
t-test for Equality of Means
FSig
.t df
Sig.(2-
tailed
)
Mean
Differenc
e
Std.Error
Differenc
e
95%
Confidenc
e
Intervalof the
Differenc
e
Lowe
r
Uppe
r
AGE Age
of
Responden
t
Equal
variance
s
assumed
1.63
1.20
25.14
11446 .000 5.14 1.000
3.17
97.10
2
Equal
variance
s notassumed
5.01
2
737.80
3.000 5.14 1.026
3.12
7
7.15
4
ection has (M = 1.44, s = 1.318), t(44) = 1.461, p= .151, = .05.
7/29/2019 A Review of the t
14/17
Bivariate (Pearson) Correlation
A correlation expresses the strength of linkage or co-occurrence between to variables ina single value between -1 and +1. This value that measures the strength of linkage iscalled correlation coefficient, which is represented typically as the letterr.
The correlation coefficient between two continuous-level variables is also calledPearsons r or Pearson product-moment correlation coefficient. A positive rvalueexpresses a positive relationship between the two variables (the larger A, the larger B)while a negative rvalue indicates a negative relationship (the larger A, the smaller B). Acorrelation coefficient of zero indicates no relationship between the variables at all.However correlations are limited to linear relationships between variables. Even if thecorrelation coefficient is zero, a non-linear relationship might exist.
Bivariate correlation and regression evaluate the degree of relationship between twoquantitative variables. Pearson Correlation (r), the most commonly used bivariate correlation
technique, measures the association between two quantitative variables without distinctionbetween the independent and dependent variables (e.g., What is the relationship between SAT
scores and freshman college GPA?).The Output of the Bivariate (Pearson) CorrelationThe output is fairly simple and contains only a single table the correlation matrix. Thebivariate correlation analysis computes the Pearsons correlation coefficient of a pair oftwo variables. If the analysis is conducted for more than two variables it creates alarger matrix accordingly. The matrix is symmetrical since the correlation between Aand B is the same as between B and A. Also the correlation between A and A is always1.
In this example Pearsons correlation coefficient is .645, which signifies a medium positive linear
correlation. The significance test has the null hypothesis that there is no positive or negative
7/29/2019 A Review of the t
15/17
correlation between the two variables in the universe (r = 0). The results show a very high
statistical significance of p < 0.001 thus we can reject the null hypothesis and assume that the
Reading and Writing test scores are positively, linearly associated in the general universe.
. Data Analysis & Interpretation
Analysis involves the calculation of a correlation coefficient (i.e., a quantitative measure of a relationship)
Most common is a Pearson correlation coefficient (r)correlation between two interval variables
Numerous others exist for various combinations of variables
However, all are interpreted in similar manner; range from1.00 to +1.00 (some range from 0.00 to+1.00)
General rule of thumb for interpretation
Value of
coefficient
P-value
(significanc
e)
Sample
size
7/29/2019 A Review of the t
16/17
Partial Correlations
This feature requires the Statistics Base option.
The Partial Correlations procedure computes partial correlation coefficients that describe
the linear relationship between two variables while controlling for the effects of one ormore additional variables. Correlations are measures of linear association. Two variables
can be perfectly related, but if the relationship is not linear, a correlation coefficient is notan appropriate statistic for measuring their association.
Example. Is there a relationship between healthcare funding and disease rates?Although you might expect any such relationship to be a negative one, a study reports a
significant positivecorrelation: as healthcare funding increases, disease rates appear to
increase. Controlling for the rate of visits to healthcare providers, however, virtuallyeliminates the observed positive correlation. Healthcare funding and disease rates onlyappear to be positively related because more people have access to healthcare when
funding increases, which leads to more reported diseases by doctors and hospitals.
Partial Correlation
The partial correlation is the same as the Pearson correlation except that it allows you to
control for or remove the influence of another variable.
The influencing variable is typically referred to as a confounding variable. By statistically
controlling for or removing the influence of the confounding variable, you can obtain a more
clear and accurate indication of the relationship between your two variables of interest.
For example, lets say you wanted to evaluate the correlation between hours studied and
scores on a math test. There may be other factors that also influence test performance, like
IQ scores. So, to remove the influence of IQ scores, we would run a partial correlation
between hours studied and test score, controlling for IQ scores.
All correlations in the partial correlation are Pearson correlations coefficients (r). Just like
the bivariate correlation, the Pearson correlation coefficients (r) can only take on values
from -1 to +1. The sign indicates whether there is a positive correlation (as one variableincreases, the other variable also increases) or a negative relationship (as one variable
increases, the other variable decreases).
The size of the correlation coefficient (ignoring the sign) indicates the strength of the
relationship. The closer the correlation coefficient (r) gets to 1, either positive or negative,
7/29/2019 A Review of the t
17/17
the stronger the relationship. On the other hand, the closer the correlation coefficient gets to
0, the weaker the relationship.
Reliability analysis allows you to study the properties of measurement scales and theitems that compose the scales. The Reliability Analysis procedure calculates a number ofcommonly used measures of scale reliability and also provides information about the
relationships between individual items in the scale. Intraclass correlation coefficients canbe used to compute inter-rater reliability estimates.
Example. Does my questionnaire measure customer satisfaction in a useful way? Usingreliability analysis, you can determine the extent to which the items in your questionnaireare related to each other, you can get an overall index of the repeatability or internal
consistency of the scale as a whole, and you can identify problem items that should beexcluded from the scale.
Nominal.
A variable can be treated as nominal when its values represent categories with no intrinsicranking; for example, the department of the company in which an employee works. Examples of
nominal variables include region, zip code, or religious affiliation. A variable can be treated as
nominal when its values represent categories with no intrinsic ranking; for example, the
department of the company in which an employee works. Examples of nominal variables include
region, zip code, or religious affiliation.
Ordinal.
A variable can be treated as ordinal when its values represent categories with some intrinsic
ranking; for example, levels of service satisfaction from highly dissatisfied to highly satisfied.
Examples of ordinal variables include attitude scores representing degree of satisfaction or
confidence and preference rating scores.
A variable can be treated as ordinal when its values represent categories with some intrinsicranking; for example, levels of service satisfaction from highly dissatisfied to highly satisfied.
Examples of ordinal variables include attitude scores representing degree of satisfaction or
confidence and preference rating scores. For ordinal string variables, the alphabetic order of
string values is assumed to reflect the true order of the categories. For example, for a string
variable with the values of low, medium, high, the order of the categories is interpreted as high,
low, medium which is not the correct order. In general, it is more reliable to use numeric codes
to represent ordinal data.
Scale.
A variable can be treated as scale when its values represent ordered categories with a
meaningful metric, so that distance comparisons between values are appropriate. Examples of
scale variables include age in years and income in thousands of dollars. A variable can be
treated as scale when its values represent ordered categories with a meaningful metric, so that
distance comparisons between values are appropriate. Examples of scale variables include age
in years and income in thousands of dollars.