Upload
sol-majoral
View
224
Download
0
Embed Size (px)
Citation preview
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
1/77
1
RESEARCH METHODS
&
STATISTICS HANDBOOK
First Term
Dr. Alison, Mr. Brent Snook
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
2/77
2
Table of Contents
Section I: Introduction ............................................................................................ 3
Section II: Practicals ................................................................................................ 11
Section III: Extra Material ..................................................................................... 38
Appendix: Basic Statistics ....................................................................................... 60
Timetable.70
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
3/77
3
SECTION I
INTRODUCTION
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
4/77
4
Course Instructors
The instructors for this year will be, Brent Snook (Room 1.79), X, Y & Z. Our offices
are on the second floor of the Eleanor Rathbone Building.
Computing Systems
The University Computing Services Help Desk (Brownlow Hill phone extension
44567) has a full advice and backup service should you need any information and
help.
Computing Environments
Communication between computers and ourselves is mediated by operating systems
that allow us to access the various programmes andpackages in the University. Themost usual environment, as the systems are known, is Windows. This is controlled
mainly through pointing and clicking the mouse at various icons on the screen.
Another environment is UNIX, which is similar to MS DOS in that the commands are
typed rather than selected with the mouse.
The reason behind discussing these different environments is simply that the various
packages we will be using are stored in these environments.
Computers and Networks
Most computers act both as stand-alone machines, capable of independent use, and asnetworked machines, which rely on a central server. Generally speaking, we in
Psychology use networked machines for several reasons. Three main networks are
used to access the packages on the different environments: the PC Managed Network
Service, the NT Managed Network and the UNIX System.
The Three Networks
Access to the networks is gained by logging on with your user name andpassword.
You then have access to your own personal disk space (M: drive) at a central location
that only you can read. You have separate disk space for both Windows 2000 andUNIX, so you can have two separate passwords to increase security, though your user
name remains the same. At the end of a session, you must always logoff.
Usually, when a computer is bootedyou have the option to go on Windows 2000.
Once on the Network, you are in the MS DOS environment and can then use
Windows or UNIX.
Computer Terminals
Virtually all computers upstairs in Psychology and in the Eleanor Rathbone Teaching
Centre (ERTC) are networked to Windows 2000. Also, on the first floor inPsychology is another suite of computers in the Eleanor Rathbone Data Centre
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
5/77
5
(ERDC). There is a printer in there which you should use in preference to the ones in
the department, though Computing Services doesnt look kindly on people sending
huge printouts to their printers.
INSTALLING APPLICATIONS
There are a number of applications you need to install onto your account. On your
screen you should have an icon labelled MNTS Applications. Double click on this
icon. Now double click on All and you should get a screen full of icons. These are
all of the possible applications that you can install onto your account. Each
application is installed by simply double clicking on the application icon.
Install the following applications:
1. Mulberry (e-mail)2. SPSS (version 10)3. Stanford Graphics (on L:INVPSY)4. Microsoft Office (Word, Excel)5. WS_FTP6. Netterm7. BR Journey Planner8. The various MDS packages (LIFA2000, UNIX SSA, MSA, POSA)9. Geographic packages (Dragnet)
Within the limited timeframe, the purpose is to familiarise you with the software that
is available and to encourage you to start using it.
Registering on Windows 2000
Computing Services have all their documentation accessible through the World Wide
Web. You can print off any of the documents once you set up the appropriate printer.
The Computing Services handout will take you through all the basics of Windows
2000 including registering and changing your password.
To register on Windows 2000, you can go to any computer terminal. There should be
a Windows 2000 login screen. Type the word register in the username box and
follow the instructions.
Setting up the ERDC Printer
You have the capability to print on a local printer (a printer that is actually attached to
your machine) or a network printer (a printer which is attached to the network). Sincewe dont have enough printers for everyone it will be necessary to attach to a network
printer.
The printer which is probably most convenient is the one found in the Eleanor
Rathbone Data Centre. The network printer queue for this printer is erdc-Queue.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
6/77
6
Your are not restricted to just this printer but it is the closest and it is reserved for
postgraduate students studying in the Eleanor Rathbone Building.
To connect to one of the Universitys networked printers you need to do the
following:
From the Start menu choose Settings and Printers.
Doubleclick on Add Printer.
Highlight the option Network printer server and click on Next.
Double-click on Netware Network.
Double-click on Novell Directory Services.
Doubleclick on Liv.
Doubleclick on O=liv.
Scroll down the list of options until you see
OU=PRINT-QUEUES
Doubleclick on this option.
From the list detailed in Figure 1, select the required printer queue and doubleclick on it (in this case, erdc-Queue).
Choose OK to install the required printer driver on your local machine.
From the list illustrated in Figure 1, select the required printer manufacturer andthen select the printer from the list available (it should be an HP LaserJet 4Si/4Si
MX PS). Click on OK.
If this is the first time you have installed this particular type of printer then youwill be asked for the location of the files to install.
Replace the line D:\i386 (this might say A:\i386) in the box Copy files from withthe path line V:\NT40\i386 and click on OK.
After a few seconds you will be asked if you wish to make this your default printer.Click on Yes.
Click on Next.
You will then receive a message that your printer has been successfully installed.
Click on Finish.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
7/77
7
The printer driver will now be installed and connected to the specified printerqueue.
Remember, once you have connected and installed a network printer driver, you may
need to check the printer settings to ensure that the settings such as paper size andduplex printing are correct. For further information, please see Configuring the Printer
Settings (on the Computing Services web page).
Using UNIX
A lot of your time this year may be spent in UNIX. We will go into more detail about
this system in the section on UNIX. Three versions of the MDS procedures (SSA,
MSA & POSA) are on UNIX.
Double click the Netterm icon. Login with the same username as the PCMNS.Your password is listed on the form Computing Services sent you. You can
change the password by typing in passwd.
The versions of the MDS packages on the mainframe have a number of advantages
over the two non-Windows PC versions. They are basically more powerful and
therefore more effective. A second feature is flexibility. In comparison to the
mainframe SSA, ShyeSSA (PC package) has only two choices of measures of
association - Pearsons and Guttmans Mu. While PAP offers the widest variety of
measures, the copy we have also tends to be the one that doesnt work. Well go into
the PC packages in more detail at a later date. Running the mainframe packages is
fairly straight forward, but there are actually four parts to the whole process -preparing your data, uploading/downloading, using the UNIX system itself, using the
ned editor and running (in this case) the SSA.
The SSA package has an option for reading data as freefields (any space between
numbers indicates separate variables) for UNIX SSA, we can leave the data file as is.
For MSA and POSA, the fields must be fixed, so:
a) you dont want any spaces in your rows
b) you need to have each score for each variable to start in the same column,
otherwise, when the MDS programs read your data file, they wont be reading thevariables properly - you tell the computer where each cases score for each
variable is located by indicating the columns that variable occupies.
Both of these will become apparent when we get to running the SSA. A simple
example:
12 42 131213
13122 71111
In this case, the second variable actually takes up three columns, but, obviously, the
computer does not stick a 0 in front of a score like 42 - thats your job. The same
holds true for the third variable, which requires two columns as the score are over 9.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
8/77
8
The correct version of the two lines of data, with 0s added and spaces removed
should be:
12042131213
13122071111
Where columns 1-2 are variable 1, columns 3-5 are variable 2 and so on. After
correcting the data, you then save the file again, making sure that it is still being saved
in the generic ASCII format.
Uploading/downloading files
The best way to upload (transfer files from the PC/NT Managed Network to the
mainframe) or download (vice versa) is to use the WS-FTP application (in the comms
window). This is a simple package to use.
1) When you first start it up, a window comes up asking for information about a host- this is the location youll be accessing outside of the PC to transfer information.
Under host name type UNIX. Under host type, select the option UNIX
(standard). Under UserID put your user name. Under password enter your
UNIX password.
2) The format is simple. On the left-hand side is the local host, through which youcan change between directories and drives. The right-hand side is your remote
host (UNIX account), which also has directories you can move through. At the
very top of each, your current drives/directories are listed. To transfer a file up,
you select it and hit the right arrow. To transfer down, you hit the left button.The only trick is to make sure you have it set up for the right receiving directory,
e.g. selecting the winword directory on your M: drive to receive the results file
from an MSA that youve run. So, locate the coding.dat data file and move it from
the M: drive to your UNIX account.
3) When youve finished, hit the exit button.
The UNIX Operating SystemImagine that UNIX is set up like the file manager in Windows, but that you have to
type in commands rather than click the mouse to move up and down directories, copy
files, delete files and so on. When you first login in, the info on the left of the dollar
sign prompt indicates the user, the particular machine youre on (in brackets) and
what directory you are in - from left to right. Remember that UNIX is case sensitive -
an F is not the same thing as an f. I find it very useful to start all directories with a
capital letter and files with a lower case one to separate them. The first thing to know
is how to find help. This is done through the man command. If you know the
particular command you want help for, just type man {command}. If you have an
idea as to what type of action you want the computer to carry out, but dont know the
specific command, type man -k {keyword} where the keyword is something related to
the command, e.g. password, to get a list of commands with something to do with
passwords.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
9/77
9
Here is a short list of UNIX commands:
cp {directory/filename} {newfilename}- copies a file
rm {filename}- deletes a file
mv {filename} {directory} - moves a file/renames a file
ls - lists the contents of the current directorycd {directory} - change directory (note that cd .. moves you up one directory)
mkdir {directoryname} - creates a directory
rmdir {directoryname} - removes a directory
ned {filename} - activates the ned editor
pine - e-mail editor
gopher - info source
tin - newsgroups reader
Note that * is a wildcard character, just like in Windows, for selecting multiple files
for commands. All of this information is available in more detail on the WWW.
The ned editor
I mentioned this before (briefly), but I thought Id go into this in more detail, as its a
useful tool for editing files when you are on UNIX. All of these details are in the
document on the ned editor on the Computing Services on-line documentation, BTW.
Essentially, its a crude word processor, where all the function keys have
various...functions...as do shift-function keys. None of this silly bolding or italics, no
sir. You can type, you can move your cursor around and you can find and replace.
Word processing for real men. Anyway, on to the lesson.
To start up the ned editor, you have to edit a file on the UNIX account. You do the
latter by typing ned {filename}, so choose any filename and open the ned editor.
A screen will come up with a brownish banner along the bottom listing the various
function key options. All the basic keys operate like in Word: arrow keys move the
cursor around, the home key moves to the start of the line, and end to the end. Page
up/down are also the same. As are insert/typeover, delete, backspace and so on.
Right, now type out everything from I mentioned this... on to right here.
Hit the F1 key, to get info on one of the displayed topics, move the cursor to it and hit
enter. For a function key, hit that key. Ctrl-G exits help.
Right, position your cursor at the start of the third line from the bottom, then hit the
F2 key - a new line. Now hit F9, which will delete the line you just created. Now hit
shft-F9, and the line comes back. Right, now hit the F4 key to mark the start location
for cutting/copying and pasting. Move the cursor somewhere on the next couple of
lines then hit F6. Move to the end of the document and hit enter a couple of times,
then hit F5. All the text between the marking point and the cursor is copied to the
new cursor location. The same process is carried out for cutting text, but you hit shft-
F6 instead of F6.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
10/77
10
To insert text from another file, hit shft-F7. Ned will ask for the filename, the text of
which will be inserted wherever youre put the cursor. Shft-F4 saves the file, while
shft-F3 saves the file and exits. F3 quits without saving changes. However, the most
important feature of ned for data files that youve uploaded is the replace feature (F8).
With this, you can change 1s to 0s and so on. So, lets change the letter e to i.
Move to the top left of the document. Hit F8, then type the letter e [DONT hitenter now]. Hit F8 again, and type i. Hit F8 one more time. Youll be prompted to
make a choice about the first e. If you hit the Y key, it will change it, N will make
the computer jump to the next occurrence. A ! will cause ned to make all possiblechanges. Youll find this handy for changing numbers prior to doing analyses. For
example, changing 0s to 1s and 1s to 2s.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
11/77
11
SECTION II
PRACTICALS
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
12/77
12
WEEK 1: Thursday October 3th
Introduction to SPSS
SPSS is the primary package for running any statistical procedures outside of the
MDS packages. In addition to providing outputs for various analyses, SPSS allowsthe user to manipulate the data in a variety of ways and to produce various graphs and
figures that can be added into documents.
In this practical, you will be asked to open and search through a data matrix, and enter
and code data. The procedure for the exercises in this practical involves going
through the steps for each analysis using the data file family.sav.
Where is Family.sav?
The first thing you must do is copy family.sav from the N: drive on your computer to
the M: drive (which is your own personal account). To do this you must create afolder on your M: drive into which the family.sav file will go. You should be
looking at a screen with a number of icons on it. In the top left-hand corner is an icon
called my computer. Double-click on this icon.
Find the M: drive and double-click on it. You should now see a window containing a
number of folders. Go to FILE, then NEW and choose FOLDER. A new folder
should appear in the bottom of the window labelled New Folder. Call your new
folder Survey and ENTER. After you have done this, go to FILE and then
CLOSE.
Now, within the same window double-click on your N:drive. Within that drive youwill see a folder with title SPSSEGS (standing for SPSS example files). Double-click
on this folder. Within this folder there is a file labelled family.sav. This is the file
you want to copy into your Survey folder on your M: drive. So, single click on
family.sav and go to EDIT and then COPY.
Go back to your M: drive by shutting down the N: drive. (click on the X in the right
hand corner of your N: drive window). Double-click on your M: drive and double-
click on the folder Survey. Survey should be empty. Go to EDIT and then PASTE.
Now you should see the file family.sav.
Exploring the Data Editor Window
Start SPSS for windows by double-clicking on the SPSS icon. Once the program has
been opened a window will appear in the middle of the screen with a number of
options to choose from. You want to select OPEN AN EXISTING DATA
SOURCE.
Go to the directory Survey in your M: drive. Find the file family.sav and double-
click on it. The values from the family.sav file should now appear in the Data Editor
window. Click on the middle button in the top right hand corner of the window to
maximise the size of the window. Once the file is open you will see two sheets at thebottom of the window. One is labelled DATA VIEW and the other is labelled
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
13/77
13
VARIABLE VIEW. You want to stay on the data view sheet. Click on the VALUE
LABELS (in bold rectangle below) button on your tool bar (it is 2nd
from the right).
This will toggle between value labels (numeric and string (words)). Scroll through
the data to answer the following questions:
1. What is the name of the last variable in the data matrix?
2. What is the case number of the last case?
3. What is the value of IDNUM for the last case?
4. What is Roberts date of birth?
5. What is Jacks marital status?
If you click on a cell when value labels are displayed in the DATA VIEW
WINDOW a scroll bar will appear to provide an indication of the options (variable
labels) used in the coding framework. Using this feature, please answer the following
questions:
What are the labels for CAR?
What are they for MORTGATE?
What are they for NAME? Is there a problem with NAME? What is it?
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
14/77
14
The variable view sheet
In order to view how a variable has been defined in terms of its name, variable label,
value labels and user-missing values you have to click on the sheet VARIABLEVIEW.
Please answer the following questions. Do not forget to use the scroll bars on the
bottom and on the right side of the variable view window to find your answers.
What is the variable label for DATEBLT?
What are the values and value labels for MARSTAT? (hint: click on the grey box)
What is the user-missing value for NCARS?
Click on this Sheet
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
15/77
15
Coding and Entering Data
Open up a new Data Editor window by going to FILE, then NEW and save DATA to
M: drive. Below is a questionnaire regarding leisure activity and a coding scheme.
Your task is to set up the Data Editor Window and then enter the data below.
Leisure Activity Questionnaire
1. What is your first name?2. What is your sex? M = male, F = female3. What is your marital status?
1 = married 4 = widowed
2 = cohabiting 5 = divorced
3 = single 6 = separated
4. Do you watch sports? 1 = yes 2 = no 3 = do not know5. Do you play sports? 1 = yes 2 = no 3 = do not know6. Do you visit the seaside? 1 = yes 2 = no 3 = do not know7. Do you go to films? 1 = yes 2 = no 3 = do not know8. Do you go pop concerts? 1 = yes 2 = no 3 = do not know
Coding Framework
Variable Name Format Variable Label Coding Details/Labels
IDNUM NUMERIC IDENTIY NUMBER Unique Number for Each Person
NAME STRING FIRST NAME Enter First Characters of Name
SEX STRING SEX M = male F = FemaleAGE NUMERIC AGE IN YEARS Enter age in years (-9 = Missing)
MARSTAT NUMERIC MARITAL STATUS 1=married 4=widowed
2=cohabiting 5=divorced
3 = single 6 = separated
WATCHSP NUMERIC WATCHES SPORTS 1 = yes 2 = no 3 = do not know
PLAYSP NUMERIC PLAYS SPORTS 1 = yes 2 = no 3 = do not know
VISITSEA NUMERIC VISITS SEASIDE 1 = yes 2 = no 3 = do not know
GOTOFILM NUMERIC GOES TO FILMS 1 = yes 2 = no 3 = do not know
GOTOPOP NUMERIC GOES TO POP CONCERTS 1 = yes 2 = no 3 = do not know
Data
IDNUM NAME SEX AGE MARSTAT WATCHSP PLAYSP VISITSEA GOTOFILM GOTOPOP101 MARGARET F 87 4
201 JACK M 62 1 1 2 1 2 2
202 JOSIE F 1 2 2 1 2 2
301 NANCY F 60 5 1 2 1 2 2
503 VICTORIA F 11 -9 2 1 1 1 3
1002 JOHN M 31 2 1 3 1 1 1
You should have a clean window in front of you (i.e., there should not be any data in
the spreadsheet). You now have to set up each column of your data matrix so that you
can eventually enter in your data. The first column will hold IDNUM. To enter
IDNUM into the data view sheet you need to go to the VARIABLE VIEW window.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
16/77
16
In fact, defining and labelling all of your variables must be done in your variable view
sheet.
In the first Row (horizontal) you can label and define your first variable IDNUM.
Using the coding framework above enter in the appropriate information. Type in the
variable IDNUM under NAME. The TYPE of variable is NUMERIC (you areentering a number)and under DECIMALS, using the scroll bar, choose 0 decimal
places. Under the heading LABELS you want to type in the definition of the
variable. Make sure this definition clearly defines the variable to avoid confusion.
Depending upon the type of data (i.e., nominal, ordinal, ratio, or interval) you are
measuring you may have to add VALUES. In the case of IDNUM (identify number)
there is only one unique number, therefore you do not have to define the variable. So,
under VALUES, you should have chosen none. However in defining nominal data
such as SEX (your third variable to enter) you would have to define male as M and
female as F.
For IDNUM there are no missing values therefore you choose none. The heading
COLUMNS will give you the opportunity to define the width of your column.
Choose a width of 6. The ALIGN value allows you to determine the positioning of
your data in the cell. It may be right, left or centred. In the last column heading is
MEASURE. This column allows you to define the type of data you are working
with. With IDNUM you are working with scale data.
When you define variables such as NAME (i.e., the name of the subject), you want
the TYPE of variable to be STRING, the WIDTH should be 10 (refers to the number
of characters to appear in the name). Using the coding framework below define the
variable NAME.
When you define variables such as sex (nominal data) you want to add value labels in
the column called VALUES. If you click on the cell a value labels window will
appear. Across from value you should type your value M and across from the value
label type male and then click on add. Then you should enter F in the value box and
female in the value label box. Once you have made these changes you can move back
to the DATA VIEW window and view the changes.
Return to the VARIABLE VIEW window and define the numeric variable AGE in
the next row. It has no decimal places, and it requires a missing value of9 to identifycases where a response is not given. To assign a user-missing value of9 click on the
MISSING column. A missing values window will appear. Click on Discrete missing
values and enter 9 in the first box. Set up a variable label and a value for 9 as
shown in the coding scheme for your questionnaire. Now, do the same for the
numeric value MARSTAT in the next row. This too is numeric with no decimal
places, has a user-missing value of9 and requires a variable label and several value
labels as shown in the coding scheme.
The remaining 5 variables also need to be defined. To avoid defining each variable
separately you should define the first variable WATCHSP and then copy the cells to
the remaining four below. To do this go to the cell you want to repeat (i.e., the value
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
17/77
17
labels) and click on EDIT, COPY and then move to the cell where you want the same
definition and then go to EDIT and PASTE.
When you have finished entering all of the data save it into an SPSS file by selecting
FILE, SAVE and clicking on the folder Survey in your M: drive. Save the file under
any name you want (e.g., Person.sav). Exit from SPSS and log off.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
18/77
18
WEEK 2: October 10th
Descriptive Statistics, Charts & Manipulating Data in the
Matrix
This practical is divided into two sections. The first section is intended to familiariseyou on how to run commands to calculate descriptive statistics and to graph your data.
The second section aims to show you how to compute re-code, filter and delete your
data.
Section I: Descriptive Statistics & Charts
We shall estimate descriptive statistics for the three variables: TYPACCM,
DATEBLT, & NADULTS.
Question: Are these variables nominal (non-ordered categories), ordinal (with ordered
categories) or metrical (on a measure scale with well-defined differences between
values)? Hint: The second variable is not so obvious.
To run the descriptive statistics click on ANALYZE, DESCRIPTIVE STATISTICS
and then FREQUENCIES. In the left box there should be a list of all the variables
that are present in the spreadsheet. Highlight TYPACCMand click the arrow between
the boxes to move it into the box labelled variables. Continue this for the other two
variables. A shorter route to move the variables to the variables box would be to
double-click on the variables when they are in the left box - removing the variables
may be accomplished in the same manner.
After the three variables are in the variables box, click on STATISTICS at the
bottom of the box. Within the Frequencies: Statistics box there are several options.
Tick the boxes for MEAN, MEDIAN & MODE on the right hand side. In addition,
tick the boxes for STANDARD DEVIATION (Std. Deviations) & RANGE. After,
click on the continue button and wait for the data to process and for the output
window to appear.
Answer the follow questions:
What is the most useful measure of central tendency for each of the three variables?
What are the sample values?
What is the maximum value for NADULTS? Does this appear to be correct?
Now, try re-estimating the descriptive statistics for NADULTS, only this time without
the case with the unusual value. Select DATA and then SELECT CASES. Withinthe Select Cases make sure under the Unselected Cases that the Filtered box is
ticked. Then select the IF CONDITION IS SATISFIED optionand click on the IF
button. Move the variable NADULTS to the adjacent box by either double-clicking
on itor by clicking on the variable and moving it across using the arrow.
After the variable label use the calculator provided to type less than (
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
19/77
19
the unusual variable. After this hit continue and then OK to return to the spreadsheet.
Answer the follow questions:
Has the case with the unusual value been barred off?
Which case is it?
Now, re-run the Frequencies command for NADULTS only and record the mean,
median & mode with and without the case included.
Which descriptive statistic is most affected by the unusual variable?
Graphing your Results
Histograms
Histograms are statistical diagrams that show the distribution of variables. In a
histogram, values are grouped together in intervals and a bar is drawn for each
interval whose area is proportional to the number of cases in the interval.
To generate a histogram select GRAPHS and HISTOGRAM
Then move the variable HEIGHT into the variable box. In the same box, click the
display normal curve box and then hit OK.
Upon examining the output window that contains the graph answer the following
question:
Do you think HEIGHT has a normal distribution, or would you run other tests?
Go back to the data editor window, select GRAPHS and HISTOGRAM and run the
same command as done using the HEIGHT variable but with WEIGHT.
From the histogram, would you say that the variable WEIGHT has a normal
distribution or would you try other tests?
Are there any differences between the two histograms?
Scatter plots
Scatter plots show the joint behaviour of two (or more) variables in a diagram.
Values of one of the variables are plotted against values of another, the two variables
usually being metrical. A scatter plot usually shows much more about the behaviour
of the variables than descriptive statistics like correlation.
Scatter plots are also drawn using the GRAPHS command. Click on GRAPHS then
SCATTERPLOT then on the SIMPLE option and then click on the DEFINEbutton. Select WEIGHT for the Y-axis and HEIGHT for the X-axis. In a scatter plot,
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
20/77
20
if one of the variables is thought to depend on the other, it is plotted on the vertical Y-
axis. Here, we think that weight depends on height, therefore, weight is plotted on the
Y- axis.
In addition, select SEX for select markers by. This will allow you to identify pointson the scatter plot by sex, as males and females tend to have different heights and
weights. Run the command and look at the scatter plot in the chart carousel window.
Can you see any difference between the males and the females in terms of heights and
weights?
To edit the chart simply double-click on it. Now we shall try fitting simple linear
regression lines to the data. Select CHART then OPTIONS and FIT LINES (Select
Subgroups) and FIT OPTIONS. Make sure linear regression has been highlighted
and then click-on continue. There should be two different lines for males and
females.
What can you say about the slopes of the two regression lines?
Can you see any difference now between the males and the females in terms of
heights and weights?
The markers used to distinguish males and females are drawn in different colours, but
the difference is not very clear. It will become less clear if you print out the scatter
plot on a monochrome printer! Click on any marker in the plot: all markers of that sex
become highlighted in black squares. Then click on the icon depicting acrayon/pencil to change the colour of the marker/symbol. To change the symbol
simply click on FORMAT and then MARKER. There you should have several
options of changing the type and size of the symbol. After making the chosen changes
hit Apply and Close.
Editing a High Resolution Chart
Generate a high-resolution chart, a histogram, to try out some of the editing features.
Histograms are used for metric or quantitative variables, like AGE, which takes on
values along a scale. There are generally too many distinct values to make it worthdrawing a bar chart. Instead, the values are grouped into intervals or bands and a bar
is drawn for each interval. The area of each bar is proportional to the number of cases
with values in the interval.
Still using family.sav select GRAPHS and then HISTOGRAM. Select HWRATIO
for the variable box and click OK. A histogram for HWRATIO is added to the Chart
Carousel Window. The histogram shows some descriptive statistics for the variable
too.
What are the sample mean and standard deviation for HWRATIO?
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
21/77
21
Double-click on the chart to move the histogram from the Chart Carousel Window to
a Chart Window. The menu bar and tool bar change to show editing facilities.
First, click on CHART then OPTIONS and NORMAL CURVE - then hit OK. The
normal curve superimposed over the histogram is the one for the above mean and
standard deviation. Admittedly, its difficult to make a decision with such a smallsample, but does the curve appear to be a good fit to the histogram?
Now, click on the icon swap axes. Does the histogram look better with vertical bars
or horizontal bars?
Now try some of the other icons and tools to change the chart. These changes require
the appropriate part of the chart to have been selected. Click on any bar. The bars will
become highlighted with small black squares at their corners. Then click on the Fill
Pattern - tool button (the rectangle with diagonal shading). To apply a pattern, click
on it and then click on apply. Once you have finished with the patterns, click on close.
Also, try the Colour Palette tool button (the one with the pen) and the Bar Labels icontool button (the one with the fingernails).
You can also change the style of the line showing the Normal curve, and the fill
pattern and colour of the background of the histogram. Once you have finished with
your work, select FILE and then SAVE CHART. Save your histogram as
artwork.chz
To copy or move a chart into Word click on EDIT and then select COPY the chart.
To move to Word minimise SPSS and open word. If Word is already open then press
ALT & TAB to move between programs. Once in Word, go to EDIT PASTE.
Finally, exit from SPSS for windows by selecting FILE EXIT
Section II: Manipulating the Data in the Matrix
(Computing, Recoding, Filtering and Deleting Data)
Computing Values
Start off SPSS and open the file family.sav (you should find this file on your M:
drive in the folder that you named survey). We shall use the COMPUTE command
to build up a new variable that will be labelled BMI, which stands for body mass
index. This is calculated as:
Body mass index = weight (pounds)/ height (inches)2
Select TRANSFORM and then COMPUTE and set the Target Variable to bmi.
Click on Type & Label and enter the label body mass index in the label box. Click
continue to return to the Computer Variable dialog box. Using the source list on the
left and the calculator pad in the centre, build up
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
22/77
22
Weight * 0.4536 / (height * 0.0254) **2
in the numeric expression box. Run the completed command. The new variable is
added to the end of the data. We shall check the new variable by estimating a few
descriptive statistics using FREQUENCIES (via Analyze Descriptive Statistics).
(Analyze Descriptive Statistics Explore would be a better command, butFrequencies will do here).
Select ANALYZE, DESCRIPTIVE STATISTICS and then FREQUENCIES.
Move body mass index (bmi) to the Variable(s) box. Since bmi is a metric variable
with a potentially different value for every case in the data suppress frequency tables
by clearing the check box. Click on DISPLAY FREQUENCY TABLES. Now you
will get a message saying You have turned off all output. Unless you request
Display Frequency Tables, Statistics or Charts, Frequencies will generate no output.
No worries, we will estimate descriptive statistics by clicking on STATISTICS and
clicking on the check boxes for the following: MEAN, MEDIAN, MINIMUM and
MAXIMUM. Run the command and look at the output.
What are the sample values of the mean, median, minimum and maximum?
(The mean should be around 25.0. Any values outside the range15.0 to 35.0 should
be queried).
Do the sample statistics satisfy these rough checks? If not, something is wrong!
Conditionally Computing Values
Now we shall use the IF sub-command (via Transform-Compute) to set up a new
variable. The sub-command allows you to set up a new variable under the condition
that the original variable, which it is based on, fulfils certain criteria. We want to set
up a new variable AGEHOH for the age of the head of the household. In other
words, If a person in the sample is head of the household, AGEHOH shall indicate
that persons age.
Select TRANSFORM and then COMPUTE and clear the previous settings by
clicking on RESET. Set the Target Variable to AGEHOH and click on TYPE &
LABEL to assign the label age head of household. Click on Continue, and then setthe Numeric Expression to AGE. We want this (i.e., the current age in years) to be
applied when the case is head of household, which occurs when RELTOHOH is zero.
(For the variable RELTOHOH relationship to head of household the value 0
denotes that a person is head of household). Select IF and INCLUDE IF CASE
SATISFIES CONDITION. Set up the condition RELTOHOH = 0 in the large box
and run the command. The variable AGEHOH should now be added to the end of the
data. Have a look at the new variable. You should see ages set for some cases only.
Lets check AGEHOH by moving it in the data matrix to the column after
RELTOHOH so that we can see what happened more clearly.
First we must make a space in the data matrix by inserting a new variable. FindRELTOHOH by either scrolling through the DATA EDITOR window or by
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
23/77
23
selecting UTILITIES and VARIABLES. selecting RELTOHOH from the source
lists and then clicking on GO TO and CLOSE. Now click on any cell of the variable
that is immediately to the right of RELTOHOH (this variable should be sex). Then
select DATA and then INSERT VARIABLE. Alternatively, you can click on
INSERT VARIABLE tool (which is the sixth button from the right).
Now, a blank column headed var00001 containing system-missing values (dots) is
inserted before the selected variable. Move the AGEHOH to this column by single-
clicking on AGEHOH to highlight the column and then selecting EDIT and CUT.
To paste it in the desired location single-click on the head of the blank column
(var00001) and select EDIT and then PASTE.
Look at the values in the DATA EDITOR window.
Do all heads of household have AGEHOH set? If not, what might be the reason?(Hint: Look at the variable that agehoh is derived from!).
What value is set for cases who are not heads of household?
Re-coding Values
The RECODE command in SPSS is very powerful and efficient but it can be a little
tricky to set up due to the number of clicks required. We shall recode BMI into a new
variable BMIGRP, which takes the values
Value Range Interpretation
1 bmi < 25.0 Okay
2 25.0 bmi < 30.0 Overweight
3 bmi 30.0 Obese
Select TRANSFORM and then RECODE and INTO DIFFERENT VARIABLES.
Select BMI from the source list into the central INPUT VARIABLE OUTPUT
VARIABLE box. Enter BMIGRP into the Name box and click on Change to
complete the INPUT VARIABLE OUTPUT VARIABLE box. Also enter a
suitable variable label for BMIGRP in the LABEL box (e.g., categorical body mass
index).
To set up the recoding, click on OLD and NEW VALUES.We build up the recode
specification for the third category of BMIGRP first. In the OLD VALUE box, select
RANGE and THROUGH HIGHEST and enter 30.0 in the box before THROUGH
HIGHEST. In the NEW VALUE section, enter 3 into the VALUE box. Then click
on ADD to copy the specification 30.0 THROUGH HIGHEST = 3 to the OLD
NEW box. Build up the other two specifications, in order of 25.0 through 30.0 = 2
and LOWEST THROUGH 25.0 = 1. Now run the completed command.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
24/77
24
To finish, double-click on BMIGRP in the Data Editor window, and define suitable
value labels (i.e., 1= okay, 2 = overweight, 3 = obese).
Are the values of BMIGRP correct for the first ten cases?
Filtering Cases
In this example, we shall filter cases. The filtering option allows you to exclude
certain cases from further analysis temporarily.
Before filtering, generate a two-way frequency table for ownrent by typaccm by
selecting ANALYZE, then DESCRPTIVE STATISTICS and then CROSSTABS
and selecting ownrent for Row(s) and typaccm for column(s). Run the command and
look at the table in the output.
1. What exactly does the frequency count in the first cell of the second table refer to?6 what?
We shall filter using the variable PERSNO, which is the number of persons in the
household.
2. What will be the effect of selecting cases satisfying the condition persno=1? Whatis the impact on households?
Now, select DATA and SELECT CASES and then IF CONDITION IS
SATISFIED and make sure that UNSELECTED CASES are FILTERED (This isvery important as the alternative is DELETED, which we want to avoid now!)
Select IF.. and build up the condition persno = 1 in the large box. Run the
completed command. Find persno in the data editor window.
3. What appears in the status bar when filtering is in effect? (The status bar is at thebottom of the window)
4. What has happened to case numbers with persno 1?Rerun the CROSSTABS command (via AnalyseDescriptive statistics) and look at
the new table in the output.
5. What exactly does the frequency count in the first cell refer to now? 3 What?
Go to the Data Editor Window and save the filtered data as familyf.sav. Then select
DATA, SELECT CASES and then ALL CASES. Run the command.
6. What happens to the status bar and the case numbers?
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
25/77
25
Deleting Cases
Instead of filtering cases we shall delete unselected cases without doing any harm to
data stored in disk system files. Select DATA, SELECT CASES, IF CONDITION
IS SATISFIED which picks up the previous condition on persno = 1. Then select
UNSELECTED CASES are DELETED. Run the command and have a look at theData Editor Window.
1. How many cases are left?
2. What are the values of PERSNO?
3. What are the values of HSEMO? What does that successfully show?
Now, rerun the CROSSTABS command in the previous section and look at the
output.
4. Do the results agree with those obtained when cases are filtered?
Return to the Data Editor Window and save the selected cases to a NEW system file
named familyd.sav (after deleting cases you should do this as soon as possible to
avoid overwriting your complete data file by accident).
Finally, re-open familyf.sav, the filtered file you saved from the previous section
5. Is filtering still on?
Exit from SPSS, saving the contents of the output window into output3.spo
Open up family.sav that you saved to your survey folder.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
26/77
26
WEEK 3: October 17th
T-Tests
Section I: Parametric T-tests (related& unrelated)
This practical will show you how to run a t-testso that you can look at the difference
between means oftwo scores.
Experimental designs can be of two basic types within subject (dependent or
related) and between subject (independent or unrelated). The former is when all
subjects are subjected to all conditions (e.g., testing reaction times before and after
receiving a drug). Between subject designs are when you divide subjects into
independent groups, such as on the basis of gender, or into one group that receives a
drug, and a second that receives a placebo.
DEPENDENT OR RELATED SAMPLES T-TEST
First, a quick review of the test layouts.
1. Related Samples - two variables, one for each condition of the experiment. Eachsubject has two scores, as a result:
Variable 1 (First set of scores for
the subjects, e.g. reaction time
before taking the drug)
Variable 2 (Second set of scores
for the subjects, e.g. reaction time
after taking the drug)Sub. No.
1 10 30
2 11 31
3 12 32
4 10 30
5 9 29.
2. Independent or Unrelated Samples - two variables, the first tells SPSS whatcondition EACH subject belongs to, the second is the actual score for that subject:
Variable 1 (what condition each
subject belongs to, e.g. group 1 are
the controls, group 2 receive the
drug)
Variable 2 (actual score, e.g. each
subjects reaction time)
Sub. No.
1 (control) subjects condition (1) subject 1 score
2 (control) 1 subject 2 score
3 (experimental) 2 etc.
4 (experimental) 2 etc.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
27/77
27
T-Test for Related Sample
This is the parametric comparison of two related groups, for example, when you want
to compare mean scores for subjects at some task before and after taking a drug. Each
set of subject scores for the related t-test must be entered as an individual variable inSPSS. So, in the above example, all the individual(s) scores for the task before taking
the drug would be in one column and all the scores after taking the drug in another.
First, open family.sav. The next step is to add a variable to the data file, so that we can
run the related t-test. In this case, the comparison will be between the subjects
height/weight ratio before they were put on a 4-week diet/exercise plan and after. The
variable already in the data set HWRATIO is the measure before. At the end of the
data file, add the variable HWRATIO2 to represent their measurements after the plan.
Using what you learned in the first lesson about entering data, create the new variable
using the information below:
Variable Name: HWRATIO2
Variable Label: Height/Weight Ratio after plan
Data: see table 1 below
To run the procedure, go ANALYZE, COMPARE MEANS and then PAIRED-
SAMPLES T-TEST
The usual dialogue box appears. The dialogue box has the two-column format. The
only difference is that you must select pairs of variables and move them across, rather
than just one variable at a time. To do this, you have to click on one variable, then
locate the other variable and click on it. The two variables that you have requested
should appear in the current selection box. After clicking on both, you then press the
arrow button to move the pair across. SPSS will analyse each pair to determine if their
means are significantly different statistically. In this case, select the variables
HWRATIO and HWRATIO2 and move them across, then press the OK button.
Table 1: Data for Height/Weight Ratio after a 4-week diet/exercise plan
Subject Number HWRATIO2 score
1 .44
2 .523 .46
4 .
5 .44
6 .42
7 .33
8 .74
9 .80
10 .32
11 .60
12 .6513 .40
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
28/77
28
14 .50
15 .57
16 .41
17 .60
18 .55
19 .49
20 .60
OUTPUT
The results appear in three sections
The first section gives you a table called Paired Samples Statistics with the meanscores, standard deviations and standard error mean for the two variables.
The second section is a table called Paired Samples Correlation(s) showing thecorrelation between the two variables and the level of significance The third section is more important. The table called Paired Samples Test
indicates the significance of the results. This includes the t-value, degrees of
freedom (d.f.) and the two-tailed significance level.
What is the t-value for the comparison between the height to weight ratio scores?
Is there a significant difference between the scores before and after the diet/exercise
plan? If so, which is the greater height/weight ratio?
T-Test for Independent Samples
This is the parametric t-test for two independent samples - a between-subjects design
where, for example, subjects are randomly assigned to two separate test conditions
(e.g. drug and control) and the mean scores (e.g. reaction time) are compared to
determine if they are significantly different from each other.
In this case, you want to test whether there is a statistical difference in weight to
height ratios between the male and female subjects. The format for variables to be
used in the independent t-test is different from that used in the related. Instead of thescores being placed in two separate columns (variables), all of the scores are placed in
a single column (variable). A second variable identifies for SPSS which of the two
groups each score belongs to. So, in this case, there is the variable HWRATIO2 as the
dependent variable and NSEX as the independent variable.
To run the analysis, go to ANALYZE, COMPARE MEANS and then
INDEPENDENT-SAMPLES T-TEST. As usual, the left column lists all the
variables in your data file. On the right, there are two boxes:
The test variable(s) box is where you move the dependent variable(s). (e.g.,
HWRATIO2)
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
29/77
29
The grouping variable box is where you move the variable that distinguishesbetween the two independent groups (e.g. the variable NSEX)
First, select the dependent variable HWRATION2 and move it over to the test
variable(s) section. Next move NSEX over into the grouping Variable section and
press the DEFINE GROUPS button. Values from the grouping variable must beentered into the two boxes. In the case of the variable sex, where only two levels are
recorded, you would just enter 1" in the top box for male subjects, and 2" in the
lower one for female subjects. Hit the CONTINUE button, then hit the OK button.
[Note: There may be times where you have a larger range of values, such as five
different education levels, but only want to look at the difference between two of
them. You would enter the two values you wish to compare.]
OUTPUT
There are two sections:
The first section of the output gives you a table called Group Statistics whichindicates the number of cases and the mean scores etc. for each condition.
The second section provides a table called Independent Samples T-test andstarts with Levenes Test for Equality of Variance. If the variance is unequal and
is indicated by significant difference, then when you look at the results of the t-
test in the final table, you use the line starting with Equal variances not assumed.
If it isnt significant, you look at the line starting with Equal variances assumed.
The final table gives you t-values, degrees of freedom and the two-tailed
significance levels.
In this case, Levenes is not significant (0.137), so we look at the equal variance line.
In this case, it is not significant (two-tailed significance of .478), so we reject the
hypothesis that there is a difference between males and females in their height to
weight ratios.
Section II: Non-Parametric T-tests (Wilcoxon - related & Mann-
Whitney - unrelated)
All of the tests today can be found under ANALYZE, NONPARAMETRIC TESTS
Mann-Whitney - Unrelated
This is the non-parametric t-test for two independent samples - a between-subjects
design. To run the analysis, choose: ANALYZE, NONPARAMETRIC TESTS, and
2 INDEPENDENT SAMPLES
As usual, the left column lists all the variables in your data file. On the right, there are
two boxes:
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
30/77
30
the test variable(s) box is where you move the dependent variable(s) the grouping variable box is where you move the variable that distinguishes
between the two independent groups (e.g. the variable sex)
So, move HWRATIO2 into the test variable box, and move NSEX into the groupingvariable box. Now, click the Define Groups button. Values from the grouping variable
must be entered into the two boxes. In the case of the variable NSEX, you enter 1" in
the top box for male subjects, and 2" in the lower one for female subjects. Hit the
Continue button, then hit the Ok button.
OUTPUT
SPSS divides the entire set of subjects into three groups:
those with a score of 1 (male)
those with a score of 2 (female) cases with missing data, which are excluded from the analysis)
The first section gives the mean ranks for the two conditions that are included, as well
as the sums of the ranks and the numbers of cases
The second section gives the Z score and p-values for the T-test.
Is there a difference between males and females? How do the results from this week
compare to last weeks?
Wilcoxon - Related
This is the non-parametric repeated measures T-test, in a within subjects design. Like
the parametric equivalent, well be running a comparison of height to weight ratios for
the sample population before and after a four-week exercise/diet program. To run the
analysis, choose: ANALYZE, NONPARAMETRIC TESTS, and 2 RELATED
SAMPLES
The dialogue box has the two-column format. The only difference is that you mustselect pairs of variables and move them across. SPSS will analyse each pair to
determine if their mean ranks are significantly different statistically. For this analysis,
select the two variables HWRATIO and HWRATIO2, then click the Ok button.
OUTPUT
The output for this procedure is quite different from the parametric test. The first
section gives you information about how many rank scores for one condition are
less than (LT)
greater than (GT)equal to (EQ)
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
31/77
31
the ranks scores for the other condition. The mean ranks for each of these three levels
are given, as well as the sums of the ranks for each and the number of cases that fall
under each level.
The main results are underneath this table, where the Z value and the p value aregiven. The usual standard for levels of significance is used (if p is less than 0.05).
How many cases are there where HWRATIO is greater than HWRATIO2?
Is there a significant difference between ranked height/weight ratios before and after
the exercise/diet program?
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
32/77
32
WEEK 4: October 24th
ANOVAS
This practical will involve familiarising students with the analysis of variance
(ANOVA). The ANOVAs used in this practical are when you may want to determineif there is a significant difference between three or more groups when you have only a
single variable.
One-way ANOVA for Independent Samples
In this case, we want to determine if there is a significant difference in the height to
weight ratio between the three age groups in the sample in family.sav - children,
adults and elderly. We also want to carry out a Tukeys post-hoc test to identify where
those difference lie, if any. The procedure is remarkably similar to carrying out an
unrelated samples t-test. Go: ANALYZE, COMPARE MEANS, ONE-WAYANOVA
As you can see, the layout of the dialogue box is basically the same as the one for
unrelated t-tests from last week. First select your Dependent variable(s) - in this case
move the variable HWRATIO into the dependent list section. Your factor
(independent variable) is the variable AGEGRP. Press the Continue button.
Before running the analysis, press the Post-hoc button and turn on the Tukeys test.
Now press the Continue and Ok buttons and the analysis will be carried out.
OUTPUT
There are two sections to the results for the one-way ANOVA.
1. The first section indicates whether any significant differences exist between thedifferent levels of the independent variable. The between groups, within groups,
sums of squares are listed, degrees of freedom, the F-ratio and the F-probability
score (significance level). It is this last part that indicates significance. If the F-
prob. is less than 0.05 than a significant difference exist. In this case, the F-prob.
is 0.000, so we can say that there is a statistically significant difference in height
to weight ratios between the three age groups.
2. The post-hoc test identifies where exactly those difference lie. The final part of thesecond section is a small table with the levels of the independent variable listed
down the side. Looking at the comparisons between these levels we see that
children have a significantly higher mean height to weight ratio than adults and
the elderly (this is also indicated by the asterixes).
For the meantime, ignore the third table of the output.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
33/77
33
One-way ANOVA for Related Samples
The procedure for running this is very different from anything youve done before.
The first step is easy enough - you need to add a third height to weight ratio variable,
representing the ratios for the subjects some time after they stopped doing theexercise/diet plan. The data is below:
Variable Name: HWRATIO3
Variable Label: Height/Weight Ratio post-plan
Data: see table below
Subject Number HWRATIO3 score
1 .42
2 .56
3 .42
4 .5 .41
6 .40
7 .30
8 .78
9 .71
10 .30
11 .55
12 .64
13 .40
14 .49
15 .5516 .39
17 .52
18 .54
19 .49
20 .60
The first step is to run a single factor ANOVA by going: ANALYZE, GENERAL
LINEAR MODEL, REPEATED MEASURES
The dialogue box is different from the usual format. The first step is to give a name tothe factor being analysed, basically the thing the three variables have in common. All
three variables cover height to weight ratios, so
in the With-in Subject Factor Name: box type RATIO. in the Number of Levels box, type 3 (representing the three variables) press the Add button, then the Define button
The next dialogue box is a bit more familiar. In the right-hand column, there are three
question marks with a number beside each. Select each of the three variables to be
included in the analysis, and move them across with the arrow button. Notice how
each of the variables replaces one of the question marks, indicating to SPSS which
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
34/77
34
three variables represent the three levels of the factor RATIO. Then proceed by
clicking on OK.
OUTPUT
Firstly, you can ignore the sections of the output titled Multivariate Tests andMauchlys Test of Sphericity.
You need to examine the section titled Tests of Within-Subjects Effects. This
section indicates whether any significant differences exist between the different levels
of the within subjects variable. The degrees of freedom and sums of squares are listed,
as well as the F-score and its significance level. If the significance level is less than
0.05 than a significant difference exist. In this case, it is 0.001 (look at the measure for
sphericity assumed), so we can say that there is a statistically significant difference in
height to weight ratios between the three times when measurement were taken.
You can ignore the section titled Tests of Between-Subjects Effects. It is irrelevanthere.
To do a post-hoc test to identify where the differences lie, the SPSS for Windows
made easy manual recommends doing Paired-Sample T-tests. In this case
HWRATIO & HWRATIO2
HWRATIO & HWRATIO3
HWRATIO2 & HWRATIO3
From these three T-tests, you can determine which of the height to weight ratios are
significantly different from each other.
Kruskall-Wallis ANOVA (KWANOVAUnrelated)
This is similar to the non parametric independent ANOVA, where ranks are used
instead of the actual scores. We will run the analysis on the same variables, so go
ANALYZE, NONPARAMETRIC TESTS, and K INDEPENDENT SAMPLES
As with the parametric test, move HWRATIO over to the test (dependent variable listand AGEGRP over to the Grouping (independent) variable list, and define the group
with a minimum of 1 and a maximum of 3. Click the Ok button. Notice that the non
parametric ANOVA doesnt have a post-hoc test. If you run this ANOVA, youll
have to consult a statistics book as to how to do a post hoc on the results. One way
would be to run a series of t-tests on all the combinations of the conditions.
OUTPUT
The first section gives you the mean ranks and the number of cases for each level of
the independent variable. The second section lists the Chi-Square value, degrees of
freedom and significance of the test.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
35/77
35
Is there a significant difference between the three groups (remember you cant say
exactly what that difference is without a post hoc test)?
Friedmans - Related ANOVAs
This is similar to the nonparametric related samples ANOVA, where ranks are used
instead of the actual scores. We will run the analysis on the same variables, so go
ANALYZE, NONPARAMETRIC TESTS, and K RELATED SAMPLES
This is much easier to run - just move the three variables (HWRATIO, HWRATIO2
and HWRATIO3) over to the right column and clickOK.
OUTPUT
There is the Chi-square score, the d.f. and whether its significant (as usual, has to be
less than 0.05). Again, for post-hoc tests, youll probably have to consult a statistics
book or possibly run three non-parametric related samples T-tests.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
36/77
36
WEEK 5: 30th
October
Study Week
WEEK 6: November 6th
No Practical
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
37/77
37
WEEK 7: November 14th
QUALITATIVE RESEARCH: STUDENT SEMINAR
PRESENTATIONPREPARATIONStudents should use this time to prepare work for their presentations. Dr. Alison will be
available in his office for guidance if necessary.
WEEK 8: November 21st
QUALITATIVE RESEARCH: STUDENT SEMINAR
WEEK 9: November 28thINTERVIEWING AND DISCOURSE ANALYSIS
conductig interviews etc
This period should be used to conduct interviews in preparation for the session on
content analysis. Students are expected to conduct interviews or sessions that result in
naturally occurring language. It is important that this material is transcribed in
preparation for week 11s session. Dr. Alison will be available for consultation.
WEEK 10: December 5th
WORKING WITH NATURALLY OCCURING
LANGUAGE
PREPARATION
Students will use this period to work with their material gathered in the previous
sessions. They should use this time to prepare for presentations in the final practical
session (12th
December).
WEEK 11: December 12th
WORKING WITH NATURALLY OCCURING
LANGUAGE: STUDENT SEMINAR
Students are expected to organise their own seminar presentations in this session on
the results and methods employed regarding the content analysis of their material.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
38/77
38
SECTION III
EXTRA MATERIAL
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
39/77
39
For the benefit of students who wish to follow up other procedures in their own time,
we have included the following section which gives you some opportunity to play
with graphics packages and explore some issues associated with regression in
preparation for next term. Try not to worry if this all sounds unfamiliar at first. This
section is simply to give you a running start when it comes to your work after
Christmas.
REGRESSION
Simple Regression
In simple regression, the values of one variable (the dependent variable (y in this
case)) are estimated from those of another (the independent variable (x in this case))
by a linear (straight line) equation of the general form:
y=bo+b1(x)
where y is the estimated value of y, b1 is the slope (known as the regression
coefficient)
and bo is the intercept (known as the regression constant).
Multiple Regression
In multiple regression the values of one variable (the dependent variable (y)) are
estimated form those of two or more variables (the independent variables (x1,
x2,,xn)). This is achieved by the construction of a linear equation of the general
form:
y=bo+b1(x1)+b2(x2)++bn(xn)
where the parameters b1,b2,,bn are the partial regression coefficients and the
intercept bo is the regression constant.
Residuals
When a regression equation is used to estimate the values of a variable (y) from those
of one or more independent variables (x), the estimates (y) will not be totally
accurate (i.e., the data points will not fall precisely on the straight line). The
discrepancies between y (the actual values) and y (the estimated values) are known
as residuals and are used as a measure of accuracy of the estimates and of the extent
to which the regression model gives a good account of the data in question.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
40/77
40
The multiple correlation coefficient
One measure of the efficacy of regression for the prediction of y is the Pearson
correlation between the true values of the target variable y and the estimates y
obtained by substituting the corresponding values of x into the regression equation.
The correlation between y and y is known as the multiple correlation coefficient (R(versus r which is Pearsons (the correlation between the target variable and any one
independent variable)). In simple regression R takes the absolute value of r between
the target variable and the independent variable (so if r=-0.87 than R=0.87).
Running Simple Regression
Using the family.sav file we want to look at how accurately we can estimate height to
weight ratios (HWRATIO) using the subjects age (AGE). To run a simple
regression, choose ANALYSE,REGRESSION and LINEAR.
As usual, the left column lists all the variables in your data file. There are two sections
for variables on the right. The Dependent box is where you move the dependent
variable. Move HWRATIO there. The Independent(s) box is where you move AGE.
Next click the STATISTICSbutton, and turn on the Descriptive option.
As already states, a residual is the difference between the actual value of thedependent variable and its predicted value using the regression equation. Analysis
of the residuals gives a measure of how good the prediction is and whether there
are any cases that should be considered outliers and therefore dropped from theanalysis. Click on Case-wise diagnostics to obtain a listing of any exceptionally
large residuals.
Now click on CONTINUE.
Now click on the PLOTS button. Since systematic patterns between the predictedvalues and the residuals can indicate possible violations of the assumption of
linearity you should plot the standardised residuals against the standardised
predicted values. To do this transfer *ZRESID into the Y: box and *ZPRED into
the X: box and then ClickCONTINUE.
Now clickOk.
Output
The first thing to consider is whether your data contains any outliers. There are no
outliers in this data. If there were this would be indicated in a table labelled
Casewise Diagnostics and the cases that corresponded to these outliers would have
to be removed from your data file using the filter option you learned previously.
With that out of the way, the first table (Descriptive Statistics) to look at is right at the
top. The first part gives the means and standard deviations for the two variables (e.g.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
41/77
41
the mean age is 31.77). The next table contains the correlation (Pearsons) for the two
variables, just as if you had run the correlation procedure. The coefficient is -0.57, so
it is fairly high and is negative (as one goes up, the other decreases).
For the meantime, ignore the table labelled variables entered/removed.
The next important table is Model Summary. The R and R-squared values are
given for the equation (0.571, as above, and 0.325). Dont worry too much about the
other values in this table.
The next table contains the regression ANOVA. This test indicates how good the
model is - whether there is some overall relationship between the dependent and
independent variable(s). The key element is the F score. For this regression, the F
score has an associated p value of 0.017, well below the .05 cut-off. This indicates
that there is a linear relationship. It should be noted however that only an
examination of the scatter plot of the variables can confirm that the relationship
between two variables is linear.
The next table contains some really important information! The table is labelled
Coefficients and contains the regression equation. The regression coefficient and
constant are given in column B of the table. The equation therefore is:
Predicted height to weight ratio = -.00368(AGE) + .602
The t value indicates whether each independent variable has a significant individual
impact on the regression equation. In simple regression, there is only one independent
variable, and, for this one, it has a significant influence (a t score with an associated p
value of 0.0168 - notice its the same as the ANOVA score).
The next section begins with Residual Statistics. This gives means, SDs and other
information about the unstandardised and standardised predictor and residual scores in
the regression.
You could follow up the regression by doing up a scatter plot. Look at your scatter
plot. Basically, all you need to know is that if the plot shows no obvious pattern than
this confirms that the assumptions of linearity and homogeneity of variance have been
met. Where you get into trouble is if the points form a crescent or funnel shape. If
this is the case, further screening of your data is necessary.
Multiple Regression
Often, it is too simplistic to assume that a single independent variable is all that is
required to make some sort of prediction about the scores for a dependent variable.
This is where you have to run multiple regression.
For now, the regression will look at the impact of age (AGE), height to weight ratio
post-plan (HWRATIO2) and height to weight ratio long after the plan (HWRATIO3)
on the dependent variable, the subjects initial height to weight ratio (HWRATIO). To
run the analysis, choose: ANALYSE, REGRESSION and then LINEAR.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
42/77
42
As before, move HWRATIO to the Dependent. The Independent(s) box is where
you move AGE, HWRATIO2 and HWRATIO3. The rest is as before:
Click the STATISTICSbutton, and turn on the Descriptive option.
Click on Case-wise diagnostics to obtain a listing of any exceptionally largeresiduals.
Now click on CONTINUE.
Now click on the PLOTS button. Since systematic patterns between the predictedvalues and the residuals can indicate possible violation of the assumption of linearity
you should plot the standardised residuals against the standardised predicted values.
To do this transfer *ZRESID into the Y: box and *ZPRED into the X: box and then
clickCONTINUE.
Now clickOK.
Note: we are only doing a general, all-inclusive multiple regression. There is a box
located directly beneath the Independent(s) box called Method which gives you a
series of additional methods for running the statistics - stepwise, remove, forward and
backward.
Output
Again, the first thing to look for is outliers. Again, there are none.
With that out of the way, the next section to look at is at the top. Everything that
follows is the same as for the simple regression. The first part gives the means and
standard deviations for the four variables (e.g. the mean HWRATIO3 is .526). The
next part gives the correlation (Pearsons) for all of the variables. You can see that
HWRATIO is strongly correlated with the two other height-to-weight ratio variables
(i.e., both over .9).
The next section is under the heading Model Summary. The R and R-squared
values are given for the equation (.98 and .967).
An ANOVA is carried out that indicates how good the model is - whether there issome overall relationship between the dependent and all of the independent variables.
The key element is the F score. The F score is significant (p=0.00), so there is a strong
overall relationship.
The next table (Coefficients) contains information that indicates the individual role of
each independent variable. The values in the column labelled B give the scores to put
into the regression equation:
y = b1(x1) + b2(x2) + b3(x3) + bo
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
43/77
43
For this regression, then, the regression equation is
HWRATIO = -.0009(AGE) + .99(HWRATIO2) -.135(HWRATIO3) + 0.063
Note that since the B score for HWRATIO3 is negative, the plus sign turns to a minusone.
The t-test indicates that AGE, as before, is a significant predictor, as is HWRATIO2,
but that HWRATIO3 as a single predictor has no significant influence (p>0.05).
The next section is labelled Residual Statistics. This gives means, SDs and other
information about the unstandardised and standardised predictor and residual scores in
the regression. You should have been taught what, if anything, to do with them.
Scatter plots and Regression Lines
A regression line can easily be added to a scatter plot. As before, to create a
scatterplot go to GRAPH and SCATTER.
You want to leave the graph layout as simple, so just click the DEFINE button.
Move HEIGHT into x-axis box. Move WEIGHT into Y-axis. Now, click the TITLE
button. You can now put in a title in the Line 1 box. You can add an additional title
and sub-title lines if you want. Now press the CONTINUE button and then click the
OK button. The graph should now appear. The window where all the graphs are
stored is called the Chart Carousel, and can be saved as a separate file. The extension
for chart files is always .cht
What is the line of best fit and what does the value of R2tell you?
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
44/77
44
Chi-Square
There are two ways to run Chi-square. The first is when looking at differences in
frequencies across levels in one variable. In this case, we want to see if there are
differences in the frequencies for the three levels of the variable AGEGRP (age
groups) - child, adult and elderly. You do this through:
Analyze
Nonparametric Tests
Chi-Square
To run a basic Chi-Square, just move the variable(s) to analyse across and clickOk. In
this case, move the variable AGEGRP over and run the analysis.
[NOTE: If youre interested in the various options, information about them can be
found by pressing theHelp button when you are in a dialogue box]
OUTPUT
The results present the observed and expected frequencies for each of the three levels,
as well as the Chi-Square value, the degrees of freedom (d.f.) and the significance
level. Is there a difference between the three groups in terms of their observed
frequencies?
The second way to run a Chi-Square is when carrying out a crosstab. The only change
is that before running the crosstab, you have to turn the Chi-Square option on.
So, go
Analyze
Descriptive Statistics
Crosstabs
Move the variables NSEX in the column box and NCARS in the row box. Make sure
to turn on the Chi-Square option, by clicking the Statistics button, and turning on the
Chi-Square option. Press the Continue and Okbuttons, then run the analysis.
OUTPUT
The crosstabs box is displayed, along with a variety of results. The one to be
concerned with is the significance level for the Pearsons value.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
45/77
45
Microsoft Word Exercises
This exercise shows you how to copy and format a document. To save time heres one
we prepared earlier. A cast list is given below. Your task is to format the document
(top of page 80) into an organised piece of work (bottom of page 80). As you do this,note the different techniques you use - they will come in handy as the course
progresses.
These are hands on sessions, meaning that you should be discovering what to do
yourself. Of course, if you have any difficulties then we are here to assist you. Good
luck, and remember the Help facility.
The Help Facility
Normally you will want to go to the Help menu, then choose Contents and Index.
Click on Index and type in a relevant key word.
The Opening Screen
Word offers a number of ways of viewing the document. The most usual is Normal.
So, go to the View menu, and select Normal.
Alternatively, use the shortcut button at the bottom left of the screen. If you are not
sure what a particular button does then you should hold the pointer arrow over the
button for a second or two without pressing anything. Word will then give a short
description of the button.
The other view often used is Page Layout, which shows how the page will be printed.
Using Zoom from the View menu will allow you to enlarge the screen.
Opening Files
Were going to be e-mailing you two documents entitled play.doc and actone.txt.
Open up the e-mail, and then, one at a time, click the Word icons once with your right
mouse button. Now save the documents by clicking on Save. Find the Msoffice icon
and click on it. Now save your documents in the Msoffice folder under suitable
names (e.g. play.doc and actone.txt).
Now go into word. To open the file you just saved go to File, Open, click on your M:
drive and find your Msoffice folder. Click on the Msoffice folder and find play.doc.
Double click on play.doc.
Hidden Codes
Certain characters or text in Word are hidden. That is to say, they will only appear
on the screen but not in the final printed version. To turn this option on and off, click
on the reversed P button on the toolbar. That marker is the paragraph marker,
denoting a new line (hard return). Turn the hidden codes off.
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
46/77
46
Now show the hidden codes. Can you spot the deliberate mistake? Yes, its one of
those errors in conversion. Double click on {PRIVATE} and the whole word is
selected. (This is a handy trick worth noting). Delete this word.
Correct any other deliberate mistakes.
Page Layout
The original document has margins of 2 inches. Make sure the measurement units are
in inches by going to Tools, Options, General, and clicking inches in the box called
measurement units if it isnt already done so. Now go to File, Page Setup and
increase left and right margins to 2 inches from the Margins option. If it asks you if
you want your margins fixed respond with yes. Also note that under Paper Size you
can change the orientation of the paper. Briefly, portrait is upright (for text mainly)
and landscape is horizontal (for graphs and pictures).
To change the justification, select everything by going to Edit, Select All or bydragging the mouse over the whole document (only if its a small document). Now ,
click the right mouse button over any part of the selected area. Choose Paragraph
from the menu that appears and choose Justified from the Alignment option. You
can also do this from the toolbar. Centre alignment is useful for headings. Change
The Play to centre alignment.
Formatting
Italicise What The Butler Saw by clicking on What and dragging over the other
three words. Now use the toolbar to italicise by hitting the I button.
Highlight all of the text and change the font size to 12 pts.
Type in the other characters, leaving a space between the character and actor names.
Similarly, change the characters names to small caps by selecting the name and using
the right mouse button. From Font choose the Small caps option. For the other
character names, simply select the name and go to Edit, Repeat. Select the cast and
add a tab into the ruler at two inches by double clicking on the ruler at the two inch
mark. Place your cursor before Stanley. Go to Format and Tabs and add the leader
option 2 (i.e. lots of full stops). Press Ok and then press the tab key. Do this for each
cast member. For the director and designer, the tab is set at 1.5 inches with no leader.
Separate the pieces of text with two hard returns and dont forget to save yourwork.
The Play
8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss
47/77
47
The first London performance of What The Butler Saw was given at the Queens Theatre byLewnstein-Delfont Ltd and H.M. Tennnant on 5th March, 1969, with the following cast in order of
appearance.
Dr Prentice Stanley Baxter
Geraldine Barclay Julia FosterMrs Prentice Coral Browne
Directed by Robert Chetwyn
Designed by Hutchinson Scott
The final version should look something like this:
The Play
The first London performance ofWhat The Butler Saw was given at theQueens Theatre by Lewnstein-Delfont Ltd and H.M. Tennent Ltd. on 5th
March, 1969, with the following cast in order of appearance:
DR PRENTICE .................. Stanley Baxter
GERALDINE BARCLAY ..... Julia Foster
MRS PRENTICE ................ Coral Browne
NICHOLAS BECKETT ........ Hayward Morse
DR RANGE ....................... Ralph Richardson
SERGEANT MATCH .......... Peter Bayliss
Directed by Robert Chetwyn
Designed by Hutchinson Scott
8/2/2019 Alison