Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

1/77

1

RESEARCH METHODS

&

STATISTICS HANDBOOK

First Term

Dr. Alison, Mr. Brent Snook


2/77

2

Table of Contents

Section I: Introduction ............................................................................................ 3

Section II: Practicals ................................................................................................ 11

Section III: Extra Material ..................................................................................... 38

Appendix: Basic Statistics ....................................................................................... 60

Timetable.70


3/77

3

SECTION I

INTRODUCTION


4/77

4

Course Instructors

The instructors for this year will be, Brent Snook (Room 1.79), X, Y & Z. Our offices

are on the second floor of the Eleanor Rathbone Building.

Computing Systems

The University Computing Services Help Desk (Brownlow Hill phone extension

44567) has a full advice and backup service should you need any information and

help.

Computing Environments

Communication between computers and ourselves is mediated by operating systems

that allow us to access the various programmes andpackages in the University. Themost usual environment, as the systems are known, is Windows. This is controlled

mainly through pointing and clicking the mouse at various icons on the screen.

Another environment is UNIX, which is similar to MS DOS in that the commands are

typed rather than selected with the mouse.

The reason behind discussing these different environments is simply that the various

packages we will be using are stored in these environments.

Computers and Networks

Most computers act both as stand-alone machines, capable of independent use, and asnetworked machines, which rely on a central server. Generally speaking, we in

Psychology use networked machines for several reasons. Three main networks are

used to access the packages on the different environments: the PC Managed Network

Service, the NT Managed Network and the UNIX System.

The Three Networks

Access to the networks is gained by logging on with your user name andpassword.

You then have access to your own personal disk space (M: drive) at a central location

that only you can read. You have separate disk space for both Windows 2000 andUNIX, so you can have two separate passwords to increase security, though your user

name remains the same. At the end of a session, you must always logoff.

Usually, when a computer is bootedyou have the option to go on Windows 2000.

Once on the Network, you are in the MS DOS environment and can then use

Windows or UNIX.

Computer Terminals

Virtually all computers upstairs in Psychology and in the Eleanor Rathbone Teaching

Centre (ERTC) are networked to Windows 2000. Also, on the first floor inPsychology is another suite of computers in the Eleanor Rathbone Data Centre


5/77

5

(ERDC). There is a printer in there which you should use in preference to the ones in

the department, though Computing Services doesnt look kindly on people sending

huge printouts to their printers.

INSTALLING APPLICATIONS

There are a number of applications you need to install onto your account. On your

screen you should have an icon labelled MNTS Applications. Double click on this

icon. Now double click on All and you should get a screen full of icons. These are

all of the possible applications that you can install onto your account. Each

application is installed by simply double clicking on the application icon.

Install the following applications:

1. Mulberry (e-mail)2. SPSS (version 10)3. Stanford Graphics (on L:INVPSY)4. Microsoft Office (Word, Excel)5. WS_FTP6. Netterm7. BR Journey Planner8. The various MDS packages (LIFA2000, UNIX SSA, MSA, POSA)9. Geographic packages (Dragnet)

Within the limited timeframe, the purpose is to familiarise you with the software that

is available and to encourage you to start using it.

Registering on Windows 2000

Computing Services have all their documentation accessible through the World Wide

Web. You can print off any of the documents once you set up the appropriate printer.

The Computing Services handout will take you through all the basics of Windows

2000 including registering and changing your password.

To register on Windows 2000, you can go to any computer terminal. There should be

a Windows 2000 login screen. Type the word register in the username box and

follow the instructions.

Setting up the ERDC Printer

You have the capability to print on a local printer (a printer that is actually attached to

your machine) or a network printer (a printer which is attached to the network). Sincewe dont have enough printers for everyone it will be necessary to attach to a network

printer.

The printer which is probably most convenient is the one found in the Eleanor

Rathbone Data Centre. The network printer queue for this printer is erdc-Queue.


6/77

6

Your are not restricted to just this printer but it is the closest and it is reserved for

postgraduate students studying in the Eleanor Rathbone Building.

To connect to one of the Universitys networked printers you need to do the

following:

From the Start menu choose Settings and Printers.

Doubleclick on Add Printer.

Highlight the option Network printer server and click on Next.

Double-click on Netware Network.

Double-click on Novell Directory Services.

Doubleclick on Liv.

Doubleclick on O=liv.

Scroll down the list of options until you see

OU=PRINT-QUEUES

Doubleclick on this option.

From the list detailed in Figure 1, select the required printer queue and doubleclick on it (in this case, erdc-Queue).

Choose OK to install the required printer driver on your local machine.

From the list illustrated in Figure 1, select the required printer manufacturer andthen select the printer from the list available (it should be an HP LaserJet 4Si/4Si

MX PS). Click on OK.

If this is the first time you have installed this particular type of printer then youwill be asked for the location of the files to install.

Replace the line D:\i386 (this might say A:\i386) in the box Copy files from withthe path line V:\NT40\i386 and click on OK.

After a few seconds you will be asked if you wish to make this your default printer.Click on Yes.

Click on Next.

You will then receive a message that your printer has been successfully installed.

Click on Finish.


7/77

7

The printer driver will now be installed and connected to the specified printerqueue.

Remember, once you have connected and installed a network printer driver, you may

need to check the printer settings to ensure that the settings such as paper size andduplex printing are correct. For further information, please see Configuring the Printer

Settings (on the Computing Services web page).

Using UNIX

A lot of your time this year may be spent in UNIX. We will go into more detail about

this system in the section on UNIX. Three versions of the MDS procedures (SSA,

MSA & POSA) are on UNIX.

Double click the Netterm icon. Login with the same username as the PCMNS.Your password is listed on the form Computing Services sent you. You can

change the password by typing in passwd.

The versions of the MDS packages on the mainframe have a number of advantages

over the two non-Windows PC versions. They are basically more powerful and

therefore more effective. A second feature is flexibility. In comparison to the

mainframe SSA, ShyeSSA (PC package) has only two choices of measures of

association - Pearsons and Guttmans Mu. While PAP offers the widest variety of

measures, the copy we have also tends to be the one that doesnt work. Well go into

the PC packages in more detail at a later date. Running the mainframe packages is

fairly straight forward, but there are actually four parts to the whole process -preparing your data, uploading/downloading, using the UNIX system itself, using the

ned editor and running (in this case) the SSA.

The SSA package has an option for reading data as freefields (any space between

numbers indicates separate variables) for UNIX SSA, we can leave the data file as is.

For MSA and POSA, the fields must be fixed, so:

a) you dont want any spaces in your rows

b) you need to have each score for each variable to start in the same column,

otherwise, when the MDS programs read your data file, they wont be reading thevariables properly - you tell the computer where each cases score for each

variable is located by indicating the columns that variable occupies.

Both of these will become apparent when we get to running the SSA. A simple

example:

12 42 131213

13122 71111

In this case, the second variable actually takes up three columns, but, obviously, the

computer does not stick a 0 in front of a score like 42 - thats your job. The same

holds true for the third variable, which requires two columns as the score are over 9.


8/77

8

The correct version of the two lines of data, with 0s added and spaces removed

should be:

12042131213

13122071111

Where columns 1-2 are variable 1, columns 3-5 are variable 2 and so on. After

correcting the data, you then save the file again, making sure that it is still being saved

in the generic ASCII format.

Uploading/downloading files

The best way to upload (transfer files from the PC/NT Managed Network to the

mainframe) or download (vice versa) is to use the WS-FTP application (in the comms

window). This is a simple package to use.

1) When you first start it up, a window comes up asking for information about a host- this is the location youll be accessing outside of the PC to transfer information.

Under host name type UNIX. Under host type, select the option UNIX

(standard). Under UserID put your user name. Under password enter your

UNIX password.

2) The format is simple. On the left-hand side is the local host, through which youcan change between directories and drives. The right-hand side is your remote

host (UNIX account), which also has directories you can move through. At the

very top of each, your current drives/directories are listed. To transfer a file up,

you select it and hit the right arrow. To transfer down, you hit the left button.The only trick is to make sure you have it set up for the right receiving directory,

e.g. selecting the winword directory on your M: drive to receive the results file

from an MSA that youve run. So, locate the coding.dat data file and move it from

the M: drive to your UNIX account.

3) When youve finished, hit the exit button.

The UNIX Operating SystemImagine that UNIX is set up like the file manager in Windows, but that you have to

type in commands rather than click the mouse to move up and down directories, copy

files, delete files and so on. When you first login in, the info on the left of the dollar

sign prompt indicates the user, the particular machine youre on (in brackets) and

what directory you are in - from left to right. Remember that UNIX is case sensitive -

an F is not the same thing as an f. I find it very useful to start all directories with a

capital letter and files with a lower case one to separate them. The first thing to know

is how to find help. This is done through the man command. If you know the

particular command you want help for, just type man {command}. If you have an

idea as to what type of action you want the computer to carry out, but dont know the

specific command, type man -k {keyword} where the keyword is something related to

the command, e.g. password, to get a list of commands with something to do with

passwords.


9/77

9

Here is a short list of UNIX commands:

cp {directory/filename} {newfilename}- copies a file

rm {filename}- deletes a file

mv {filename} {directory} - moves a file/renames a file

ls - lists the contents of the current directorycd {directory} - change directory (note that cd .. moves you up one directory)

mkdir {directoryname} - creates a directory

rmdir {directoryname} - removes a directory

ned {filename} - activates the ned editor

pine - e-mail editor

gopher - info source

tin - newsgroups reader

Note that * is a wildcard character, just like in Windows, for selecting multiple files

for commands. All of this information is available in more detail on the WWW.

The ned editor

I mentioned this before (briefly), but I thought Id go into this in more detail, as its a

useful tool for editing files when you are on UNIX. All of these details are in the

document on the ned editor on the Computing Services on-line documentation, BTW.

Essentially, its a crude word processor, where all the function keys have

various...functions...as do shift-function keys. None of this silly bolding or italics, no

sir. You can type, you can move your cursor around and you can find and replace.

Word processing for real men. Anyway, on to the lesson.

To start up the ned editor, you have to edit a file on the UNIX account. You do the

latter by typing ned {filename}, so choose any filename and open the ned editor.

A screen will come up with a brownish banner along the bottom listing the various

function key options. All the basic keys operate like in Word: arrow keys move the

cursor around, the home key moves to the start of the line, and end to the end. Page

up/down are also the same. As are insert/typeover, delete, backspace and so on.

Right, now type out everything from I mentioned this... on to right here.

Hit the F1 key, to get info on one of the displayed topics, move the cursor to it and hit

enter. For a function key, hit that key. Ctrl-G exits help.

Right, position your cursor at the start of the third line from the bottom, then hit the

F2 key - a new line. Now hit F9, which will delete the line you just created. Now hit

shft-F9, and the line comes back. Right, now hit the F4 key to mark the start location

for cutting/copying and pasting. Move the cursor somewhere on the next couple of

lines then hit F6. Move to the end of the document and hit enter a couple of times,

then hit F5. All the text between the marking point and the cursor is copied to the

new cursor location. The same process is carried out for cutting text, but you hit shft-

F6 instead of F6.


10/77

10

To insert text from another file, hit shft-F7. Ned will ask for the filename, the text of

which will be inserted wherever youre put the cursor. Shft-F4 saves the file, while

shft-F3 saves the file and exits. F3 quits without saving changes. However, the most

important feature of ned for data files that youve uploaded is the replace feature (F8).

With this, you can change 1s to 0s and so on. So, lets change the letter e to i.

Move to the top left of the document. Hit F8, then type the letter e [DONT hitenter now]. Hit F8 again, and type i. Hit F8 one more time. Youll be prompted to

make a choice about the first e. If you hit the Y key, it will change it, N will make

the computer jump to the next occurrence. A ! will cause ned to make all possiblechanges. Youll find this handy for changing numbers prior to doing analyses. For

example, changing 0s to 1s and 1s to 2s.


11/77

11

SECTION II

PRACTICALS


12/77

12

WEEK 1: Thursday October 3th

Introduction to SPSS

SPSS is the primary package for running any statistical procedures outside of the

MDS packages. In addition to providing outputs for various analyses, SPSS allowsthe user to manipulate the data in a variety of ways and to produce various graphs and

figures that can be added into documents.

In this practical, you will be asked to open and search through a data matrix, and enter

and code data. The procedure for the exercises in this practical involves going

through the steps for each analysis using the data file family.sav.

Where is Family.sav?

The first thing you must do is copy family.sav from the N: drive on your computer to

the M: drive (which is your own personal account). To do this you must create afolder on your M: drive into which the family.sav file will go. You should be

looking at a screen with a number of icons on it. In the top left-hand corner is an icon

called my computer. Double-click on this icon.

Find the M: drive and double-click on it. You should now see a window containing a

number of folders. Go to FILE, then NEW and choose FOLDER. A new folder

should appear in the bottom of the window labelled New Folder. Call your new

folder Survey and ENTER. After you have done this, go to FILE and then

CLOSE.

Now, within the same window double-click on your N:drive. Within that drive youwill see a folder with title SPSSEGS (standing for SPSS example files). Double-click

on this folder. Within this folder there is a file labelled family.sav. This is the file

you want to copy into your Survey folder on your M: drive. So, single click on

family.sav and go to EDIT and then COPY.

Go back to your M: drive by shutting down the N: drive. (click on the X in the right

hand corner of your N: drive window). Double-click on your M: drive and double-

click on the folder Survey. Survey should be empty. Go to EDIT and then PASTE.

Now you should see the file family.sav.

Exploring the Data Editor Window

Start SPSS for windows by double-clicking on the SPSS icon. Once the program has

been opened a window will appear in the middle of the screen with a number of

options to choose from. You want to select OPEN AN EXISTING DATA

SOURCE.

Go to the directory Survey in your M: drive. Find the file family.sav and double-

click on it. The values from the family.sav file should now appear in the Data Editor

window. Click on the middle button in the top right hand corner of the window to

maximise the size of the window. Once the file is open you will see two sheets at thebottom of the window. One is labelled DATA VIEW and the other is labelled


13/77

13

VARIABLE VIEW. You want to stay on the data view sheet. Click on the VALUE

LABELS (in bold rectangle below) button on your tool bar (it is 2nd

from the right).

This will toggle between value labels (numeric and string (words)). Scroll through

the data to answer the following questions:

1. What is the name of the last variable in the data matrix?

2. What is the case number of the last case?

3. What is the value of IDNUM for the last case?

4. What is Roberts date of birth?

5. What is Jacks marital status?

If you click on a cell when value labels are displayed in the DATA VIEW

WINDOW a scroll bar will appear to provide an indication of the options (variable

labels) used in the coding framework. Using this feature, please answer the following

questions:

What are the labels for CAR?

What are they for MORTGATE?

What are they for NAME? Is there a problem with NAME? What is it?


14/77

14

The variable view sheet

In order to view how a variable has been defined in terms of its name, variable label,

value labels and user-missing values you have to click on the sheet VARIABLEVIEW.

Please answer the following questions. Do not forget to use the scroll bars on the

bottom and on the right side of the variable view window to find your answers.

What is the variable label for DATEBLT?

What are the values and value labels for MARSTAT? (hint: click on the grey box)

What is the user-missing value for NCARS?

Click on this Sheet


15/77

15

Coding and Entering Data

Open up a new Data Editor window by going to FILE, then NEW and save DATA to

M: drive. Below is a questionnaire regarding leisure activity and a coding scheme.

Your task is to set up the Data Editor Window and then enter the data below.

Leisure Activity Questionnaire

1. What is your first name?2. What is your sex? M = male, F = female3. What is your marital status?

1 = married 4 = widowed

2 = cohabiting 5 = divorced

3 = single 6 = separated

4. Do you watch sports? 1 = yes 2 = no 3 = do not know5. Do you play sports? 1 = yes 2 = no 3 = do not know6. Do you visit the seaside? 1 = yes 2 = no 3 = do not know7. Do you go to films? 1 = yes 2 = no 3 = do not know8. Do you go pop concerts? 1 = yes 2 = no 3 = do not know

Coding Framework

Variable Name Format Variable Label Coding Details/Labels

IDNUM NUMERIC IDENTIY NUMBER Unique Number for Each Person

NAME STRING FIRST NAME Enter First Characters of Name

SEX STRING SEX M = male F = FemaleAGE NUMERIC AGE IN YEARS Enter age in years (-9 = Missing)

MARSTAT NUMERIC MARITAL STATUS 1=married 4=widowed

2=cohabiting 5=divorced

3 = single 6 = separated

WATCHSP NUMERIC WATCHES SPORTS 1 = yes 2 = no 3 = do not know

PLAYSP NUMERIC PLAYS SPORTS 1 = yes 2 = no 3 = do not know

VISITSEA NUMERIC VISITS SEASIDE 1 = yes 2 = no 3 = do not know

GOTOFILM NUMERIC GOES TO FILMS 1 = yes 2 = no 3 = do not know

GOTOPOP NUMERIC GOES TO POP CONCERTS 1 = yes 2 = no 3 = do not know

Data

IDNUM NAME SEX AGE MARSTAT WATCHSP PLAYSP VISITSEA GOTOFILM GOTOPOP101 MARGARET F 87 4

201 JACK M 62 1 1 2 1 2 2

202 JOSIE F 1 2 2 1 2 2

301 NANCY F 60 5 1 2 1 2 2

503 VICTORIA F 11 -9 2 1 1 1 3

1002 JOHN M 31 2 1 3 1 1 1

You should have a clean window in front of you (i.e., there should not be any data in

the spreadsheet). You now have to set up each column of your data matrix so that you

can eventually enter in your data. The first column will hold IDNUM. To enter

IDNUM into the data view sheet you need to go to the VARIABLE VIEW window.


16/77

16

In fact, defining and labelling all of your variables must be done in your variable view

sheet.

In the first Row (horizontal) you can label and define your first variable IDNUM.

Using the coding framework above enter in the appropriate information. Type in the

variable IDNUM under NAME. The TYPE of variable is NUMERIC (you areentering a number)and under DECIMALS, using the scroll bar, choose 0 decimal

places. Under the heading LABELS you want to type in the definition of the

variable. Make sure this definition clearly defines the variable to avoid confusion.

Depending upon the type of data (i.e., nominal, ordinal, ratio, or interval) you are

measuring you may have to add VALUES. In the case of IDNUM (identify number)

there is only one unique number, therefore you do not have to define the variable. So,

under VALUES, you should have chosen none. However in defining nominal data

such as SEX (your third variable to enter) you would have to define male as M and

female as F.

For IDNUM there are no missing values therefore you choose none. The heading

COLUMNS will give you the opportunity to define the width of your column.

Choose a width of 6. The ALIGN value allows you to determine the positioning of

your data in the cell. It may be right, left or centred. In the last column heading is

MEASURE. This column allows you to define the type of data you are working

with. With IDNUM you are working with scale data.

When you define variables such as NAME (i.e., the name of the subject), you want

the TYPE of variable to be STRING, the WIDTH should be 10 (refers to the number

of characters to appear in the name). Using the coding framework below define the

variable NAME.

When you define variables such as sex (nominal data) you want to add value labels in

the column called VALUES. If you click on the cell a value labels window will

appear. Across from value you should type your value M and across from the value

label type male and then click on add. Then you should enter F in the value box and

female in the value label box. Once you have made these changes you can move back

to the DATA VIEW window and view the changes.

Return to the VARIABLE VIEW window and define the numeric variable AGE in

the next row. It has no decimal places, and it requires a missing value of9 to identifycases where a response is not given. To assign a user-missing value of9 click on the

MISSING column. A missing values window will appear. Click on Discrete missing

values and enter 9 in the first box. Set up a variable label and a value for 9 as

shown in the coding scheme for your questionnaire. Now, do the same for the

numeric value MARSTAT in the next row. This too is numeric with no decimal

places, has a user-missing value of9 and requires a variable label and several value

labels as shown in the coding scheme.

The remaining 5 variables also need to be defined. To avoid defining each variable

separately you should define the first variable WATCHSP and then copy the cells to

the remaining four below. To do this go to the cell you want to repeat (i.e., the value


17/77

17

labels) and click on EDIT, COPY and then move to the cell where you want the same

definition and then go to EDIT and PASTE.

When you have finished entering all of the data save it into an SPSS file by selecting

FILE, SAVE and clicking on the folder Survey in your M: drive. Save the file under

any name you want (e.g., Person.sav). Exit from SPSS and log off.


18/77

18

WEEK 2: October 10th

Descriptive Statistics, Charts & Manipulating Data in the

Matrix

This practical is divided into two sections. The first section is intended to familiariseyou on how to run commands to calculate descriptive statistics and to graph your data.

The second section aims to show you how to compute re-code, filter and delete your

data.

Section I: Descriptive Statistics & Charts

We shall estimate descriptive statistics for the three variables: TYPACCM,

DATEBLT, & NADULTS.

Question: Are these variables nominal (non-ordered categories), ordinal (with ordered

categories) or metrical (on a measure scale with well-defined differences between

values)? Hint: The second variable is not so obvious.

To run the descriptive statistics click on ANALYZE, DESCRIPTIVE STATISTICS

and then FREQUENCIES. In the left box there should be a list of all the variables

that are present in the spreadsheet. Highlight TYPACCMand click the arrow between

the boxes to move it into the box labelled variables. Continue this for the other two

variables. A shorter route to move the variables to the variables box would be to

double-click on the variables when they are in the left box - removing the variables

may be accomplished in the same manner.

After the three variables are in the variables box, click on STATISTICS at the

bottom of the box. Within the Frequencies: Statistics box there are several options.

Tick the boxes for MEAN, MEDIAN & MODE on the right hand side. In addition,

tick the boxes for STANDARD DEVIATION (Std. Deviations) & RANGE. After,

click on the continue button and wait for the data to process and for the output

window to appear.

Answer the follow questions:

What is the most useful measure of central tendency for each of the three variables?

What are the sample values?

What is the maximum value for NADULTS? Does this appear to be correct?

Now, try re-estimating the descriptive statistics for NADULTS, only this time without

the case with the unusual value. Select DATA and then SELECT CASES. Withinthe Select Cases make sure under the Unselected Cases that the Filtered box is

ticked. Then select the IF CONDITION IS SATISFIED optionand click on the IF

button. Move the variable NADULTS to the adjacent box by either double-clicking

on itor by clicking on the variable and moving it across using the arrow.

After the variable label use the calculator provided to type less than (


19/77

19

the unusual variable. After this hit continue and then OK to return to the spreadsheet.

Answer the follow questions:

Has the case with the unusual value been barred off?

Which case is it?

Now, re-run the Frequencies command for NADULTS only and record the mean,

median & mode with and without the case included.

Which descriptive statistic is most affected by the unusual variable?

Graphing your Results

Histograms

Histograms are statistical diagrams that show the distribution of variables. In a

histogram, values are grouped together in intervals and a bar is drawn for each

interval whose area is proportional to the number of cases in the interval.

To generate a histogram select GRAPHS and HISTOGRAM

Then move the variable HEIGHT into the variable box. In the same box, click the

display normal curve box and then hit OK.

Upon examining the output window that contains the graph answer the following

question:

Do you think HEIGHT has a normal distribution, or would you run other tests?

Go back to the data editor window, select GRAPHS and HISTOGRAM and run the

same command as done using the HEIGHT variable but with WEIGHT.

From the histogram, would you say that the variable WEIGHT has a normal

distribution or would you try other tests?

Are there any differences between the two histograms?

Scatter plots

Scatter plots show the joint behaviour of two (or more) variables in a diagram.

Values of one of the variables are plotted against values of another, the two variables

usually being metrical. A scatter plot usually shows much more about the behaviour

of the variables than descriptive statistics like correlation.

Scatter plots are also drawn using the GRAPHS command. Click on GRAPHS then

SCATTERPLOT then on the SIMPLE option and then click on the DEFINEbutton. Select WEIGHT for the Y-axis and HEIGHT for the X-axis. In a scatter plot,


20/77

20

if one of the variables is thought to depend on the other, it is plotted on the vertical Y-

axis. Here, we think that weight depends on height, therefore, weight is plotted on the

Y- axis.

In addition, select SEX for select markers by. This will allow you to identify pointson the scatter plot by sex, as males and females tend to have different heights and

weights. Run the command and look at the scatter plot in the chart carousel window.

Can you see any difference between the males and the females in terms of heights and

weights?

To edit the chart simply double-click on it. Now we shall try fitting simple linear

regression lines to the data. Select CHART then OPTIONS and FIT LINES (Select

Subgroups) and FIT OPTIONS. Make sure linear regression has been highlighted

and then click-on continue. There should be two different lines for males and

females.

What can you say about the slopes of the two regression lines?

Can you see any difference now between the males and the females in terms of

heights and weights?

The markers used to distinguish males and females are drawn in different colours, but

the difference is not very clear. It will become less clear if you print out the scatter

plot on a monochrome printer! Click on any marker in the plot: all markers of that sex

become highlighted in black squares. Then click on the icon depicting acrayon/pencil to change the colour of the marker/symbol. To change the symbol

simply click on FORMAT and then MARKER. There you should have several

options of changing the type and size of the symbol. After making the chosen changes

hit Apply and Close.

Editing a High Resolution Chart

Generate a high-resolution chart, a histogram, to try out some of the editing features.

Histograms are used for metric or quantitative variables, like AGE, which takes on

values along a scale. There are generally too many distinct values to make it worthdrawing a bar chart. Instead, the values are grouped into intervals or bands and a bar

is drawn for each interval. The area of each bar is proportional to the number of cases

with values in the interval.

Still using family.sav select GRAPHS and then HISTOGRAM. Select HWRATIO

for the variable box and click OK. A histogram for HWRATIO is added to the Chart

Carousel Window. The histogram shows some descriptive statistics for the variable

too.

What are the sample mean and standard deviation for HWRATIO?


21/77

21

Double-click on the chart to move the histogram from the Chart Carousel Window to

a Chart Window. The menu bar and tool bar change to show editing facilities.

First, click on CHART then OPTIONS and NORMAL CURVE - then hit OK. The

normal curve superimposed over the histogram is the one for the above mean and

standard deviation. Admittedly, its difficult to make a decision with such a smallsample, but does the curve appear to be a good fit to the histogram?

Now, click on the icon swap axes. Does the histogram look better with vertical bars

or horizontal bars?

Now try some of the other icons and tools to change the chart. These changes require

the appropriate part of the chart to have been selected. Click on any bar. The bars will

become highlighted with small black squares at their corners. Then click on the Fill

Pattern - tool button (the rectangle with diagonal shading). To apply a pattern, click

on it and then click on apply. Once you have finished with the patterns, click on close.

Also, try the Colour Palette tool button (the one with the pen) and the Bar Labels icontool button (the one with the fingernails).

You can also change the style of the line showing the Normal curve, and the fill

pattern and colour of the background of the histogram. Once you have finished with

your work, select FILE and then SAVE CHART. Save your histogram as

artwork.chz

To copy or move a chart into Word click on EDIT and then select COPY the chart.

To move to Word minimise SPSS and open word. If Word is already open then press

ALT & TAB to move between programs. Once in Word, go to EDIT PASTE.

Finally, exit from SPSS for windows by selecting FILE EXIT

Section II: Manipulating the Data in the Matrix

(Computing, Recoding, Filtering and Deleting Data)

Computing Values

Start off SPSS and open the file family.sav (you should find this file on your M:

drive in the folder that you named survey). We shall use the COMPUTE command

to build up a new variable that will be labelled BMI, which stands for body mass

index. This is calculated as:

Body mass index = weight (pounds)/ height (inches)2

Select TRANSFORM and then COMPUTE and set the Target Variable to bmi.

Click on Type & Label and enter the label body mass index in the label box. Click

continue to return to the Computer Variable dialog box. Using the source list on the

left and the calculator pad in the centre, build up


22/77

22

Weight * 0.4536 / (height * 0.0254) **2

in the numeric expression box. Run the completed command. The new variable is

added to the end of the data. We shall check the new variable by estimating a few

descriptive statistics using FREQUENCIES (via Analyze Descriptive Statistics).

(Analyze Descriptive Statistics Explore would be a better command, butFrequencies will do here).

Select ANALYZE, DESCRIPTIVE STATISTICS and then FREQUENCIES.

Move body mass index (bmi) to the Variable(s) box. Since bmi is a metric variable

with a potentially different value for every case in the data suppress frequency tables

by clearing the check box. Click on DISPLAY FREQUENCY TABLES. Now you

will get a message saying You have turned off all output. Unless you request

Display Frequency Tables, Statistics or Charts, Frequencies will generate no output.

No worries, we will estimate descriptive statistics by clicking on STATISTICS and

clicking on the check boxes for the following: MEAN, MEDIAN, MINIMUM and

MAXIMUM. Run the command and look at the output.

What are the sample values of the mean, median, minimum and maximum?

(The mean should be around 25.0. Any values outside the range15.0 to 35.0 should

be queried).

Do the sample statistics satisfy these rough checks? If not, something is wrong!

Conditionally Computing Values

Now we shall use the IF sub-command (via Transform-Compute) to set up a new

variable. The sub-command allows you to set up a new variable under the condition

that the original variable, which it is based on, fulfils certain criteria. We want to set

up a new variable AGEHOH for the age of the head of the household. In other

words, If a person in the sample is head of the household, AGEHOH shall indicate

that persons age.

Select TRANSFORM and then COMPUTE and clear the previous settings by

clicking on RESET. Set the Target Variable to AGEHOH and click on TYPE &

LABEL to assign the label age head of household. Click on Continue, and then setthe Numeric Expression to AGE. We want this (i.e., the current age in years) to be

applied when the case is head of household, which occurs when RELTOHOH is zero.

(For the variable RELTOHOH relationship to head of household the value 0

denotes that a person is head of household). Select IF and INCLUDE IF CASE

SATISFIES CONDITION. Set up the condition RELTOHOH = 0 in the large box

and run the command. The variable AGEHOH should now be added to the end of the

data. Have a look at the new variable. You should see ages set for some cases only.

Lets check AGEHOH by moving it in the data matrix to the column after

RELTOHOH so that we can see what happened more clearly.

First we must make a space in the data matrix by inserting a new variable. FindRELTOHOH by either scrolling through the DATA EDITOR window or by


23/77

23

selecting UTILITIES and VARIABLES. selecting RELTOHOH from the source

lists and then clicking on GO TO and CLOSE. Now click on any cell of the variable

that is immediately to the right of RELTOHOH (this variable should be sex). Then

select DATA and then INSERT VARIABLE. Alternatively, you can click on

INSERT VARIABLE tool (which is the sixth button from the right).

Now, a blank column headed var00001 containing system-missing values (dots) is

inserted before the selected variable. Move the AGEHOH to this column by single-

clicking on AGEHOH to highlight the column and then selecting EDIT and CUT.

To paste it in the desired location single-click on the head of the blank column

(var00001) and select EDIT and then PASTE.

Look at the values in the DATA EDITOR window.

Do all heads of household have AGEHOH set? If not, what might be the reason?(Hint: Look at the variable that agehoh is derived from!).

What value is set for cases who are not heads of household?

Re-coding Values

The RECODE command in SPSS is very powerful and efficient but it can be a little

tricky to set up due to the number of clicks required. We shall recode BMI into a new

variable BMIGRP, which takes the values

Value Range Interpretation

1 bmi < 25.0 Okay

2 25.0 bmi < 30.0 Overweight

3 bmi 30.0 Obese

Select TRANSFORM and then RECODE and INTO DIFFERENT VARIABLES.

Select BMI from the source list into the central INPUT VARIABLE OUTPUT

VARIABLE box. Enter BMIGRP into the Name box and click on Change to

complete the INPUT VARIABLE OUTPUT VARIABLE box. Also enter a

suitable variable label for BMIGRP in the LABEL box (e.g., categorical body mass

index).

To set up the recoding, click on OLD and NEW VALUES.We build up the recode

specification for the third category of BMIGRP first. In the OLD VALUE box, select

RANGE and THROUGH HIGHEST and enter 30.0 in the box before THROUGH

HIGHEST. In the NEW VALUE section, enter 3 into the VALUE box. Then click

on ADD to copy the specification 30.0 THROUGH HIGHEST = 3 to the OLD

NEW box. Build up the other two specifications, in order of 25.0 through 30.0 = 2

and LOWEST THROUGH 25.0 = 1. Now run the completed command.


24/77

24

To finish, double-click on BMIGRP in the Data Editor window, and define suitable

value labels (i.e., 1= okay, 2 = overweight, 3 = obese).

Are the values of BMIGRP correct for the first ten cases?

Filtering Cases

In this example, we shall filter cases. The filtering option allows you to exclude

certain cases from further analysis temporarily.

Before filtering, generate a two-way frequency table for ownrent by typaccm by

selecting ANALYZE, then DESCRPTIVE STATISTICS and then CROSSTABS

and selecting ownrent for Row(s) and typaccm for column(s). Run the command and

look at the table in the output.

1. What exactly does the frequency count in the first cell of the second table refer to?6 what?

We shall filter using the variable PERSNO, which is the number of persons in the

household.

2. What will be the effect of selecting cases satisfying the condition persno=1? Whatis the impact on households?

Now, select DATA and SELECT CASES and then IF CONDITION IS

SATISFIED and make sure that UNSELECTED CASES are FILTERED (This isvery important as the alternative is DELETED, which we want to avoid now!)

Select IF.. and build up the condition persno = 1 in the large box. Run the

completed command. Find persno in the data editor window.

3. What appears in the status bar when filtering is in effect? (The status bar is at thebottom of the window)

4. What has happened to case numbers with persno 1?Rerun the CROSSTABS command (via AnalyseDescriptive statistics) and look at

the new table in the output.

5. What exactly does the frequency count in the first cell refer to now? 3 What?

Go to the Data Editor Window and save the filtered data as familyf.sav. Then select

DATA, SELECT CASES and then ALL CASES. Run the command.

6. What happens to the status bar and the case numbers?


25/77

25

Deleting Cases

Instead of filtering cases we shall delete unselected cases without doing any harm to

data stored in disk system files. Select DATA, SELECT CASES, IF CONDITION

IS SATISFIED which picks up the previous condition on persno = 1. Then select

UNSELECTED CASES are DELETED. Run the command and have a look at theData Editor Window.

1. How many cases are left?

2. What are the values of PERSNO?

3. What are the values of HSEMO? What does that successfully show?

Now, rerun the CROSSTABS command in the previous section and look at the

output.

4. Do the results agree with those obtained when cases are filtered?

Return to the Data Editor Window and save the selected cases to a NEW system file

named familyd.sav (after deleting cases you should do this as soon as possible to

avoid overwriting your complete data file by accident).

Finally, re-open familyf.sav, the filtered file you saved from the previous section

5. Is filtering still on?

Exit from SPSS, saving the contents of the output window into output3.spo

Open up family.sav that you saved to your survey folder.


26/77

26


T-Tests

Section I: Parametric T-tests (related& unrelated)

This practical will show you how to run a t-testso that you can look at the difference

between means oftwo scores.

Experimental designs can be of two basic types within subject (dependent or

related) and between subject (independent or unrelated). The former is when all

subjects are subjected to all conditions (e.g., testing reaction times before and after

receiving a drug). Between subject designs are when you divide subjects into

independent groups, such as on the basis of gender, or into one group that receives a

drug, and a second that receives a placebo.

DEPENDENT OR RELATED SAMPLES T-TEST

First, a quick review of the test layouts.

1. Related Samples - two variables, one for each condition of the experiment. Eachsubject has two scores, as a result:

Variable 1 (First set of scores for

the subjects, e.g. reaction time

before taking the drug)

Variable 2 (Second set of scores

for the subjects, e.g. reaction time

after taking the drug)Sub. No.

1 10 30

2 11 31

3 12 32

4 10 30

5 9 29.

2. Independent or Unrelated Samples - two variables, the first tells SPSS whatcondition EACH subject belongs to, the second is the actual score for that subject:

Variable 1 (what condition each

subject belongs to, e.g. group 1 are

the controls, group 2 receive the

drug)

Variable 2 (actual score, e.g. each

subjects reaction time)

Sub. No.

1 (control) subjects condition (1) subject 1 score

2 (control) 1 subject 2 score

3 (experimental) 2 etc.

4 (experimental) 2 etc.


27/77

27

T-Test for Related Sample

This is the parametric comparison of two related groups, for example, when you want

to compare mean scores for subjects at some task before and after taking a drug. Each

set of subject scores for the related t-test must be entered as an individual variable inSPSS. So, in the above example, all the individual(s) scores for the task before taking

the drug would be in one column and all the scores after taking the drug in another.

First, open family.sav. The next step is to add a variable to the data file, so that we can

run the related t-test. In this case, the comparison will be between the subjects

height/weight ratio before they were put on a 4-week diet/exercise plan and after. The

variable already in the data set HWRATIO is the measure before. At the end of the

data file, add the variable HWRATIO2 to represent their measurements after the plan.

Using what you learned in the first lesson about entering data, create the new variable

using the information below:

Variable Name: HWRATIO2

Variable Label: Height/Weight Ratio after plan

Data: see table 1 below

To run the procedure, go ANALYZE, COMPARE MEANS and then PAIRED-

SAMPLES T-TEST

The usual dialogue box appears. The dialogue box has the two-column format. The

only difference is that you must select pairs of variables and move them across, rather

than just one variable at a time. To do this, you have to click on one variable, then

locate the other variable and click on it. The two variables that you have requested

should appear in the current selection box. After clicking on both, you then press the

arrow button to move the pair across. SPSS will analyse each pair to determine if their

means are significantly different statistically. In this case, select the variables

HWRATIO and HWRATIO2 and move them across, then press the OK button.

Table 1: Data for Height/Weight Ratio after a 4-week diet/exercise plan

Subject Number HWRATIO2 score

1 .44

2 .523 .46

4 .

5 .44

6 .42

7 .33

8 .74

9 .80

10 .32

11 .60

12 .6513 .40


28/77

28

14 .50

15 .57

16 .41

17 .60

18 .55

19 .49

20 .60

OUTPUT

The results appear in three sections

The first section gives you a table called Paired Samples Statistics with the meanscores, standard deviations and standard error mean for the two variables.

The second section is a table called Paired Samples Correlation(s) showing thecorrelation between the two variables and the level of significance The third section is more important. The table called Paired Samples Test

indicates the significance of the results. This includes the t-value, degrees of

freedom (d.f.) and the two-tailed significance level.

What is the t-value for the comparison between the height to weight ratio scores?

Is there a significant difference between the scores before and after the diet/exercise

plan? If so, which is the greater height/weight ratio?

T-Test for Independent Samples

This is the parametric t-test for two independent samples - a between-subjects design

where, for example, subjects are randomly assigned to two separate test conditions

(e.g. drug and control) and the mean scores (e.g. reaction time) are compared to

determine if they are significantly different from each other.

In this case, you want to test whether there is a statistical difference in weight to

height ratios between the male and female subjects. The format for variables to be

used in the independent t-test is different from that used in the related. Instead of thescores being placed in two separate columns (variables), all of the scores are placed in

a single column (variable). A second variable identifies for SPSS which of the two

groups each score belongs to. So, in this case, there is the variable HWRATIO2 as the

dependent variable and NSEX as the independent variable.

To run the analysis, go to ANALYZE, COMPARE MEANS and then

INDEPENDENT-SAMPLES T-TEST. As usual, the left column lists all the

variables in your data file. On the right, there are two boxes:

The test variable(s) box is where you move the dependent variable(s). (e.g.,

HWRATIO2)


29/77

29

The grouping variable box is where you move the variable that distinguishesbetween the two independent groups (e.g. the variable NSEX)

First, select the dependent variable HWRATION2 and move it over to the test

variable(s) section. Next move NSEX over into the grouping Variable section and

press the DEFINE GROUPS button. Values from the grouping variable must beentered into the two boxes. In the case of the variable sex, where only two levels are

recorded, you would just enter 1" in the top box for male subjects, and 2" in the

lower one for female subjects. Hit the CONTINUE button, then hit the OK button.

[Note: There may be times where you have a larger range of values, such as five

different education levels, but only want to look at the difference between two of

them. You would enter the two values you wish to compare.]

OUTPUT

There are two sections:

The first section of the output gives you a table called Group Statistics whichindicates the number of cases and the mean scores etc. for each condition.

The second section provides a table called Independent Samples T-test andstarts with Levenes Test for Equality of Variance. If the variance is unequal and

is indicated by significant difference, then when you look at the results of the t-

test in the final table, you use the line starting with Equal variances not assumed.

If it isnt significant, you look at the line starting with Equal variances assumed.

The final table gives you t-values, degrees of freedom and the two-tailed

significance levels.

In this case, Levenes is not significant (0.137), so we look at the equal variance line.

In this case, it is not significant (two-tailed significance of .478), so we reject the

hypothesis that there is a difference between males and females in their height to

weight ratios.

Section II: Non-Parametric T-tests (Wilcoxon - related & Mann-

Whitney - unrelated)

All of the tests today can be found under ANALYZE, NONPARAMETRIC TESTS

Mann-Whitney - Unrelated

This is the non-parametric t-test for two independent samples - a between-subjects

design. To run the analysis, choose: ANALYZE, NONPARAMETRIC TESTS, and

2 INDEPENDENT SAMPLES

As usual, the left column lists all the variables in your data file. On the right, there are

two boxes:


30/77

30

the test variable(s) box is where you move the dependent variable(s) the grouping variable box is where you move the variable that distinguishes

between the two independent groups (e.g. the variable sex)

So, move HWRATIO2 into the test variable box, and move NSEX into the groupingvariable box. Now, click the Define Groups button. Values from the grouping variable

must be entered into the two boxes. In the case of the variable NSEX, you enter 1" in

the top box for male subjects, and 2" in the lower one for female subjects. Hit the

Continue button, then hit the Ok button.

OUTPUT

SPSS divides the entire set of subjects into three groups:

those with a score of 1 (male)

those with a score of 2 (female) cases with missing data, which are excluded from the analysis)

The first section gives the mean ranks for the two conditions that are included, as well

as the sums of the ranks and the numbers of cases

The second section gives the Z score and p-values for the T-test.

Is there a difference between males and females? How do the results from this week

compare to last weeks?

Wilcoxon - Related

This is the non-parametric repeated measures T-test, in a within subjects design. Like

the parametric equivalent, well be running a comparison of height to weight ratios for

the sample population before and after a four-week exercise/diet program. To run the

analysis, choose: ANALYZE, NONPARAMETRIC TESTS, and 2 RELATED

SAMPLES

The dialogue box has the two-column format. The only difference is that you mustselect pairs of variables and move them across. SPSS will analyse each pair to

determine if their mean ranks are significantly different statistically. For this analysis,

select the two variables HWRATIO and HWRATIO2, then click the Ok button.

OUTPUT

The output for this procedure is quite different from the parametric test. The first

section gives you information about how many rank scores for one condition are

less than (LT)

greater than (GT)equal to (EQ)


31/77

31

the ranks scores for the other condition. The mean ranks for each of these three levels

are given, as well as the sums of the ranks for each and the number of cases that fall

under each level.

The main results are underneath this table, where the Z value and the p value aregiven. The usual standard for levels of significance is used (if p is less than 0.05).

How many cases are there where HWRATIO is greater than HWRATIO2?

Is there a significant difference between ranked height/weight ratios before and after

the exercise/diet program?


32/77

32


ANOVAS

This practical will involve familiarising students with the analysis of variance

(ANOVA). The ANOVAs used in this practical are when you may want to determineif there is a significant difference between three or more groups when you have only a

single variable.

One-way ANOVA for Independent Samples

In this case, we want to determine if there is a significant difference in the height to

weight ratio between the three age groups in the sample in family.sav - children,

adults and elderly. We also want to carry out a Tukeys post-hoc test to identify where

those difference lie, if any. The procedure is remarkably similar to carrying out an

unrelated samples t-test. Go: ANALYZE, COMPARE MEANS, ONE-WAYANOVA

As you can see, the layout of the dialogue box is basically the same as the one for

unrelated t-tests from last week. First select your Dependent variable(s) - in this case

move the variable HWRATIO into the dependent list section. Your factor

(independent variable) is the variable AGEGRP. Press the Continue button.

Before running the analysis, press the Post-hoc button and turn on the Tukeys test.

Now press the Continue and Ok buttons and the analysis will be carried out.

OUTPUT

There are two sections to the results for the one-way ANOVA.

1. The first section indicates whether any significant differences exist between thedifferent levels of the independent variable. The between groups, within groups,

sums of squares are listed, degrees of freedom, the F-ratio and the F-probability

score (significance level). It is this last part that indicates significance. If the F-

prob. is less than 0.05 than a significant difference exist. In this case, the F-prob.

is 0.000, so we can say that there is a statistically significant difference in height

to weight ratios between the three age groups.

2. The post-hoc test identifies where exactly those difference lie. The final part of thesecond section is a small table with the levels of the independent variable listed

down the side. Looking at the comparisons between these levels we see that

children have a significantly higher mean height to weight ratio than adults and

the elderly (this is also indicated by the asterixes).

For the meantime, ignore the third table of the output.


33/77

33

One-way ANOVA for Related Samples

The procedure for running this is very different from anything youve done before.

The first step is easy enough - you need to add a third height to weight ratio variable,

representing the ratios for the subjects some time after they stopped doing theexercise/diet plan. The data is below:

Variable Name: HWRATIO3

Variable Label: Height/Weight Ratio post-plan

Data: see table below

Subject Number HWRATIO3 score

1 .42

2 .56

3 .42

4 .5 .41

6 .40

7 .30

8 .78

9 .71

10 .30

11 .55

12 .64

13 .40

14 .49

15 .5516 .39

17 .52

18 .54

19 .49

20 .60

The first step is to run a single factor ANOVA by going: ANALYZE, GENERAL

LINEAR MODEL, REPEATED MEASURES

The dialogue box is different from the usual format. The first step is to give a name tothe factor being analysed, basically the thing the three variables have in common. All

three variables cover height to weight ratios, so

in the With-in Subject Factor Name: box type RATIO. in the Number of Levels box, type 3 (representing the three variables) press the Add button, then the Define button

The next dialogue box is a bit more familiar. In the right-hand column, there are three

question marks with a number beside each. Select each of the three variables to be

included in the analysis, and move them across with the arrow button. Notice how

each of the variables replaces one of the question marks, indicating to SPSS which


34/77

34

three variables represent the three levels of the factor RATIO. Then proceed by

clicking on OK.

OUTPUT

Firstly, you can ignore the sections of the output titled Multivariate Tests andMauchlys Test of Sphericity.

You need to examine the section titled Tests of Within-Subjects Effects. This

section indicates whether any significant differences exist between the different levels

of the within subjects variable. The degrees of freedom and sums of squares are listed,

as well as the F-score and its significance level. If the significance level is less than

0.05 than a significant difference exist. In this case, it is 0.001 (look at the measure for

sphericity assumed), so we can say that there is a statistically significant difference in

height to weight ratios between the three times when measurement were taken.

You can ignore the section titled Tests of Between-Subjects Effects. It is irrelevanthere.

To do a post-hoc test to identify where the differences lie, the SPSS for Windows

made easy manual recommends doing Paired-Sample T-tests. In this case

HWRATIO & HWRATIO2

HWRATIO & HWRATIO3

HWRATIO2 & HWRATIO3

From these three T-tests, you can determine which of the height to weight ratios are

significantly different from each other.

Kruskall-Wallis ANOVA (KWANOVAUnrelated)

This is similar to the non parametric independent ANOVA, where ranks are used

instead of the actual scores. We will run the analysis on the same variables, so go

ANALYZE, NONPARAMETRIC TESTS, and K INDEPENDENT SAMPLES

As with the parametric test, move HWRATIO over to the test (dependent variable listand AGEGRP over to the Grouping (independent) variable list, and define the group

with a minimum of 1 and a maximum of 3. Click the Ok button. Notice that the non

parametric ANOVA doesnt have a post-hoc test. If you run this ANOVA, youll

have to consult a statistics book as to how to do a post hoc on the results. One way

would be to run a series of t-tests on all the combinations of the conditions.

OUTPUT

The first section gives you the mean ranks and the number of cases for each level of

the independent variable. The second section lists the Chi-Square value, degrees of

freedom and significance of the test.


35/77

35

Is there a significant difference between the three groups (remember you cant say

exactly what that difference is without a post hoc test)?

Friedmans - Related ANOVAs

This is similar to the nonparametric related samples ANOVA, where ranks are used

instead of the actual scores. We will run the analysis on the same variables, so go

ANALYZE, NONPARAMETRIC TESTS, and K RELATED SAMPLES

This is much easier to run - just move the three variables (HWRATIO, HWRATIO2

and HWRATIO3) over to the right column and clickOK.

OUTPUT

There is the Chi-square score, the d.f. and whether its significant (as usual, has to be

less than 0.05). Again, for post-hoc tests, youll probably have to consult a statistics

book or possibly run three non-parametric related samples T-tests.


36/77

36

WEEK 5: 30th

October

Study Week

WEEK 6: November 6th

No Practical


37/77

37

WEEK 7: November 14th

QUALITATIVE RESEARCH: STUDENT SEMINAR

PRESENTATIONPREPARATIONStudents should use this time to prepare work for their presentations. Dr. Alison will be

available in his office for guidance if necessary.

WEEK 8: November 21st

QUALITATIVE RESEARCH: STUDENT SEMINAR

WEEK 9: November 28thINTERVIEWING AND DISCOURSE ANALYSIS

conductig interviews etc

This period should be used to conduct interviews in preparation for the session on

content analysis. Students are expected to conduct interviews or sessions that result in

naturally occurring language. It is important that this material is transcribed in

preparation for week 11s session. Dr. Alison will be available for consultation.

WEEK 10: December 5th

WORKING WITH NATURALLY OCCURING

LANGUAGE

PREPARATION

Students will use this period to work with their material gathered in the previous

sessions. They should use this time to prepare for presentations in the final practical

session (12th

December).

WEEK 11: December 12th

WORKING WITH NATURALLY OCCURING

LANGUAGE: STUDENT SEMINAR

Students are expected to organise their own seminar presentations in this session on

the results and methods employed regarding the content analysis of their material.


38/77

38

SECTION III

EXTRA MATERIAL


39/77

39

For the benefit of students who wish to follow up other procedures in their own time,

we have included the following section which gives you some opportunity to play

with graphics packages and explore some issues associated with regression in

preparation for next term. Try not to worry if this all sounds unfamiliar at first. This

section is simply to give you a running start when it comes to your work after

Christmas.

REGRESSION

Simple Regression

In simple regression, the values of one variable (the dependent variable (y in this

case)) are estimated from those of another (the independent variable (x in this case))

by a linear (straight line) equation of the general form:

y=bo+b1(x)

where y is the estimated value of y, b1 is the slope (known as the regression

coefficient)

and bo is the intercept (known as the regression constant).

Multiple Regression

In multiple regression the values of one variable (the dependent variable (y)) are

estimated form those of two or more variables (the independent variables (x1,

x2,,xn)). This is achieved by the construction of a linear equation of the general

form:

y=bo+b1(x1)+b2(x2)++bn(xn)

where the parameters b1,b2,,bn are the partial regression coefficients and the

intercept bo is the regression constant.

Residuals

When a regression equation is used to estimate the values of a variable (y) from those

of one or more independent variables (x), the estimates (y) will not be totally

accurate (i.e., the data points will not fall precisely on the straight line). The

discrepancies between y (the actual values) and y (the estimated values) are known

as residuals and are used as a measure of accuracy of the estimates and of the extent

to which the regression model gives a good account of the data in question.


40/77

40

The multiple correlation coefficient

One measure of the efficacy of regression for the prediction of y is the Pearson

correlation between the true values of the target variable y and the estimates y

obtained by substituting the corresponding values of x into the regression equation.

The correlation between y and y is known as the multiple correlation coefficient (R(versus r which is Pearsons (the correlation between the target variable and any one

independent variable)). In simple regression R takes the absolute value of r between

the target variable and the independent variable (so if r=-0.87 than R=0.87).

Running Simple Regression

Using the family.sav file we want to look at how accurately we can estimate height to

weight ratios (HWRATIO) using the subjects age (AGE). To run a simple

regression, choose ANALYSE,REGRESSION and LINEAR.

As usual, the left column lists all the variables in your data file. There are two sections

for variables on the right. The Dependent box is where you move the dependent

variable. Move HWRATIO there. The Independent(s) box is where you move AGE.

Next click the STATISTICSbutton, and turn on the Descriptive option.

As already states, a residual is the difference between the actual value of thedependent variable and its predicted value using the regression equation. Analysis

of the residuals gives a measure of how good the prediction is and whether there

are any cases that should be considered outliers and therefore dropped from theanalysis. Click on Case-wise diagnostics to obtain a listing of any exceptionally

large residuals.

Now click on CONTINUE.

Now click on the PLOTS button. Since systematic patterns between the predictedvalues and the residuals can indicate possible violations of the assumption of

linearity you should plot the standardised residuals against the standardised

predicted values. To do this transfer *ZRESID into the Y: box and *ZPRED into

the X: box and then ClickCONTINUE.

Now clickOk.

Output

The first thing to consider is whether your data contains any outliers. There are no

outliers in this data. If there were this would be indicated in a table labelled

Casewise Diagnostics and the cases that corresponded to these outliers would have

to be removed from your data file using the filter option you learned previously.

With that out of the way, the first table (Descriptive Statistics) to look at is right at the

top. The first part gives the means and standard deviations for the two variables (e.g.


41/77

41

the mean age is 31.77). The next table contains the correlation (Pearsons) for the two

variables, just as if you had run the correlation procedure. The coefficient is -0.57, so

it is fairly high and is negative (as one goes up, the other decreases).

For the meantime, ignore the table labelled variables entered/removed.

The next important table is Model Summary. The R and R-squared values are

given for the equation (0.571, as above, and 0.325). Dont worry too much about the

other values in this table.

The next table contains the regression ANOVA. This test indicates how good the

model is - whether there is some overall relationship between the dependent and

independent variable(s). The key element is the F score. For this regression, the F

score has an associated p value of 0.017, well below the .05 cut-off. This indicates

that there is a linear relationship. It should be noted however that only an

examination of the scatter plot of the variables can confirm that the relationship

between two variables is linear.

The next table contains some really important information! The table is labelled

Coefficients and contains the regression equation. The regression coefficient and

constant are given in column B of the table. The equation therefore is:

Predicted height to weight ratio = -.00368(AGE) + .602

The t value indicates whether each independent variable has a significant individual

impact on the regression equation. In simple regression, there is only one independent

variable, and, for this one, it has a significant influence (a t score with an associated p

value of 0.0168 - notice its the same as the ANOVA score).

The next section begins with Residual Statistics. This gives means, SDs and other

information about the unstandardised and standardised predictor and residual scores in

the regression.

You could follow up the regression by doing up a scatter plot. Look at your scatter

plot. Basically, all you need to know is that if the plot shows no obvious pattern than

this confirms that the assumptions of linearity and homogeneity of variance have been

met. Where you get into trouble is if the points form a crescent or funnel shape. If

this is the case, further screening of your data is necessary.

Multiple Regression

Often, it is too simplistic to assume that a single independent variable is all that is

required to make some sort of prediction about the scores for a dependent variable.

This is where you have to run multiple regression.

For now, the regression will look at the impact of age (AGE), height to weight ratio

post-plan (HWRATIO2) and height to weight ratio long after the plan (HWRATIO3)

on the dependent variable, the subjects initial height to weight ratio (HWRATIO). To

run the analysis, choose: ANALYSE, REGRESSION and then LINEAR.


42/77

42

As before, move HWRATIO to the Dependent. The Independent(s) box is where

you move AGE, HWRATIO2 and HWRATIO3. The rest is as before:

Click the STATISTICSbutton, and turn on the Descriptive option.

Click on Case-wise diagnostics to obtain a listing of any exceptionally largeresiduals.

Now click on CONTINUE.

Now click on the PLOTS button. Since systematic patterns between the predictedvalues and the residuals can indicate possible violation of the assumption of linearity

you should plot the standardised residuals against the standardised predicted values.

To do this transfer *ZRESID into the Y: box and *ZPRED into the X: box and then

clickCONTINUE.

Now clickOK.

Note: we are only doing a general, all-inclusive multiple regression. There is a box

located directly beneath the Independent(s) box called Method which gives you a

series of additional methods for running the statistics - stepwise, remove, forward and

backward.

Output

Again, the first thing to look for is outliers. Again, there are none.

With that out of the way, the next section to look at is at the top. Everything that

follows is the same as for the simple regression. The first part gives the means and

standard deviations for the four variables (e.g. the mean HWRATIO3 is .526). The

next part gives the correlation (Pearsons) for all of the variables. You can see that

HWRATIO is strongly correlated with the two other height-to-weight ratio variables

(i.e., both over .9).

The next section is under the heading Model Summary. The R and R-squared

values are given for the equation (.98 and .967).

An ANOVA is carried out that indicates how good the model is - whether there issome overall relationship between the dependent and all of the independent variables.

The key element is the F score. The F score is significant (p=0.00), so there is a strong

overall relationship.

The next table (Coefficients) contains information that indicates the individual role of

each independent variable. The values in the column labelled B give the scores to put

into the regression equation:

y = b1(x1) + b2(x2) + b3(x3) + bo


43/77

43

For this regression, then, the regression equation is

HWRATIO = -.0009(AGE) + .99(HWRATIO2) -.135(HWRATIO3) + 0.063

Note that since the B score for HWRATIO3 is negative, the plus sign turns to a minusone.

The t-test indicates that AGE, as before, is a significant predictor, as is HWRATIO2,

but that HWRATIO3 as a single predictor has no significant influence (p>0.05).

The next section is labelled Residual Statistics. This gives means, SDs and other

information about the unstandardised and standardised predictor and residual scores in

the regression. You should have been taught what, if anything, to do with them.

Scatter plots and Regression Lines

A regression line can easily be added to a scatter plot. As before, to create a

scatterplot go to GRAPH and SCATTER.

You want to leave the graph layout as simple, so just click the DEFINE button.

Move HEIGHT into x-axis box. Move WEIGHT into Y-axis. Now, click the TITLE

button. You can now put in a title in the Line 1 box. You can add an additional title

and sub-title lines if you want. Now press the CONTINUE button and then click the

OK button. The graph should now appear. The window where all the graphs are

stored is called the Chart Carousel, and can be saved as a separate file. The extension

for chart files is always .cht

What is the line of best fit and what does the value of R2tell you?


44/77

44

Chi-Square

There are two ways to run Chi-square. The first is when looking at differences in

frequencies across levels in one variable. In this case, we want to see if there are

differences in the frequencies for the three levels of the variable AGEGRP (age

groups) - child, adult and elderly. You do this through:

Analyze

Nonparametric Tests

Chi-Square

To run a basic Chi-Square, just move the variable(s) to analyse across and clickOk. In

this case, move the variable AGEGRP over and run the analysis.

[NOTE: If youre interested in the various options, information about them can be

found by pressing theHelp button when you are in a dialogue box]

OUTPUT

The results present the observed and expected frequencies for each of the three levels,

as well as the Chi-Square value, the degrees of freedom (d.f.) and the significance

level. Is there a difference between the three groups in terms of their observed

frequencies?

The second way to run a Chi-Square is when carrying out a crosstab. The only change

is that before running the crosstab, you have to turn the Chi-Square option on.

So, go

Analyze

Descriptive Statistics

Crosstabs

Move the variables NSEX in the column box and NCARS in the row box. Make sure

to turn on the Chi-Square option, by clicking the Statistics button, and turning on the

Chi-Square option. Press the Continue and Okbuttons, then run the analysis.

OUTPUT

The crosstabs box is displayed, along with a variety of results. The one to be

concerned with is the significance level for the Pearsons value.


45/77

45

Microsoft Word Exercises

This exercise shows you how to copy and format a document. To save time heres one

we prepared earlier. A cast list is given below. Your task is to format the document

(top of page 80) into an organised piece of work (bottom of page 80). As you do this,note the different techniques you use - they will come in handy as the course

progresses.

These are hands on sessions, meaning that you should be discovering what to do

yourself. Of course, if you have any difficulties then we are here to assist you. Good

luck, and remember the Help facility.

The Help Facility

Normally you will want to go to the Help menu, then choose Contents and Index.

Click on Index and type in a relevant key word.

The Opening Screen

Word offers a number of ways of viewing the document. The most usual is Normal.

So, go to the View menu, and select Normal.

Alternatively, use the shortcut button at the bottom left of the screen. If you are not

sure what a particular button does then you should hold the pointer arrow over the

button for a second or two without pressing anything. Word will then give a short

description of the button.

The other view often used is Page Layout, which shows how the page will be printed.

Using Zoom from the View menu will allow you to enlarge the screen.

Opening Files

Were going to be e-mailing you two documents entitled play.doc and actone.txt.

Open up the e-mail, and then, one at a time, click the Word icons once with your right

mouse button. Now save the documents by clicking on Save. Find the Msoffice icon

and click on it. Now save your documents in the Msoffice folder under suitable

names (e.g. play.doc and actone.txt).

Now go into word. To open the file you just saved go to File, Open, click on your M:

drive and find your Msoffice folder. Click on the Msoffice folder and find play.doc.

Double click on play.doc.

Hidden Codes

Certain characters or text in Word are hidden. That is to say, they will only appear

on the screen but not in the final printed version. To turn this option on and off, click

on the reversed P button on the toolbar. That marker is the paragraph marker,

denoting a new line (hard return). Turn the hidden codes off.


46/77

46

Now show the hidden codes. Can you spot the deliberate mistake? Yes, its one of

those errors in conversion. Double click on {PRIVATE} and the whole word is

selected. (This is a handy trick worth noting). Delete this word.

Correct any other deliberate mistakes.

Page Layout

The original document has margins of 2 inches. Make sure the measurement units are

in inches by going to Tools, Options, General, and clicking inches in the box called

measurement units if it isnt already done so. Now go to File, Page Setup and

increase left and right margins to 2 inches from the Margins option. If it asks you if

you want your margins fixed respond with yes. Also note that under Paper Size you

can change the orientation of the paper. Briefly, portrait is upright (for text mainly)

and landscape is horizontal (for graphs and pictures).

To change the justification, select everything by going to Edit, Select All or bydragging the mouse over the whole document (only if its a small document). Now ,

click the right mouse button over any part of the selected area. Choose Paragraph

from the menu that appears and choose Justified from the Alignment option. You

can also do this from the toolbar. Centre alignment is useful for headings. Change

The Play to centre alignment.

Formatting

Italicise What The Butler Saw by clicking on What and dragging over the other

three words. Now use the toolbar to italicise by hitting the I button.

Highlight all of the text and change the font size to 12 pts.

Type in the other characters, leaving a space between the character and actor names.

Similarly, change the characters names to small caps by selecting the name and using

the right mouse button. From Font choose the Small caps option. For the other

character names, simply select the name and go to Edit, Repeat. Select the cast and

add a tab into the ruler at two inches by double clicking on the ruler at the two inch

mark. Place your cursor before Stanley. Go to Format and Tabs and add the leader

option 2 (i.e. lots of full stops). Press Ok and then press the tab key. Do this for each

cast member. For the director and designer, the tab is set at 1.5 inches with no leader.

Separate the pieces of text with two hard returns and dont forget to save yourwork.

The Play


47/77

47

The first London performance of What The Butler Saw was given at the Queens Theatre byLewnstein-Delfont Ltd and H.M. Tennnant on 5th March, 1969, with the following cast in order of

appearance.

Dr Prentice Stanley Baxter

Geraldine Barclay Julia FosterMrs Prentice Coral Browne

Directed by Robert Chetwyn

Designed by Hutchinson Scott

The final version should look something like this:

The Play

The first London performance ofWhat The Butler Saw was given at theQueens Theatre by Lewnstein-Delfont Ltd and H.M. Tennent Ltd. on 5th

March, 1969, with the following cast in order of appearance:

DR PRENTICE .................. Stanley Baxter

GERALDINE BARCLAY ..... Julia Foster

MRS PRENTICE ................ Coral Browne

NICHOLAS BECKETT ........ Hayward Morse

DR RANGE ....................... Ralph Richardson

SERGEANT MATCH .......... Peter Bayliss

Directed by Robert Chetwyn

Designed by Hutchinson Scott

8/2/2019 Alison

Documents

Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss