Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

Embed Size (px)

Citation preview

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    1/77

    1

    RESEARCH METHODS

    &

    STATISTICS HANDBOOK

    First Term

    Dr. Alison, Mr. Brent Snook

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    2/77

    2

    Table of Contents

    Section I: Introduction ............................................................................................ 3

    Section II: Practicals ................................................................................................ 11

    Section III: Extra Material ..................................................................................... 38

    Appendix: Basic Statistics ....................................................................................... 60

    Timetable.70

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    3/77

    3

    SECTION I

    INTRODUCTION

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    4/77

    4

    Course Instructors

    The instructors for this year will be, Brent Snook (Room 1.79), X, Y & Z. Our offices

    are on the second floor of the Eleanor Rathbone Building.

    Computing Systems

    The University Computing Services Help Desk (Brownlow Hill phone extension

    44567) has a full advice and backup service should you need any information and

    help.

    Computing Environments

    Communication between computers and ourselves is mediated by operating systems

    that allow us to access the various programmes andpackages in the University. Themost usual environment, as the systems are known, is Windows. This is controlled

    mainly through pointing and clicking the mouse at various icons on the screen.

    Another environment is UNIX, which is similar to MS DOS in that the commands are

    typed rather than selected with the mouse.

    The reason behind discussing these different environments is simply that the various

    packages we will be using are stored in these environments.

    Computers and Networks

    Most computers act both as stand-alone machines, capable of independent use, and asnetworked machines, which rely on a central server. Generally speaking, we in

    Psychology use networked machines for several reasons. Three main networks are

    used to access the packages on the different environments: the PC Managed Network

    Service, the NT Managed Network and the UNIX System.

    The Three Networks

    Access to the networks is gained by logging on with your user name andpassword.

    You then have access to your own personal disk space (M: drive) at a central location

    that only you can read. You have separate disk space for both Windows 2000 andUNIX, so you can have two separate passwords to increase security, though your user

    name remains the same. At the end of a session, you must always logoff.

    Usually, when a computer is bootedyou have the option to go on Windows 2000.

    Once on the Network, you are in the MS DOS environment and can then use

    Windows or UNIX.

    Computer Terminals

    Virtually all computers upstairs in Psychology and in the Eleanor Rathbone Teaching

    Centre (ERTC) are networked to Windows 2000. Also, on the first floor inPsychology is another suite of computers in the Eleanor Rathbone Data Centre

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    5/77

    5

    (ERDC). There is a printer in there which you should use in preference to the ones in

    the department, though Computing Services doesnt look kindly on people sending

    huge printouts to their printers.

    INSTALLING APPLICATIONS

    There are a number of applications you need to install onto your account. On your

    screen you should have an icon labelled MNTS Applications. Double click on this

    icon. Now double click on All and you should get a screen full of icons. These are

    all of the possible applications that you can install onto your account. Each

    application is installed by simply double clicking on the application icon.

    Install the following applications:

    1. Mulberry (e-mail)2. SPSS (version 10)3. Stanford Graphics (on L:INVPSY)4. Microsoft Office (Word, Excel)5. WS_FTP6. Netterm7. BR Journey Planner8. The various MDS packages (LIFA2000, UNIX SSA, MSA, POSA)9. Geographic packages (Dragnet)

    Within the limited timeframe, the purpose is to familiarise you with the software that

    is available and to encourage you to start using it.

    Registering on Windows 2000

    Computing Services have all their documentation accessible through the World Wide

    Web. You can print off any of the documents once you set up the appropriate printer.

    The Computing Services handout will take you through all the basics of Windows

    2000 including registering and changing your password.

    To register on Windows 2000, you can go to any computer terminal. There should be

    a Windows 2000 login screen. Type the word register in the username box and

    follow the instructions.

    Setting up the ERDC Printer

    You have the capability to print on a local printer (a printer that is actually attached to

    your machine) or a network printer (a printer which is attached to the network). Sincewe dont have enough printers for everyone it will be necessary to attach to a network

    printer.

    The printer which is probably most convenient is the one found in the Eleanor

    Rathbone Data Centre. The network printer queue for this printer is erdc-Queue.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    6/77

    6

    Your are not restricted to just this printer but it is the closest and it is reserved for

    postgraduate students studying in the Eleanor Rathbone Building.

    To connect to one of the Universitys networked printers you need to do the

    following:

    From the Start menu choose Settings and Printers.

    Doubleclick on Add Printer.

    Highlight the option Network printer server and click on Next.

    Double-click on Netware Network.

    Double-click on Novell Directory Services.

    Doubleclick on Liv.

    Doubleclick on O=liv.

    Scroll down the list of options until you see

    OU=PRINT-QUEUES

    Doubleclick on this option.

    From the list detailed in Figure 1, select the required printer queue and doubleclick on it (in this case, erdc-Queue).

    Choose OK to install the required printer driver on your local machine.

    From the list illustrated in Figure 1, select the required printer manufacturer andthen select the printer from the list available (it should be an HP LaserJet 4Si/4Si

    MX PS). Click on OK.

    If this is the first time you have installed this particular type of printer then youwill be asked for the location of the files to install.

    Replace the line D:\i386 (this might say A:\i386) in the box Copy files from withthe path line V:\NT40\i386 and click on OK.

    After a few seconds you will be asked if you wish to make this your default printer.Click on Yes.

    Click on Next.

    You will then receive a message that your printer has been successfully installed.

    Click on Finish.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    7/77

    7

    The printer driver will now be installed and connected to the specified printerqueue.

    Remember, once you have connected and installed a network printer driver, you may

    need to check the printer settings to ensure that the settings such as paper size andduplex printing are correct. For further information, please see Configuring the Printer

    Settings (on the Computing Services web page).

    Using UNIX

    A lot of your time this year may be spent in UNIX. We will go into more detail about

    this system in the section on UNIX. Three versions of the MDS procedures (SSA,

    MSA & POSA) are on UNIX.

    Double click the Netterm icon. Login with the same username as the PCMNS.Your password is listed on the form Computing Services sent you. You can

    change the password by typing in passwd.

    The versions of the MDS packages on the mainframe have a number of advantages

    over the two non-Windows PC versions. They are basically more powerful and

    therefore more effective. A second feature is flexibility. In comparison to the

    mainframe SSA, ShyeSSA (PC package) has only two choices of measures of

    association - Pearsons and Guttmans Mu. While PAP offers the widest variety of

    measures, the copy we have also tends to be the one that doesnt work. Well go into

    the PC packages in more detail at a later date. Running the mainframe packages is

    fairly straight forward, but there are actually four parts to the whole process -preparing your data, uploading/downloading, using the UNIX system itself, using the

    ned editor and running (in this case) the SSA.

    The SSA package has an option for reading data as freefields (any space between

    numbers indicates separate variables) for UNIX SSA, we can leave the data file as is.

    For MSA and POSA, the fields must be fixed, so:

    a) you dont want any spaces in your rows

    b) you need to have each score for each variable to start in the same column,

    otherwise, when the MDS programs read your data file, they wont be reading thevariables properly - you tell the computer where each cases score for each

    variable is located by indicating the columns that variable occupies.

    Both of these will become apparent when we get to running the SSA. A simple

    example:

    12 42 131213

    13122 71111

    In this case, the second variable actually takes up three columns, but, obviously, the

    computer does not stick a 0 in front of a score like 42 - thats your job. The same

    holds true for the third variable, which requires two columns as the score are over 9.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    8/77

    8

    The correct version of the two lines of data, with 0s added and spaces removed

    should be:

    12042131213

    13122071111

    Where columns 1-2 are variable 1, columns 3-5 are variable 2 and so on. After

    correcting the data, you then save the file again, making sure that it is still being saved

    in the generic ASCII format.

    Uploading/downloading files

    The best way to upload (transfer files from the PC/NT Managed Network to the

    mainframe) or download (vice versa) is to use the WS-FTP application (in the comms

    window). This is a simple package to use.

    1) When you first start it up, a window comes up asking for information about a host- this is the location youll be accessing outside of the PC to transfer information.

    Under host name type UNIX. Under host type, select the option UNIX

    (standard). Under UserID put your user name. Under password enter your

    UNIX password.

    2) The format is simple. On the left-hand side is the local host, through which youcan change between directories and drives. The right-hand side is your remote

    host (UNIX account), which also has directories you can move through. At the

    very top of each, your current drives/directories are listed. To transfer a file up,

    you select it and hit the right arrow. To transfer down, you hit the left button.The only trick is to make sure you have it set up for the right receiving directory,

    e.g. selecting the winword directory on your M: drive to receive the results file

    from an MSA that youve run. So, locate the coding.dat data file and move it from

    the M: drive to your UNIX account.

    3) When youve finished, hit the exit button.

    The UNIX Operating SystemImagine that UNIX is set up like the file manager in Windows, but that you have to

    type in commands rather than click the mouse to move up and down directories, copy

    files, delete files and so on. When you first login in, the info on the left of the dollar

    sign prompt indicates the user, the particular machine youre on (in brackets) and

    what directory you are in - from left to right. Remember that UNIX is case sensitive -

    an F is not the same thing as an f. I find it very useful to start all directories with a

    capital letter and files with a lower case one to separate them. The first thing to know

    is how to find help. This is done through the man command. If you know the

    particular command you want help for, just type man {command}. If you have an

    idea as to what type of action you want the computer to carry out, but dont know the

    specific command, type man -k {keyword} where the keyword is something related to

    the command, e.g. password, to get a list of commands with something to do with

    passwords.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    9/77

    9

    Here is a short list of UNIX commands:

    cp {directory/filename} {newfilename}- copies a file

    rm {filename}- deletes a file

    mv {filename} {directory} - moves a file/renames a file

    ls - lists the contents of the current directorycd {directory} - change directory (note that cd .. moves you up one directory)

    mkdir {directoryname} - creates a directory

    rmdir {directoryname} - removes a directory

    ned {filename} - activates the ned editor

    pine - e-mail editor

    gopher - info source

    tin - newsgroups reader

    Note that * is a wildcard character, just like in Windows, for selecting multiple files

    for commands. All of this information is available in more detail on the WWW.

    The ned editor

    I mentioned this before (briefly), but I thought Id go into this in more detail, as its a

    useful tool for editing files when you are on UNIX. All of these details are in the

    document on the ned editor on the Computing Services on-line documentation, BTW.

    Essentially, its a crude word processor, where all the function keys have

    various...functions...as do shift-function keys. None of this silly bolding or italics, no

    sir. You can type, you can move your cursor around and you can find and replace.

    Word processing for real men. Anyway, on to the lesson.

    To start up the ned editor, you have to edit a file on the UNIX account. You do the

    latter by typing ned {filename}, so choose any filename and open the ned editor.

    A screen will come up with a brownish banner along the bottom listing the various

    function key options. All the basic keys operate like in Word: arrow keys move the

    cursor around, the home key moves to the start of the line, and end to the end. Page

    up/down are also the same. As are insert/typeover, delete, backspace and so on.

    Right, now type out everything from I mentioned this... on to right here.

    Hit the F1 key, to get info on one of the displayed topics, move the cursor to it and hit

    enter. For a function key, hit that key. Ctrl-G exits help.

    Right, position your cursor at the start of the third line from the bottom, then hit the

    F2 key - a new line. Now hit F9, which will delete the line you just created. Now hit

    shft-F9, and the line comes back. Right, now hit the F4 key to mark the start location

    for cutting/copying and pasting. Move the cursor somewhere on the next couple of

    lines then hit F6. Move to the end of the document and hit enter a couple of times,

    then hit F5. All the text between the marking point and the cursor is copied to the

    new cursor location. The same process is carried out for cutting text, but you hit shft-

    F6 instead of F6.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    10/77

    10

    To insert text from another file, hit shft-F7. Ned will ask for the filename, the text of

    which will be inserted wherever youre put the cursor. Shft-F4 saves the file, while

    shft-F3 saves the file and exits. F3 quits without saving changes. However, the most

    important feature of ned for data files that youve uploaded is the replace feature (F8).

    With this, you can change 1s to 0s and so on. So, lets change the letter e to i.

    Move to the top left of the document. Hit F8, then type the letter e [DONT hitenter now]. Hit F8 again, and type i. Hit F8 one more time. Youll be prompted to

    make a choice about the first e. If you hit the Y key, it will change it, N will make

    the computer jump to the next occurrence. A ! will cause ned to make all possiblechanges. Youll find this handy for changing numbers prior to doing analyses. For

    example, changing 0s to 1s and 1s to 2s.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    11/77

    11

    SECTION II

    PRACTICALS

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    12/77

    12

    WEEK 1: Thursday October 3th

    Introduction to SPSS

    SPSS is the primary package for running any statistical procedures outside of the

    MDS packages. In addition to providing outputs for various analyses, SPSS allowsthe user to manipulate the data in a variety of ways and to produce various graphs and

    figures that can be added into documents.

    In this practical, you will be asked to open and search through a data matrix, and enter

    and code data. The procedure for the exercises in this practical involves going

    through the steps for each analysis using the data file family.sav.

    Where is Family.sav?

    The first thing you must do is copy family.sav from the N: drive on your computer to

    the M: drive (which is your own personal account). To do this you must create afolder on your M: drive into which the family.sav file will go. You should be

    looking at a screen with a number of icons on it. In the top left-hand corner is an icon

    called my computer. Double-click on this icon.

    Find the M: drive and double-click on it. You should now see a window containing a

    number of folders. Go to FILE, then NEW and choose FOLDER. A new folder

    should appear in the bottom of the window labelled New Folder. Call your new

    folder Survey and ENTER. After you have done this, go to FILE and then

    CLOSE.

    Now, within the same window double-click on your N:drive. Within that drive youwill see a folder with title SPSSEGS (standing for SPSS example files). Double-click

    on this folder. Within this folder there is a file labelled family.sav. This is the file

    you want to copy into your Survey folder on your M: drive. So, single click on

    family.sav and go to EDIT and then COPY.

    Go back to your M: drive by shutting down the N: drive. (click on the X in the right

    hand corner of your N: drive window). Double-click on your M: drive and double-

    click on the folder Survey. Survey should be empty. Go to EDIT and then PASTE.

    Now you should see the file family.sav.

    Exploring the Data Editor Window

    Start SPSS for windows by double-clicking on the SPSS icon. Once the program has

    been opened a window will appear in the middle of the screen with a number of

    options to choose from. You want to select OPEN AN EXISTING DATA

    SOURCE.

    Go to the directory Survey in your M: drive. Find the file family.sav and double-

    click on it. The values from the family.sav file should now appear in the Data Editor

    window. Click on the middle button in the top right hand corner of the window to

    maximise the size of the window. Once the file is open you will see two sheets at thebottom of the window. One is labelled DATA VIEW and the other is labelled

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    13/77

    13

    VARIABLE VIEW. You want to stay on the data view sheet. Click on the VALUE

    LABELS (in bold rectangle below) button on your tool bar (it is 2nd

    from the right).

    This will toggle between value labels (numeric and string (words)). Scroll through

    the data to answer the following questions:

    1. What is the name of the last variable in the data matrix?

    2. What is the case number of the last case?

    3. What is the value of IDNUM for the last case?

    4. What is Roberts date of birth?

    5. What is Jacks marital status?

    If you click on a cell when value labels are displayed in the DATA VIEW

    WINDOW a scroll bar will appear to provide an indication of the options (variable

    labels) used in the coding framework. Using this feature, please answer the following

    questions:

    What are the labels for CAR?

    What are they for MORTGATE?

    What are they for NAME? Is there a problem with NAME? What is it?

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    14/77

    14

    The variable view sheet

    In order to view how a variable has been defined in terms of its name, variable label,

    value labels and user-missing values you have to click on the sheet VARIABLEVIEW.

    Please answer the following questions. Do not forget to use the scroll bars on the

    bottom and on the right side of the variable view window to find your answers.

    What is the variable label for DATEBLT?

    What are the values and value labels for MARSTAT? (hint: click on the grey box)

    What is the user-missing value for NCARS?

    Click on this Sheet

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    15/77

    15

    Coding and Entering Data

    Open up a new Data Editor window by going to FILE, then NEW and save DATA to

    M: drive. Below is a questionnaire regarding leisure activity and a coding scheme.

    Your task is to set up the Data Editor Window and then enter the data below.

    Leisure Activity Questionnaire

    1. What is your first name?2. What is your sex? M = male, F = female3. What is your marital status?

    1 = married 4 = widowed

    2 = cohabiting 5 = divorced

    3 = single 6 = separated

    4. Do you watch sports? 1 = yes 2 = no 3 = do not know5. Do you play sports? 1 = yes 2 = no 3 = do not know6. Do you visit the seaside? 1 = yes 2 = no 3 = do not know7. Do you go to films? 1 = yes 2 = no 3 = do not know8. Do you go pop concerts? 1 = yes 2 = no 3 = do not know

    Coding Framework

    Variable Name Format Variable Label Coding Details/Labels

    IDNUM NUMERIC IDENTIY NUMBER Unique Number for Each Person

    NAME STRING FIRST NAME Enter First Characters of Name

    SEX STRING SEX M = male F = FemaleAGE NUMERIC AGE IN YEARS Enter age in years (-9 = Missing)

    MARSTAT NUMERIC MARITAL STATUS 1=married 4=widowed

    2=cohabiting 5=divorced

    3 = single 6 = separated

    WATCHSP NUMERIC WATCHES SPORTS 1 = yes 2 = no 3 = do not know

    PLAYSP NUMERIC PLAYS SPORTS 1 = yes 2 = no 3 = do not know

    VISITSEA NUMERIC VISITS SEASIDE 1 = yes 2 = no 3 = do not know

    GOTOFILM NUMERIC GOES TO FILMS 1 = yes 2 = no 3 = do not know

    GOTOPOP NUMERIC GOES TO POP CONCERTS 1 = yes 2 = no 3 = do not know

    Data

    IDNUM NAME SEX AGE MARSTAT WATCHSP PLAYSP VISITSEA GOTOFILM GOTOPOP101 MARGARET F 87 4

    201 JACK M 62 1 1 2 1 2 2

    202 JOSIE F 1 2 2 1 2 2

    301 NANCY F 60 5 1 2 1 2 2

    503 VICTORIA F 11 -9 2 1 1 1 3

    1002 JOHN M 31 2 1 3 1 1 1

    You should have a clean window in front of you (i.e., there should not be any data in

    the spreadsheet). You now have to set up each column of your data matrix so that you

    can eventually enter in your data. The first column will hold IDNUM. To enter

    IDNUM into the data view sheet you need to go to the VARIABLE VIEW window.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    16/77

    16

    In fact, defining and labelling all of your variables must be done in your variable view

    sheet.

    In the first Row (horizontal) you can label and define your first variable IDNUM.

    Using the coding framework above enter in the appropriate information. Type in the

    variable IDNUM under NAME. The TYPE of variable is NUMERIC (you areentering a number)and under DECIMALS, using the scroll bar, choose 0 decimal

    places. Under the heading LABELS you want to type in the definition of the

    variable. Make sure this definition clearly defines the variable to avoid confusion.

    Depending upon the type of data (i.e., nominal, ordinal, ratio, or interval) you are

    measuring you may have to add VALUES. In the case of IDNUM (identify number)

    there is only one unique number, therefore you do not have to define the variable. So,

    under VALUES, you should have chosen none. However in defining nominal data

    such as SEX (your third variable to enter) you would have to define male as M and

    female as F.

    For IDNUM there are no missing values therefore you choose none. The heading

    COLUMNS will give you the opportunity to define the width of your column.

    Choose a width of 6. The ALIGN value allows you to determine the positioning of

    your data in the cell. It may be right, left or centred. In the last column heading is

    MEASURE. This column allows you to define the type of data you are working

    with. With IDNUM you are working with scale data.

    When you define variables such as NAME (i.e., the name of the subject), you want

    the TYPE of variable to be STRING, the WIDTH should be 10 (refers to the number

    of characters to appear in the name). Using the coding framework below define the

    variable NAME.

    When you define variables such as sex (nominal data) you want to add value labels in

    the column called VALUES. If you click on the cell a value labels window will

    appear. Across from value you should type your value M and across from the value

    label type male and then click on add. Then you should enter F in the value box and

    female in the value label box. Once you have made these changes you can move back

    to the DATA VIEW window and view the changes.

    Return to the VARIABLE VIEW window and define the numeric variable AGE in

    the next row. It has no decimal places, and it requires a missing value of9 to identifycases where a response is not given. To assign a user-missing value of9 click on the

    MISSING column. A missing values window will appear. Click on Discrete missing

    values and enter 9 in the first box. Set up a variable label and a value for 9 as

    shown in the coding scheme for your questionnaire. Now, do the same for the

    numeric value MARSTAT in the next row. This too is numeric with no decimal

    places, has a user-missing value of9 and requires a variable label and several value

    labels as shown in the coding scheme.

    The remaining 5 variables also need to be defined. To avoid defining each variable

    separately you should define the first variable WATCHSP and then copy the cells to

    the remaining four below. To do this go to the cell you want to repeat (i.e., the value

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    17/77

    17

    labels) and click on EDIT, COPY and then move to the cell where you want the same

    definition and then go to EDIT and PASTE.

    When you have finished entering all of the data save it into an SPSS file by selecting

    FILE, SAVE and clicking on the folder Survey in your M: drive. Save the file under

    any name you want (e.g., Person.sav). Exit from SPSS and log off.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    18/77

    18

    WEEK 2: October 10th

    Descriptive Statistics, Charts & Manipulating Data in the

    Matrix

    This practical is divided into two sections. The first section is intended to familiariseyou on how to run commands to calculate descriptive statistics and to graph your data.

    The second section aims to show you how to compute re-code, filter and delete your

    data.

    Section I: Descriptive Statistics & Charts

    We shall estimate descriptive statistics for the three variables: TYPACCM,

    DATEBLT, & NADULTS.

    Question: Are these variables nominal (non-ordered categories), ordinal (with ordered

    categories) or metrical (on a measure scale with well-defined differences between

    values)? Hint: The second variable is not so obvious.

    To run the descriptive statistics click on ANALYZE, DESCRIPTIVE STATISTICS

    and then FREQUENCIES. In the left box there should be a list of all the variables

    that are present in the spreadsheet. Highlight TYPACCMand click the arrow between

    the boxes to move it into the box labelled variables. Continue this for the other two

    variables. A shorter route to move the variables to the variables box would be to

    double-click on the variables when they are in the left box - removing the variables

    may be accomplished in the same manner.

    After the three variables are in the variables box, click on STATISTICS at the

    bottom of the box. Within the Frequencies: Statistics box there are several options.

    Tick the boxes for MEAN, MEDIAN & MODE on the right hand side. In addition,

    tick the boxes for STANDARD DEVIATION (Std. Deviations) & RANGE. After,

    click on the continue button and wait for the data to process and for the output

    window to appear.

    Answer the follow questions:

    What is the most useful measure of central tendency for each of the three variables?

    What are the sample values?

    What is the maximum value for NADULTS? Does this appear to be correct?

    Now, try re-estimating the descriptive statistics for NADULTS, only this time without

    the case with the unusual value. Select DATA and then SELECT CASES. Withinthe Select Cases make sure under the Unselected Cases that the Filtered box is

    ticked. Then select the IF CONDITION IS SATISFIED optionand click on the IF

    button. Move the variable NADULTS to the adjacent box by either double-clicking

    on itor by clicking on the variable and moving it across using the arrow.

    After the variable label use the calculator provided to type less than (

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    19/77

    19

    the unusual variable. After this hit continue and then OK to return to the spreadsheet.

    Answer the follow questions:

    Has the case with the unusual value been barred off?

    Which case is it?

    Now, re-run the Frequencies command for NADULTS only and record the mean,

    median & mode with and without the case included.

    Which descriptive statistic is most affected by the unusual variable?

    Graphing your Results

    Histograms

    Histograms are statistical diagrams that show the distribution of variables. In a

    histogram, values are grouped together in intervals and a bar is drawn for each

    interval whose area is proportional to the number of cases in the interval.

    To generate a histogram select GRAPHS and HISTOGRAM

    Then move the variable HEIGHT into the variable box. In the same box, click the

    display normal curve box and then hit OK.

    Upon examining the output window that contains the graph answer the following

    question:

    Do you think HEIGHT has a normal distribution, or would you run other tests?

    Go back to the data editor window, select GRAPHS and HISTOGRAM and run the

    same command as done using the HEIGHT variable but with WEIGHT.

    From the histogram, would you say that the variable WEIGHT has a normal

    distribution or would you try other tests?

    Are there any differences between the two histograms?

    Scatter plots

    Scatter plots show the joint behaviour of two (or more) variables in a diagram.

    Values of one of the variables are plotted against values of another, the two variables

    usually being metrical. A scatter plot usually shows much more about the behaviour

    of the variables than descriptive statistics like correlation.

    Scatter plots are also drawn using the GRAPHS command. Click on GRAPHS then

    SCATTERPLOT then on the SIMPLE option and then click on the DEFINEbutton. Select WEIGHT for the Y-axis and HEIGHT for the X-axis. In a scatter plot,

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    20/77

    20

    if one of the variables is thought to depend on the other, it is plotted on the vertical Y-

    axis. Here, we think that weight depends on height, therefore, weight is plotted on the

    Y- axis.

    In addition, select SEX for select markers by. This will allow you to identify pointson the scatter plot by sex, as males and females tend to have different heights and

    weights. Run the command and look at the scatter plot in the chart carousel window.

    Can you see any difference between the males and the females in terms of heights and

    weights?

    To edit the chart simply double-click on it. Now we shall try fitting simple linear

    regression lines to the data. Select CHART then OPTIONS and FIT LINES (Select

    Subgroups) and FIT OPTIONS. Make sure linear regression has been highlighted

    and then click-on continue. There should be two different lines for males and

    females.

    What can you say about the slopes of the two regression lines?

    Can you see any difference now between the males and the females in terms of

    heights and weights?

    The markers used to distinguish males and females are drawn in different colours, but

    the difference is not very clear. It will become less clear if you print out the scatter

    plot on a monochrome printer! Click on any marker in the plot: all markers of that sex

    become highlighted in black squares. Then click on the icon depicting acrayon/pencil to change the colour of the marker/symbol. To change the symbol

    simply click on FORMAT and then MARKER. There you should have several

    options of changing the type and size of the symbol. After making the chosen changes

    hit Apply and Close.

    Editing a High Resolution Chart

    Generate a high-resolution chart, a histogram, to try out some of the editing features.

    Histograms are used for metric or quantitative variables, like AGE, which takes on

    values along a scale. There are generally too many distinct values to make it worthdrawing a bar chart. Instead, the values are grouped into intervals or bands and a bar

    is drawn for each interval. The area of each bar is proportional to the number of cases

    with values in the interval.

    Still using family.sav select GRAPHS and then HISTOGRAM. Select HWRATIO

    for the variable box and click OK. A histogram for HWRATIO is added to the Chart

    Carousel Window. The histogram shows some descriptive statistics for the variable

    too.

    What are the sample mean and standard deviation for HWRATIO?

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    21/77

    21

    Double-click on the chart to move the histogram from the Chart Carousel Window to

    a Chart Window. The menu bar and tool bar change to show editing facilities.

    First, click on CHART then OPTIONS and NORMAL CURVE - then hit OK. The

    normal curve superimposed over the histogram is the one for the above mean and

    standard deviation. Admittedly, its difficult to make a decision with such a smallsample, but does the curve appear to be a good fit to the histogram?

    Now, click on the icon swap axes. Does the histogram look better with vertical bars

    or horizontal bars?

    Now try some of the other icons and tools to change the chart. These changes require

    the appropriate part of the chart to have been selected. Click on any bar. The bars will

    become highlighted with small black squares at their corners. Then click on the Fill

    Pattern - tool button (the rectangle with diagonal shading). To apply a pattern, click

    on it and then click on apply. Once you have finished with the patterns, click on close.

    Also, try the Colour Palette tool button (the one with the pen) and the Bar Labels icontool button (the one with the fingernails).

    You can also change the style of the line showing the Normal curve, and the fill

    pattern and colour of the background of the histogram. Once you have finished with

    your work, select FILE and then SAVE CHART. Save your histogram as

    artwork.chz

    To copy or move a chart into Word click on EDIT and then select COPY the chart.

    To move to Word minimise SPSS and open word. If Word is already open then press

    ALT & TAB to move between programs. Once in Word, go to EDIT PASTE.

    Finally, exit from SPSS for windows by selecting FILE EXIT

    Section II: Manipulating the Data in the Matrix

    (Computing, Recoding, Filtering and Deleting Data)

    Computing Values

    Start off SPSS and open the file family.sav (you should find this file on your M:

    drive in the folder that you named survey). We shall use the COMPUTE command

    to build up a new variable that will be labelled BMI, which stands for body mass

    index. This is calculated as:

    Body mass index = weight (pounds)/ height (inches)2

    Select TRANSFORM and then COMPUTE and set the Target Variable to bmi.

    Click on Type & Label and enter the label body mass index in the label box. Click

    continue to return to the Computer Variable dialog box. Using the source list on the

    left and the calculator pad in the centre, build up

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    22/77

    22

    Weight * 0.4536 / (height * 0.0254) **2

    in the numeric expression box. Run the completed command. The new variable is

    added to the end of the data. We shall check the new variable by estimating a few

    descriptive statistics using FREQUENCIES (via Analyze Descriptive Statistics).

    (Analyze Descriptive Statistics Explore would be a better command, butFrequencies will do here).

    Select ANALYZE, DESCRIPTIVE STATISTICS and then FREQUENCIES.

    Move body mass index (bmi) to the Variable(s) box. Since bmi is a metric variable

    with a potentially different value for every case in the data suppress frequency tables

    by clearing the check box. Click on DISPLAY FREQUENCY TABLES. Now you

    will get a message saying You have turned off all output. Unless you request

    Display Frequency Tables, Statistics or Charts, Frequencies will generate no output.

    No worries, we will estimate descriptive statistics by clicking on STATISTICS and

    clicking on the check boxes for the following: MEAN, MEDIAN, MINIMUM and

    MAXIMUM. Run the command and look at the output.

    What are the sample values of the mean, median, minimum and maximum?

    (The mean should be around 25.0. Any values outside the range15.0 to 35.0 should

    be queried).

    Do the sample statistics satisfy these rough checks? If not, something is wrong!

    Conditionally Computing Values

    Now we shall use the IF sub-command (via Transform-Compute) to set up a new

    variable. The sub-command allows you to set up a new variable under the condition

    that the original variable, which it is based on, fulfils certain criteria. We want to set

    up a new variable AGEHOH for the age of the head of the household. In other

    words, If a person in the sample is head of the household, AGEHOH shall indicate

    that persons age.

    Select TRANSFORM and then COMPUTE and clear the previous settings by

    clicking on RESET. Set the Target Variable to AGEHOH and click on TYPE &

    LABEL to assign the label age head of household. Click on Continue, and then setthe Numeric Expression to AGE. We want this (i.e., the current age in years) to be

    applied when the case is head of household, which occurs when RELTOHOH is zero.

    (For the variable RELTOHOH relationship to head of household the value 0

    denotes that a person is head of household). Select IF and INCLUDE IF CASE

    SATISFIES CONDITION. Set up the condition RELTOHOH = 0 in the large box

    and run the command. The variable AGEHOH should now be added to the end of the

    data. Have a look at the new variable. You should see ages set for some cases only.

    Lets check AGEHOH by moving it in the data matrix to the column after

    RELTOHOH so that we can see what happened more clearly.

    First we must make a space in the data matrix by inserting a new variable. FindRELTOHOH by either scrolling through the DATA EDITOR window or by

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    23/77

    23

    selecting UTILITIES and VARIABLES. selecting RELTOHOH from the source

    lists and then clicking on GO TO and CLOSE. Now click on any cell of the variable

    that is immediately to the right of RELTOHOH (this variable should be sex). Then

    select DATA and then INSERT VARIABLE. Alternatively, you can click on

    INSERT VARIABLE tool (which is the sixth button from the right).

    Now, a blank column headed var00001 containing system-missing values (dots) is

    inserted before the selected variable. Move the AGEHOH to this column by single-

    clicking on AGEHOH to highlight the column and then selecting EDIT and CUT.

    To paste it in the desired location single-click on the head of the blank column

    (var00001) and select EDIT and then PASTE.

    Look at the values in the DATA EDITOR window.

    Do all heads of household have AGEHOH set? If not, what might be the reason?(Hint: Look at the variable that agehoh is derived from!).

    What value is set for cases who are not heads of household?

    Re-coding Values

    The RECODE command in SPSS is very powerful and efficient but it can be a little

    tricky to set up due to the number of clicks required. We shall recode BMI into a new

    variable BMIGRP, which takes the values

    Value Range Interpretation

    1 bmi < 25.0 Okay

    2 25.0 bmi < 30.0 Overweight

    3 bmi 30.0 Obese

    Select TRANSFORM and then RECODE and INTO DIFFERENT VARIABLES.

    Select BMI from the source list into the central INPUT VARIABLE OUTPUT

    VARIABLE box. Enter BMIGRP into the Name box and click on Change to

    complete the INPUT VARIABLE OUTPUT VARIABLE box. Also enter a

    suitable variable label for BMIGRP in the LABEL box (e.g., categorical body mass

    index).

    To set up the recoding, click on OLD and NEW VALUES.We build up the recode

    specification for the third category of BMIGRP first. In the OLD VALUE box, select

    RANGE and THROUGH HIGHEST and enter 30.0 in the box before THROUGH

    HIGHEST. In the NEW VALUE section, enter 3 into the VALUE box. Then click

    on ADD to copy the specification 30.0 THROUGH HIGHEST = 3 to the OLD

    NEW box. Build up the other two specifications, in order of 25.0 through 30.0 = 2

    and LOWEST THROUGH 25.0 = 1. Now run the completed command.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    24/77

    24

    To finish, double-click on BMIGRP in the Data Editor window, and define suitable

    value labels (i.e., 1= okay, 2 = overweight, 3 = obese).

    Are the values of BMIGRP correct for the first ten cases?

    Filtering Cases

    In this example, we shall filter cases. The filtering option allows you to exclude

    certain cases from further analysis temporarily.

    Before filtering, generate a two-way frequency table for ownrent by typaccm by

    selecting ANALYZE, then DESCRPTIVE STATISTICS and then CROSSTABS

    and selecting ownrent for Row(s) and typaccm for column(s). Run the command and

    look at the table in the output.

    1. What exactly does the frequency count in the first cell of the second table refer to?6 what?

    We shall filter using the variable PERSNO, which is the number of persons in the

    household.

    2. What will be the effect of selecting cases satisfying the condition persno=1? Whatis the impact on households?

    Now, select DATA and SELECT CASES and then IF CONDITION IS

    SATISFIED and make sure that UNSELECTED CASES are FILTERED (This isvery important as the alternative is DELETED, which we want to avoid now!)

    Select IF.. and build up the condition persno = 1 in the large box. Run the

    completed command. Find persno in the data editor window.

    3. What appears in the status bar when filtering is in effect? (The status bar is at thebottom of the window)

    4. What has happened to case numbers with persno 1?Rerun the CROSSTABS command (via AnalyseDescriptive statistics) and look at

    the new table in the output.

    5. What exactly does the frequency count in the first cell refer to now? 3 What?

    Go to the Data Editor Window and save the filtered data as familyf.sav. Then select

    DATA, SELECT CASES and then ALL CASES. Run the command.

    6. What happens to the status bar and the case numbers?

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    25/77

    25

    Deleting Cases

    Instead of filtering cases we shall delete unselected cases without doing any harm to

    data stored in disk system files. Select DATA, SELECT CASES, IF CONDITION

    IS SATISFIED which picks up the previous condition on persno = 1. Then select

    UNSELECTED CASES are DELETED. Run the command and have a look at theData Editor Window.

    1. How many cases are left?

    2. What are the values of PERSNO?

    3. What are the values of HSEMO? What does that successfully show?

    Now, rerun the CROSSTABS command in the previous section and look at the

    output.

    4. Do the results agree with those obtained when cases are filtered?

    Return to the Data Editor Window and save the selected cases to a NEW system file

    named familyd.sav (after deleting cases you should do this as soon as possible to

    avoid overwriting your complete data file by accident).

    Finally, re-open familyf.sav, the filtered file you saved from the previous section

    5. Is filtering still on?

    Exit from SPSS, saving the contents of the output window into output3.spo

    Open up family.sav that you saved to your survey folder.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    26/77

    26

    WEEK 3: October 17th

    T-Tests

    Section I: Parametric T-tests (related& unrelated)

    This practical will show you how to run a t-testso that you can look at the difference

    between means oftwo scores.

    Experimental designs can be of two basic types within subject (dependent or

    related) and between subject (independent or unrelated). The former is when all

    subjects are subjected to all conditions (e.g., testing reaction times before and after

    receiving a drug). Between subject designs are when you divide subjects into

    independent groups, such as on the basis of gender, or into one group that receives a

    drug, and a second that receives a placebo.

    DEPENDENT OR RELATED SAMPLES T-TEST

    First, a quick review of the test layouts.

    1. Related Samples - two variables, one for each condition of the experiment. Eachsubject has two scores, as a result:

    Variable 1 (First set of scores for

    the subjects, e.g. reaction time

    before taking the drug)

    Variable 2 (Second set of scores

    for the subjects, e.g. reaction time

    after taking the drug)Sub. No.

    1 10 30

    2 11 31

    3 12 32

    4 10 30

    5 9 29.

    2. Independent or Unrelated Samples - two variables, the first tells SPSS whatcondition EACH subject belongs to, the second is the actual score for that subject:

    Variable 1 (what condition each

    subject belongs to, e.g. group 1 are

    the controls, group 2 receive the

    drug)

    Variable 2 (actual score, e.g. each

    subjects reaction time)

    Sub. No.

    1 (control) subjects condition (1) subject 1 score

    2 (control) 1 subject 2 score

    3 (experimental) 2 etc.

    4 (experimental) 2 etc.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    27/77

    27

    T-Test for Related Sample

    This is the parametric comparison of two related groups, for example, when you want

    to compare mean scores for subjects at some task before and after taking a drug. Each

    set of subject scores for the related t-test must be entered as an individual variable inSPSS. So, in the above example, all the individual(s) scores for the task before taking

    the drug would be in one column and all the scores after taking the drug in another.

    First, open family.sav. The next step is to add a variable to the data file, so that we can

    run the related t-test. In this case, the comparison will be between the subjects

    height/weight ratio before they were put on a 4-week diet/exercise plan and after. The

    variable already in the data set HWRATIO is the measure before. At the end of the

    data file, add the variable HWRATIO2 to represent their measurements after the plan.

    Using what you learned in the first lesson about entering data, create the new variable

    using the information below:

    Variable Name: HWRATIO2

    Variable Label: Height/Weight Ratio after plan

    Data: see table 1 below

    To run the procedure, go ANALYZE, COMPARE MEANS and then PAIRED-

    SAMPLES T-TEST

    The usual dialogue box appears. The dialogue box has the two-column format. The

    only difference is that you must select pairs of variables and move them across, rather

    than just one variable at a time. To do this, you have to click on one variable, then

    locate the other variable and click on it. The two variables that you have requested

    should appear in the current selection box. After clicking on both, you then press the

    arrow button to move the pair across. SPSS will analyse each pair to determine if their

    means are significantly different statistically. In this case, select the variables

    HWRATIO and HWRATIO2 and move them across, then press the OK button.

    Table 1: Data for Height/Weight Ratio after a 4-week diet/exercise plan

    Subject Number HWRATIO2 score

    1 .44

    2 .523 .46

    4 .

    5 .44

    6 .42

    7 .33

    8 .74

    9 .80

    10 .32

    11 .60

    12 .6513 .40

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    28/77

    28

    14 .50

    15 .57

    16 .41

    17 .60

    18 .55

    19 .49

    20 .60

    OUTPUT

    The results appear in three sections

    The first section gives you a table called Paired Samples Statistics with the meanscores, standard deviations and standard error mean for the two variables.

    The second section is a table called Paired Samples Correlation(s) showing thecorrelation between the two variables and the level of significance The third section is more important. The table called Paired Samples Test

    indicates the significance of the results. This includes the t-value, degrees of

    freedom (d.f.) and the two-tailed significance level.

    What is the t-value for the comparison between the height to weight ratio scores?

    Is there a significant difference between the scores before and after the diet/exercise

    plan? If so, which is the greater height/weight ratio?

    T-Test for Independent Samples

    This is the parametric t-test for two independent samples - a between-subjects design

    where, for example, subjects are randomly assigned to two separate test conditions

    (e.g. drug and control) and the mean scores (e.g. reaction time) are compared to

    determine if they are significantly different from each other.

    In this case, you want to test whether there is a statistical difference in weight to

    height ratios between the male and female subjects. The format for variables to be

    used in the independent t-test is different from that used in the related. Instead of thescores being placed in two separate columns (variables), all of the scores are placed in

    a single column (variable). A second variable identifies for SPSS which of the two

    groups each score belongs to. So, in this case, there is the variable HWRATIO2 as the

    dependent variable and NSEX as the independent variable.

    To run the analysis, go to ANALYZE, COMPARE MEANS and then

    INDEPENDENT-SAMPLES T-TEST. As usual, the left column lists all the

    variables in your data file. On the right, there are two boxes:

    The test variable(s) box is where you move the dependent variable(s). (e.g.,

    HWRATIO2)

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    29/77

    29

    The grouping variable box is where you move the variable that distinguishesbetween the two independent groups (e.g. the variable NSEX)

    First, select the dependent variable HWRATION2 and move it over to the test

    variable(s) section. Next move NSEX over into the grouping Variable section and

    press the DEFINE GROUPS button. Values from the grouping variable must beentered into the two boxes. In the case of the variable sex, where only two levels are

    recorded, you would just enter 1" in the top box for male subjects, and 2" in the

    lower one for female subjects. Hit the CONTINUE button, then hit the OK button.

    [Note: There may be times where you have a larger range of values, such as five

    different education levels, but only want to look at the difference between two of

    them. You would enter the two values you wish to compare.]

    OUTPUT

    There are two sections:

    The first section of the output gives you a table called Group Statistics whichindicates the number of cases and the mean scores etc. for each condition.

    The second section provides a table called Independent Samples T-test andstarts with Levenes Test for Equality of Variance. If the variance is unequal and

    is indicated by significant difference, then when you look at the results of the t-

    test in the final table, you use the line starting with Equal variances not assumed.

    If it isnt significant, you look at the line starting with Equal variances assumed.

    The final table gives you t-values, degrees of freedom and the two-tailed

    significance levels.

    In this case, Levenes is not significant (0.137), so we look at the equal variance line.

    In this case, it is not significant (two-tailed significance of .478), so we reject the

    hypothesis that there is a difference between males and females in their height to

    weight ratios.

    Section II: Non-Parametric T-tests (Wilcoxon - related & Mann-

    Whitney - unrelated)

    All of the tests today can be found under ANALYZE, NONPARAMETRIC TESTS

    Mann-Whitney - Unrelated

    This is the non-parametric t-test for two independent samples - a between-subjects

    design. To run the analysis, choose: ANALYZE, NONPARAMETRIC TESTS, and

    2 INDEPENDENT SAMPLES

    As usual, the left column lists all the variables in your data file. On the right, there are

    two boxes:

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    30/77

    30

    the test variable(s) box is where you move the dependent variable(s) the grouping variable box is where you move the variable that distinguishes

    between the two independent groups (e.g. the variable sex)

    So, move HWRATIO2 into the test variable box, and move NSEX into the groupingvariable box. Now, click the Define Groups button. Values from the grouping variable

    must be entered into the two boxes. In the case of the variable NSEX, you enter 1" in

    the top box for male subjects, and 2" in the lower one for female subjects. Hit the

    Continue button, then hit the Ok button.

    OUTPUT

    SPSS divides the entire set of subjects into three groups:

    those with a score of 1 (male)

    those with a score of 2 (female) cases with missing data, which are excluded from the analysis)

    The first section gives the mean ranks for the two conditions that are included, as well

    as the sums of the ranks and the numbers of cases

    The second section gives the Z score and p-values for the T-test.

    Is there a difference between males and females? How do the results from this week

    compare to last weeks?

    Wilcoxon - Related

    This is the non-parametric repeated measures T-test, in a within subjects design. Like

    the parametric equivalent, well be running a comparison of height to weight ratios for

    the sample population before and after a four-week exercise/diet program. To run the

    analysis, choose: ANALYZE, NONPARAMETRIC TESTS, and 2 RELATED

    SAMPLES

    The dialogue box has the two-column format. The only difference is that you mustselect pairs of variables and move them across. SPSS will analyse each pair to

    determine if their mean ranks are significantly different statistically. For this analysis,

    select the two variables HWRATIO and HWRATIO2, then click the Ok button.

    OUTPUT

    The output for this procedure is quite different from the parametric test. The first

    section gives you information about how many rank scores for one condition are

    less than (LT)

    greater than (GT)equal to (EQ)

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    31/77

    31

    the ranks scores for the other condition. The mean ranks for each of these three levels

    are given, as well as the sums of the ranks for each and the number of cases that fall

    under each level.

    The main results are underneath this table, where the Z value and the p value aregiven. The usual standard for levels of significance is used (if p is less than 0.05).

    How many cases are there where HWRATIO is greater than HWRATIO2?

    Is there a significant difference between ranked height/weight ratios before and after

    the exercise/diet program?

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    32/77

    32

    WEEK 4: October 24th

    ANOVAS

    This practical will involve familiarising students with the analysis of variance

    (ANOVA). The ANOVAs used in this practical are when you may want to determineif there is a significant difference between three or more groups when you have only a

    single variable.

    One-way ANOVA for Independent Samples

    In this case, we want to determine if there is a significant difference in the height to

    weight ratio between the three age groups in the sample in family.sav - children,

    adults and elderly. We also want to carry out a Tukeys post-hoc test to identify where

    those difference lie, if any. The procedure is remarkably similar to carrying out an

    unrelated samples t-test. Go: ANALYZE, COMPARE MEANS, ONE-WAYANOVA

    As you can see, the layout of the dialogue box is basically the same as the one for

    unrelated t-tests from last week. First select your Dependent variable(s) - in this case

    move the variable HWRATIO into the dependent list section. Your factor

    (independent variable) is the variable AGEGRP. Press the Continue button.

    Before running the analysis, press the Post-hoc button and turn on the Tukeys test.

    Now press the Continue and Ok buttons and the analysis will be carried out.

    OUTPUT

    There are two sections to the results for the one-way ANOVA.

    1. The first section indicates whether any significant differences exist between thedifferent levels of the independent variable. The between groups, within groups,

    sums of squares are listed, degrees of freedom, the F-ratio and the F-probability

    score (significance level). It is this last part that indicates significance. If the F-

    prob. is less than 0.05 than a significant difference exist. In this case, the F-prob.

    is 0.000, so we can say that there is a statistically significant difference in height

    to weight ratios between the three age groups.

    2. The post-hoc test identifies where exactly those difference lie. The final part of thesecond section is a small table with the levels of the independent variable listed

    down the side. Looking at the comparisons between these levels we see that

    children have a significantly higher mean height to weight ratio than adults and

    the elderly (this is also indicated by the asterixes).

    For the meantime, ignore the third table of the output.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    33/77

    33

    One-way ANOVA for Related Samples

    The procedure for running this is very different from anything youve done before.

    The first step is easy enough - you need to add a third height to weight ratio variable,

    representing the ratios for the subjects some time after they stopped doing theexercise/diet plan. The data is below:

    Variable Name: HWRATIO3

    Variable Label: Height/Weight Ratio post-plan

    Data: see table below

    Subject Number HWRATIO3 score

    1 .42

    2 .56

    3 .42

    4 .5 .41

    6 .40

    7 .30

    8 .78

    9 .71

    10 .30

    11 .55

    12 .64

    13 .40

    14 .49

    15 .5516 .39

    17 .52

    18 .54

    19 .49

    20 .60

    The first step is to run a single factor ANOVA by going: ANALYZE, GENERAL

    LINEAR MODEL, REPEATED MEASURES

    The dialogue box is different from the usual format. The first step is to give a name tothe factor being analysed, basically the thing the three variables have in common. All

    three variables cover height to weight ratios, so

    in the With-in Subject Factor Name: box type RATIO. in the Number of Levels box, type 3 (representing the three variables) press the Add button, then the Define button

    The next dialogue box is a bit more familiar. In the right-hand column, there are three

    question marks with a number beside each. Select each of the three variables to be

    included in the analysis, and move them across with the arrow button. Notice how

    each of the variables replaces one of the question marks, indicating to SPSS which

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    34/77

    34

    three variables represent the three levels of the factor RATIO. Then proceed by

    clicking on OK.

    OUTPUT

    Firstly, you can ignore the sections of the output titled Multivariate Tests andMauchlys Test of Sphericity.

    You need to examine the section titled Tests of Within-Subjects Effects. This

    section indicates whether any significant differences exist between the different levels

    of the within subjects variable. The degrees of freedom and sums of squares are listed,

    as well as the F-score and its significance level. If the significance level is less than

    0.05 than a significant difference exist. In this case, it is 0.001 (look at the measure for

    sphericity assumed), so we can say that there is a statistically significant difference in

    height to weight ratios between the three times when measurement were taken.

    You can ignore the section titled Tests of Between-Subjects Effects. It is irrelevanthere.

    To do a post-hoc test to identify where the differences lie, the SPSS for Windows

    made easy manual recommends doing Paired-Sample T-tests. In this case

    HWRATIO & HWRATIO2

    HWRATIO & HWRATIO3

    HWRATIO2 & HWRATIO3

    From these three T-tests, you can determine which of the height to weight ratios are

    significantly different from each other.

    Kruskall-Wallis ANOVA (KWANOVAUnrelated)

    This is similar to the non parametric independent ANOVA, where ranks are used

    instead of the actual scores. We will run the analysis on the same variables, so go

    ANALYZE, NONPARAMETRIC TESTS, and K INDEPENDENT SAMPLES

    As with the parametric test, move HWRATIO over to the test (dependent variable listand AGEGRP over to the Grouping (independent) variable list, and define the group

    with a minimum of 1 and a maximum of 3. Click the Ok button. Notice that the non

    parametric ANOVA doesnt have a post-hoc test. If you run this ANOVA, youll

    have to consult a statistics book as to how to do a post hoc on the results. One way

    would be to run a series of t-tests on all the combinations of the conditions.

    OUTPUT

    The first section gives you the mean ranks and the number of cases for each level of

    the independent variable. The second section lists the Chi-Square value, degrees of

    freedom and significance of the test.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    35/77

    35

    Is there a significant difference between the three groups (remember you cant say

    exactly what that difference is without a post hoc test)?

    Friedmans - Related ANOVAs

    This is similar to the nonparametric related samples ANOVA, where ranks are used

    instead of the actual scores. We will run the analysis on the same variables, so go

    ANALYZE, NONPARAMETRIC TESTS, and K RELATED SAMPLES

    This is much easier to run - just move the three variables (HWRATIO, HWRATIO2

    and HWRATIO3) over to the right column and clickOK.

    OUTPUT

    There is the Chi-square score, the d.f. and whether its significant (as usual, has to be

    less than 0.05). Again, for post-hoc tests, youll probably have to consult a statistics

    book or possibly run three non-parametric related samples T-tests.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    36/77

    36

    WEEK 5: 30th

    October

    Study Week

    WEEK 6: November 6th

    No Practical

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    37/77

    37

    WEEK 7: November 14th

    QUALITATIVE RESEARCH: STUDENT SEMINAR

    PRESENTATIONPREPARATIONStudents should use this time to prepare work for their presentations. Dr. Alison will be

    available in his office for guidance if necessary.

    WEEK 8: November 21st

    QUALITATIVE RESEARCH: STUDENT SEMINAR

    WEEK 9: November 28thINTERVIEWING AND DISCOURSE ANALYSIS

    conductig interviews etc

    This period should be used to conduct interviews in preparation for the session on

    content analysis. Students are expected to conduct interviews or sessions that result in

    naturally occurring language. It is important that this material is transcribed in

    preparation for week 11s session. Dr. Alison will be available for consultation.

    WEEK 10: December 5th

    WORKING WITH NATURALLY OCCURING

    LANGUAGE

    PREPARATION

    Students will use this period to work with their material gathered in the previous

    sessions. They should use this time to prepare for presentations in the final practical

    session (12th

    December).

    WEEK 11: December 12th

    WORKING WITH NATURALLY OCCURING

    LANGUAGE: STUDENT SEMINAR

    Students are expected to organise their own seminar presentations in this session on

    the results and methods employed regarding the content analysis of their material.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    38/77

    38

    SECTION III

    EXTRA MATERIAL

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    39/77

    39

    For the benefit of students who wish to follow up other procedures in their own time,

    we have included the following section which gives you some opportunity to play

    with graphics packages and explore some issues associated with regression in

    preparation for next term. Try not to worry if this all sounds unfamiliar at first. This

    section is simply to give you a running start when it comes to your work after

    Christmas.

    REGRESSION

    Simple Regression

    In simple regression, the values of one variable (the dependent variable (y in this

    case)) are estimated from those of another (the independent variable (x in this case))

    by a linear (straight line) equation of the general form:

    y=bo+b1(x)

    where y is the estimated value of y, b1 is the slope (known as the regression

    coefficient)

    and bo is the intercept (known as the regression constant).

    Multiple Regression

    In multiple regression the values of one variable (the dependent variable (y)) are

    estimated form those of two or more variables (the independent variables (x1,

    x2,,xn)). This is achieved by the construction of a linear equation of the general

    form:

    y=bo+b1(x1)+b2(x2)++bn(xn)

    where the parameters b1,b2,,bn are the partial regression coefficients and the

    intercept bo is the regression constant.

    Residuals

    When a regression equation is used to estimate the values of a variable (y) from those

    of one or more independent variables (x), the estimates (y) will not be totally

    accurate (i.e., the data points will not fall precisely on the straight line). The

    discrepancies between y (the actual values) and y (the estimated values) are known

    as residuals and are used as a measure of accuracy of the estimates and of the extent

    to which the regression model gives a good account of the data in question.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    40/77

    40

    The multiple correlation coefficient

    One measure of the efficacy of regression for the prediction of y is the Pearson

    correlation between the true values of the target variable y and the estimates y

    obtained by substituting the corresponding values of x into the regression equation.

    The correlation between y and y is known as the multiple correlation coefficient (R(versus r which is Pearsons (the correlation between the target variable and any one

    independent variable)). In simple regression R takes the absolute value of r between

    the target variable and the independent variable (so if r=-0.87 than R=0.87).

    Running Simple Regression

    Using the family.sav file we want to look at how accurately we can estimate height to

    weight ratios (HWRATIO) using the subjects age (AGE). To run a simple

    regression, choose ANALYSE,REGRESSION and LINEAR.

    As usual, the left column lists all the variables in your data file. There are two sections

    for variables on the right. The Dependent box is where you move the dependent

    variable. Move HWRATIO there. The Independent(s) box is where you move AGE.

    Next click the STATISTICSbutton, and turn on the Descriptive option.

    As already states, a residual is the difference between the actual value of thedependent variable and its predicted value using the regression equation. Analysis

    of the residuals gives a measure of how good the prediction is and whether there

    are any cases that should be considered outliers and therefore dropped from theanalysis. Click on Case-wise diagnostics to obtain a listing of any exceptionally

    large residuals.

    Now click on CONTINUE.

    Now click on the PLOTS button. Since systematic patterns between the predictedvalues and the residuals can indicate possible violations of the assumption of

    linearity you should plot the standardised residuals against the standardised

    predicted values. To do this transfer *ZRESID into the Y: box and *ZPRED into

    the X: box and then ClickCONTINUE.

    Now clickOk.

    Output

    The first thing to consider is whether your data contains any outliers. There are no

    outliers in this data. If there were this would be indicated in a table labelled

    Casewise Diagnostics and the cases that corresponded to these outliers would have

    to be removed from your data file using the filter option you learned previously.

    With that out of the way, the first table (Descriptive Statistics) to look at is right at the

    top. The first part gives the means and standard deviations for the two variables (e.g.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    41/77

    41

    the mean age is 31.77). The next table contains the correlation (Pearsons) for the two

    variables, just as if you had run the correlation procedure. The coefficient is -0.57, so

    it is fairly high and is negative (as one goes up, the other decreases).

    For the meantime, ignore the table labelled variables entered/removed.

    The next important table is Model Summary. The R and R-squared values are

    given for the equation (0.571, as above, and 0.325). Dont worry too much about the

    other values in this table.

    The next table contains the regression ANOVA. This test indicates how good the

    model is - whether there is some overall relationship between the dependent and

    independent variable(s). The key element is the F score. For this regression, the F

    score has an associated p value of 0.017, well below the .05 cut-off. This indicates

    that there is a linear relationship. It should be noted however that only an

    examination of the scatter plot of the variables can confirm that the relationship

    between two variables is linear.

    The next table contains some really important information! The table is labelled

    Coefficients and contains the regression equation. The regression coefficient and

    constant are given in column B of the table. The equation therefore is:

    Predicted height to weight ratio = -.00368(AGE) + .602

    The t value indicates whether each independent variable has a significant individual

    impact on the regression equation. In simple regression, there is only one independent

    variable, and, for this one, it has a significant influence (a t score with an associated p

    value of 0.0168 - notice its the same as the ANOVA score).

    The next section begins with Residual Statistics. This gives means, SDs and other

    information about the unstandardised and standardised predictor and residual scores in

    the regression.

    You could follow up the regression by doing up a scatter plot. Look at your scatter

    plot. Basically, all you need to know is that if the plot shows no obvious pattern than

    this confirms that the assumptions of linearity and homogeneity of variance have been

    met. Where you get into trouble is if the points form a crescent or funnel shape. If

    this is the case, further screening of your data is necessary.

    Multiple Regression

    Often, it is too simplistic to assume that a single independent variable is all that is

    required to make some sort of prediction about the scores for a dependent variable.

    This is where you have to run multiple regression.

    For now, the regression will look at the impact of age (AGE), height to weight ratio

    post-plan (HWRATIO2) and height to weight ratio long after the plan (HWRATIO3)

    on the dependent variable, the subjects initial height to weight ratio (HWRATIO). To

    run the analysis, choose: ANALYSE, REGRESSION and then LINEAR.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    42/77

    42

    As before, move HWRATIO to the Dependent. The Independent(s) box is where

    you move AGE, HWRATIO2 and HWRATIO3. The rest is as before:

    Click the STATISTICSbutton, and turn on the Descriptive option.

    Click on Case-wise diagnostics to obtain a listing of any exceptionally largeresiduals.

    Now click on CONTINUE.

    Now click on the PLOTS button. Since systematic patterns between the predictedvalues and the residuals can indicate possible violation of the assumption of linearity

    you should plot the standardised residuals against the standardised predicted values.

    To do this transfer *ZRESID into the Y: box and *ZPRED into the X: box and then

    clickCONTINUE.

    Now clickOK.

    Note: we are only doing a general, all-inclusive multiple regression. There is a box

    located directly beneath the Independent(s) box called Method which gives you a

    series of additional methods for running the statistics - stepwise, remove, forward and

    backward.

    Output

    Again, the first thing to look for is outliers. Again, there are none.

    With that out of the way, the next section to look at is at the top. Everything that

    follows is the same as for the simple regression. The first part gives the means and

    standard deviations for the four variables (e.g. the mean HWRATIO3 is .526). The

    next part gives the correlation (Pearsons) for all of the variables. You can see that

    HWRATIO is strongly correlated with the two other height-to-weight ratio variables

    (i.e., both over .9).

    The next section is under the heading Model Summary. The R and R-squared

    values are given for the equation (.98 and .967).

    An ANOVA is carried out that indicates how good the model is - whether there issome overall relationship between the dependent and all of the independent variables.

    The key element is the F score. The F score is significant (p=0.00), so there is a strong

    overall relationship.

    The next table (Coefficients) contains information that indicates the individual role of

    each independent variable. The values in the column labelled B give the scores to put

    into the regression equation:

    y = b1(x1) + b2(x2) + b3(x3) + bo

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    43/77

    43

    For this regression, then, the regression equation is

    HWRATIO = -.0009(AGE) + .99(HWRATIO2) -.135(HWRATIO3) + 0.063

    Note that since the B score for HWRATIO3 is negative, the plus sign turns to a minusone.

    The t-test indicates that AGE, as before, is a significant predictor, as is HWRATIO2,

    but that HWRATIO3 as a single predictor has no significant influence (p>0.05).

    The next section is labelled Residual Statistics. This gives means, SDs and other

    information about the unstandardised and standardised predictor and residual scores in

    the regression. You should have been taught what, if anything, to do with them.

    Scatter plots and Regression Lines

    A regression line can easily be added to a scatter plot. As before, to create a

    scatterplot go to GRAPH and SCATTER.

    You want to leave the graph layout as simple, so just click the DEFINE button.

    Move HEIGHT into x-axis box. Move WEIGHT into Y-axis. Now, click the TITLE

    button. You can now put in a title in the Line 1 box. You can add an additional title

    and sub-title lines if you want. Now press the CONTINUE button and then click the

    OK button. The graph should now appear. The window where all the graphs are

    stored is called the Chart Carousel, and can be saved as a separate file. The extension

    for chart files is always .cht

    What is the line of best fit and what does the value of R2tell you?

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    44/77

    44

    Chi-Square

    There are two ways to run Chi-square. The first is when looking at differences in

    frequencies across levels in one variable. In this case, we want to see if there are

    differences in the frequencies for the three levels of the variable AGEGRP (age

    groups) - child, adult and elderly. You do this through:

    Analyze

    Nonparametric Tests

    Chi-Square

    To run a basic Chi-Square, just move the variable(s) to analyse across and clickOk. In

    this case, move the variable AGEGRP over and run the analysis.

    [NOTE: If youre interested in the various options, information about them can be

    found by pressing theHelp button when you are in a dialogue box]

    OUTPUT

    The results present the observed and expected frequencies for each of the three levels,

    as well as the Chi-Square value, the degrees of freedom (d.f.) and the significance

    level. Is there a difference between the three groups in terms of their observed

    frequencies?

    The second way to run a Chi-Square is when carrying out a crosstab. The only change

    is that before running the crosstab, you have to turn the Chi-Square option on.

    So, go

    Analyze

    Descriptive Statistics

    Crosstabs

    Move the variables NSEX in the column box and NCARS in the row box. Make sure

    to turn on the Chi-Square option, by clicking the Statistics button, and turning on the

    Chi-Square option. Press the Continue and Okbuttons, then run the analysis.

    OUTPUT

    The crosstabs box is displayed, along with a variety of results. The one to be

    concerned with is the significance level for the Pearsons value.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    45/77

    45

    Microsoft Word Exercises

    This exercise shows you how to copy and format a document. To save time heres one

    we prepared earlier. A cast list is given below. Your task is to format the document

    (top of page 80) into an organised piece of work (bottom of page 80). As you do this,note the different techniques you use - they will come in handy as the course

    progresses.

    These are hands on sessions, meaning that you should be discovering what to do

    yourself. Of course, if you have any difficulties then we are here to assist you. Good

    luck, and remember the Help facility.

    The Help Facility

    Normally you will want to go to the Help menu, then choose Contents and Index.

    Click on Index and type in a relevant key word.

    The Opening Screen

    Word offers a number of ways of viewing the document. The most usual is Normal.

    So, go to the View menu, and select Normal.

    Alternatively, use the shortcut button at the bottom left of the screen. If you are not

    sure what a particular button does then you should hold the pointer arrow over the

    button for a second or two without pressing anything. Word will then give a short

    description of the button.

    The other view often used is Page Layout, which shows how the page will be printed.

    Using Zoom from the View menu will allow you to enlarge the screen.

    Opening Files

    Were going to be e-mailing you two documents entitled play.doc and actone.txt.

    Open up the e-mail, and then, one at a time, click the Word icons once with your right

    mouse button. Now save the documents by clicking on Save. Find the Msoffice icon

    and click on it. Now save your documents in the Msoffice folder under suitable

    names (e.g. play.doc and actone.txt).

    Now go into word. To open the file you just saved go to File, Open, click on your M:

    drive and find your Msoffice folder. Click on the Msoffice folder and find play.doc.

    Double click on play.doc.

    Hidden Codes

    Certain characters or text in Word are hidden. That is to say, they will only appear

    on the screen but not in the final printed version. To turn this option on and off, click

    on the reversed P button on the toolbar. That marker is the paragraph marker,

    denoting a new line (hard return). Turn the hidden codes off.

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    46/77

    46

    Now show the hidden codes. Can you spot the deliberate mistake? Yes, its one of

    those errors in conversion. Double click on {PRIVATE} and the whole word is

    selected. (This is a handy trick worth noting). Delete this word.

    Correct any other deliberate mistakes.

    Page Layout

    The original document has margins of 2 inches. Make sure the measurement units are

    in inches by going to Tools, Options, General, and clicking inches in the box called

    measurement units if it isnt already done so. Now go to File, Page Setup and

    increase left and right margins to 2 inches from the Margins option. If it asks you if

    you want your margins fixed respond with yes. Also note that under Paper Size you

    can change the orientation of the paper. Briefly, portrait is upright (for text mainly)

    and landscape is horizontal (for graphs and pictures).

    To change the justification, select everything by going to Edit, Select All or bydragging the mouse over the whole document (only if its a small document). Now ,

    click the right mouse button over any part of the selected area. Choose Paragraph

    from the menu that appears and choose Justified from the Alignment option. You

    can also do this from the toolbar. Centre alignment is useful for headings. Change

    The Play to centre alignment.

    Formatting

    Italicise What The Butler Saw by clicking on What and dragging over the other

    three words. Now use the toolbar to italicise by hitting the I button.

    Highlight all of the text and change the font size to 12 pts.

    Type in the other characters, leaving a space between the character and actor names.

    Similarly, change the characters names to small caps by selecting the name and using

    the right mouse button. From Font choose the Small caps option. For the other

    character names, simply select the name and go to Edit, Repeat. Select the cast and

    add a tab into the ruler at two inches by double clicking on the ruler at the two inch

    mark. Place your cursor before Stanley. Go to Format and Tabs and add the leader

    option 2 (i.e. lots of full stops). Press Ok and then press the tab key. Do this for each

    cast member. For the director and designer, the tab is set at 1.5 inches with no leader.

    Separate the pieces of text with two hard returns and dont forget to save yourwork.

    The Play

  • 8/2/2019 Alison (2002) Research Methods & Statistics Handbook - Introduction to Spss

    47/77

    47

    The first London performance of What The Butler Saw was given at the Queens Theatre byLewnstein-Delfont Ltd and H.M. Tennnant on 5th March, 1969, with the following cast in order of

    appearance.

    Dr Prentice Stanley Baxter

    Geraldine Barclay Julia FosterMrs Prentice Coral Browne

    Directed by Robert Chetwyn

    Designed by Hutchinson Scott

    The final version should look something like this:

    The Play

    The first London performance ofWhat The Butler Saw was given at theQueens Theatre by Lewnstein-Delfont Ltd and H.M. Tennent Ltd. on 5th

    March, 1969, with the following cast in order of appearance:

    DR PRENTICE .................. Stanley Baxter

    GERALDINE BARCLAY ..... Julia Foster

    MRS PRENTICE ................ Coral Browne

    NICHOLAS BECKETT ........ Hayward Morse

    DR RANGE ....................... Ralph Richardson

    SERGEANT MATCH .......... Peter Bayliss

    Directed by Robert Chetwyn

    Designed by Hutchinson Scott

  • 8/2/2019 Alison