43

Click here to load reader

SPSS-SYNTAX

Embed Size (px)

Citation preview

Page 1: SPSS-SYNTAX

2- DAYS WORKSHOP ON SPSS SYNTAX (28th and 29th October,

2010)Organized by: Indian Institute of

Psychometry, Kolkata

Dr. Debdulal Dutta Roy, Ph.D.Psychology Research UnitIndian Statistical Institute,

KolkataDr. D. Dutta Roy, ISI., Kolkata

Page 2: SPSS-SYNTAX

What is SPSS ? Initially, SPSS is

considered as statistical package for social sciences. But it is noted that SPSS is used by many non social scientists. Therefore it is considered as software for statistical data analysis. Now, SPSS is managed by IBM.

ICONS OF SPSS

Dr. D. Dutta Roy, ISI., Kolkata

Page 3: SPSS-SYNTAX

SPSS facilities The software includes several facilities as

File management creating new file, opening spss formatted file, extracting non SPSS

file, merging file, splitting file, transposing data Variable management

creating new variables, recoding variable Case management

adding cases, select cases, sorting cases Text data analysis or Text analytics

text categorization, text clustering, concept/entity extraction, document summarization, and entity relation modeling (i.e., learning relations between named entities).

Numeric data analysis Describing the data, data quality or fitting the data into statistical

models, data association, data clustering, data reliability and validity using different statistical tools.

Dr. D. Dutta Roy, ISI., Kolkata

Page 4: SPSS-SYNTAX

SPSS WORKSHEET Variable view Data view Create variables :

Name : Type : String, Numeric, Comma and others Width : Length of digit Decimal: Label: Meaning of variable code name Values: m=male, f=female or 1=male and 2=female Missing: np/ 9/99/ extreme values Columns : Align : left, right, center Measure: nominal, ordinal, scale

Dr. D. Dutta Roy, ISI., Kolkata

Page 5: SPSS-SYNTAX

Assignment In SPSS worksheet

Prepare worksheet with five variables as gender, first name, middle name , surname and age.

Prepare list of names. Examine their distribution using graphs and

tables. Retrieving data from excel Retrieving data from note pad

Write in this way <Ms., Ratna, kumari, Roy, 25> in the note pad. Retrieve the list using SPSS command

Dr. D. Dutta Roy, ISI., Kolkata

Page 6: SPSS-SYNTAX

AssignmentCross tabulation is useful to determine

association of two categorical variables.

Prepare spss worksheet to compute cross tabulation between gender and anxiety.

Use both text and numeric data.Compute chi-square.

Dr. D. Dutta Roy, ISI., Kolkata

Page 7: SPSS-SYNTAX

Solution

Dr. D. Dutta Roy, ISI., Kolkata

Page 8: SPSS-SYNTAX

Summary -1 SPSS is useful software for analysis of both

text and numeric data. SPSS worksheet has two windows – data

window and value window. Later is used to customize the variable.

The data saved in SPSS file can be transformed to Excel or text.

Again, the data saved in Excel or in text format can be retrieved into SPSS worksheet.

Dr. D. Dutta Roy, ISI., Kolkata

Page 9: SPSS-SYNTAX

SPSS - SYNTAX

Dr. D. Dutta Roy, ISI., Kolkata

Page 10: SPSS-SYNTAX

What is SPSS-Syntax ? Syntax is a set of rules that are associated with

the language or command. SPSS syntax is useful for data management and archiving the procedure of data analysis. In the dissertation, presence of syntax helps examiner to understand the procedure followed by the researcher.

The syntax can be written in notepad and in word document. SPSS syntax is the alternative to the point and click mode.

It is more user friendly as user can do repetitive tasks using syntax and can see what procedures are followed by him for data analysis.

Dr. D. Dutta Roy, ISI., Kolkata

Page 11: SPSS-SYNTAX

Problems of point and click Point and click procedure provides many

information. Sometimes they are not relevant to researcher. Researcher can restrict analytical information according to needs.

Point and click procedure varies with different interfaces or versions of SPSS. But syntax works well in almost all the versions.

Statistical tool not available in SPSS can be developed by syntax if author knows how to write syntax for example, moderated regression analysis.

Dr. D. Dutta Roy, ISI., Kolkata

Page 12: SPSS-SYNTAX

Syntax error A syntax error occurs when the

researcher or individual who wrote the code had not followed the rules of the language, the flow chart, causing the program to fail.

The common error is missing terminator and columns for the command line. General command is first line starts at the first column and the others are in the second line starts at second column.

Dr. D. Dutta Roy, ISI., Kolkata

Page 13: SPSS-SYNTAX

Syntax window

Command

Terminator

Dr. D. Dutta Roy, ISI., Kolkata

Page 14: SPSS-SYNTAX

ASSIGNMENT Write the below in syntax window and run

the program. DESCRIPTIVES VARIABLES = ABANY

ABDEFECT ABHLTH ABNOMORE ABPOOR ABRAPE ABSINGLE ADULTS AGE

/STATISTICS=MEAN STDDEV.Observation: Do you get your results ? If not, what is

missing ?Put terminators in both lines and run the

program. What is your observation ?Can you find out continuation line ? Dr. D. Dutta Roy, ISI., Kolkata

Page 15: SPSS-SYNTAX

Summary -2 Syntax rule guides program in

analysis of data according to user needs.

Statements are written systematically following syntax rules in syntax window .

One can control unnecessary output by using syntax.

Dr. D. Dutta Roy, ISI., Kolkata

Page 16: SPSS-SYNTAX

FLOW CHART

Dr. D. Dutta Roy, ISI., Kolkata

Page 17: SPSS-SYNTAX

What is flow chart ? The flowchart is a means of visually

presenting the flow of data through an information processing systems, the operations performed within the system and the sequence in which they are performed.

Dr. D. Dutta Roy, ISI., Kolkata

Page 18: SPSS-SYNTAX

Standard symbols Start or end of the

programComputational steps or

processing function of a program

Input or output operationDecision making and

branchingConnector or joining of

two parts of programDr. D. Dutta Roy, ISI., Kolkata

Page 19: SPSS-SYNTAX

Guidelines of flow charting In drawing a proper flowchart, all necessary

requirements should be listed out in logical order. The flowchart should be clear, neat and easy to follow.

There should not be any room for ambiguity in understanding the flowchart.

The usual direction of the flow of a procedure or system is from left to right or top to bottom.

Only one flow line should come out from a process symbol.

Only one flow line should enter a decision symbol, but two or three flow lines, one for each possible answer, should leave the decision symbol.

Only one flow line is used in conjunction with terminal symbol.

Write within standard symbols briefly. As necessary, you can use the annotation symbol to describe data or computational steps more clearly.

If the flowchart becomes complex, it is better to use connector symbols to reduce the number of flow lines. Avoid the intersection of flow lines if you want to make it more effective and better way of communication.

Ensure that the flowchart has a logical start and finish.

It is useful to test the validity of the flowchart by passing through it with a simple test data.

Reference: http://www.nos.org/htm/basic2.htm

Dr. D. Dutta Roy, ISI., Kolkata

Page 20: SPSS-SYNTAX

Flow chart of correlations INPUT TWO

SETS OF METRIC DATA

IS THERE MISSING DATA ? DELETE

IS THERE OUTLIER ?

Y

YN

IS STANDARD DEVIATION = 0 ?

YN

DO CORRELATIONS

N

Dr. D. Dutta Roy, ISI., Kolkata

Page 21: SPSS-SYNTAX

Summary - 3Use of any statistical tool requires set

of specific assumptions. Flow chart helps us to incorporate all the assumptions systematically. This will reduce errors in data analysis.

Therefore, syntax writer should study thoroughly all the assumptions and their systematic uses before selection of statistical tool in analysis.

Dr. D. Dutta Roy, ISI., Kolkata

Page 22: SPSS-SYNTAX

SYNTAX RULES

Dr. D. Dutta Roy, ISI., Kolkata

Page 23: SPSS-SYNTAX

CommandEach command must begin in the first

column of a new line.Continuation lines must be indented at least one space.The period at the end of the command is optional.

If you generate command syntax by pasting dialog box choices into a syntax window, the format of the commands is suitable for any mode of operation.

Dr. D. Dutta Roy, ISI., Kolkata

Page 24: SPSS-SYNTAX

Variable namesVariable names ending in a period can cause errors in

commands created by the dialog boxes. You cannot create such variable names in the dialog boxes, and you should generally avoid them.

SPSS command syntax is case insensitive, and three-letter abbreviations can be used for many command specifications. You can use as many lines as you want to specify a single command. You can add space or break lines at almost any point where a single blank is allowed, such as around slashes, parentheses, arithmetic operators, or between variable names. For example,

FREQUENCIES VARIABLES=JOBCAT GENDER /PERCENTILES=25 50 75 /BARCHART.andfreq var=jobcat gender /percent=25 50 75 /bar.Dr. D. Dutta Roy, ISI., Kolkata

Page 25: SPSS-SYNTAX

Creating new variable There are some

situations where in new variable is to be created in research. For example, you are interested to add or multiply some weight to any variable or you want to multiply two variables.

Use COMPUTE command

EXERCISE* age2 is new variableCOMPUTE age2=Age - 5.EXECUTE.DESCRIPTIVES

VARIABLES=age, age2/STATISTICS=MEAN

STDDEV MIN MAX.Descriptive Statistics

 N

Minimum

Maximum Mean

Std. Deviatio

n

Age 542 7 15 9.54 1.117

age2 542 2 10 4.5406 1.11667

Valid N (listwise) 542        Dr. D. Dutta Roy, ISI., Kolkata

Page 26: SPSS-SYNTAX

Finding out lost fileResearcher sometimes forgets the

location of file using click menu. He can find the file using ‘GET FILE’ syntax.

Get the file File>new>syntaxWrite below syntax GET FILE=‘c:\windows\desktop\

ddr.sav’.Dr. D. Dutta Roy, ISI., Kolkata

Page 27: SPSS-SYNTAX

Check your file You can check validity of lost file using DISPLAY

command. This will help you to get the variable names.

GET FILE='E:\ses_data_final.sav'.* Display all variablesDISPLAY./* Display data of all variablesLIST/* Display data of single variableLIST VARIABLES = <var1>. Here * is used for beginning comment and /* is

used for middle comment. Dr. D. Dutta Roy, ISI., Kolkata

Page 28: SPSS-SYNTAX

Data checking by total score Data checking is made using if

command. Box 8.5 represents syntax for checking the data. Here is the assumption that total score should not be more than 10. Therefore the command ‘if(total>10) t2=9’ is used. After the if command, execute command with period sign (.) is necessary. Output file is saved in the specific location finally.

ExerciseGET File= 'E:\

ses_data_final.sav'.if(total>10) t2=9.Execute.LIST variables=name,

total, t2. save outfile='e:\

sesout.sav'.

OutputNAME total t2

TANIA PARVIN 8 .00

BACCHU MONDAL 9 .00

HABIBUL ISLAM 9 .00

KARIM RAHAMAN 10 .00

AKTAR HUSSAIN 10 .00

LALTU MONDAL 10 .00

RAHIM RAHAMAN 10 .00

NOOR ALAM 10 .00

***** 11 9.00

SADIK JAMAL 12 9.00

TAJMIR KHATUN 8 .00

FIROJ MONDAL . .

Dr. D. Dutta Roy, ISI., Kolkata

Page 29: SPSS-SYNTAX

Is your data good for analysis ?

Data entry error is a serious concern for analysis of data. Extreme data or outlier is assumed as error. Presence of outlier sometimes changes mean and standard deviation. SD becomes higher than mean. It is not necessary to delete the outlier first as outlier sometimes provide valid information. It gives you information about inequality in distribution of data. But finding out the outlier is important. Box whisker plot is useful to find out outlier.

Write this in syntax window:EXAMINE VARIABLES=abany 

abdefect  /COMPARE VARIABLE  /PLOT=BOXPLOT  /STATISTICS=NONE  /NOTOTAL  /MISSING=LISTWISE. Another way is to study

frequencies of variables.Frequencies variables=abany.

Dr. D. Dutta Roy, ISI., Kolkata

Page 30: SPSS-SYNTAX

How can you find out case error?

Box-whisker plot sometimes can not find out the cases who have done systematic error. Suppose you have collected job satisfaction data using five point rating scale of 20 items where in 10 items are in reverse. And one case assigns 3 across all the items. Box plot can not locate the case.

Under such condition, you can transpose the data and compute mean and SD for each case. Case error can be identified if SD is 0.00 or is higher than mean. By using FLIP command you can transpose the data.

EXERCISE

FLIP VARIABLES=DESCRIPTIVES

VARIABLES=

Dr. D. Dutta Roy, ISI., Kolkata

Page 31: SPSS-SYNTAX

Relational operator Relational operator is

used to compare values. It is used with if command

A relation is a logical expression that compares two values using a relational operator. In the command

IF (X EQ 0) Y=1 the variable X and 0 are expressions that yield the values to be compared by the EQ relational operator. The following are the relational operators:

Symbol Definition

EQ or = Equal to

NE or ~= or ¬ = or <> Not equal to

LT or < Less than

LE or <= Less than or equal to

GT or > Greater than

GE or >= Greater than or equal to

Dr. D. Dutta Roy, ISI., Kolkata

Page 32: SPSS-SYNTAX

Select case When researcher wants to compute

specific statistics for specific cases, the command select case is useful.

SELECT IF (AGE=8).DESCRIPTIVES VARIABLES=ACH.

Dr. D. Dutta Roy, ISI., Kolkata

Page 33: SPSS-SYNTAX

Command to filter variableResearcher can analyze the data of specific group. Box 8.2

shows syntax for descriptive statistics of age for the cases who are living in specific block of district (code=1).  

USE ALL.COMPUTE filter_$=(Block_code=1).VARIABLE LABEL filter_$ 'Block_code=1 (FILTER)'.VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.FORMAT filter_$ (f1.0).FILTER BY filter_$.EXECUTE.DATASET ACTIVATE DataSet1.DESCRIPTIVES variables=age. 

Dr. D. Dutta Roy, ISI., Kolkata

Page 34: SPSS-SYNTAX

Summary -4Syntax rules are important to write

the programs in syntax window.By writing the programs, one can

import and export file, check file, list variables, evaluate data entry error, create new variable, select case and filter variable.

Dr. D. Dutta Roy, ISI., Kolkata

Page 35: SPSS-SYNTAX

STATISTICAL ANALYSIS

Dr. D. Dutta Roy, ISI., Kolkata

Page 36: SPSS-SYNTAX

Item-item correlation of five point rating scale

GET FILE='C:\Users\ddroy\Desktop\IIP_SPSS

syntax_workshop\innovation data.sav'. CORRELATIONS /VARIABLES=AW1 AW2 AW6 AW10

AW18 AW19 /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE.

There are 6 items measuring awareness of environment. It is assumed that 6 items are related to each other. One can use AW1 TO AW19 also.

This program assesses inter correlation among 6 items.

Pair wise missing data are deleted and level of significance is shown.

Two tail is applicable when direction of relationship is not pre assumed.

NOSIG is used to flag significant values. Dr. D. Dutta Roy, ISI., Kolkata

Page 37: SPSS-SYNTAX

Item total correlations

GET   FILE='C:\Users\ddroy\Desktop\

IIP_SPSS syntax_workshop\innovation data.sav'.

compute total=AW1+ AW2+ AW6 +AW10 +AW18+ AW19.

 CORRELATIONS   /

VARIABLES=AW1 to  AW19, total

   /PRINT=TWOTAIL NOSIG   /MISSING=PAIRWISE.

Compute command is used to determine total score. Later it is used for item total correlation.

Dr. D. Dutta Roy, ISI., Kolkata

Page 38: SPSS-SYNTAX

Multiple regression

GET FILE='C:\Users\ddroy\Desktop\IIP_SPSS

syntax_workshop\innovation data.sav'.compute total=AW1+ AW2+ AW6 +AW10

+AW18+ AW19.

REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA

CHANGE /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT total /METHOD=ENTER AW1 AW2 AW6 AW10

AW18.

Run command should select all otherwise total score will not be used.

In this model total score is predicted by each item.

Dr. D. Dutta Roy, ISI., Kolkata

Page 39: SPSS-SYNTAX

Mean differencesWhen data were collected from

two different groups. Command of independent t-test is

T-TEST GROUPS=IC3(3)  /MISSING=LISTWISE  /VARIABLES=total  /CRITERIA=CI(.9500).

Here IC3 is independent variable and total is dependent variable.

Ic3 (3) indicates 3 as cut off points to make two different groups.

IC3(1 2) indicates categorization based on value 1 and 2.

Dr. D. Dutta Roy, ISI., Kolkata

Page 40: SPSS-SYNTAX

Chi-square statistics CROSSTABS /TABLES=AW1 BY AW2 /FORMAT=AVALUE TABLES /STATISTICS=CHISQ PHI /CELLS=COUNT /COUNT ROUND CELL.

This examines association between items . For multiple items command is

TABLES=AW1 BY AW2 AW10 AW18 AW19 AW6

In above AW1 IS ROW AND OTHERS ARE IN COL.

Dr. D. Dutta Roy, ISI., Kolkata

Page 41: SPSS-SYNTAX

One-WAY ANOVA

ONEWAY total BY EXP/MISSING ANALYSIS.

Here total is dependent variable

EXP is independent variable.

Dr. D. Dutta Roy, ISI., Kolkata

Page 42: SPSS-SYNTAX

COMPUTE SIZE OF SAMPLE/*-----------------------------

GETTING INPUT FILE------------------------------------------  .

GET   FILE='C:\Users\ddroy\Desktop\

IIP_SPSS syntax_workshop\innovation data.sav'.

/*-----------------------------SIZE OF SAMPLE ------------------------------------------  .

compute n=0.compute n=n+1.descriptives n, AW1.

n=0 indicates initialization. N=n+1 indicates summing

value following loop.DESCRIPTIVES <n, AW1>

indicates comparison between computed n and aw1.

Here AW1 (numeric type and scaling measure) is used to verify the computed N or size of sample.

Dr. D. Dutta Roy, ISI., Kolkata

Page 43: SPSS-SYNTAX

Summary - 5SPSS-Syntax makes the researcher

more systematic in analysis of data. Researcher can fulfill all the assumptions of statistical tool systematically by writing the programs.

The compute command is very powerful as it assists researcher to write own program for analysis of data. Dr. D. Dutta Roy, ISI., Kolkata