Click here to load reader
Upload
d-dutta-roy
View
6.695
Download
2
Embed Size (px)
Citation preview
2- DAYS WORKSHOP ON SPSS SYNTAX (28th and 29th October,
2010)Organized by: Indian Institute of
Psychometry, Kolkata
Dr. Debdulal Dutta Roy, Ph.D.Psychology Research UnitIndian Statistical Institute,
KolkataDr. D. Dutta Roy, ISI., Kolkata
What is SPSS ? Initially, SPSS is
considered as statistical package for social sciences. But it is noted that SPSS is used by many non social scientists. Therefore it is considered as software for statistical data analysis. Now, SPSS is managed by IBM.
ICONS OF SPSS
Dr. D. Dutta Roy, ISI., Kolkata
SPSS facilities The software includes several facilities as
File management creating new file, opening spss formatted file, extracting non SPSS
file, merging file, splitting file, transposing data Variable management
creating new variables, recoding variable Case management
adding cases, select cases, sorting cases Text data analysis or Text analytics
text categorization, text clustering, concept/entity extraction, document summarization, and entity relation modeling (i.e., learning relations between named entities).
Numeric data analysis Describing the data, data quality or fitting the data into statistical
models, data association, data clustering, data reliability and validity using different statistical tools.
Dr. D. Dutta Roy, ISI., Kolkata
SPSS WORKSHEET Variable view Data view Create variables :
Name : Type : String, Numeric, Comma and others Width : Length of digit Decimal: Label: Meaning of variable code name Values: m=male, f=female or 1=male and 2=female Missing: np/ 9/99/ extreme values Columns : Align : left, right, center Measure: nominal, ordinal, scale
Dr. D. Dutta Roy, ISI., Kolkata
Assignment In SPSS worksheet
Prepare worksheet with five variables as gender, first name, middle name , surname and age.
Prepare list of names. Examine their distribution using graphs and
tables. Retrieving data from excel Retrieving data from note pad
Write in this way <Ms., Ratna, kumari, Roy, 25> in the note pad. Retrieve the list using SPSS command
Dr. D. Dutta Roy, ISI., Kolkata
AssignmentCross tabulation is useful to determine
association of two categorical variables.
Prepare spss worksheet to compute cross tabulation between gender and anxiety.
Use both text and numeric data.Compute chi-square.
Dr. D. Dutta Roy, ISI., Kolkata
Solution
Dr. D. Dutta Roy, ISI., Kolkata
Summary -1 SPSS is useful software for analysis of both
text and numeric data. SPSS worksheet has two windows – data
window and value window. Later is used to customize the variable.
The data saved in SPSS file can be transformed to Excel or text.
Again, the data saved in Excel or in text format can be retrieved into SPSS worksheet.
Dr. D. Dutta Roy, ISI., Kolkata
SPSS - SYNTAX
Dr. D. Dutta Roy, ISI., Kolkata
What is SPSS-Syntax ? Syntax is a set of rules that are associated with
the language or command. SPSS syntax is useful for data management and archiving the procedure of data analysis. In the dissertation, presence of syntax helps examiner to understand the procedure followed by the researcher.
The syntax can be written in notepad and in word document. SPSS syntax is the alternative to the point and click mode.
It is more user friendly as user can do repetitive tasks using syntax and can see what procedures are followed by him for data analysis.
Dr. D. Dutta Roy, ISI., Kolkata
Problems of point and click Point and click procedure provides many
information. Sometimes they are not relevant to researcher. Researcher can restrict analytical information according to needs.
Point and click procedure varies with different interfaces or versions of SPSS. But syntax works well in almost all the versions.
Statistical tool not available in SPSS can be developed by syntax if author knows how to write syntax for example, moderated regression analysis.
Dr. D. Dutta Roy, ISI., Kolkata
Syntax error A syntax error occurs when the
researcher or individual who wrote the code had not followed the rules of the language, the flow chart, causing the program to fail.
The common error is missing terminator and columns for the command line. General command is first line starts at the first column and the others are in the second line starts at second column.
Dr. D. Dutta Roy, ISI., Kolkata
Syntax window
Command
Terminator
Dr. D. Dutta Roy, ISI., Kolkata
ASSIGNMENT Write the below in syntax window and run
the program. DESCRIPTIVES VARIABLES = ABANY
ABDEFECT ABHLTH ABNOMORE ABPOOR ABRAPE ABSINGLE ADULTS AGE
/STATISTICS=MEAN STDDEV.Observation: Do you get your results ? If not, what is
missing ?Put terminators in both lines and run the
program. What is your observation ?Can you find out continuation line ? Dr. D. Dutta Roy, ISI., Kolkata
Summary -2 Syntax rule guides program in
analysis of data according to user needs.
Statements are written systematically following syntax rules in syntax window .
One can control unnecessary output by using syntax.
Dr. D. Dutta Roy, ISI., Kolkata
FLOW CHART
Dr. D. Dutta Roy, ISI., Kolkata
What is flow chart ? The flowchart is a means of visually
presenting the flow of data through an information processing systems, the operations performed within the system and the sequence in which they are performed.
Dr. D. Dutta Roy, ISI., Kolkata
Standard symbols Start or end of the
programComputational steps or
processing function of a program
Input or output operationDecision making and
branchingConnector or joining of
two parts of programDr. D. Dutta Roy, ISI., Kolkata
Guidelines of flow charting In drawing a proper flowchart, all necessary
requirements should be listed out in logical order. The flowchart should be clear, neat and easy to follow.
There should not be any room for ambiguity in understanding the flowchart.
The usual direction of the flow of a procedure or system is from left to right or top to bottom.
Only one flow line should come out from a process symbol.
Only one flow line should enter a decision symbol, but two or three flow lines, one for each possible answer, should leave the decision symbol.
Only one flow line is used in conjunction with terminal symbol.
Write within standard symbols briefly. As necessary, you can use the annotation symbol to describe data or computational steps more clearly.
If the flowchart becomes complex, it is better to use connector symbols to reduce the number of flow lines. Avoid the intersection of flow lines if you want to make it more effective and better way of communication.
Ensure that the flowchart has a logical start and finish.
It is useful to test the validity of the flowchart by passing through it with a simple test data.
Reference: http://www.nos.org/htm/basic2.htm
Dr. D. Dutta Roy, ISI., Kolkata
Flow chart of correlations INPUT TWO
SETS OF METRIC DATA
IS THERE MISSING DATA ? DELETE
IS THERE OUTLIER ?
Y
YN
IS STANDARD DEVIATION = 0 ?
YN
DO CORRELATIONS
N
Dr. D. Dutta Roy, ISI., Kolkata
Summary - 3Use of any statistical tool requires set
of specific assumptions. Flow chart helps us to incorporate all the assumptions systematically. This will reduce errors in data analysis.
Therefore, syntax writer should study thoroughly all the assumptions and their systematic uses before selection of statistical tool in analysis.
Dr. D. Dutta Roy, ISI., Kolkata
SYNTAX RULES
Dr. D. Dutta Roy, ISI., Kolkata
CommandEach command must begin in the first
column of a new line.Continuation lines must be indented at least one space.The period at the end of the command is optional.
If you generate command syntax by pasting dialog box choices into a syntax window, the format of the commands is suitable for any mode of operation.
Dr. D. Dutta Roy, ISI., Kolkata
Variable namesVariable names ending in a period can cause errors in
commands created by the dialog boxes. You cannot create such variable names in the dialog boxes, and you should generally avoid them.
SPSS command syntax is case insensitive, and three-letter abbreviations can be used for many command specifications. You can use as many lines as you want to specify a single command. You can add space or break lines at almost any point where a single blank is allowed, such as around slashes, parentheses, arithmetic operators, or between variable names. For example,
FREQUENCIES VARIABLES=JOBCAT GENDER /PERCENTILES=25 50 75 /BARCHART.andfreq var=jobcat gender /percent=25 50 75 /bar.Dr. D. Dutta Roy, ISI., Kolkata
Creating new variable There are some
situations where in new variable is to be created in research. For example, you are interested to add or multiply some weight to any variable or you want to multiply two variables.
Use COMPUTE command
EXERCISE* age2 is new variableCOMPUTE age2=Age - 5.EXECUTE.DESCRIPTIVES
VARIABLES=age, age2/STATISTICS=MEAN
STDDEV MIN MAX.Descriptive Statistics
N
Minimum
Maximum Mean
Std. Deviatio
n
Age 542 7 15 9.54 1.117
age2 542 2 10 4.5406 1.11667
Valid N (listwise) 542 Dr. D. Dutta Roy, ISI., Kolkata
Finding out lost fileResearcher sometimes forgets the
location of file using click menu. He can find the file using ‘GET FILE’ syntax.
Get the file File>new>syntaxWrite below syntax GET FILE=‘c:\windows\desktop\
ddr.sav’.Dr. D. Dutta Roy, ISI., Kolkata
Check your file You can check validity of lost file using DISPLAY
command. This will help you to get the variable names.
GET FILE='E:\ses_data_final.sav'.* Display all variablesDISPLAY./* Display data of all variablesLIST/* Display data of single variableLIST VARIABLES = <var1>. Here * is used for beginning comment and /* is
used for middle comment. Dr. D. Dutta Roy, ISI., Kolkata
Data checking by total score Data checking is made using if
command. Box 8.5 represents syntax for checking the data. Here is the assumption that total score should not be more than 10. Therefore the command ‘if(total>10) t2=9’ is used. After the if command, execute command with period sign (.) is necessary. Output file is saved in the specific location finally.
ExerciseGET File= 'E:\
ses_data_final.sav'.if(total>10) t2=9.Execute.LIST variables=name,
total, t2. save outfile='e:\
sesout.sav'.
OutputNAME total t2
TANIA PARVIN 8 .00
BACCHU MONDAL 9 .00
HABIBUL ISLAM 9 .00
KARIM RAHAMAN 10 .00
AKTAR HUSSAIN 10 .00
LALTU MONDAL 10 .00
RAHIM RAHAMAN 10 .00
NOOR ALAM 10 .00
***** 11 9.00
SADIK JAMAL 12 9.00
TAJMIR KHATUN 8 .00
FIROJ MONDAL . .
Dr. D. Dutta Roy, ISI., Kolkata
Is your data good for analysis ?
Data entry error is a serious concern for analysis of data. Extreme data or outlier is assumed as error. Presence of outlier sometimes changes mean and standard deviation. SD becomes higher than mean. It is not necessary to delete the outlier first as outlier sometimes provide valid information. It gives you information about inequality in distribution of data. But finding out the outlier is important. Box whisker plot is useful to find out outlier.
Write this in syntax window:EXAMINE VARIABLES=abany
abdefect /COMPARE VARIABLE /PLOT=BOXPLOT /STATISTICS=NONE /NOTOTAL /MISSING=LISTWISE. Another way is to study
frequencies of variables.Frequencies variables=abany.
Dr. D. Dutta Roy, ISI., Kolkata
How can you find out case error?
Box-whisker plot sometimes can not find out the cases who have done systematic error. Suppose you have collected job satisfaction data using five point rating scale of 20 items where in 10 items are in reverse. And one case assigns 3 across all the items. Box plot can not locate the case.
Under such condition, you can transpose the data and compute mean and SD for each case. Case error can be identified if SD is 0.00 or is higher than mean. By using FLIP command you can transpose the data.
EXERCISE
FLIP VARIABLES=DESCRIPTIVES
VARIABLES=
Dr. D. Dutta Roy, ISI., Kolkata
Relational operator Relational operator is
used to compare values. It is used with if command
A relation is a logical expression that compares two values using a relational operator. In the command
IF (X EQ 0) Y=1 the variable X and 0 are expressions that yield the values to be compared by the EQ relational operator. The following are the relational operators:
Symbol Definition
EQ or = Equal to
NE or ~= or ¬ = or <> Not equal to
LT or < Less than
LE or <= Less than or equal to
GT or > Greater than
GE or >= Greater than or equal to
Dr. D. Dutta Roy, ISI., Kolkata
Select case When researcher wants to compute
specific statistics for specific cases, the command select case is useful.
SELECT IF (AGE=8).DESCRIPTIVES VARIABLES=ACH.
Dr. D. Dutta Roy, ISI., Kolkata
Command to filter variableResearcher can analyze the data of specific group. Box 8.2
shows syntax for descriptive statistics of age for the cases who are living in specific block of district (code=1).
USE ALL.COMPUTE filter_$=(Block_code=1).VARIABLE LABEL filter_$ 'Block_code=1 (FILTER)'.VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.FORMAT filter_$ (f1.0).FILTER BY filter_$.EXECUTE.DATASET ACTIVATE DataSet1.DESCRIPTIVES variables=age.
Dr. D. Dutta Roy, ISI., Kolkata
Summary -4Syntax rules are important to write
the programs in syntax window.By writing the programs, one can
import and export file, check file, list variables, evaluate data entry error, create new variable, select case and filter variable.
Dr. D. Dutta Roy, ISI., Kolkata
STATISTICAL ANALYSIS
Dr. D. Dutta Roy, ISI., Kolkata
Item-item correlation of five point rating scale
GET FILE='C:\Users\ddroy\Desktop\IIP_SPSS
syntax_workshop\innovation data.sav'. CORRELATIONS /VARIABLES=AW1 AW2 AW6 AW10
AW18 AW19 /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE.
There are 6 items measuring awareness of environment. It is assumed that 6 items are related to each other. One can use AW1 TO AW19 also.
This program assesses inter correlation among 6 items.
Pair wise missing data are deleted and level of significance is shown.
Two tail is applicable when direction of relationship is not pre assumed.
NOSIG is used to flag significant values. Dr. D. Dutta Roy, ISI., Kolkata
Item total correlations
GET FILE='C:\Users\ddroy\Desktop\
IIP_SPSS syntax_workshop\innovation data.sav'.
compute total=AW1+ AW2+ AW6 +AW10 +AW18+ AW19.
CORRELATIONS /
VARIABLES=AW1 to AW19, total
/PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE.
Compute command is used to determine total score. Later it is used for item total correlation.
Dr. D. Dutta Roy, ISI., Kolkata
Multiple regression
GET FILE='C:\Users\ddroy\Desktop\IIP_SPSS
syntax_workshop\innovation data.sav'.compute total=AW1+ AW2+ AW6 +AW10
+AW18+ AW19.
REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA
CHANGE /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT total /METHOD=ENTER AW1 AW2 AW6 AW10
AW18.
Run command should select all otherwise total score will not be used.
In this model total score is predicted by each item.
Dr. D. Dutta Roy, ISI., Kolkata
Mean differencesWhen data were collected from
two different groups. Command of independent t-test is
T-TEST GROUPS=IC3(3) /MISSING=LISTWISE /VARIABLES=total /CRITERIA=CI(.9500).
Here IC3 is independent variable and total is dependent variable.
Ic3 (3) indicates 3 as cut off points to make two different groups.
IC3(1 2) indicates categorization based on value 1 and 2.
Dr. D. Dutta Roy, ISI., Kolkata
Chi-square statistics CROSSTABS /TABLES=AW1 BY AW2 /FORMAT=AVALUE TABLES /STATISTICS=CHISQ PHI /CELLS=COUNT /COUNT ROUND CELL.
This examines association between items . For multiple items command is
TABLES=AW1 BY AW2 AW10 AW18 AW19 AW6
In above AW1 IS ROW AND OTHERS ARE IN COL.
Dr. D. Dutta Roy, ISI., Kolkata
One-WAY ANOVA
ONEWAY total BY EXP/MISSING ANALYSIS.
Here total is dependent variable
EXP is independent variable.
Dr. D. Dutta Roy, ISI., Kolkata
COMPUTE SIZE OF SAMPLE/*-----------------------------
GETTING INPUT FILE------------------------------------------ .
GET FILE='C:\Users\ddroy\Desktop\
IIP_SPSS syntax_workshop\innovation data.sav'.
/*-----------------------------SIZE OF SAMPLE ------------------------------------------ .
compute n=0.compute n=n+1.descriptives n, AW1.
n=0 indicates initialization. N=n+1 indicates summing
value following loop.DESCRIPTIVES <n, AW1>
indicates comparison between computed n and aw1.
Here AW1 (numeric type and scaling measure) is used to verify the computed N or size of sample.
Dr. D. Dutta Roy, ISI., Kolkata
Summary - 5SPSS-Syntax makes the researcher
more systematic in analysis of data. Researcher can fulfill all the assumptions of statistical tool systematically by writing the programs.
The compute command is very powerful as it assists researcher to write own program for analysis of data. Dr. D. Dutta Roy, ISI., Kolkata