42
Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Embed Size (px)

Citation preview

Page 1: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Basics of writing SPSS syntax filesVince GrayDLI Boot CampJune 3, 2014

Page 2: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Session goalsIntroduction to the basic parts of

a SPSS syntax file to read in data◦Not intended to show how to analyze

data, but how to make them available for analysis

Tips and tricks for preparing syntax file

Cleaning up blatant problems with the data

Have a short exercise in coding a SPSS syntax file

Page 3: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Why know how to do this?Older files may not have syntax

available – may be in paper only

SPSS is not Statistics Canada's specialty: they don't do much work with it, and that can show in what you receive from them

Faculty members may wish to deposit old data with you

Page 4: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Sample of a print-only codebookHousehold Income, Facilities and Equipment Micro Data File, 1971

Income:Survey of Consumer Finances, 1972 and

Survey of Household Facilities and Equipment, 1972

Page 5: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

SPSS foundational conceptsSPSS is generally case insensitive

◦Commands and labels are capitalized for display purposes only

◦On Unix computers, file specification is case sensitive (C:\Data\file.txt <> c:\data\File.txt)

◦Operations on string variables are case sensitive

SPSS commands end with a periodRecommendation: edit syntax files

using a fixed pitch font (e.g., Courier new)

Page 6: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

SPSS foundational conceptsComments (text that isn't a

command) can be used to explain what you're doing◦May be placed at the start of a line

with either the word comment or an asterisk and ending with a period

◦May be placed within a command or at end of a line enclosed within /* comment */

Variable labels /* this is a fatuous comment */

var001 1-4 …

Page 7: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Basics of SPSS syntax fileWhere is the file; what are its

attributes?What are the variables and what

format?What are the variable labels?What values do you want to label?Are any of the values missing (i.e.,

should they be ignored during analysis?)Do you need to repair data?Where & how to save files?

Page 8: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Where is the file and what are its attributes?

Usually done with data listdata list file='drive:\directory\filename.ext' format records=# table / variable list / line 2 variable list ….

Need to define a file handle for very large files (record length 8192+) firstfile handle myhandle / name='drive:\directory\filename.ext’ /recform=?? /lrecl=####.

data list file=myhandle format records=# table /variable list.

Page 9: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

What are the variables and what format?

Each variable being read in from the file must be described

Must be assigned a variable name: see Variable Names in SPSS help (Syntax)◦Cannot be a reserved word◦May be up to 64 characters long: no

spaces◦Start with A-Z, @, # (scratch variables),

or $ (system variables)◦May contain A-Z 0-9 _ . $ # @

Page 10: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Thoughts on long variable namesUsers of older (perpetual) versions of

SPSS may not be able to use themVariable names may wrap across

linesBeing lazy, it's more typingCan use rename variables syntax to

retain long variable namesRecommendation: use 8

characters or less for variable names

Page 11: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Defining variable formatMultiple ways to do it

◦Specify columns and typeUniqueid 1-8Recwght 9-15 (3)Cityname 16-45 (A)var001 46-50 var002 51-55 var003 56-60income gvttrans othrincm 61-87 (2) …

◦Use Fortran encodingUniqueid (F8.0)Recwght (F7.3)Cityname (A30)var001 to var003 (3F5.0)income gvttrans othrincm (3F9.2) …

Page 12: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Defining variable format (cont'd)Can combine various formats in a data list

command* Here we will declare variables in the file.

data list file=oldfile records=1 table /

uniqueid 1-5

province 6-7

urbnrurl 8

farmflag 9

hhldwght 10-12

numprsns nmadults nmchlt06 nmch0615 nmch1617 nmch1824 13-24

hhldcomp (F1.0)

farm_income_dependence 56

mjsrcinc 57

nmearner nmpsninc (2F2.0)

earnrmbd invstmnt govttran miscincm ttlincom 62-91.

Note indentation of 1 space on each variable: used to be required, now more stylistic

Page 13: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Defining variable format (cont'd)Don't define variables as strings

unless the data contain non-numeric characters◦Can lose ordinal variable

relationships◦This may mean revising StatCan

syntax files, which have been known to define non-interval variables as string, regardless of the coding actually used for the variable

Page 14: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

String variables (cont'd)◦In worst case (and at your discretion based

on comfort level), means recoding variables (e.g., Discharge Abstract Database)

◦If convert to Stata, value labels won't convert since can't be assigned to string variables

◦Recommendation: if the string requires a value label to be meaningful, convert it to a coded numeric value (therefore, leave place names, census tract numbers, etc. as strings) E.g., if "1" stands for "Male", read it as a non-

string

Page 15: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

What are the variable labels?The purpose of variable labels is to give

more descriptive information than the variable name can provide◦Sex

Probably safe to guess that it is a gender variable But not necessarily: Have you had sex in the past month?

Recorded for whom? Respondent/spouse/1st-born?

If any doubt might exist, try to remove it!

Do not use arbitrary contractions – especially if loading into a searchable metadata service

Page 16: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Variable labels (cont'd)Sample code with arbitrary

contractionsVARIABLE LABELS YEAR "Refyr - 1998" PUCPID26 "Cross-sect random pers ID - 1998" PUCHID25 "Cross-sect random hhld ID - 1998" D31CF26 "Census family ID - 1998" ICSWT26 "Int cross-sect weight - 1998" ECYOB26 "Ext YOB (cross-sect) - 1998" ECAGE26 "Ext age refyr (cross-sect) - 1998" ECSEX99 "Ext sex refyr (cross-sect) - 1998" MARST26 "Marital status refyr - 1998" MJACT26 "Major activity - 1998" MJIEH26 "Major inc earner Hhld - 1998" MJINE26 "Major inc earner EF - 1998" RMJIG26 "Rel maj inc earner grp EF - 1998" MJICE26 "Major inc earner CF - 1998"

Page 17: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Variable labels (cont'd)Make sure that the label includes

the most important information. In the variables below, the key information was omitted by StatCan – does it what, would you what, described as what?HAL_Q150 "Does a physical condition or mental condition or health prob"

HAL_Q160 "Does a physical condition or mental condition or health prob"

HAL_Q170 "Does a physical condition or mental condition or health prob"

HAL_Q210 "Do you regularly have trouble going to sleep or staying asle"

MSS_Q110 "Thinking about the amount of stress in your life, would you"

MSS_Q120 "What is your main source of stress?"

HS_Q110 "Presently, would you describe yourself as:"

Page 18: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Variable labels (cont'd)Meaningful labels

HAL_Q150 "Reduction of amount/kind of activity at home"

HAL_Q160 "Reduction of amount/kind of activity at work or school"

HAL_Q170 "Reduction of amount/kind of activity in other activities (transport/leisure)"

HAL_Q210 "Regularly have trouble going to sleep or staying asleep"

MSS_Q110 "Self-assessed amount of stress in respondent's life"

MSS_Q120 "What is your main source of stress"

HS_Q110 "Self-assessed happiness"

Page 19: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Variable labels (cont'd)If labels are repeated, explain

why (the variable names may not be intuitive):SUDDLAI 'Any drug use (incl 1 time cann)'

SUDDLAE 'Any drug use (excl 1 time cann)'

SUDDLID 'Any drug use (excl cann) - life (D)'

SUDDYAI 'Any drug use (incl 1 time cann)'

SUDDYAE 'Any drug use (excl 1 time cann)'

is less useful thanSUDDLAI "Ever used drugs (including 1 time cannabis, derived)"

SUDDLAE "Ever used drugs (excluding 1 time cannabis, derived)"

SUDDLID "Ever used drugs (excluding cannabis, derived)"

SUDDYAI "Used any drugs in past 12 months (including 1 time cannabis, derived)"

SUDDYAE "Used any drugs in past 12 months (excluding 1 time cannabis, derived)"

Page 20: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Variable label formattingRecommend placing all labels in double

quotes rather than single quotesnoanswr1 "Didn't answer: wasn't at home"

rather thannoanswr1 'Didn't answer: wasn''t at home'

◦Either works, but single quotes can lead to more mistakes due to carelessness in data entry

Have up to 255 characters for variable labels: all may not be displayed, though (some procedures show only 40 characters)

Page 21: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

What values do you want to label?

Nominal and ordinal variables are generally meaningless without value labels◦Gender: is 1 male and 0 female, or vice versa?◦Does a scale variable run worse to better or

better to worse (the value alone doesn't necessarily suffice to tell you this)

◦What does value 3 in Agegroup represent?Continuous variables may have key values

◦E.g., income or age may be capped or flooredMissing values need to be declared

Page 22: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Value label formatsDo not use arbitrary contractions: up to

120 characters can be displayedRecommend placing all labels in double

quotes rather than single quotes6 "Don't know"

rather than6 'Don''t know'

String values must be enclosed in quotes (e.g., "B" "Boston lettuce")◦but you won't be using string variables if you

need value labels to make sense, right?

Page 23: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Value label formats (cont'd)A single label declaration can be used

for any and all variables using that coding, or separate declarations can be madevalue labels

SUDDYO SUDDYOA SUDDYOD SUDFINT SUDFLAU SUDFLCA SUDFLCM SUDFLSU SUDFLTU

SUDFYCM SUDGLOTH SUD_87 SUI_01 SUI_02 SUI_03 TWD_1 TWD_3 TWD_5

1 "YES"

2 "NO"

6 "NOT APPLICABLE"

7 "DON'T KNOW"

8 "REFUSAL"

9 "NOT STATED"

/

Each declaration is separated from the previous with a /

Page 24: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Value label formats (cont'd) Can explicitly identify variables to which no values are

assigned If consecutive variables use the same coding, use "to”

value labels

uniqueid hhldwght

/

SUDDYO to SUDGLOTH SUD_87 SUI_01 SUI_02 SUI_03 TWD_1 TWD_3 TWD_5

1 "YES"

2 "NO"

6 "NOT APPLICABLE"

7 "DON'T KNOW"

8 "REFUSAL"

9 "NOT STATED"

.

Repeated value labels for any variable are ignored: the first one found is used, and a warning is issued in the syntax window

Page 25: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Missing valuesMissing values get omitted from

analysis – if you are looking for the average income of spouses, you don't include households who don't have spouses

Statistics Canada normally uses values ending in 6/7/8/9 as missings (i.e., not applicable, don't know, refusal, not asked) – but often only define the values 9 as missing values in SPSS: varies by Division

Page 26: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Missing values (cont'd)Other values may be missing as

wellmthrplbr fthrplbr

1 "Born in Canada"

2 "Born outside of Canada - North America/Europe"

3 "Born outside of Canada - Other country"

4 "Country uncodeable"

8 "Not stated"

9 "Don't know"

/

◦ The value 4 might be considered missing – I would code it as missing!

◦ Check the codebook carefully!

Page 27: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Missing values formatSPSS allows up to three discrete

values to be defined as missing, or a range (using thru, which includes all values within the range), or one discrete value and a range.

May explicitly declare that no values are missing for a variable.Missing values

uniqueid () /* Can explicitly show no missings */

var001 to var028 (6,7,9)

var029 var031 (6 thru 9)

var030 (-1, 6 thru highest).

Page 28: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Missing values format (cont'd)String and non-string missing values

can't be declared in the same missing values statement.Missing values

uniqueid ()

var001 to var028 (6,7,9)

var029 var031 (6 thru 9)

var030 (-1, 6 thru highest).

Missing values

stringv1

("ZZZZZZZ", "-1 ").

Missing values are dealt with immediately: be aware of the order of operations

Page 29: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Do you need to repair data?

Does each record have a unique record identifier (used to match variables from different files or subsets)◦If not, create one:

compute uniqueid=($casenum).variable labels uniqueid "Unique record identifier".* The formats command will specify how many columns are

reserved for the field: by default, new variables are created as F8.2. No decimals are needed for this variable. Length (#) is based on the number of records in the file.

formats uniqueid (F#.0).

Page 30: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Repairing data (cont’d)If numerically coded variables

are defined as string, change that to be non-string.Data list … >>> Data list …

uniqueid 1-8 uniqueid 1-8

gender 9 (A) gender 9

… …

Value labels Value labels

gender gender

"1" "Male" 1 "Male"

"2" "Female" 2 "Female"

"9" "Not ascertained" 9 "Not ascertained"

. .

Page 31: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Repairing data (cont’d)If string variables require value labels to be

meaningful, create non-string versions: this is case sensitive!Value labels gradelvl

"H" "Top 10% of the class" "M" "Middle 80% of the class"

"L" "Bottom 10% of the class" " " "Rank in class not known".

Missing values gradlvl (" ").

* Create a non-string version of the variable.

Formats newgrdlv (F1.0).

If gradelvl="H" newgrdlv=1.

If gradelvl="M" newgrdlv=2.

If gradelvl="L" newgrdlv=3.

If (missing(gradelvl)) newgrdlv=9.

Value labels newgrdlvl

1 "Top 10% of the class" 2 "Middle 80% of the class"

3 "Bottom 10% of the class" 9 "Rank in class not known".

Missing values newgrdlv (9).

Variable labels newgrdlv "Reformatted gradelvl: class placement".

Page 32: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Repairing data (cont’d)Repairing coding flaws is the most

difficult, and possibly, the most important thing you can do for your users: do it if you’re comfortable!

Page 33: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Solution to coding problem* Find records where there is no wife.

* According to documentation, should use (hdmarsta=1)

or (hdmarsta=8) or (hdmarsta=9) or (hdmarsta=10).

* Doing that results in 17,129 valid (non-missing) records.

* Defining 0 as missing for age gives 14,352 valid records.

* Since 0 is defined as a missing code for wfagegrp,

you cannot use "wfagegrp=0" as the condition.

do if (missing(wfagegrp)).

* Reset values from 0 to a specifed missing code.

+ compute wfincome=999999.

+ compute wfwkswrk=-1.

end if.

Try to not change the format of the variable when adding a value – wfincome has 6 columns, with valid entries from –ve 99999 to +99999. So, 999999 is outside the valid range. For wfwkswrk, we could have used 99 as the missing code (the valid range is 0 to 52).

Page 34: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Solution to coding problem (cont’d)Value labels are needed:Value labels wfincome

999999 "Not applicable - no wife"

/

wfincsrc

1 "No income" 2 "Wages and salaries"

3 "Military pay and allowances"

4 "Net income from self-employment"

5 "Net income from roomers and boarders"

6 "Government transfer payments"

7 "Net income from investment"

8 "Retirement pensions, superannuation and annuities"

9 "Other money income" 0 "Not applicable - no wife"

/

wfagegrp

76 "Age 76 and over" 0 "Not applicable - no wife"

/ …

Page 35: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Solution to coding problem (cont’d)Missing value declarations are needed, to

make having done this worthwhileMissing values

wfincsrc wfagegrp

(0)

wfincome

(999999)

The ripple effect of the change isn’t necessarily as simple as changing one piece of code: you have to track down the rest of the effects of the change and document them.

Page 36: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Where & how to save files?

Write: creates ASCII file (for preservation)◦Doesn’t actually do anything until the

program encounters an executable commandwrite outfile=‘drive:\directory\filename.dat’ table /all.

◦The table parameter tells SPSS to include the format used in writing the ASCII file in the log file; /all indicates to write out all variables on the file.

◦Does not preserve variable/value labels or missing declarations in ASCII file – you need syntax to read the file created by write into SPSS.

Page 37: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Where & how to save files?

Export: creates portable file◦No longer widely used: used to transport

between platforms or programsexport outfile=‘drive:\directory\filename.por’ /keep=? /drop=? /map.

◦Keep and drop allow you to include or exclude variables by naming them; map lists variable names and labels

◦Preserves variable/value labels and missing declarations: can be read back into SPSS

◦Long variable names truncate to 8 characters

◦ Is an executable command (will force Write)

Page 38: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Where & how to save files?

Save: creates system file◦This is the native format of SPSS: files

will load into SPSS and keep all variable/value labels, missing declarations and long variable namessave outfile=‘drive:\directory\filename.sav’ /keep=? /drop=? /map.

◦Keep and drop allow you to include or exclude variables by naming them; map lists variable names and labels

◦Is an executable command (will force Write)

Page 39: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Where & how to save files?

Syntax for saving data & metadata:write outfile='j:\presentations\hife1972.dat' table /all.

save outfile='j:\presentations\hife1972.sav' /map.

display dictionary.

Display dictionary◦Writes information about the system file into

the output – variable names, formats, labels, missing declarations, etc.

Save your output file, at least as a .spv file, better by exporting to text (because can ‘always’ read it – preservation purposes!)

Page 40: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

ExerciseCreate syntax to read the 4 variables

on the next page into SPSS, including:◦A data list command (c:\data\

192_1972.dat)◦Variable labels◦Value labels◦Missing declarations◦Comments for any "fixups" that need to

be done: reflect any fixups in value labels and missing declarations

◦Saving your work

Page 41: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Exercise page

Page 42: Basics of writing SPSS syntax files Vince Gray DLI Boot Camp June 3, 2014

Good, better and horrible newsGood news

◦You’re done!Better news

◦You may never have to do this: ask on the DLI list if other DLI reps have a syntax file that they can provide you if you can’t locate one on the EFT site!

Horrible news◦ If a faculty member shows up with a file

that he or she collected, no one else will have syntax – someone may have to do this!