36
14a. Accessing Data Files in SPSS ®

14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

Embed Size (px)

Citation preview

Page 1: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Page 2: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

2

14a. Accessing Data Files in SPSS®

Prerequisites• Recommended modules to complete before viewing

this module 1. Introduction to the NLTS2 Training Modules 2. NLTS2 Study Overview 3. NLTS2 Study Design and Sampling NLTS2 Data Sources, either

• 4. Parent and Youth Surveys or• 5. School Surveys, Student Assessments, and Transcripts

NLTS2 Documentation• 10. Overview• 11. Data Dictionaries• 12. Quick References

Page 3: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

3

14a. Accessing Data Files in SPSS®

Overview Purpose Open and view data files Limiting variables Subsetting cases Joining/combining data files Summary Closing Important information

Page 4: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

4

14a. Accessing Data Files in SPSS®

NLTS2 restricted-use data

• NLTS2 data are restricted.• Data used in these presentations are from a

randomly selected subset of the restricted-use NLTS2 data.

• Results in these presentations cannot be replicated with the NLTS2 data licensed by NCES.

Page 5: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

5

14a. Accessing Data Files in SPSS®

Purpose

• Learn to Open a data file See what is in a file (i.e., contents of the file) “Size” a data file for a perfect fit

• Reduce the number of variables• Reduce the number of cases (i.e., subset the data)

Combine information from multiple sources• Bring in data from another source or another wave• Join or combine files

Create a new file

Page 6: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Open and view data files

• SAS® and SPSS® data are in separate folders.• SPSS data have a “.sav” extension.• Note: Data files were developed in SAS, which

Allows 28 distinct missing values vs. SPSS’s 3 distinct values or a range of values.

Has associated user-defined value label formats stored separately in a format library vs. the SPSS convention of storing value labels with the variable.

See “Notes to SPSS Users” hyperlinked from the table of contents in the data documentation for details.

6

Page 7: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Open and view data files

• Files are either read from or written to.• Files have a name and a location where they are

stored.• SPSS needs to know the name of the file and where

to find it. The path describes the nesting of folders. Example:

• C:\myprojects\NLTS2\Data is a path or location.– i.e., the file is located on Drive “C”, in the folder “Data,” which

is nested inside the “myprojects” and “NLTS2” folders.

7These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 8: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

8

14a. Accessing Data Files in SPSS®

Open and view data files• Syntax

Open file from menu Open file command in SPSS syntax editor Submit a “GET FILE” command

GET FILE 'C':\myprojects\NLTS2\Data\n2w1tchr.sav'.• Menu command

From menu:• File: Open: Data [select file from browser]

• A window opens with spreadsheet type of display.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 9: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

9

14a. Accessing Data Files in SPSS®

Open and view data files

• The open file is the “active” dataset.• Select the “Variable View” tab for details about

variables. Rows list the variables. Columns contain descriptors and attributes about the

variables and values.• “Data View” has case-by-case values for each

variable. Each row holds data for a single respondent. Each column holds the data for a single variable.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 10: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

10

14a. Accessing Data Files in SPSS®

Open and view data files: Example

• Viewing files Open the Wave 1

Teacher file. Look at both the

data view and variable view.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 11: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Open and view data files: Example

11These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 12: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

12

14a. Accessing Data Files in SPSS®

Limiting variables

• How to reduce the number of variables in the file Large files with many cases and many variables are

unwieldy; simplify.• Fewer variables to search through • Fewer cases to process

Create files that are limited to just those variables needed for analysis.

You have the choice of drop or keep.• Which one is best? The one that requires less typing!• If you are dropping more variables than keeping, use “keep” and

vice versa.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 13: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

13

14a. Accessing Data Files in SPSS®

Limiting variables

• Note: When making changes to a file Use work files for temporary changes.

• Work files are files that are in existence only for the duration of the program or SPSS session.

• An active file is a work file unless it is saved. Save the results to a new data file.

• Usually it is best to create a new file rather than to modify the source file.

These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 14: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Limiting variables• In syntax editor

GET FILE = 'C:\MyProjects\NLTS2\Data\n2w1parent.sav' /KEEP= ID w1_DisHdr2001 w1_GendHdr2001 w1_IncomeHdr2001 w1_AgeHdr2001 np1Weight np1HealthProb np1GroupMember np1ProblemCount np1E2a np1B2a.

EXECUTE.SAVE OUTFILE='C:\MyProjects\NLTS2\Data\Par_w1_lmt_vars.sav'

/DROP= np1E2a np1GroupMember.• Notice that “KEEP” is on the “GET FILE” and “DROP” is on the “SAVE FILE”

commands.• Notice that SPSS statements end in a period.

14These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 15: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Limiting variables• Menu driven

Open a file. From “FILE” select “SAVE AS.” Click on “VARIABLES.” in next pop-up menu, click on “DROP ALL.” In the large box with the variable list, click in each little box in the

“KEEP” column to select only the variables needed from this file.• Little boxes with an “X” indicate the variables are kept; blank boxes

indicate the variables are dropped. Click “CONTINUE,” give the file a new name in the Browse window,

and click “SAVE.” Open a new file.

15These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 16: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Limiting variables: Example• Limiting variables

Create a file with fewer variables. Create a new file called “PrScores.sav” from n2w2dirassess.sav. Keep only the following variables:

• ID• ndacalc_pr• ndaPC_PR• ndasyn_pr• NDaF1_friend• na_age4• w2_dis12• w2_gend2• na_grade4• w2_incm3• wt_na.

Open and review the new file in variable view.

16These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 17: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Limiting variables

17These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 18: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Subsetting cases

• How to reduce the number of cases or records in the file. Often analysis is done on a subset; for example:

• Select only youth with visual impairment.• Select only youth who are out of secondary school.• Exclude younger students.

• All variables are available in the file; only cases are conditionally restricted.

18These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 19: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Subsetting cases• Example: Limit Wave 4 Parent/Youth interview data to those who are 21 or older,

excluding youth who are 19 or 20 (W4_Age2007 = 19 or 20).• Code in syntax editor to limit cases

USE ALL.COMPUTE filter_$=(W4_Age2007>20).VARIABLE LABEL filter_$ 'W4_Age2007>20 (FILTER)'.VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.FORMAT filter_$ (f1.0).FILTER BY filter_$.EXECUTE.

• Code to select all cases

FILTER OFF.USE ALL.EXECUTE.

19These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 20: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Subsetting cases• To limit cases from menu

Data: Select Cases Click “IF CONDITION IS SATISFIED” and “IF” button. Build logic condition. Click “CONTINUE” and “OK.”

• To select all cases from menu Data: Select Cases Select “ALL CASES” and click “OK.”

• To have the best of both worlds Click “PASTE” to save code. Select and run code from syntax editor. Toggle on and off as needed.

20These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 21: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Subsetting cases: Example• Subsetting cases

Create a small data set with a subset of cases. Open “PrScores.sav” created in previous example. Limit cases to those classified with hearing impairment

only, i.e., those with a value of “5” for “w2_dis12.” Look at “Variable View” and “Data View” in the data

editor.• Are there any visual clues that the filter is on?

Turn the filter off so all cases are selected.

21These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 22: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Joining/combining data files

• How to bring in data from another file• Purpose

Learn to combine or join files• Bring in data from another source• Bring in data from another wave

Learn what to watch for• Number of cases in the combined file• How cases are joined

– Key variable, i.e., which variable to match on– Keyed file, i.e., which cases to keep

22These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 23: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Joining/combining data files

• Why do this? Often it is necessary to combine information from different

files to perform comparative analyses, create new variables, or measure differences over time.

• For example, you may want to Create composite variables from multiple sources. Look at similar items at different points in time.

23These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 24: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Joining/combining data files• Example of a composite variable from multiple

sources Create a variable for “if parent attended a parent/teacher

conference” using Wave 2 teacher survey item nts2C8 and fill in with parent interview item np2E1a_d if teacher data are missing.

• Example of items at different points in time Create a variable to look at the pattern of employment between

Waves 2 and 3: employed both waves, either wave, or neither wave.• Set to employed both waves if np2HasPdJob (W2) and np3HasJob

(W3) are yes.• Else set to employed in either wave if np2HasPdJob or np3HasJob

are yes.• Else set to not employed if np2HasPdJob and np3HasJob are no.

24These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 25: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Joining/combining data filesHypothetical Example: Data Availability Across Instruments

YouthInterview Data W1

Assessment Data W2

Interview Data W2

Program Data W2

1 Yes Yes Yes No

2 Yes No Yes No

3 No No No Yes

4 Yes No No Yes

5 Yes No Yes No

6 Yes Yes Yes Yes

There will be missing records across files and missing items within files.If data look like this, ask which file is the main file being analyzed.

25These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 26: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Joining/combining data files• Data in all files must be sorted by the key variable.

The key variable matches files case by case. Key variable is “ID.”

• Files on CD should be sorted by key variable, but as you work with files they may become unsorted.

• Code in syntax editor to sort data.SORT CASES BYID (A).

• To sort data from menu Data: Sort Cases Select “SORT IN ASCENDING ORDER” radio button.

Select “ID” and move it to the “SORT BY” box by clicking the right-facing arrow. Click "OK" or “Paste" and run code from syntax editor.

26These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 27: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Joining/combining data files• Code in syntax editor to join files.

MATCH FILES /FILE=C:\MyProjects\NLTS2\Data\n2w1parent.sav‘ /TABLE='C:\MyProjects\NLTS2\Data\n2w2paryouth.sav' /TABLE='C:\MyProjects\NLTS2\Data\n2w3paryouth.sav‘ /TABLE='C:\MyProjects\NLTS2\Data\n2w4paryouth.sav' /BY ID /KEEP=ID np1i_3a_7 np2HasPdJob np3HasJob np4HasJob.

Execute.

• Why “FILE” or “TABLE”? FILE: All cases in the file are kept. TABLE: Keeps only the cases that match those found in

“FILE.”

27These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 28: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Joining/combining data files• To join data using menu-driven options

From active file, go to menu item “Data” and select “Merge Files” and “Add Variables.”

Select file from browser window and click “Open.” Note: All variables that appear in both files will automatically be

moved to the “Excluded Variables” box. Select “ID” in “Excluded Variables” box and do the following:

• Click “Match cases on key variables in sorted file.”• Select “External file is keyed table” radio button.

– In some versions of SPSS, “Non-active dataset is keyed table.”– Keyed table keeps only those cases that match those in the active file.

• Click left-facing arrow next to move “ID” to “Key Variables” box.

28These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 29: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Joining/combining data files• To join data using menu-driven options (cont’d)

Select variables to keep or drop • Active file variables are marked with "*."• Variables from the keyed or external file are marked with "+."• To drop items, move variables to "Excluded Variables" box from the "New

Active Dataset" box by highlighting each variable and clicking the left-facing arrow

If only selecting a few variables• Exclude all those marked with "+" using click/shift click on the first and last

variables.• Click left-facing arrow to move variables to the "Excluded Variables" box.• Select each variable to keep in the "Excluded Variables" box.• Click the right-facing arrow to move to the “New Active Dataset.”

Press “OK” to run or select “Paste” to run code from syntax editor.

29These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 30: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Joining/combining data files• To save your work using menu-driven instructions

Select “Save as” from “File” menu and give the file a name. If a new file has already been created with a new name

using the steps outlined in the subsetting data files, select “Save” under the “File” menu.

• To save your work using the syntax editor A file can be saved using either an existing or new name. To save a file with the name “MyNewFile”:

SAVE OUTFILE= 'C:\MyProjects\NLTS2\Data\MyNewFile.sav'.

30These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 31: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Joining/combining data files

• If bringing in data from more than one file, repeat this process for each file.

• Suggestion: Name files in a meaningful way, such as By date: AnFile_29July.sav By type of analysis: PI_CrossWave.sav By source: PI_W123.sav By sequence: File_5.sav.

31These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 32: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Joining/combining data files:Example• Joining/combining data

Combine data from another file with an existing file.

Open and sort PrScores.sav by ID. Bring in np2HasPdJob from n2w2paryouth.sav. Bring in np3HasJob from n2w3paryouth. Save the file as PrScoresEmp.sav.

32These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 33: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Joining/combining data files:Example

33These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.

Page 34: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Closing

• Congratulations, you have learned to Open and view a file Create a new file Reduce the size of files by specifying the

• Variables needed• Cases needed

Join files using a key variable Save files with a new name.

34

Page 35: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Closing

• Topics discussed in this module Purpose Open and view data files Limiting variables Subsetting cases Joining/combining data files

• Next module: 15a. Accessing Data: Frequencies in SPSS

35

Page 36: 14a. Accessing Data Files in SPSS ®. 1 Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training

14a. Accessing Data Files in SPSS®

Important information NLTS2 website contains reports, data tables, and other

project-related information http://nlts2.org/

Information about obtaining the NLTS2 database and documentation can be found on the NCES website http://nces.ed.gov/statprog/rudman/

General information about restricted data licenses can be found on the NCES website http://nces.ed.gov/statprog/instruct.asp

E-mail address: [email protected]

36