Upload
duongmien
View
215
Download
2
Embed Size (px)
Citation preview
Introduction to SAS Programming and Applications
Module 1 : THE DATA STEP (1, 2, 3)
MARK CARPENTER, Ph.D.
Slide 1-1
Keywords : DATA, INFILE, INPUT, FILENAME, DATALINES
Procedures : PRINT
Pre-Lecture Preparation: create directory on your local hard drive called “i\stat6110\module1”. Download SAS programs called “Ex1_1.sas”, “Ex1_2.sas” into this directory. Create a raw text file called ex1_1.txt and the comma delimited file ex1_1.csv in this directory, consisting of the following 3 data lines:
ex1_1.txt ex1_1.csv 1 18 92 1,18,92 2 21 88 2,21,88 3 26 98 3,26,98
SAS Programs: module1_examples1.sas and module1_examples2.sas
h"p://support.sas.com/documenta2on/onlinedoc/base/index.html#base94
SAS Documenta-on:
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
DATA step a programming language that you use to manipulate and manage your data. SAS procedures software tools for data analysis and reporting. macro facility a tool for extending and customizing SAS software programs and for reducing text in your programs. DATA step debugger a programming tool that helps you find logic problems in DATA step programs. Output Delivery System (ODS) a system that delivers output in a variety of easy-to-access formats, such as SAS data sets, procedure output files, or Hypertext Markup Language (HTML). SAS windowing environment an interactive, graphical user interface that enables you to easily run and test your SAS programs.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
The Title of this session is “DATA STEP Programming (1, 2, 3)”. The “1, 2, 3” refers to the three basic elements required in producing a SAS data set in SAS using the DATA STEP process. These three elements are:
(1) DATA STEP (begins with a DATA statement and the name of new SAS data set),
(2) DATA Source (we use INFILE and FILENAME statements to tell SAS the location and type of data when importing data. A SET statement is used when the source is another SAS data set)
(3) DATA Structure: Telling SAS the structure of the data (INPUT, INFOMAT, FORMAT. When the source is an existing SAS data set the INPUT, INFILE, INFORMAT and FILENAME statements are not needed).
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
• The primary method for creating a SAS data set with Base SAS software.
• A DATA step is a group of SAS language statements that begin with a DATA statement. The group of language statements contains other programming statements that manipulate existing SAS data sets or create SAS data sets from raw data files.
• A DATA step creates a SAS data set. This data set can be a SAS data file or a SAS view. A SAS data file stores data values while a SAS view stores instructions for retrieving and processing data. When you can use a SAS view as a SAS data file, as is true in most cases, this documentation uses the broader term SAS data set.
What is a DATA STEP?
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
DATA Statement: Begins a DATA step and provides names for any output SAS data sets, views, or programs.
INFILE Statement: Specifies an external file to read with an INPUT statement. Can be used with or without a FILENAME statement (see below)
INPUT Statement: Describes the arrangement of values in the input data record and assigns input values to the corresponding SAS variables
FILENAME Statement: Associates a SAS fileref with an external file or an output device, disassociates a fileref and external file, or lists attributes of external files. Can be used for either importing files or exporting files.
DATALINES Statement: Specifies that data lines follow for the current DATA STEP. Useful when working with small datasets and SAS programs are being used by more than one person.
INFORMATS/FORMAT: Statements Described on next slide.
.
Some DATA Step Statements (sometimes optional)
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
.
DATA Step when pre-existing SAS DATA used as inputs:
All of the above statements are used with importing data from an external source. When you use a SAS data set as input into a DATA step, the description of the data set is available to SAS. In your DATA step, use a SET, MERGE, MODIFY, or UPDATE statement to read the SAS data set. Use SAS programming statements to process the data and create an output SAS data set.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
INFORMAT Statement: specifies any special formats for incoming data during the importation process during a data step. For example, incoming data may have special characters that SAS uses for other purposes ($,&, etc), date formats will have characters, such as the slash or the month spelled out, etc. The optional INFORMAT Statement tells SAS what to expect with the incoming variables. Note: the format in the resulting data set does not necessarily reflect the incoming format.
FORMAT Statement: This specifies the final format of the Data Set produced from a DATA STEP. For example, the date coming in (INFORMAT) might be of the form February 15, 1963, but once the SAS data set is produced (DATA STEP is completed), the format can be changed to 02/15/63.
.
Data and resulting variables come in many types, character, date, numerical, scientific notation. We refer to the format of incoming data from external sources as INFORMATS and the format of variables in SAS data sets as FORMATS. Sometimes it is necessary to specify these during DATA step processing.
INFORMAT AND FORMAT Statements Some DATA Step Statements (cont)
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
.
Example 1.1: Simple DATA Steps to import a raw data set containing 3 data lines and 3 variables. To demonstrate different methods, this is done with 6 different DATA steps, example1_1a-‐f, as described below: Example1_1a: imports from datalines using DATALINES statement.
Example1_1b: imports from a raw external data file called “ex1_1.txt”
Example1_1c: same as above but the external file is located at URL
Example1_1d: same as above but adds step of using FILENAME
Example1_1e: Uses FILENAME to demonstrate different uses.
Example1_1f: Uses SET statement to create SAS dataset from exis2ng SAS set
Example1_1g: Uses SET statement to create SAS dataset from exis2ng SAS sets.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
DATA Example1_1a; INFILE DATALINES; INPUT ID Age Exam1; DATALINES; 1 18 92 2 21 88 3 26 98 ;
These lines represent the DATA step that produces the SAS dataset called “Example1_1”.
Example 1.1.a: Simple DATA Step Using DATALINES The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
DATA Example1_1a; INFILE DATALINES; INPUT ID Age Exam1; DATALINES; 1 18 92 2 21 88 3 26 98 ;
Begins a DATA step and provides names for any output SAS data sets
Example 1.1.a: Simple DATA Step Using DATALINES The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
DATA Example1_1a; INFILE DATALINES; INPUT ID Age Exam1; DATALINES; 1 18 92 2 21 88 3 26 98 ;
INFILE usually Specifies an external file to read with an INPUT statement, but in this case it specifies the special case of DATALINES within the datastep.
Example 1.1.a: Simple DATA Step Using DATALINES The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
DATA Example1_1a; INFILE DATALINES; INPUT ID Age Exam1; DATALINES; 1 18 92 2 21 88 3 26 98 ;
Describes the arrangement of values in the input data record and assigns input values to the corresponding SAS variables.
Example 1.1.a: Simple DATA Step Using DATALINES The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
DATA Example1_1a; INFILE DATALINES; INPUT ID Age Exam1; DATALINES; 1 18 92 2 21 88 3 26 98 ;
Specifies that data lines follow for the current DATA STEP. Useful when working with small datasets and SAS programs are being used by more than one person.
Example 1.1.a: Simple DATA Step Using DATALINES The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
DATA Example1_1a; INFILE DATALINES; INPUT ID Age Exam1; DATALINES; 1 18 92 2 21 88 3 26 98 ;
The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.
The actual data lines that make up the data to be placed in the final data set. The semicolon must be on the line immediately following the last data line.
Note: If the data are contained in a file external to SAS or an existing SAS dataset the DATALINES statement and data would not be needed.
Example 1.1.a: Simple DATA Step Using DATALINES
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
The Following DATA step creates a SAS data set called “Example1_1b”. It produces a data set identical to Example1_1a, but it reads the data from a text file locate on the local harddrive. DATA Example1_1b; INFILE ‘i:\stat6110\module1\ex1_1.txt'; INPUT ID Age Exam1; RUN;
Example 1.1.b: Simple DATA Step from external data source
DATA Step
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
The Following DATA step creates a SAS data set called “Example1_1b”. It produces a data set identical to Example1_1a, but it reads the data from a text file locate on the local harddrive. DATA Example1_1b; INFILE ‘i:stat6110\module1\ex1_1.txt'; INPUT ID Age Exam1; RUN;
Example 1.1.b: Simple DATA Step from external data source
Full path name to file on hard drive is in quotes.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
The Following DATA step creates a SAS data set called “Example1_1b”. It produces a data set identical to Example1_1a, but it reads the data from a text file locate on the local harddrive. DATA Example1_1b; INFILE ‘i:\stat6110\module1\ex1_1.txt'; INPUT ID Age Exam1; RUN;
Example 1.1.b: Simple DATA Step from external data source
Full path name to file on hard drive is in quotes.
Note: either single quotes or double quotes (double quotes key on keyboard) will work here. If single quotes used, SAS ignores special reserves characters and treats the string literally. If double quotes, SAS will act if it comes across a SAS reserve characters, like & for a macro variable, for example.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
The Following DATA step creates a SAS data set called “Example1_1b”. It produces a data set identical to Example1_1a, but it reads the data from a text file locate on the local hard drive. DATA Example1_1b; INFILE ‘i:\module1\ex1_1.txt'; INPUT ID Age Exam1; RUN;
Example 1.1.b: Simple DATA Step from external data source
RUN not necessary but when SAS reads this statement it officially ends the DATA step statements and SAS begins to process.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
The Following DATA step creates a SAS data set called “Example1_1c”. It produces a data set identical to Example1_1a & b, but it reads the data from a text file located at the indicated URL.
DATA Example1_1c; INFILE 'http://www.auburn.edu/~carpedm/courses/notes/module1/ex1_1.txt’
DEVICE=URL; INPUT ID Age Exam1; RUN;
Example 1.1.c: Simple DATA Step from URL
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
The Following DATA step creates a SAS data set called “Example1_1c”. It produces a data set identical to Example1_1a & b, but it reads the data from a text file located at the indicated URL.
DATA Example1_1c; INFILE 'http://www.auburn.edu/~carpedm/courses/notes/module1/ex1_1.txt’
DEVICE=URL; INPUT ID Age Exam1; RUN;
Example 1.1.c: Simple DATA Step from URL
Notice how the DEVICE=URL goes after the quoted string which points SAS to the data file.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
The Following DATA step creates a SAS data set called “Example1_1d”. It produces a data set identical to Example1_1a,b, & c, but it reads the data from a text file using the Fileref created with the FILENAME statement. The FILENAME statement associates the name “FromWeb” with the file located at the indicated URL. Note: SAS must be informed that the file will be found through the indicated URL by including the Keyword URL in the statement. By default SAS assumes the file is located on a local or virtual hard drive. FILENAME FromWeb URL 'http://www.auburn.edu/~carpedm/courses/stat6110/notes/module1/ex1_1.txt’;
DATA Example1_1d; INFILE FromWeb; INPUT ID Age Exam1; RUN;
Example 1.1.d: Simple DATA Step from URL Using FILENAME
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
The Following DATA step creates a SAS data set called “Example1_1e”. It produces a data set identical to Example1_1a,b, c & d, but it reads the data from a text file located on the hard drive at the path indicated in quotes. The data step uses the Fileref, “FromHD”, that was created by the preceding FILENAME Statement. Note: because the file is located on a local hard drive, SAS doesn’t have to be informed of any special devices like URL in the previous example. FILENAME FromHD 'i:\stat6110\module1\ex1_1.txt'; DATA Example1_1e; INFILE FromHD; INPUT ID Age Exam1; RUN;
Example 1.1.e: Using FILENAME for local hard drive
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
The Following DATA step creates a SAS data set called “Example1_1f” from the existing SAS data set Example1_1f using the SET Statement. Example1_1g demonstrates how several SAS data sets can be combined (concatenated) by placing a list of SAS data sets in the SET statement. DATA Example1_1f; SET Example1_1e; RUN;
Example 1.1.f & g: Simple DATA Step from Exis-ng SAS Data Set
DATA Example1_1g; SET Example1_1a Example1_1b Example1_1c; RUN;
The SET and MERGE statements will be covered in greater detail in Module 2 : Combining and Sor2ng SAS Data Sets.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
Form 1: DATA <data-‐set-‐name-‐1 <(data-‐set-‐op2ons-‐1)> > <...data-‐set-‐name-‐n <(data-‐set-‐op2ons-‐n)> > </ <DEBUG><NESTING><STACK = stack-‐size> ><NOLIST>;
DATA Statement Syntax: SAS Documentation indicates several forms (1-6) of DATA steps syntax to reflect different situations, but we only look at the first form here (_null_ data sets, data views, stored programs, passwords, etc, will be discussed later). As Form 1, below indicates, several data sets can be produced during one data step, so to keep it simple for now we examine Revised Form 1syntax.
Revised Form 1: DATA <data-‐set-‐name <(data-‐set-‐op2ons)>>;
(data-set-options) specifies optional arguments that the DATA step applies when it writes observations to the output.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
Valid in: DATA Step
Category: File-‐handling
Type: Executable
Opera2ng environment: The INFILE statement contains opera2ng environment-‐specific material. See the SAS documenta2on for your opera2ng environment before using this statement.
See: INFILE Statement under Windows, UNIX, and z/OS
INFILE Statement Specifies an external file to read with an INPUT statement.
Syntax INFILE file-‐specifica2on <device-‐type><op2ons >; INFILE DBMS-‐specifica2ons;
device-type
device-type = specifies the type of device or the access method that is used if the fileref points to an input or output device or location that is not a physical file: FTP, URL, socket, etc.
Options: delimiter = ‘,’ for example, others, file-specification identifies the source of the data, file location, etc.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
Valid in: DATA step
Category: File-‐handling
Type: Executable
Describes the arrangement of values in the input data record and assigns input values to the corresponding SAS variables.
INPUT Statement
Syntax INPUT <specifica2on(s)> <@ | @@>;
Specifica2ons -‐ variable or list of variables, along with informats. Default is numerical (BEST12.)
$ - specifies to store the variable value as a character value rather than as a numeric value. Tip: if the variable is previously defined as character, the $ is not necessary.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
Valid in: Anywhere Category: Data Access See: FILENAME Statement under Windows, UNIX, and z/OS
Associates a SAS fileref with an external file or an output device, disassociates a fileref and external file, or lists attributes of external files.
FILENAME Statement
FILENAME fileref <device-‐type> 'external-‐file' <ENCODING='encoding-‐value'> <op2ons> <opera2ng-‐environment-‐op2ons>;
Syntax:
FILENAME fileref <device-‐type> 'external-‐file‘;
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
fileref is any SAS name that you use when you assign a new fileref.
Tip: The associa2on between a fileref and an external file lasts only for the dura2on of the SAS session or un2l you change it or discon2nue it by using another FILENAME statement. Change the fileref for a file as ooen as you want.
FILENAME fileref <device-‐type> 'external-‐file‘;
device-type
specifies the type of device or the access method that is used if the fileref points to an input or output device or location that is not a physical file. Ex: URL, FTP, etc.
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
Introduction to SAS Programming and Applications
Module 1 : DATA STEP (1, 2, 3) Mark Carpenter, Ph.D. Slide 1-1
The Following DATA step creates a SAS data set called “Example1_1a”. This data set contains three observations of three numerical variables, ID , Age and Exam1.
Example 1_2 a-‐g: are repeats of 1_1 a-‐g but the files are comma delimited
DATA Example1_2a; INFILE DATALINES delimiter=‘,’; INPUT ID Age Exam1; DATALINES; 1,18,92 2,21,88 3,26,98 ;