Upload
-
View
225
Download
0
Embed Size (px)
Citation preview
7/27/2019 SAS Short Course Presentation 11-4-09
1/87
November 4, 2009
Introduction to SAS
LISA Short Course Series
Mark Seiss, Dept. of Statistics
7/27/2019 SAS Short Course Presentation 11-4-09
2/87
Reference Material
The Little SAS BookDelwiche and Slaughter
SAS Programming I: Essentials
SAS Programming II: Manipulating Data with theDATA Step
Presentation and Data
http://www.lisa.stat.vt.edu/?q=node/167
7/27/2019 SAS Short Course Presentation 11-4-09
3/87
Presentation Outline
1. Introduction to the SAS Environment
2. Working With SAS Data Sets
3. Summary Procedures
4. Basic Statistical Analysis Procedures
7/27/2019 SAS Short Course Presentation 11-4-09
4/87
Presentation Outline
Questions/Comments
7/27/2019 SAS Short Course Presentation 11-4-09
5/87
Introduction to theSAS Environment
1. SAS Programs
2. SAS Data Sets and Data Libraries
2. Creating SAS Data Sets
7/27/2019 SAS Short Course Presentation 11-4-09
6/87
SAS Programs
File extension - .sas
Editor window has four uses:
Access and edit existing SAS programs
Write new SAS programs
Submitting SAS programs for execution Saving SAS programs
SAS programsequence of steps that the user submits forexecution
Submitting SAS programs Entire program
Selection of the program
7/27/2019 SAS Short Course Presentation 11-4-09
7/87
SAS Programs
Syntax Rules for SAS statements Free-formatcan use upper or lower case
Usually begin with an identifying keyword
Can span multiple lines
Always end with a semicolon
Multiple statements can be on the same line
Errors
Misspelled key words
Missing or invalid punctuation (missing semi-colon common)
Invalid options
Indicated in the Log window
7/27/2019 SAS Short Course Presentation 11-4-09
8/87
SAS Programs
2 Basic steps in SAS programs: Data Steps
Typically used to create SAS datasets and manipulate data,
Begins with DATA statement
Proc Steps
Typically used to process SAS data sets
Begins with PROC statement
The end of the data or proc steps are indicated by:
RUN statementmost steps QUIT statementsome steps
Beginning of another step (DATA or PROC statement)
7/27/2019 SAS Short Course Presentation 11-4-09
9/87
SAS Programs
Output generated from SAS program2 Windows SAS log
Information about the processing of the SAS program
Includes any warnings or error messages
Accumulated in the order the data and procedure steps are
submitted
SAS output
Reports generated by the SAS procedures
Accumulates output in the order it is generated
7/27/2019 SAS Short Course Presentation 11-4-09
10/87
SAS Data Sets and Data Libraries
SAS Data Set Specifically structured file that contains data values.
File extension - .sas7bdat
Rows and Columns formatsimilar to Excel
Columnsvariables in the table corresponding to fields of data
Rowssingle record or observation
Two types of variables
Charactercontain any value (letters, numbers, symbols, etc.)
Numericfloating point numbers
Located in SAS Data Libraries
7/27/2019 SAS Short Course Presentation 11-4-09
11/87
SAS Data Sets and Data Libraries
SAS Data Libraries Contain SAS data sets
Identified by assigning a library reference namelibref
Temporary
Work library
SAS data files are deleted when session ends
Library reference name not necessary
Permanent
SAS data sets are saved after session ends SASUSER library
You can create and access your own libraries
7/27/2019 SAS Short Course Presentation 11-4-09
12/87
SAS Data Sets and Data Libraries
SAS Data Libraries cont. Assigning library references
Syntax
LIBNAME libref SAS-data-library;
Rules for Library References
8 characters or less
Must begin with letter or underscore
Other characters are letters, numbers, or under scores
7/27/2019 SAS Short Course Presentation 11-4-09
13/87
SAS Data Sets and Data Libraries
SAS Data Libraries cont. Identifying SAS data sets within SAS Data Libraries
libref.filename
Accessing SAS data sets within SAS Data Libraries
Example: DATA new_data_set;
set libref.filename;
run;
Creating SAS data sets within SAS Data LibrariesExample: DATA libref.filename;
set old_data_set;
run;
7/27/2019 SAS Short Course Presentation 11-4-09
14/87
Creating SAS Data Sets
Creating a SAS data sets from raw data 4 methods
1. Importing existing raw data in SAS program
2. Manually entering raw data in SAS program
3. Importing existing data sets using Import menu option
4. Manually entering raw data using Table Editor
7/27/2019 SAS Short Course Presentation 11-4-09
15/87
Creating SAS Data Sets
Importing existing raw data in SAS program1. Start Data step and name the SAS data set to be created
(include SAS Data library to be stored in)
DATA libref.SAS-data-set;
2. Identify the file that contains the raw data file (.dat file)
INFILE raw-data-filename;
3. Provide instruction on how to read data from raw data file
INPUT input-specifications;
7/27/2019 SAS Short Course Presentation 11-4-09
16/87
Creating SAS Data Sets
Input Specifications Specifies the names of the SAS variables in the new data set
Specifies whether the SAS variables are character or numeric
Identifies the locations of the variables in the raw data file
List Input
Column Input
Formatted Input
Mixed Input
7/27/2019 SAS Short Course Presentation 11-4-09
17/87
Creating SAS Data Sets
List Input Used when raw data is separated by spaces
All data in a row must be read in
All missing data must be indicated by period
Simple character datano embedded spaces, no lengths greater
than 8
INPUT statement
Simply list variables after the INPUT keyword in the order theyappear on file.
If variables are character format, place a $ after the variable name
Example) INPUT Name $ City $ Age Height Weight Sex $;
7/27/2019 SAS Short Course Presentation 11-4-09
18/87
Creating SAS Data Sets
Column Input Used when raw data file does not have delimiters between values
(large data sets)
Each variables values are found in the same columns in each row
Numeric data must be standardnumbers, decimals, signs, and
scientific notation only Advantages
No spaces required
Missing values left blank
Character data can have embedded spaces
Ability to skip unwanted variables
7/27/2019 SAS Short Course Presentation 11-4-09
19/87
Creating SAS Data Sets
Column Input cont. INPUT Statement
Numeric variableslist variable name then list column or rangeof columns where the variable is found on the raw data file
Character variableslist variable name, dollar sign, and then
column or range of columns Example) INPUT Name $ 1-10 Age 26-28 Sex $ 35;
7/27/2019 SAS Short Course Presentation 11-4-09
20/87
Creating SAS Data Sets
Formatted Input Appropriate for reading:
Data in fixed columns
Standard and nonstandard character and numeric data
Calendar values to be converted to SAS date value
Read data in using SAS informats
Instruction that SAS uses to read in data values
General forms
Character - $informatw.
Numericinformatw.d
Dateinformatw.
7/27/2019 SAS Short Course Presentation 11-4-09
21/87
Creating SAS Data Sets
Formatted Input cont. Character Informats
$w.character string with a width of w, trims leading blanks
$charw.character string with a width of w, does not trim leadingor trailing blanks
Numeric Informats
w.dstandard numeric data with width w and d numbers afterthe decimal
Raw Data Value = 1234567informat = 8.2SAS Data Value = 12345.67
COMMAw.dnumeric data with embedded commas Raw Data Value =1,000,001 informat=COMMA10.
SAS Data Value=1000001
7/27/2019 SAS Short Course Presentation 11-4-09
22/87
Creating SAS Data Sets
Formatted Input cont. SAS date values
Stored as special numeric number data
Number of days between January 1, 1960 and the specified data
Informats are used to read and convert the dates
Raw Data Value Informat
11/04/2009 MMDDYY10.
11/04/09 MMDDYY8.
04NOV2009 Date9.
04/11/2009 DDMMYY10.
7/27/2019 SAS Short Course Presentation 11-4-09
23/87
Creating SAS Data Sets
Formatted Input cont. Columns read are determined by the starting point and width of the
informat
Example:
INPUT Name $10. Age 3. Height 5.1 BirthDate MMDDYY10.;
- Name Character of length 10, columns 1-10
- Age Numeric with length 3, columns 11-13
- Height Numeric with length 5 (including decimal) and one
decimal place (120.9 for instance), columns 14-18
- Birthdate Date format MMDDYY (11-04-2009 for instance),columns 19 - 28
7/27/2019 SAS Short Course Presentation 11-4-09
24/87
Creating SAS Data Sets
Formatted Input cont. Pointer controls
+n moves pointer n positions
@n moves pointer to column n
Example:INPUT Flight 3. +4 Date mmddyy8. @20 Destination $3.;
- Flight - Number of length 3, columns 1 through 3
- DateDate format mmddyy (11/04/09) of length 8, columns 8 through 15
- DestinationCharacter of length 3, columns 20 through 22
7/27/2019 SAS Short Course Presentation 11-4-09
25/87
Creating SAS Data Sets
Mixed Formatted Input Styles Mix and match the previous 3 input styles
Example:
Raw Data: Great Smoky Mountains NC/TN 1926 520,269
INPUT ParkName $ 1-22 State $ Year @40 Acreage COMMA9.;
- Parkname - Character of length 22, columns 1 through 22
- State - Character, separated by spaces
- Year - Numeric, separated by spaces
- Acreage - Numeric with informat COMMA9., starts column 40
7/27/2019 SAS Short Course Presentation 11-4-09
26/87
Creating SAS Data Sets
Manually Entering Raw Data Files in SAS program1. Start Data step and name the SAS data set to be created
DATA library.SAS-data-set;
2. Provide instructions on how to read data from raw data fileINPUT input-specifications;
3. Manually enter raw data
DATALINES;
7/27/2019 SAS Short Course Presentation 11-4-09
27/87
Creating SAS Data Sets
Manually Entering Raw Data Files in SAS programExample:
Data uspresidents;
INPUT President $ Party $ Number;
DATALINES;
Adams F 2
Lincoln R 16
Grant R 18
Kennedy D 35
;
Run;
7/27/2019 SAS Short Course Presentation 11-4-09
28/87
Creating SAS Data Sets
Using the import data menu option
1. FileImport Data
2. Standard data sourceselect the file format
3. Specify file location or Browse to select file4. Create name for the new SAS data set and specify location
7/27/2019 SAS Short Course Presentation 11-4-09
29/87
Creating SAS Data Sets
Compatible file formats Microsoft Excel Spreadsheets
Microsoft Access Databases
Comma Separate Files (.csv)
Tab Delimited Files (.txt)
dBASE Files (.dbf) JMP data sets
SPSS Files
Lotus Spreadsheets
Stata Files
Paradox Files
7/27/2019 SAS Short Course Presentation 11-4-09
30/87
Creating SAS Data Sets
Enter raw data directly into a SAS data set1. ToolsTable Editor
2. Enter data manually into table
- Observations in each row
- Variables in each column
3. Left Click ColumnColumn Attributes
- Variable Name, Variable Label, TypeCharacter/Numeric,
Format, Informat
Note: Informats determine how raw data is read. Formats
determine how variable is displayed.4. Close window Save ChangesYes
Specify File name and directory
7/27/2019 SAS Short Course Presentation 11-4-09
31/87
Introduction to theSAS Environment
Questions/Comments
7/27/2019 SAS Short Course Presentation 11-4-09
32/87
Working With SAS Data Sets
1. Data Set Manipulation
2. Data Set Processing
3. Combining Data Sets
A. Concatenating/Appending
B. Merging
7/27/2019 SAS Short Course Presentation 11-4-09
33/87
Data Set Manipulation
Create a new SAS data set using an existing SAS data set asinput
Specify name of the new SAS data set after the DATA statement
Use SET statement to identify SAS data set being read
Syntax:
DATA output_data_set;
SET input_data_set;
;
RUN;
By default the SET statement reads all observations and variablesfrom the input data set into the output data set.
7/27/2019 SAS Short Course Presentation 11-4-09
34/87
Data Set Manipulation
Assignment Statements Evaluate an expression
Assign resulting value to a variable
General Form: variable = expression;
Example: miles_per_hour = distance/time;
SAS Functions
Perform arithmetic functions, compute simple statistics, manipulatedates, etc.
General Form: variable=function_name(argument1, argument2,); Example: Time_worked = sum(Day1,Day2, Day3, Day4, Day5);
7/27/2019 SAS Short Course Presentation 11-4-09
35/87
Data Set Manipulation
Selecting Variables Use DROP and KEEP to determine which variables are written to
new SAS data set.
2 Ways
DROP and KEEP as statements
Form: DROP = Variable1 Variable2;KEEP = Variable3 Variable4 Variable5;
DROP and KEEP options in SET statement
Form: SET input_data_set (KEEP=Var1);
7/27/2019 SAS Short Course Presentation 11-4-09
36/87
Data Set Manipulation
Conditional Processing Uses IF-THEN-ELSE logic
General Form: IF THEN ;
ELSE IF THEN ;
ELSE ;
is a true/false statement, such as:
Day1=Day2, Day1 > Day2, Day1 < Day2
Day1+Day2=10
Sum(day1,day2)=10
Day1=5 and Day2=5
7/27/2019 SAS Short Course Presentation 11-4-09
37/87
Data Set Manipulation
Conditional ProcessingSymbolic Mnemonic Example
= EQ IF region=Spain;
~= or ^= NE IF region ne Spain;
> GT IF rainfall > 20;
< LT IF rainfall lt 20;>= GE IF rainfall ge 20;
7/27/2019 SAS Short Course Presentation 11-4-09
38/87
Data Set Manipulation
Conditional Processing cont. If is true, is processed
ELSE IF and ELSE are only processed if is false
Only one statement specified using this form
Use DO and END statements to execute group of statements
General Form: IF THEN DO;
;
END;
ELSE DO;
;
END;
7/27/2019 SAS Short Course Presentation 11-4-09
39/87
Data Set Manipulation
Subsetting Rows (Observations) We will look at two ways
Using IF statement
Using WHERE option in SET statement
IF statement
Only writes observations to the new data set in which anexpression is true;
General Form: IF ;
Example: IF career = Teacher;
IF sex ne M;
In the second example, only observations where sex is not equalto M will be written to the output data set
7/27/2019 SAS Short Course Presentation 11-4-09
40/87
Data Set Manipulation
Subsetting Rows (Observations) cont. Where Option in SET statement
Use option to only read rows from the input data set in which theexpression is true
General Form: SET input_data_set (where=());
Example: SET vacation (where=(destination=Bermuda)); Only observations where the destination equals Bermuda will be
read from the input data set
Comparison
Resulting output data set is equivalent
IF statementall rows read from the input data set
Where optiononly rows where expression is true are read frominput data set
Difference in processing time when working with big data sets
7/27/2019 SAS Short Course Presentation 11-4-09
41/87
Data Set Manipulation
PROC SORT sorts data according to specified variables General Form: PROC SORT DATA=input_data_set ;
BY Variable1 Variable2;
RUN;
Sorts data according to Variable1 and then Variable2;
By default, SAS sorts data in ascending order
Number low to high
A to Z
Use DESCENDING statement for numbers high to low and letters Z to A BY City DESCENDING Population;
SAS sorts data first by city A to Z and then Population high to low
7/27/2019 SAS Short Course Presentation 11-4-09
42/87
Data Set Manipulation
Some Options NODUPKEY
Eliminates observations that have the same values for the BYvariables
OUT=output_data_set By default, PROC SORT replaces the input data set with the
sorted data set
Using this option, PROC SORT creates a newly sorted data setand the input data set remains unchanged
7/27/2019 SAS Short Course Presentation 11-4-09
43/87
Data Set Processing
Data Set Processing DATA steps read in data from existing data sets or raw data files one
row at a time, like a loop
DATA step reads data from the input data set in the following way:
1. Read in current row from input data set to Program Data
Vector (PDV)2. Process SAS statements
3. PDV to output data set
4. Set current row to the next row in the input data set
5. Iterate to Step 1
One row at a time is processed
Thus we cannot simply add the value of a variable in one row to thevalue in another row
7/27/2019 SAS Short Course Presentation 11-4-09
44/87
Data Set Processing
Data Set ProcessingExample Let the following be the input data set dfwlax:
Flight Date Dest FirstClass Economy
439 14955 LAX 20 137
921 14955 DFW 15 131
114 14956 LAX 15 85
982 14956 DFW 5 196
439 14957 LAX 14 116
982 14957 DFW 20 166
7/27/2019 SAS Short Course Presentation 11-4-09
45/87
Data Set Processing
Data Set ProcessingExample Consider the following submitted code:
DATA onboard;
SET dfwlax;
Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;
ELSE FirstClassFull=0;
RUN;
7/27/2019 SAS Short Course Presentation 11-4-09
46/87
Data Set Processing
Data Set ProcessingExample Execution of the Data Step
DATA onboard;Current SET dfwlax;
Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;
RUN;
PDV
Onboard
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 . .
Flight Date Dest FirstClass Economy Total FirstClassFull
7/27/2019 SAS Short Course Presentation 11-4-09
47/87
Data Set Processing
Data Set ProcessingExample Execution of the Data Step
DATA onboard;SET dfwlax;
Current Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;
RUN;
PDV
Onboard
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 .
Flight Date Dest FirstClass Economy Total FirstClassFull
7/27/2019 SAS Short Course Presentation 11-4-09
48/87
Data Set Processing
Data Set ProcessingExample Execution of the Data Step
DATA onboard;SET dfwlax;Total=FirstClass+Economy;
Current IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;
RUN;
PDV
Onboard
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
Flight Date Dest FirstClass Economy Total FirstClassFull
7/27/2019 SAS Short Course Presentation 11-4-09
49/87
Data Set Processing
Data Set ProcessingExample Execution of the Data Step
DATA onboard;SET dfwlax;Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;
Current RUN;
PDV
Onboard
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
7/27/2019 SAS Short Course Presentation 11-4-09
50/87
Data Set Processing
Data Set ProcessingExample Execution of the Data Step
Current DATA onboard;SET dfwlax;Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;
RUN;
PDV
Onboard
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 . .
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
7/27/2019 SAS Short Course Presentation 11-4-09
51/87
Data Set Processing
Data Set ProcessingExample Execution of the Data Step
DATA onboard;Current SET dfwlax;
Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;
RUN;
PDV
Onboard
Flight Date Dest FirstClass Economy Total FirstClassFull
921 14955 DFW 15 131 . .
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
7/27/2019 SAS Short Course Presentation 11-4-09
52/87
Data Set Processing
Data Set ProcessingExample Execution of the Data Step
DATA onboard;SET dfwlax;
Current Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;
RUN;
PDV
Onboard
Flight Date Dest FirstClass Economy Total FirstClassFull
921 14955 DFW 15 131 146 .
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
7/27/2019 SAS Short Course Presentation 11-4-09
53/87
Data Set Processing
Data Set ProcessingExample Execution of the Data Step
DATA onboard;SET dfwlax;
Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;
Current ELSE FirstClassFull=0;RUN;
PDV
Onboard
Flight Date Dest FirstClass Economy Total FirstClassFull
921 14955 DFW 15 131 146 0
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
7/27/2019 SAS Short Course Presentation 11-4-09
54/87
Data Set Processing
Data Set ProcessingExample Execution of the Data Step
DATA onboard;SET dfwlax;
Total=FirstClass+Economy;IF FirstClass=20 then FirstClassFull=1;ELSE FirstClassFull=0;
Current RUN;
PDV
Onboard
Flight Date Dest FirstClass Economy Total FirstClassFull
921 14955 DFW 15 131 146 0
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
921 14955 DFW 15 131 146 0
7/27/2019 SAS Short Course Presentation 11-4-09
55/87
Combining Data Sets
Concatenating (or Appending) Stacks each data set upon the other
If one data set does not have a variable that the other datasetsdo, the variable in the new data set is set to missing for theobservations from that data set.
General Form: DATA output_data_set;
SET data1 data2;
run;
PROC APPEND may also be used
7/27/2019 SAS Short Course Presentation 11-4-09
56/87
Combining Data Sets
Merging Data Sets One-to-One Match Merge
A single record in a data set corresponds to a single record in allother data sets
Example: Patient and Billing Information
One-to-Many Match Merge
Matching one observation from one data set to multipleobservations in other data sets
Example: County and State Information
Note: Data must be sorted before merging can be done
(PROC SORT)
7/27/2019 SAS Short Course Presentation 11-4-09
57/87
Combining Data Sets
One-to-One Match Merge Usually need at least one common variable between data sets
matching purposes
For the example, a patient ID would be needed
Do not need common variable if all data sets are in exactly the sameorder
General Form: DATA output_data_set;
MERGE input_data_set1 input_data_set2;
By variable1 variable2;
RUN;
7/27/2019 SAS Short Course Presentation 11-4-09
58/87
Combining Data Sets
One-to-One Match Merge Example:
Performance Goals
Code:
DATA compare;
MERGE performance goals;
BY month;
difference=sales-goal;
RUN;
Month Sales
1 8223
2 6034
3 4220
Month Goal
1 9000
2 6000
3 5000
7/27/2019 SAS Short Course Presentation 11-4-09
59/87
Combining Data Sets
One-to-One Match Merge Example cont.:
Compare
Month Sales Goal Difference
1 8223 9000 -777
2 6034 6000 34
3 4220 5000 -780
7/27/2019 SAS Short Course Presentation 11-4-09
60/87
Combining Data Sets
One-to-Many Match Merge Requires at least one common variable in the data sets for matching
purposes
For the example, State information is in both the state and countyfiles
If two data sets have variables with the same name, the variables inthe second data set will overwrite the variable in the first.
General Form: DATA output_data_set;
MERGE Data1 Data2 Data3;
BY Variable1 Variable2;
RUN:
7/27/2019 SAS Short Course Presentation 11-4-09
61/87
Combining Data Sets
One-to-Many Match Merge Example:
Videos Adjustment
Code:
DATA prices;
MERGE videos adjustment
BY category;
NewPrice=(1-adjustment)*sales;
RUN;
Category Sales
Aerobics 12.99
Aerobics 13.99
Aerobics 13.99
Step 12.99
Step 12.99
Weights 15.99
Category Adjustment
Aerobics .20
Step .30
Weights .25
7/27/2019 SAS Short Course Presentation 11-4-09
62/87
Combining Data Sets
One-to-One Many Merge Example cont.:
VideosCategory Sales Adjustment NewPrice
Aerobics 12.99 .20 10.39
Aerobics 13.99 .20 11.19
Aerobics 13.99 .20 11.19
Step 12.99 .30 9.09
Step 12.99 .30 9.09
Weights 15.99 .25 11.99
7/27/2019 SAS Short Course Presentation 11-4-09
63/87
Working With SAS Data Sets
Questions/Comments
7/27/2019 SAS Short Course Presentation 11-4-09
64/87
Summary Procedures
1. Print Procedure
2. Plot Procedure
3. Univariate Procedure
4. Means Procedure5. Freq Procedure
7/27/2019 SAS Short Course Presentation 11-4-09
65/87
Print Procedure
PROC PRINT is used to print data to the output window By default, prints all observations and variables in the SAS data set
General Form: PROC PRINT DATA=input_data_set
;
RUN;
Some Options
input_data_set (obs=n) - Specifies the number of observations tobe printed in the output
NOOBS - Suppresses printing observation number
LABEL - Prints the labels instead of variablenames
7/27/2019 SAS Short Course Presentation 11-4-09
66/87
Print Procedure
Optional SAS statements BY variable1 variable2 variable3;
Starts a new section of output for every new value of the BYvariables
ID variable1 variable2 variable3; Prints ID variables on the left hand side of the page and
suppresses the printing of the observation numbers
SUM variable1 variable2 variable3;
Prints sum of listed variables at the bottom of the output
VAR variable1 variable2 variable3;
Prints only listed variables in the output
7/27/2019 SAS Short Course Presentation 11-4-09
67/87
Plot Procedure
Used to create basic scatter plots of the data Use PROC GPLOT or PROC SGPLOT for more sophisticated plots
General Form: PROC PLOT DATA=input_data_set;
PLOT vertical_variable *horizontal_variable/;
RUN;
By default, SAS uses letters to mark points on plots
A for a single observation, B for two observations at the same point,etc.
To specify a different character to represent a point
PLOT vertical_variable * horizontal variable = *;
7/27/2019 SAS Short Course Presentation 11-4-09
68/87
Plot Procedure
To specify a third variable to use to mark points PLOT vertical_variable * horizontal_variable = third_variable;
To plot more than one variable on the vertical axis
PLOT vertical_variable1 * horizontal_variable=2
vertical_variable2 * horizontal_variable=1/OVERLAY;
7/27/2019 SAS Short Course Presentation 11-4-09
69/87
Univariate Procedure
PROC UNIVARIATE is used to examine the distribution of data Produces summary statistics for a single variable
Includes mean, median, mode, standard deviation, skewness,kurtosis, quantiles, etc.
General Form: PROC UNIVARIATE DATA=input_data_set ;VAR variable1 variable2 variable3;
RUN ;
If the variable statement is not used, summary statistics will be produced
for all numeric variables in the input data set.
7/27/2019 SAS Short Course Presentation 11-4-09
70/87
Univariate Procedure
Options include: PLOTproduces Stem-and-leaf plot, Box plot, and Normal
probability plot;
NORMALproduces tests of Normality
7/27/2019 SAS Short Course Presentation 11-4-09
71/87
Means Procedure
Similar to the Univariate procedure General Form: PROC MEANS DATA=input_data_set options;
;
RUN;
With no options or optional SAS statements, the Means procedure willprint out the number of non-missing values, mean, standard deviation,minimum, and maximum for all numeric variables in the input data set
7/27/2019 SAS Short Course Presentation 11-4-09
72/87
Means Procedure Options
Statistics Available
Note: The default alpha level for confidence limits is 95%. Use ALPHA= option tospecify different alpha level.
CLM Two-Sided Confidence Limits RANGE Range
CSS Corrected Sum of Squares SKEWNESS Skewness
CV Coefficient of Variation STDDEV Standard Deviation
KURTOSIS Kurtosis STDERR Standard Error of Mean
LCLM Lower Confidence Limit SUM Sum
MAX Maximum Value SUMWGT Sum of Weight Variables
MEAN Mean UCLM Upper Confidence Limit
MIN Minimum Value USS Uncorrected Sum of Squares
N Number Non-missing Values VAR Variance
NMISS Number Missing Values PROBT Probability for Students t
MEDIAN (or P50) Median T Students t
Q1 (P25) 25% Quantile Q3 (P75) 75% Quantile
P1 1% Quantile P5 5% Quantile
P10 10% Quantile P90 90% Quantile
P95 95% Quantile P99 99% Quantile
7/27/2019 SAS Short Course Presentation 11-4-09
73/87
Means Procedure
Optional SAS Statements VAR Variable1 Variable2;
Specifies which numeric variables statistics will be produced for
BY Variable1 Variable2;
Calculates statistics for each combination of the BY variables
Output out=output_data_set;
Creates data set with the default statistics
7/27/2019 SAS Short Course Presentation 11-4-09
74/87
FREQ Procedure PROC FREQ is used to generate frequency tables
Most common usage is create table showing the distribution of categoricalvariables
General Form: PROC FREQ DATA=input_data_set;
TABLE variable1*variable2*variable3/;
RUN;
Options
LISTprints cross tabulations in list format rather than grid
MISSINGspecifies that missing values should be included in the tabulations
OUT=output_data_setcreates a data set containing frequencies, list format
NOPRINTsuppress printing in the output window
Use BY statement to get percentages within each category of a variable
7/27/2019 SAS Short Course Presentation 11-4-09
75/87
Summary Procedures
Questions/Comments
7/27/2019 SAS Short Course Presentation 11-4-09
76/87
Statistical Analysis Procedures
1. CorrelationPROC CORR
2. RegressionPROC REG
3. Analysis of VariancePROC ANOVA
4. Chi-square Test of AssociationPROC FREQ5. General Linear ModelsPROC GENMOD
7/27/2019 SAS Short Course Presentation 11-4-09
77/87
CORR Procedure PROC CORR is used to calculate the correlations between variables
Correlation coefficient measures the linear relationship between two variables
Values Range from -1 to 1
Negative correlation - as one variable increases the other decreases
Positive correlationas one variable increases the other increases
0no linear relationship between the two variables 1perfect positive linear relationship
-1perfect negative linear relationship
General Form: PROC CORR DATA=input_data_set
VAR Variable1 Variable2;With Variable3;
RUN;
CO
7/27/2019 SAS Short Course Presentation 11-4-09
78/87
CORR Procedure
If the VAR and WITH statements are not used, correlation is computedfor all pairs of numeric variables
Options include
SPEARMANcomputes Spearmans rank correlations
KENDALLcomputes Kendalls Tau coefficients
HOEFFDINGcomputes HoeffdingsD statistic
REG P d
7/27/2019 SAS Short Course Presentation 11-4-09
79/87
REG Procedure PROC REG is used to fit linear regression models by least squares estimation
One of many SAS procedures that can perform regression analysis
Only continuous independent variables (Use GENMOD for categorical variables)
General Form:
PROC REG DATA=input_data_set
MODEL dependent=independent1 independent2/;;
RUN;
PROC REG statement options include
PCOMIT=m - performs principle component estimation with m principlecomponents
CORRdisplays correlation matrix for independent variables in the model
REG P d
7/27/2019 SAS Short Course Presentation 11-4-09
80/87
REG Procedure
MODEL statement options include SELECTION=
Specifies a model selection procedure be conductedFORWARD, BACKWARD, and STEPWISE
ADJRSQ - Computes the Adjusted R-Square MSEComputes the Mean Square Error
COLLINperforms collinearity analysis
CLBcomputes confidence limits for parameter estimates
ALPHA=
Sets significance value for confidence and prediction intervalsand tests
REG P d
7/27/2019 SAS Short Course Presentation 11-4-09
81/87
REG Procedure
Optional statements include PLOT Dependent*Independent1generates plot of data
ANOVA P d
7/27/2019 SAS Short Course Presentation 11-4-09
82/87
ANOVA Procedure
PROC ANOVA performs analysis of variance Designed for balanced data (PROC GLM used for unbalance data)
Can handle nested and crossed effects and repeated measures
General Form: PROC ANOVA DATA=input_data_set ;
CLASS independent1 independent2;
MODEL dependent=independent1 independent2;;
Run;
Class statement must come before model statement, used to defineclassification variables
ANOVA P d
7/27/2019 SAS Short Course Presentation 11-4-09
83/87
ANOVA Procedure
Useful PROC ANOVA statement optionOUTSTAT=output_data_set Generates output data set that contains sums of squares,
degrees of freedom, statistics, and p-values for each effect in themodel
Useful optional statementMEANS independent1/
Used to perform multiple comparisons analysis
Set to:
TUKEYTukeysstudentized range test
BONBonferroni t test
Tpairwise t tests DuncanDuncans multiple-range test
ScheffeScheffesmultiple comparison procedure
FREQ P d
7/27/2019 SAS Short Course Presentation 11-4-09
84/87
FREQ Procedure
PROC FREQ can also be used to perform analysis with categorical data General Form: PROC FREQ DATA=input_data_set;
TABLE variable1 variable2/;
RUN;
TABLE statement options include: AGREE Tests and measures of classification agreement including McNemarstest,
Bowkerstest, Cochrans Q test, and Kappa statistics
CHISQ - Chi-square test of homogeneity and measures of association
MEASURE - Measures of association include Pearson and Spearman correlation,gamma, Kendalls Tau, Stuarts tau, SomersD, lambda, odds ratios, riskratios, and confidence intervals
GENMOD P d
7/27/2019 SAS Short Course Presentation 11-4-09
85/87
GENMOD Procedure
PROC GENMOD is used to estimate linear models in which the responseis not necessarily normal
Logistic and Poisson regression are examples of generalized linearmodels
General Form:
PROC GENMOD DATA=input_data_set;
CLASS independent1;
MODEL dependent = independent1 independent2/
dist=
link=;
run;
GENMOD P d
7/27/2019 SAS Short Course Presentation 11-4-09
86/87
GENMOD Procedure DIST = - specifies the distribution of the response variable
LINK= - specifies the link function from the linear predictor to the mean ofthe response
ExampleLogistic Regression
DIST = binomial LINK = logit
ExamplePoisson Regression
DIST = poisson
LINK = log
St ti ti l A l i P d
7/27/2019 SAS Short Course Presentation 11-4-09
87/87
Statistical Analysis Procedures
Questions/Comments