23
1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni [email protected] Teaching Assistant : Sorina Eftim [email protected] Lecture/Lab: Room 3025 WEB site: www.biostat.jhsph.edu/bstcourse/bio632/default.htm e-mail: [email protected] to submit exercises 3 Using the PC labs SAS Version 9.0 requires basic Windows skills Set up class folder on your media (thumb drive) download files from the website for lab into class folder bring thumb drive to class 4 Text : ‘The Little SAS Book 3rd edition’ Other References: SAS online documentation Online tutor SAS system help Many, many SAS manuals SAS website www.sas.com 5 WHAT IS SAS? Integrated system of software products began as software package for statistical analysis data management reporting and graphics analytic etc. 6 COURSE OBJECTIVES to introduce and develop skills in SAS; a statistical package used in research data analysis develop the skills necessary to create and modify a SAS data set and perform statistical analyses

Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni [email protected] Teaching

  • Upload
    doandat

  • View
    233

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

1

Introduction to SASStatistical Package

Biostatistics 140.605.11Lecture 1

2

Instructor:Lucy Meoni [email protected] Assistant :Sorina Eftim [email protected]/Lab: Room 3025

WEB site: www.biostat.jhsph.edu/bstcourse/bio632/default.htm

e-mail: [email protected] to submit exercises

3

Using the PC labsSAS Version 9.0

• requires basic Windows skills• Set up class folder on your media

(thumb drive)• download files from the website for lab

into class folder• bring thumb drive to class

4

Text : ‘The Little SAS Book 3rd edition’

Other References:

SAS online documentation

Online tutor

SAS system help

Many, many SAS manuals

SAS website www.sas.com

5

WHAT IS SAS?

Integrated system of software products

• began as software package for statistical analysis

• data management

• reporting and graphics

• analytic

• etc.6

COURSE OBJECTIVES

• to introduce and develop skills in SAS; a statistical package used in research data analysis

• develop the skills necessary to create and modify a SAS data set and perform statistical analyses

Page 2: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

2

7

COURSE TOPIC SEQUENCE• introduction to SAS

• data definition and restructuring

• dates and functions

• file combination

• arrays and loops

• statistical procedures

8

REFERENCE

The Little SAS Book Chapter 1 and 2

WEB site:

www.biostat.jhsph.edu/bstcourse/bio632/default.htm

9

TOPICS:• Introduction to SAS• SAS windowing environment• SAS tables (data sets)• SAS libraries• Temporary vs permanent files• SAS programs• Creating SAS tables − IMPORT wizard− StatTransfer

10

SAS• use statements to write a series of

instructions called a SAS program• not command line driven (STATA)• statements are written using the SAS

language (a programming language that you use to manage your data).

• sequence of statements executed in order

• SAS procedures are software tools for data analysis and reporting.

11

Introduction to SAS

• describe windowing environment• introduce the HELP menu• describe SAS data sets and libraries

12

SAS Windowing Environment

• windowing system for editing and executing SAS programs

• interactive full screen

• collection of windows for editing programs, executing programs and displaying results

• five basic SAS windows

Page 3: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

3

13

Windowing Mode

Windowing mode is a facility that enables you to enter and execute SAS programs and view the results in an interactive environment.An interactive environment permits the program to be processed immediately when submitted for execution.

Navigating the SAS Windowing Environment

15

Starting SAS

• Start SAS from the START button, PROGRAMS, The SAS System for Windows V9 system

• interactive full screen

• Enhanced Editor window, LOG window, and Explorer windows appear

• activate window by clicking within the window or by using the task bar

16

Interactive windows enable you to interface with SAS.SAS Windowing Environment

17

The SAS windowing environment is made up of a collection of windows.

There are three primary windows in the windowing environment.

3 2 1

18

Commands are used to navigate among the various windows of the SAS windowing environment and are used to execute a program.

Depending upon the operating environment, commands can be issued by ...

Page 4: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

4

19

Selecting from pull-down menu

Typing the command

Right click in window to bring up menu

Clicking on a tool button

Using function keys (F1 - F12)

20

The Enhanced Program Editor window enables SAS program code to be

• entered from the keyboard OR

• read in from file using File, Open menu

• submitted for execution.

• SAS program elements are color-coded, including procedures, keywords, numeric and string constants, undefined keywords.

21 22

ENHANCED PROGRAM EDITOR• text editor

• write and edit programs

• File, Open on menu to read in an existing SAS program file (lecture1.sas)

• submit programs (use SUBMIT icon)

• save program statements to file with extension .sas

23

ENHANCED PROGRAM EDITOR• initial Editor window is Editor-untitled1

• asterisk (*) appears in title bar to indicate file has not been saved

• if you open a file or save contents to a file, the title changes to the file name; asterisks disappears

• if contents of window is modified, asterisk appears to indicate the changes have not been saved

24

ENHANCED PROGRAM EDITOR• an ASCII editor that uses visual aides to

help you write and debug your SAS programs.

• SAS program elements are color- coded, including procedures, keywords, dates, numeric and string constants and more.

• multiple windows possible

• Open new window by clicking NEW icon on Toolbar or Select EDIT, Clear All from the menu bar

Page 5: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

5

25 26

The Log window displays

• the SAS program code submitted for execution

• messages about the SAS session; indicates the status of the program compilation and execution.

• New log is appended to the last log in window

27

LOG Window

• contains the compilation and execution results of DATA

• contains submitted program statements• messages from SAS about compilation

and execution -notes, warnings, errors• save contents of this window to a file

with extension .log (File, Save menu)• clear window by clicking NEW icon on

Toolbar or Select EDIT, Clear All from the menu bar 28

29

The Output window

• displays reports generated by the SAS program.

• New output is appended to the last output in the window

30

OUTPUT WINDOW

• printable results from procedures

• automatically opens and moves to the front of display when output is generated

• save contents of this window to a file with extension .lst

• if closed, need to reopen to see output Select View, output to reopen

Page 6: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

6

31

OUTPUT WINDOW

• EMPTY if program did not run CHECK LOG window for errors

• indexed in the RESULTS window

• clear window by clicking on NEW icon on Toolbar or Select EDIT, Clear all on the menu bar

32

33 34

RESULTS WINDOW

• used to manage output window; table of contents

• lists each part of your results in an outline form

• possible to save and/or print sections of results by right-clickingon section

• view a section by double-clicking

35 36

EXPLORER WINDOW

• provides access to SAS files and libraries

• similar to Windows Explorer• move, copy, delete files

• open sas data sets to view

Page 7: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

7

37

SAS Help and Documentation

SAS Help Menu• Using This Window is task-oriented help for the active window.

•SAS Help and Documentation gives you access to SAS 9 Help, which is a complete guide to syntax, examples, procedures, concepts, and what's new for Version 9.

• Tutorials (such as Getting Started with SAS) are listed on the Help menu when they are available.

SAS Help Menu

• Learning SAS Programming gives you access to SAS OnlineTutor if it is licensed at your site.

• SAS on the Web provides links to information on the web, including Technical Support and Frequently Asked Questions.

41

SAS Data Sets

Page 8: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

8

SAS Data Sets• data set (table) contains descriptor portion and data values.

• organized as table of observations (rows) and variables (columns); may have index (optional) which enables SAS to locate records.

• stored in SAS library - Collection of files (In windows a group of SAS files in the same folder or directory)

44

SAS documentation and text in the SAS windowing environment use the following terms interchangeably:

SAS Data Set SAS Table

Variable Column

Observation Row

SAS Data Set Terminology

45

General data set information * data set name * data set label* date/time created * storage information* number of observations

Information for each variable* Name * Type * Length * Position* Format * Informat * Label

Descriptor Portion

Data Portion

SAS data sets have a descriptor portion and a data portion.

SAS Data Sets

46

Numeric values

Varia

ble

nam

esVaria

ble

values

lname gender BMI

Richardson 1 25.6558Lowrey 2 32.8050Tierney 1 32.8873Sommers 2 25.2014Kegan 2 23.3179

The data portion of a SAS data set is a rectangular table of character and/or numeric data values.

Character values

SAS Data Sets: Data Portion

SAS Data Libraries

48

Objectives

– Explain the concept of a SAS data library.– State the difference between a permanent

library and a temporary library.– Explore libraries using EXPLORER window

Page 9: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

9

49

A SAS data library is a collection of SAS files that are recognized as a unit by SAS.

Windows: c:\mysasfilesUNIX: /users/dept/mysasfiles

SAS Data Library

SAS File

SAS File

SAS File

A SAS data set is a type of SAS file.

Directory-based A SAS data library isSystems a directory.

SAS Data Libraries

50

FILES

LIBRARIES

You can think of a SAS data library as a drawer in a filing cabinet and a SAS data set as one of the file folders in the drawer.

SAS Data Libraries

51

Regardless of which host operating system you use, you identify SAS data libraries by assigning each a library reference name (libref).

libref

Assigning a Libref

52

work

sasuser

sashelp

work - temporary library

sasuser - permanent library

When you invoke SAS, you automatically have access to a temporary and a permanent SAS data library.

You can create and access your own permanent libraries.

sashelp - permanent library

SAS Data Libraries

53

SAS Libraries

• Sasuser - permanent library that contains SAS files in the Profile catalog that store your personal settings.

• Work - temporary library for files that do not need to be saved after the session.

• Sashelp - permanent library that contains sample data and other files that control how SAS works at your site. This is a read-only library

54

Assign Library

• On the toolbar, click the New Library tool The New Library window opens.

• In the Name box, type MyLib.Library names - are limited to 8 characters - must start with a letter or underscore - can contain only letters, numerals, or

underscores.

Page 10: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

10

55

Assign Library• Click Browse

- Select the default location or select another location in your operating environment.

- files that you save to the Mylib library will be saved in the directory or folder that you designate in the Path box.

- Click OK

• Click OK to close the New Library window56

Copy and Open a SAS Data Set

• Copy the PrdSale table in the SASHelplibrary to mylib library using EXPLORER window

• View the data values- In the Explorer window, double-click the

PrdSale table in the mylib library. The table opens in the VIEWTABLE window.

57

Descriptor Portion

• The descriptor portion of a SAS data set contains information about the data set, including

- the name of the data set - the date and time the data set was created

- the number of observations - the number of variables.

58

Descriptor Portion

• You can see this information by viewing the general properties of a data set.

• In the Explorer window, right-clickthe PrdSale table and select Properties.

59 60

SAS DATA SET NAMES• 2 level names • first level is the libref

- refers to location of the file- Default - file stored in WORK library

• second level is the member name that identifies the data set within the library

Example :

work.sales-located in the WORK library

mylib.sales-located in the MYLIB library

Page 11: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

11

61

The first name (libref) refers to the library.

Every SAS file has a two-level name:

The second name (filename) refers to the file in the library.

The data set ia.sales is a SAS file in the ia library.libref.filename

sasuser

work

ia

sales

Two-level SAS Filenames

62

work.employee employee

Temporary SAS Filename

• The libref work can be omitted when you refer to a file in the work library. The default libref is work if the libref is omitted.

files are deleted when SAS session ends

63

Exercise I-III ONLYRunning SAS Programs

65

– Invoke the SAS System and include a SAS program into your session.

– Submit a program and browse the results.

– Navigate the SAS windowing environment.

Objectives

66

Starting SAS

• Start SAS from the START button, PROGRAMS, The SAS System for Windows V9 system

• interactive full screen

• Enhanced Editor window, LOG window, and Explorer windows appear

• activate window by clicking within the window

Page 12: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

12

67

Class1_1.SAS• Start SAS from the START button,

PROGRAMS, The SAS System for Windows V9 system

• File, Open

• Class1_1.sas

• Submit by using SUBMIT button on Toolbar or Run, Submit on the Menu

• Note : submit sections of a program by highlighting before submission 68

LIBNAME libref 'SAS-data-library' <options>;

Rules for naming a libref: must be 8 characters or lessmust begin with a letter or underscoreremaining characters are letters, numbers, or underscores.

Assigning a Libref• Use the LIBNAME statement to assign a libref to a

SAS data library.• Global statement - in effect for the entire SAS

session until replaced

69

• When you submit the LIBNAME statement, a connection is made between a libref in SAS and the physical location of files on your operating system (instead of using menu)

Windows 'c:\workshop\winsas\prog1'

Making the Connection

70

71 72

When you execute a SAS program, the output generated by SAS is divided into two major parts:

SAS log contains information about the processing of the SAS program, including any warning and error messages.

SAS output contains reports generated by SAS procedures and DATA steps.

SAS Program Execution

Page 13: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

13

73

• descriptor portion of a SAS data set contains– general information about the SAS data set (such

as data set name and number of observations)– variable attributes (name, type, length, position,

informat, format, label).

• CONTENTS procedure displays the descriptor portion of a SAS data set.

Browsing the Descriptor Portion

74

• General form of the CONTENTS procedure:

• Example:

PROC CONTENTS DATA=SAS-data-set;RUN;

proc contents data=work.newclass;run;

Browsing the Descriptor Portion

75

The CONTENTS Procedure

Data Set Name WORK.NEWCLASS Observations 5Member Type DATA Variables 7Engine V9 Indexes 0Created Friday, March 18, 2005 04:02:40 PM Observation Length 64Last Modified Friday, March 18, 2005 04:02:40 PM Deleted Observations 0Protection Compressed NOData Set Type Sorted NOLabel

Alphabetic List of Variables and Attributes

# Variable Type Len Format Informat

7 BMI Num 82 baseage Num 8 BEST12. F12.3 gender Num 8 BEST12. F12.6 height Num 8 BEST12. F12.1 lname Char 15 $F15. $F15.4 race Num 8 BEST12. F12.5 weight Num 8 BEST12. F12.

Partial PROC CONTENTS OUTPUT

76

•The PRINT procedure displays the data portion of a SAS data set.

•By default, PROC PRINT displays– all observations– all variables– an Obs column on the left side.

Browsing the Data Portion

77

• General form of the PRINT procedure:•

• Example:

PROC PRINT DATA=SAS-data-set;RUN;

proc print data=work.newclass;run;

c02s3d1

Browsing the Data Portion

78

The SAS System

Obs lname baseage gender race weight height BMI

1 Richardson 34 1 2 189 72 25.65582 Lowrey 29 2 1 235 71 32.80503 Tierney 32 1 2 229 70 32.88734 Sommers 33 2 1 156 66 25.20145 Kegan 19 2 1 140 65 23.3179

PROC PRINT Output

Page 14: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

14

Introduction to SAS Programs

SAS Programs

• manipulate data

• store and retrieve information

• perform statistical analysis

• create reports.

• composed of Data and Proc steps

81

DATA steps are typically used to create SAS data sets.

PROC steps are typically used to process SAS data sets (that is, generate reports and graphs, edit data, and sort data).

A SAS program is a sequence of steps that the user submits for execution.

RawData

DATAStep

Report

SASData Set

SASData Set

PROCStep

SAS Programs

82

DATA steps typically create or modify SAS data sets, but they can also be used to produce custom-designed reports. For example, you can use DATA steps to

• put your data into a SAS data set

• compute the values for new variables

• check for and correct errors in your data

• produce new SAS data sets by subsetting, merging, and updating existing data sets

DATA Step

83

PROC (procedure) steps typically analyze and process data in the form of a SAS data set; control a library of prewritten routines (procedures) that perform tasks on SAS data sets, such as listing, sorting, and summarizing :

• print a report

• produce descriptive statistics

• create a tabular report

• produce plots and charts.

PROC Step

84

DATA newclass; SET mylib.class;BMI=(weight*.454)/((height*.0254)**2);

PROC PRINT DATA=newclass;

PROC CONTENTS DATA=newclass;RUN;

DATA Step

PROC Steps

SAS Program : Class1_1.SAS

Page 15: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

15

85

SAS steps begin with aDATA statement

PROC statement.

SAS detects the end of a step when it encounters

a RUN statement (for most steps)

a QUIT statement (for some procedures)

the beginning of another step (DATA statement or PROC statement).

Step Boundaries

86

DATA newclass; SET mylib.class;BMI=(weight*.454)/((height*.0254)**2);

PROC PRINT DATA=newclass;

PROC Contents DATA=newclass;RUN;

Step Boundaries : Class1_1.sas

87

DATA newclass; SET mylib.class;BMI=(weight*.454)/((height*.0254)**2);RUN; *optional;PROC PRINT DATA=newclass;RUN; *optional;PROC contents DATA=newclass;RUN; *optional;

RUN; * required;

Step Boundaries : Class1_1.sas

88

SAS statements• usually begin with an identifying keyword• always end with a SEMICOLON !!!!!!!

DATA newclass; SET mylib.class;BMI=(weight*.454)/((height*.0254)**2);

PROC PRINT DATA=newclass;

PROC CONTENTS DATA=newclass;RUN;

SAS Syntax Rules

89

• SAS statements are free-format.• One or more blanks or special characters can

be used to separate words.• They can begin and end in any column.• A single statement can span multiple lines.• Several statements can be on the same line.• Unconventional Spacing

...

DATA newclass; SET mylib.class;BMI=(weight*.454)/((height*.0254)**2);

PROC PRINT DATA=newclass;PROC CONTENTS DATA=newclass;RUN;

SAS Syntax Rules

90

Good spacing makes the program easier to read. Conventional Spacing

DATA newclass; SET mylib.class;BMI=(weight*.454)/((height*.0254)**2);

PROC PRINT DATA=newclass;

PROC CONTENTS DATA=newclass;RUN;

SAS Syntax Rules

Page 16: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

16

91

DATA Step

PROC Step

DATA Step

PROC Step

PROC Step

SAS Program may consist of a DATA Step

Or a PROC Step

Or any combination of DATA and PROC Steps

92

The SASProgramming

ProcessCreate a SAS Program

Enter the SAS Program Code

Process the SAS Program Code

Review the Results

Debug or Modify

Define the Need

Mastering Fundamental Concepts

94

– Define a SAS variable.

– Identify a missing value and a SAS date value.

– State the naming conventions for SAS data sets and variables.

Objectives

95

SAS names • can be 32 characters long.• can be uppercase, lowercase, or mixed-

case.• must start with a letter or underscore.

Subsequent characters can be letters, underscores, or numeric digits.

SAS Data Set and Variable Names

96

• Select the valid default SAS names.

data5mon

_5monthsdata

five months data

fivemonthsdata

data#5

Valid SAS Names

five_month_data

Page 17: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

17

97

Select the valid default SAS names.

data5mon_5monthsdata

five months datafivemonthsdata

data#5

Valid SAS Names

five_month_data

98

SAS Variable ValuesThere are two types of variables:Character contain any value: letters, numbers,

special characters, and blanks. Character values are stored with a length of 1 to 32,767bytes. One byte equals one character.

Numeric stored as floating point numbers in 8bytes of storage by default. Eight bytes of floating point storage provide space for 16 or 17 significant digits. You are not restricted to 8 digits.

99

• SAS stores date values as numeric values.• A SAS date value is stored as the number of

days between January 1, 1960, and a specific date.01JAN1959 01JAN1960 01JAN1961

store-365 0 366

display

01/01/1959 01/01/1960 01/01/1961

SAS Date Values

100

LastName FirstName JobTitle Salary

TORRES JAN Pilot 50000LANGKAMM SARAH Mechanic 80000SMITH MICHAEL Mechanic . WAGSCHAL NADJA Pilot 77500TOERMOEN JOCHEN 65000

A value must exist for every variable for each observation. Missing values are valid values.

A numeric missing value is displayed as a period.

A character missing value is displayed as a blank.

Missing Data Values

101

Type * to begin a comment.Type your comment text.Type ; to end the comment.

* Create work.newclass data set and add bmi;DATA newclass; SET mylib.class;

BMI=(weight*.454)/((height*.0254)**2);

* Produce listing report of newclass ;PROC PRINT DATA=newclass;RUN;

SAS Comments

102

EXERCISES IV-VII

Page 18: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

18

Reading SAS Data Sets and Creating Variables

104

Objectives

−create a SAS data set using another SAS data set as input.

−create SAS variables.

−use operators and SAS functions to manipulate data values.

−control which variables are included in a SAS data set.

105

Create a temporary SAS data set named newclass2 from the permanent SAS data named mylib.class2 and create a variable that represents the bmi.

Compute BMI from the variable Height and Weight.

New Variable

Reading a SAS Data Set: Class1_2.SAS

Lname Birthdate Baseage Weight Height Richardson 2225 34 189 72

Lowrey 3734 29 235 71

Tierney . 32 229 70

Sommers -205 33 156 66

BMI

25.6558

32.8050

32.8873

25.2014

SAS date valuesmylib.class2

106

• To create a SAS data set using a SAS data set as input, you must use a DATA statement to start a DATA step and name the SAS data set being created (output data set: newclass2)

• SET statement to identify the SAS data set being read (input data set: myib.class2).

Reading a SAS Data Set

107

• By default, the SET statement reads all of the observations and variables in the input SAS data set

Reading a SAS Data Set

DATA libref.filename1;SET libref.filename2;additional SAS statements

RUN;

108

libname mylib 'e:\summer2006\class1';

data newclass2; set mylib.class2; BMI=(weight*.454)/((height*.0254)**2);

proc print data=newclass2;run;

SAS Program : Class1_2.SAS

Page 19: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

19

109

Creating a New Variable

To create a variable, you must use anassignment statement to use the values of the variables Weight and Height and assign the result of the calculation to the variable BMI

*** WITHIN A DATA STEP ***

110

• An assignment statement evaluates an expression assigns the resulting value to a variable.

General form of an assignment statement:

Assignment Statements

variable=expression;

111

Define the Variable

valuevalue

expressionnew_variable_name =

EVALUATE

ASSIGN

BMI=(weight*.454)/((height*.0254)**2);112

EXAMPLES

• x=3; assigns 3 to X for all observations

• y=age/10; assigns the value of age divided by 10 to each observation

• Clinic=’Boston’; assigns the character constant Boston to the variable clinic for each observation

NOTE : X Y and bmi are numeric variables; clinic is a character variable

113

Operators are symbols that request arithmetic calculationsSAS functions.

Operands arevariable namesconstants.

An expression contains operands and operators that form a set of instructions that produce a value.

SAS Expressions

114

Selected operators for basic arithmetic calculations in an assignment statement:

Operator Action Example Priority

+ Addition Sum=x+y; III

- Subtraction Diff=x-y; III

* Multiplication Mult=x*y; II

/ Division Divide=x/y; II

** Exponentiation Raise=x**y; I

- Negative prefix Negative=-x; I

Using Operators

Page 20: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

20

115

Name the New Variable

Rules for naming SAS variables:• 1 to 32 characters in length• start with a letter (A through Z) or an

underscore (_)• continue with any combination of

numbers, letters, or underscores• can be stored in mixed-case.

116

DATA Step -- Two Phase Process

Phase 1 - Compile phasecorresponds to descriptor portion

Phase 2 - Execute phasecorresponds to data portion

117

CompilationDuring compilation, SASchecks code for syntax errorstranslates code to machine codeestablishes an area of memory called the

input buffer if reading raw dataestablishes an area of memory called the

Program Data Vector (PDV)assigns required attributes to variablescreates the descriptor portion of the new data

set.118

ExecutionDuring the execution phase, SASinitializes the PDV to missingreads data values into the PDVcarries out assignment statements and

conditional processing writes the observation in the PDV to the output

SAS data set at the end of the DATA step (by default)

returns to the top of the DATA stepinitializes any variables that are not read from a

SAS data set to missing (by default)repeats the process.

119

Lname $ 15

Birthdate N 8

Baseage N 8

Weight N 8

Height N 8

PDV

Compiling the DATA Step

libname mylib 'SAS-data-library';data newclass2;

set mylib.class2;BMI=(weight*.454)/((height*.0254)**2);run;

...120

lname$ 15

BirthdateN 8

Baseage N 8

Weight N 8

HeightN 8

BMI N 8

PDV

Compiling the DATA Step

libname mylib 'SAS-data-library';data newclass2;

set mylib.class2;BMI=(weight*.454)/((height*.0254)**2);

run;

...

Page 21: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

21

121

Lname Birthdate Baseage Weight Height Bmi . . . . .

Mylib.class2

PDV

newclass2

Executing the DATA Step

data newclass2;set mylib.class2;

BMI=(weight*.454)/((height*.0254)**2);run;

Lname Birthdate Baseage Weight Height Bmi

...

Lname Birthdate Baseage Weight HeightRichardson 2225 34 189 72 Lowrey 3734 29 235 71 Tierney . 32 229 70

122

Lname Birthdate Baseage Weight Height Bmi Richardson 2225 34 189 72 .

Mylib.class2

PDV

newclass2

Executing the DATA Step

data newclass2;set mylib.class2;

BMI=(weight*.454)/((height*.0254)**2);run;

Lname Birthdate Baseage Weight Height Bmi

...

Lname Birthdate Baseage Weight HeightRichardson 2225 34 189 72 Lowrey 3734 29 235 71 Tierney . 32 229 70

123

Lname Birthdate Baseage Weight Height Bmi Richardson 2225 34 189 72 25.6558

Mylib.class2

PDV

newclass2

Executing the DATA Step

data newclass2;set mylib.class2;

BMI=(weight*.454)/((height*.0254)**2);run;

Lname Birthdate Baseage Weight Height Bmi

...

Lname Birthdate Baseage Weight HeightRichardson 2225 34 189 72 Lowrey 3734 29 235 71 Tierney . 32 229 70

124

Lname Birthdate Baseage Weight Height Bmi Richardson 2225 34 189 72 25.6558

Lname Birthdate Baseage Weight Height Bmi Richardson 2225 34 189 72 25.6558

Executing the DATA Step

data newclass2;set mylib.class2;

BMI=(weight*.454)/((height*.0254)**2);run;

Automatic return

...

PDV

newclass2

Mylib.class2Lname Birthdate Baseage Weight HeightRichardson 2225 34 189 72 Lowrey 3734 29 235 71 Tierney . 32 229 70

Automatic output

125

Lname Birthdate Baseage Weight HeightRichardson 2225 34 189 72 Lowrey 3734 29 235 71 Tierney . 32 229 70

Executing the DATA Step

data newclass2;set mylib.class2;

BMI=(weight*.454)/((height*.0254)**2);run;

Reinitialize BMI to missing

...

PDV

newclass2

Mylib.class2

Lname Birthdate Baseage Weight Height Bmi Richardson 2225 34 189 72 .

Lname Birthdate Baseage Weight Height Bmi Richardson 2225 34 189 72 25.6558

126

Executing the DATA Step

data newclass2;set mylib.class2;

BMI=(weight*.454)/((height*.0254)**2);run;

...

PDV

newclass2

Mylib.class2Lname Birthdate Baseage Weight HeightRichardson 2225 34 189 72 Lowrey 3734 29 235 71 Tierney . 32 229 70

Lname Birthdate Baseage Weight Height Bmi Lowrey 3724 29 235 71 .

Lname Birthdate Baseage Weight Height Bmi Richardson 2225 34 189 72 25.6558

Page 22: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

22

127

Executing the DATA Step

data newclass2;set mylib.class2;

BMI=(weight*.454)/((height*.0254)**2);run;

...

PDV

newclass2

Mylib.class2Lname Birthdate Baseage Weight HeightRichardson 2225 34 189 72 Lowrey 3734 29 235 71 Tierney . 32 229 70

Lname Birthdate Baseage Weight Height Bmi Lowrey 3724 29 235 71 32.8050

Lname Birthdate Baseage Weight Height Bmi Richardson 2225 34 189 72 25.6558

128

Executing the DATA Step

data newclass2;set mylib.class2;

BMI=(weight*.454)/((height*.0254)**2);run;

...

PDV

newclass2

Mylib.class2Lname Birthdate Baseage Weight HeightRichardso 2225 34 189 72 Lowrey 3734 29 235 71 Tierney . 32 229 70

Lname Birthdate Baseage Weight Height Bmi Lowrey 3724 29 235 71 32.8050

Lname Birthdate Baseage Weight Height Bmi Richardson 2225 34 189 72 25.6558

Lowrey 3724 29 235 71 32.8050

Automatic return

Automatic output

129Why is BMI missing in observation 5?

Assignment Statementsproc print data=newclass2;

format birthdate date9.;var lname birthdate height weight bmi;

run;

Obs lname birthdate height weight BMI

1 Richardson 03FEB1966 72 189 25.65582 Lowrey 23MAR1970 71 235 32.80503 Tierney . 70 229 32.88734 Sommers 10JUN1959 66 156 25.20145 Kegan 08AUG1980 . 140 .

130

Printing DatesPROC PRINT data=example;Var id birthdate;Format mmddyy10.; SPECIFY FORMATRun;Obs id birthdate

1 14632 03/04/1993 2 67456 05/28/1991

Without format1 14632 12116 2 67456 11470

131

SAMPLE SAS DATA FORMATS

Print: SAS format:

040197 MMDDYY6.04/01/97 MMDDYY8.04/01/1997 MMDDYY10.010497 DDMMYY6.01/04/97 DDMMYY8. 01/04/1997 DDMMYY10. 01APR97 DATE7.01APR1997 DATE9.

132

EXERCISE VIII

Page 23: Introduction to SAS Statistical Package - Departments · 1 Introduction to SAS Statistical Package Biostatistics 140.605.11 Lecture 1 2 Instructor: Lucy Meoni lmeoni@jhsph.edu Teaching

23

Portions © Copyright 2001 SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with

permission from SAS Institute Inc., Cary NC, USA.

134

SAS Data Libraries•A SAS data library is a collection of SAS files that are recognized as a unit by SAS on your operating environment.

WORK - temporary library

SASUSER - permanent library

You can create and access your own permanent libraries.

mylib - permanent library

WORK

SASUSER

MYLIB

135

SAS FILE ORGANIZATION

File Type Extensionprogram file (EDITOR window) .sasSAS table (data set) .sas7bdatlog file (LOG window) .logListing file (OUTPUT) .lst