22
Introduction to SAS BIO 226 – Spring 2011

INTRODUCTION TO SAS

Embed Size (px)

Citation preview

Page 1: INTRODUCTION TO SAS

Introduction to SAS

BIO 226 – Spring 2011

Page 2: INTRODUCTION TO SAS

2

Outline• Windows and common rules• Getting the data

– The PRINT and CONTENTS Procedures• Basic SAS procedures

– The SORT Procedure– The MEANS Procedure– The UNIVARIATE Procedure– The FREQ Procedure– The CORR Procedure – The PLOT Procedure

• Manipulating the data, e.g., creating new variables

• Libraries• Output in Word document• References• Practice

Slides 3-7Slides 8-10

Slide 9

Slide 13Slides 14-15

Slide 15Slide 16Slide 16Slide 17

Slide 11-12Slide 18Slide 19Slide 20

Slides 21-22

Page 3: INTRODUCTION TO SAS

3

The different SAS windows• Explorer: contains SAS files and libraries

• Editor: where you can open or type SAS programs

• Log: stores details about your SAS session (code run, dataset created, errors...)

• Results: table of contents for output of programs

• Output: printed results of SAS programs

Page 4: INTRODUCTION TO SAS

4

Basic SAS rules (1)• Variable names must:

– be one to 32 characters in length– begin with letter (A-Z) or underscore (_)– continue with any combination of number, letters or underscores

• A variable’s type is either character or numeric

• Missing values: – missing character data is left blank– missing numeric data is denoted by a period (.)

Page 5: INTRODUCTION TO SAS

5

Basic SAS rules (2)• Two ways to make comments:

– * write comment here;– /* write comment here */

• SAS is insensitive to case

Page 6: INTRODUCTION TO SAS

6

Basic programming rules (1)• SAS programs are composed of statements: these are

organized in DATA steps and PROC steps– DATA step: gives dataset a name, manipulates dataset– PROC step: procedure or analysis you want SAS to carry out

• SAS reads code line by line and the end of a line is marked by a semicolon.

• All SAS programs end with RUN;

• Quotes can be single or double.

Page 7: INTRODUCTION TO SAS

7

Basic programming rules (2)• SAS statements are free-format:

– Can begin and end in any column– One statement can continue over several lines– Several statements can be on one line

• To submit program, highlight the code to run and click on the submit button (running silhouette)

Page 8: INTRODUCTION TO SAS

8

Loading data• If you have SAS data set (sasintro.sas7bdat) you can double

click on it and it will load itself.

• If you don’t have SAS data set (sasintro.txt), and the first row of your dataset contains the variable names, you can import it using File > Import Data… and specify the directory.

• Or you can use the following code:

DATA mydata;INFILE ‘g:\shared\bio226\sasintro.txt’;INPUT weight bmi id age activity education smoking;RUN;

• Setting your current directory: on the bottom line of the main SAS window, you should see it set to C:\WINDOWS\system32. Double click on it to change it.

Page 9: INTRODUCTION TO SAS

9

How to view the loaded data?• Go in the Explorer window, double click on Libraries,

then Work and sasintro.sas7bdat

• To view general information about the data set, like variables’ name and type:

PROC CONTENTS DATA=mydata;RUN;

• Use the PRINT procedure to view the first 25 records:

PROC PRINT DATA=mydata (OBS=25);RUN;

Page 10: INTRODUCTION TO SAS

10

Variables from sasintro.txt# Variable Type Unit

5 activity Num kcal/week

4 age Num years

2 bmi Num kg/m2

6 education Num years

3 id Num

7 smoking Num 1:current smoker, 0:non-smoker1 weight Num lbs

Page 11: INTRODUCTION TO SAS

11

Manipulating data (1)• selecting a subset of rows

DATA mydata_s;SET mydata;IF smoking=1;RUN;

• deleting a column (or columns)

DATA mydata2;SET mydata;DROP weight education;RUN;

Page 12: INTRODUCTION TO SAS

12

Manipulating data (2)• adding a column (or columns)

DATA mydata3;SET mydata;weight_kg=weight*0.453;IF age <= 60 THEN agegroup=1;ELSE IF age<=70 THEN agegroup=2;ELSE agegroup=3;/*drop age;*/RUN;

Page 13: INTRODUCTION TO SAS

13

Sorting dataPROC SORT DATA=mydata OUT=mydata4;BY ID age weight;

PROC PRINT DATA=mydata (OBS=5);PROC PRINT DATA=mydata4 (OBS=5);RUN;

Page 14: INTRODUCTION TO SAS

14

Summarizing data (1)• Summarizing weight:

PROC MEANS DATA=mydata;VAR weight;RUN;

• Summarizing weight in the youngest agegroup:

PROC MEANS DATA=mydata3;VAR weight;WHERE agegroup=1;RUN;

Page 15: INTRODUCTION TO SAS

15

Summarizing data (2)• Summarizing weight by smoking status (two possible codes):

PROC SORT DATA=mydata OUT=mydata5;BY smoking;PROC MEANS DATA=mydata5;VAR weight;BY smoking;RUN;PROC MEANS DATA=mydata;CLASS smoking;VAR weight;RUN;

• All these summarizing measures can be obtained with PROC UNIVARIATE also.

Page 16: INTRODUCTION TO SAS

16

Categorical data and correlation• Summarizing categorical data

PROC FREQ DATA=mydata3;TABLES smoking*agegroup /chisq exact;RUN;

• Examining correlation

PROC CORR DATA=mydata;VAR weight;WITH bmi age;RUN;

Page 17: INTRODUCTION TO SAS

17

Basic procedures: plots• BarchartsPROC CHART DATA=mydata3;VBAR agegroup /DISCRETE;RUN;

• ScatterplotPROC PLOT DATA=mydata3;PLOT bmi*weight='*';RUN;

• Histogram, Boxplot, Normal Probability PlotPROC UNIVARIATE DATA=mydata3 PLOT;VAR weight;RUN;

Page 18: INTRODUCTION TO SAS

18

/* Libraries */• A library is the directory where your SAS dataset is stored.

• The default library is named Work and stores your SAS datasets temporarily: they will be deleted when you end your SAS session

• If you want to save your SAS datasets and use them again later, create your own library:

LIBNAME SAS_Lab 'p:\BIO226\SAS';DATA SAS_Lab.mydata;INFILE ‘g:\shared\bio226\sasintro.txt’;INPUT weight bmi id age activity education smoking;RUN;

Page 19: INTRODUCTION TO SAS

19

SAS output and Word• To send you SAS output to a Word document:

ODS RTF FILE=‘p:output.RTF’ style=minimal;PROC CORR DATA =mydata;

VAR weight;WITH bmi age;RUN;

ODS RTF CLOSE;

• Other styles: Journal, Analysis, Statistical

Page 20: INTRODUCTION TO SAS

20

For further references• SAS9 Documentation on the Web: http://support.sas.com/onlinedoc/913/docMainpage.jsp

• Applied Statistics and the SAS Programming Language (5th Edition) Ron P. Cody and Jeffrey K. Smith

• The Little SAS Book, L.D. Delwiche and S.J. Slaughter

• See SAS_help.doc on course website

Page 21: INTRODUCTION TO SAS

21

Try your own• Find the summary statistics (mean, mode, standard

deviation,…) for education with PROC UNIVARIATE, as well as a histogram for years of education.

• Create a new variable educ_group which breaks years of education into four groups (0-10, 10-15,15-18,18-25). Put this new variable in a new data set and drop the education variable, as well as weight, bmi and age.

• Find the number of smokers per education group.

• Find the mean physical activity in each education group.

Page 22: INTRODUCTION TO SAS

22

Data name Descriptionmydata original imported data

mydata_s only smokers

mydata2 dropped weight, education

mydata3 added weight_kg, agegroup, dropped age

mydata4 sorted original data by age and weight

mydata5 sorted original data by smoking status

Recap of different datasets created