54
1 Copyright © 2012, SAS Institute Inc. All rights reserved. Innovative Techniques: Doing More with Loops and Arrays SAS Talks April 12, 2012 PLEASE STAND BY Today’s event will begin at 1:00pm EST. The audio portion of the presentation will be heard through your computer speakers. This is an automatic setup and is preferred. There will also be a limited option to listen through the telephone to 250 lines. If you would prefer to dial in, please call: US Toll-Free: 1-888-682-4285 Toll/International: +1-973-368-0695 Conference Code: 4675179# If you experience any technical difficulties, you may contact WebEx Technical Support at 866-229-3239. #sastalks

Loops and Arrays 12april

Embed Size (px)

DESCRIPTION

sas loops and arrays course

Citation preview

1

Copyright © 2012, SAS Institute Inc. All rights reserved.

Innovative Techniques: Doing More with Loops and Arrays SAS Talks April 12, 2012

PLEASE STAND BY

Today’s event will begin at 1:00pm EST.

The audio portion of the presentation will be heard through your computer

speakers.

This is an automatic setup and is preferred. There will also be a limited option to

listen through the telephone to 250 lines.

If you would prefer to dial in, please call:

US Toll-Free: 1-888-682-4285

Toll/International: +1-973-368-0695

Conference Code: 4675179#

If you experience any technical difficulties,

you may contact WebEx Technical Support

at 866-229-3239.

#sastalks

Copyright © 2012, SAS Institute Inc. All rights reserved.

Innovative Techniques: Doing More with Loops and Arrays SAS Talks April 12, 2012

3

Copyright © 2012, SAS Institute Inc. All rights reserved.

Speakers

Stacy Hobson

Director, Customer Loyalty and Retention SAS Institute

Art Carpenter

Author and Senior Consultant California Occidental Consultants

4

Innovative Techniques: Doing More with Loops and Arrays

Arthur L. Carpenter California Occidental Consultants

10606 Ketch Circle Anchorage, AK 99515

(907) 865-9167 [email protected] www.caloxy.com

5

INTRODUCTION Techniques involving the use of DO loops and arrays provide the programmer with powerful and diverse methods for solving a wide range of types of problems. This presentation assumes that the audience has at least a passing understanding of the: • types of DO loops and their basic syntax • ARRAY statement and its various forms

The primary objective of this presentation is to demonstrate the interaction of DO loops and arrays.

6

Shorthand Variable Naming

Variables with a common prefix and a numeric suffix

array vis {10} visit1 - visit10;

(2.6.1)

Can create variables.

Introduction

The colon operator can be used for Named Prefix lists

array vis {10} visit:;

7

Shorthand Variable Naming

Specialized name lists address variables by their type. _CHARACTER_ All character variables _NUMERIC_ All numeric variables _ALL_ All variables on the PDV Since each of these lists pertain to the current list of variables, they will not create variables. In each case the resulting list of variables will be in the same order as they are on the Program Data Vector.

(2.6.1)

Introduction

8

DO Loop Specifications (3.9.2)

Introduction

do count=1 to 3, 5 to 20 by 5, 26, 33; . . . end;

COUNT: 1, 2, 3, 5, 10, 15, 20, 26, 33

Character index value specifications

do month = 'Jan', 'Feb', 'Mar'; . . . end;

Compound loop specifications

9

Special DO Loop Forms (3.9.3)

Introduction

do count=1 to 3; . . . end;

COUNT exits the loop with a value of COUNT=3

do count=1 to 3 until(count=3); . . . end;

COUNT exits the loop with a value of COUNT=4

Incremented infinite loop with conditional exit

do k=1 by 1 until(x=5); . . . end; Notice there is no TO specification

10

ARRAY Statement Forms (3.10.1)

array list {3} aa bb cc; array list {1:3} aa bb cc; array list {0:2} aa bb cc; array vis {16} visit1-visit16; array vis {*} visit1-visit16; array visit {16} ; array nvar {*} _numeric_; array nvar {*} _character_; array clist {3} $2 aa bb cc; array clist {3} $1 ('a', 'b', 'c'); array clist {4:6} $1 ('a', 'b', 'c');

Introduction

Array dimension = number of elements

11

ARRAY Statement Forms (3.10.1)

array list {3} aa bb cc; array list {1:3} aa bb cc; array list {0:2} aa bb cc; array vis {16} visit1-visit16; array vis {*} visit1-visit16; array visit {16} ; array nvar {*} _numeric_; array nvar {*} _character_; array clist {3} $2 aa bb cc; array clist {3} $1 ('a', 'b', 'c'); array clist {4:6} $1 ('a', 'b', 'c');

Introduction

These two array dimensions are the same.

12

ARRAY Statement Forms (3.10.1)

array list {3} aa bb cc; array list {1:3} aa bb cc; array list {0:2} aa bb cc; array vis {16} visit1-visit16; array vis {*} visit1-visit16; array visit {16} ; array nvar {*} _numeric_; array nvar {*} _character_; array clist {3} $2 aa bb cc; array clist {3} $1 ('a', 'b', 'c'); array clist {4:6} $1 ('a', 'b', 'c');

LIST{1} references the variable BB

Introduction

Index range

13

ARRAY Statement Forms (3.10.1)

array list {3} aa bb cc; array list {1:3} aa bb cc; array list {0:2} aa bb cc; array vis {16} visit1-visit16; array vis {*} visit1-visit16; array visit {16} ; array nvar {*} _numeric_; array nvar {*} _character_; array clist {3} $2 aa bb cc; array clist {3} $1 ('a', 'b', 'c'); array clist {4:6} $1 ('a', 'b', 'c');

Introduction

Dimension is calculated by SAS

14

ARRAY Statement Forms (3.10.1)

array list {3} aa bb cc; array list {1:3} aa bb cc; array list {0:2} aa bb cc; array vis {16} visit1-visit16; array vis {*} visit1-visit16; array visit {16} ; array nvar {*} _numeric_; array nvar {*} _character_; array clist {3} $2 aa bb cc; array clist {3} $1 ('a', 'b', 'c'); array clist {4:6} $1 ('a', 'b', 'c');

The three array elements are initialized with these three character values.

Introduction

No variable list means that variables are created: CLIST1, CLIST2, CLIST3

15

Temporary Arrays (3.10.2)

array visdate {16} _temporary_; array list {5} _temporary_ (11,12,13,14,15); array list {5} _temporary_ (11:15); array list {6} _temporary_ (6*3); array list {6} _temporary_ (2*1:3);

Introduction

No Permanent variables are created.

The dimension must be specified.

16

Temporary Arrays (3.10.2)

array visdate {16} _temporary_; array list {5} _temporary_ (11,12,13,14,15); array list {5} _temporary_ (11:15); array list {6} _temporary_ (6*3); array list {6} _temporary_ (2*1:3);

Introduction

No Permanent variables are created.

Initial values are assigned as a list inside of parentheses.

17

Temporary Arrays (3.10.2)

array visdate {16} _temporary_; array list {5} _temporary_ (11,12,13,14,15); array list {5} _temporary_ (11:15); array list {6} _temporary_ (6*3); array list {6} _temporary_ (2*1:3);

Introduction

No Permanent variables are created.

(3,3,3,3,3,3)

Initial values are assigned as a list inside of parentheses.

18

Temporary Arrays (3.10.2)

array visdate {16} _temporary_; array list {5} _temporary_ (11,12,13,14,15); array list {5} _temporary_ (11:15); array list {6} _temporary_ (6*3); array list {6} _temporary_ (2*1:3);

Introduction

No Permanent variables are created.

(3,3,3,3,3,3)

(1,2,3,1,2,3)

Initial values are assigned as a list inside of parentheses.

19

TRANSPOSING DATA This task is a simple introduction to the relationship between the DO loop and the array defined by the ARRAY statement.

(2.4)

Introduction

20

TRANSPOSING DATA Most, but not all, SAS procedures prefer to operate against normalized data, which tends to be tall and narrow, and often contains classification variables that are used to identify individual rows. Data in non-normal form tends to have one column for each level of one of the classification variables. Converting between the two forms can be accomplished using PROC TRANSPOSE or from within the DATA step. This is one of the 'classic' uses of arrays in conjunction with DO loops.

(2.4)

Introduction

21

TRANSPOSING DATA Most, but not all, SAS procedures prefer to operate against normalized data, which tends to be tall and narrow, and often contains classification variables that are used to identify individual rows. Data in non-normal form tends to have one column for each level of one of the classification variables. Converting between the two forms can be accomplished using PROC TRANSPOSE or from within the DATA step.

(2.4)

Normal Form

Obs SUBJECT VISIT sodium

1 208 1 13.7

2 208 2 14.1

3 208 4 14.1

4 208 5 14.1

5 208 6 13.9

6 208 7 13.9

7 208 8 14.0

8 208 9 14.0

9 208 10 14.0

10 209 1 14.0

. . . . portions of the table are not shown . . . .

Introduction

22

Transposing Rows to Columns (2.4.2)

Normal Form

Obs SUBJECT VISIT sodium

1 208 1 13.7

2 208 2 14.1

3 208 4 14.1

4 208 5 14.1

5 208 6 13.9

6 208 7 13.9

7 208 8 14.0

8 208 9 14.0

9 208 10 14.0

10 209 1 14.0

. . . . portions of the table are not shown . . . .

S v v v v v v v

U v v v v v v v v v i i i i i i i

B i i i i i i i i i s s s s s s s

J s s s s s s s s s i i i i i i i

O E i i i i i i i i i t t t t t t t

b C t t t t t t t t t 1 1 1 1 1 1 1

s T 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

1 208 13.7 14.1 . 14.1 14.1 13.9 13.9 14 14 14.0 . . . . . .

2 209 14.0 14.0 . 13.9 14.2 14.5 13.8 14 . 13.8 14 14.1 14.2 14.1 14 14.1

. . . . portions of the table are not shown . . . .

23

Transposing Rows to Columns

Commonly the process of transposing will involve the use of an array and an iterative DO loop.

data lab_nonnormal(keep=subject visit1-visit16); set lab_chemistry(keep=subject visit sodium); by subject; retain visit1-visit16 ; array visits {16} visit1-visit16; if first.subject then do i = 1 to 16;

visits{i} = .; end; visits{visit} = sodium; if last.subject then output lab_nonnormal;

run;

(2.4.2)

VISITS will become columns

Assign the value

SUBJECT remains as a CLASS variable. Introduction

24

Transposing Rows to Columns

Commonly the process of transposing will involve the use of an array and an iterative DO loop.

data lab_nonnormal(keep=subject visit1-visit16); set lab_chemistry(keep=subject visit sodium); by subject; retain visit1-visit16 ; array visits {16} visit1-visit16; if first.subject then do i = 1 to 16;

visits{i} = .; end; visits{visit} = sodium; if last.subject then output lab_nonnormal;

run;

(2.4.2)

Introduction

25

Transposing Columns to Rows

Assume that the visits are to become the classification variable.

data lab_normal(keep=subject visit sodium); set lab_nonnormal(keep=subject visit:);

by subject; array visits {16} visit1-visit16; do visit = 1 to 16; sodium = visits{visit}; output lab_normal;

end; run;

(2.4.2)

VISITS exist as columns

Use visit number as the array index.

The OUTPUT statement is inside of the DO loop.

Introduction

26

Transposing Columns to Rows (2.4.2)

S v v v v v v v

U v v v v v v v v v i i i i i i i

B i i i i i i i i i s s s s s s s

J s s s s s s s s s i i i i i i i

O E i i i i i i i i i t t t t t t t

b C t t t t t t t t t 1 1 1 1 1 1 1

s T 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

1 208 13.7 14.1 . 14.1 14.1 13.9 13.9 14 14 14.0 . . . . . .

2 209 14.0 14.0 . 13.9 14.2 14.5 13.8 14 . 13.8 14 14.1 14.2 14.1 14 14.1

. . . . portions of the table are not shown . . . .

Obs SUBJECT visit sodium

1 208 1 13.7

2 208 2 14.1

3 208 3 .

4 208 4 14.1

5 208 5 14.1

6 208 6 13.9

7 208 7 13.9

8 208 8 14.0

9 208 9 14.0

10 208 10 14.0

11 208 11 .

12 208 12 .

13 208 13 .

14 208 14 .

15 208 15 .

16 208 16 .

17 209 1 14.0

. . . . portions of the table are not shown . . . .

Introduction

27

ARRAY FUNCTIONS Three functions have been designed to work with arrays. DIM – return the dimension of the array

(3.10.3)

Introduction

data newchem(drop=i); set advrpt.lab_chemistry (drop=visit labdt); array chem {*} _numeric_; do i=1 to dim(chem);

chem{i} = chem{i}/100; end; run;

The number of numeric variables determines the array dimension.

28

ARRAY FUNCTIONS Three functions have been designed to work with arrays. DIM – return the dimension of the array

(3.10.3)

Introduction

data newchem(drop=i); set advrpt.lab_chemistry (drop=visit labdt); array chem {*} _numeric_; do i=1 to dim(chem);

chem{i} = chem{i}/100; end; run;

the DIM function returns the dimension

Although it does in this example, the index does not always vary from 1 to the dimension.

29

ARRAY FUNCTIONS LBOUND and HBOUND – Lower and upper array bounds

(3.10.3)

Introduction

data CloseHT; array heights {&lb:&hb} _temporary_;

do until(done); set advrpt.demog(keep=subject ht) end=done; heights(subject)=ht;

end; done=0; do until(done); set advrpt.demog(keep=subject ht) end=done; do Hsubj = lbound(heights) to hbound(heights); closeHT = heights{hsubj}; if (ht-1 le closeht le ht+1)

& (subject ne hsubj) then output closeHT; end; end; stop;

run;

Array bounds unknown to the programmer

30

ARRAY FUNCTIONS LBOUND and HBOUND – Lower and upper array bounds

(3.10.3)

Introduction

data CloseHT; array heights {&lb:&hb} _temporary_;

do until(done); set advrpt.demog(keep=subject ht) end=done; heights(subject)=ht;

end; done=0; do until(done); set advrpt.demog(keep=subject ht) end=done; do Hsubj = lbound(heights) to hbound(heights); closeHT = heights{hsubj}; if (ht-1 le closeht le ht+1)

& (subject ne hsubj) then output closeHT; end; end; stop;

run;

The array HEIGHTS is loaded.

Notice the use of ( ) instead of { }. Don't use ( ).

31

ARRAY FUNCTIONS LBOUND and HBOUND – Lower and upper array bounds

(3.10.3)

Introduction

data CloseHT; array heights {&lb:&hb} _temporary_;

do until(done); set advrpt.demog(keep=subject ht) end=done; heights(subject)=ht;

end; done=0; do until(done); set advrpt.demog(keep=subject ht) end=done; do Hsubj = lbound(heights) to hbound(heights); closeHT = heights{hsubj}; if (ht-1 le closeht le ht+1)

& (subject ne hsubj) then output closeHT; end; end; stop;

run;

Step through the array one element at a time

Read the data again

32

WORKING ACROSS OBSERVATIONS Because SAS reads one observation at a time into the PDV, it is difficult to ‘remember’ the values from an earlier observation (look-back) or to anticipate the values of a future observation (look-ahead). Without doing something extra, only the current observation is available for use.

(3.1)

33

Processing Within Groups The problems inherent with single observation processing are especially apparent when we need to work with our data in groups. The BY statement can be used to define groups, but the detection and handling of group boundaries is still an issue. Fortunately there is more than one approach to this type of processing.

(3.1.1)

Across Observations

34

BY Group Processing; First. and Last.

Count clinics and patient visits within each region.

(3.1.1)

data counter(keep=region clincnt patcnt); set regions(keep=region clinnum); by region clinnum; if first.region then do;

clincnt=0; patcnt=0; end; if first.clinnum then clincnt + 1; patcnt+1;

if last.region then output;

run;

Initialize the counters

Count items of interest (increment counters)

Write totals

Across Observations

35

BY Group Processing; First. and Last.

Count clinics and patient visits within each region.

(3.1.1)

data counter(keep=region clincnt patcnt); set regions(keep=region clinnum); by region clinnum; if first.region then do;

clincnt=0; patcnt=0; end; if first.clinnum then clincnt + 1; patcnt+1;

if last.region then output;

run;

clincnt + first.clinnum;

Across Observations

Across Observations 36

Transposing to Temporary Arrays

Used to handle more than one observation at a time.

(3.1.1)

data labvisits(keep=subject count meanlength); set advrpt.lab_chemistry; by subject; array Vdate {16} _temporary_;

retain totaldays count 0; if first.subject then do;

totaldays=0; count = 0; do i = 1 to 16; vdate{i}=.; end; end;

Temporary array to hold values of interest.

Initialize the counters

Across Observations 37

Transposing to Temporary Arrays

Used to handle more than one observation at a time.

(3.1.2)

vdate{visit} = labdt;

if last.subject then do;

do i = 1 to 15; between = vdate{i+1}-vdate{i};

if between ne . then do; totaldays = totaldays+between;

count = count+1; end; end; meanlength = totaldays/count;

output; end; run;

Load the array.

Process across values

Write the mean for this subject.

38

Transposing to Arrays

Code comment to clear an array of values.

(3.1.2)

do i = 1 to 16; vdate{i}=.; end;

call missing(of vdate{*}); The MISSING routine assigns missing values to its arguments.

Across Observations

39

Building a FIFO Stack

When processing across a series of observations for the calculation of statistics, such as running averages, a stack can be helpful. A stack is a collection of values that have automatic entrance and exit rules. Values tend to rotate through a stack Stacks come in two basic flavors; First-In-First-Out, FIFO, and Last-In-First-Out, LIFO. In a FIFO stack the oldest value in the stack is removed to make room for the newest value.

(3.1.7)

Across Observations

40

Building a FIFO Stack

A three day moving average of potassium levels is to be calculated for each subject.

(3.1.7)

data Average(keep=subject visit labdt potassium Avg3day); set labdates; by subject; * dimension of array is number of * items to be averaged; retain visitcnt .; array stack {0:2} _temporary_;

Index the array starting at 0

Across Observations

41

Building a FIFO Stack

A three day moving average of potassium levels is to be calculated for each subject.

(3.1.7)

if first.subject then do; call missing(of stack{*});

visitcnt=0; end; visitcnt+1; index = mod(visitcnt,3); stack{index} = potassium; avg3day = mean(of stack{*});

run;

Clear the stack for each subject.

The MOD function

determines the array INDEX.

Number of items in the stack.

Across Observations

42

Using Set Statement Options Although a majority of DATA steps use the SET statement, few programmers take advantage of its full potential. The SET statement has options that can be used to control how the data are to be read. END= used to detect the last observation from the incoming data set(s) KEY= specifies a index to be used when reading INDSNAME= used to identify the current data source NOBS= number of observations OPEN= determines when to open a data set POINT= designates the next observation to read UNIQUE used with KEY= to read from the top of the index

(3..8)

43

Using POINT= and NOBS= The SET statement by default reads one observation after another, first observation to last. The POINT= option makes it possible to perform a non-sequential read. The POINT= option identifies a temporary variable that indicates the number of the next observation to read. The NOBS= option also identifies a temporary variable, which after DATA step compilation will hold the number of observations on the incoming data set.

(3.8.1)

SET Statement Options

44

Using POINT= and NOBS= Randomly read a subset of observations from &DSN.

(3.8.1)

%macro rand_wo(dsn=,pcnt=0); data rand_wo(drop=cnt totl); totl = ceil(&pcnt*obscnt); array obsno {10000} _temporary_;

do until(cnt = totl); point = ceil(ranuni(0)*obscnt); if obsno{point} ne 1 then do; set &dsn point=point nobs=obscnt;

output; obsno{point}=1;

cnt+1; end; end; stop;

run; %mend rand_wo; %rand_wo(dsn=advrpt.demog,pcnt=.3)

Total observations to read.

Randomly select the

next observation to read.

The value of

POINT is the next observation to read.

Flag this observation,

it has now been read.

45

Using the DOW Loop The DOW loop, which is also known as the DO-W loop, was named for Ian Whitlock who popularized the technique and first demonstrated its efficiencies.

(3.9.1)

data implied;

set big; output implied; run;

The DATA step has an implied loop. The loop is executed once for each incoming observation.

data dowloop; do until(eof); set big end=eof;

output dowloop; end; stop;

run;

The implicit loop is replaced with an explicit one (here a DO UNTIL).

Terminate the step with a STOP.

46

Using the DOW Loop Merge a single observation summary data set onto the analysis data. Calculate percent change.

(3.9.1)

proc summary data=advrpt.demog; var wt; output out=means mean=/autoname; run; data Diff1; if _n_=1 then set means(keep=wt_mean); set advrpt.demog(keep=lname fname wt);

diff = (wt-wt_mean)/wt_mean; run;

Use the IF to control reading

the summary data set.

DOW Loop

47

Using the DOW Loop Merge a single observation summary data set onto the analysis data. Calculate percent change, using a DOW loop to merge.

(3.9.1)

data Diff2; set means(keep=wt_mean); do until(eof);

set advrpt.demog(keep=lname fname wt) end=eof;

diff = (wt-wt_mean)/wt_mean; output diff2; end; stop;

run;

The IF is not needed.

Terminate the DATA step.

DOW Loop

48

Key Indexing – A Simple Hash Use an array to hold values. Neither data set needs to be sorted.

(6.7.2)

data clinnames(keep=subject lname fname clinnum clinname); array chkname {999999} $35 _temporary_; do until(allnames);

set advrpt.clinicnames end=allnames; chkname{input(clinnum,6.)}=clinname;

end; do until(alldemog); set advrpt.demog(keep=subject lname fname clinnum)

end=alldemog; clinname = chkname{input(clinnum,6.)};

output clinnames; end; stop;

run;

data clinnames; merge demog clinicnames; by clinnum; run;

Read and store the names, indexed by clinic number.

Read a number and retieve its corresponding name from the array.

49

Use a Hash Object Hash objects also avoid the need to sort.

(6.7.2)

data hashnames(keep=subject clinnum clinname lname fname); if 0 then set advrpt.clinicnames; declare hash lookup(dataset: 'advrpt.clinicnames', hashexp: 8); lookup.defineKey('clinnum'); lookup.defineData('clinname');

lookup.defineDone(); * Read the primary data; do until(done); set advrpt.demog(keep=subject clinnum lname fname) end=done; if lookup.find() = 0 then output hashnames;

end; stop;

run;

data hashnames; merge demog clinicnames; by clinnum; run;

Load the names in the LOOKUP hash object, indexed by clinic number.

For each number retrieve the name.

DOW Loop

Many of the examples in this talk are based on material found in the new SAS Press book:

Carpenter's Guide to Innovative SAS® Techniques by Art Carpenter and are used with permission of the author.

50

51

Innovative Techniques: Doing More with Loops and Arrays

Arthur L. Carpenter California Occidental Consultants

10606 Ketch Circle Anchorage, AK 99515

(907) 865-9167 [email protected] www.caloxy.com

52

Copyright © 2012, SAS Institute Inc. All rights reserved.

Q & A

53

Copyright © 2012, SAS Institute Inc. All rights reserved.

Additional Resources

Carpenter's Guide to Innovative SAS Techniques

Upcoming Live Webinars

April 25: Live at SAS Global Forum: Building Better Business Intelligence with SAS®

May 10: Social Networks in Data Mining: Challenges and Applications

Upcoming Live Events

2012 SAS Global Forum – livestream of the conference

» April 22 – 25, 2012

SAS Talks on support.sas.com

Follow along on Twitter using #sastalks

Copyright © 2011, SAS Institute Inc. All rights reserved.

support.sas.com