39
The Power of The Power of the the BY BY Statement Statement SVSUG 2009.06.25 SVSUG 2009.06.25 Paul Choate, California Developmental Services Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical (& Toby Dunn, U.S. Army Medical Department Center & School) Department Center & School)

The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Embed Size (px)

Citation preview

Page 1: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

The Power of The Power of the the BYBY Statement Statement

SVSUG 2009.06.25SVSUG 2009.06.25

Paul Choate, California Developmental ServicesPaul Choate, California Developmental Services

(& Toby Dunn, U.S. Army Medical (& Toby Dunn, U.S. Army Medical Department Center & School)Department Center & School)

Page 2: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

BY Statement Syntax and UsageBY Statement Syntax and Usage

The BY statement is used in SAS to instruct the DATA step or The BY statement is used in SAS to instruct the DATA step or procedures to process dataset observations in groups, rather than procedures to process dataset observations in groups, rather than singly. It can be used whenever SAS data is ordered, or can be singly. It can be used whenever SAS data is ordered, or can be accessed in order through a SAS dataset index. accessed in order through a SAS dataset index.

In the DATA step this allows observations to be summarized or In the DATA step this allows observations to be summarized or reorganized according to a group structure. In PROC steps it reorganized according to a group structure. In PROC steps it allows SAS to process and present data in groups.allows SAS to process and present data in groups.

The basic syntax of the BY statement is the same throughout SAS, The basic syntax of the BY statement is the same throughout SAS, with the exception that the GROUPFORMAT option is only with the exception that the GROUPFORMAT option is only available in the DATA step.available in the DATA step.

BY <DESCENDING> var1 <...<DESCENDING> varn> BY <DESCENDING> var1 <...<DESCENDING> varn> <NOTSORTED> <GROUPFORMAT>;<NOTSORTED> <GROUPFORMAT>;

Page 3: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

BY Statement Syntax and UsageBY Statement Syntax and Usage

BY Sex Age Name;BY Sex Age Name;

NAME SEX AGE HEIGHT WEIGHTNAME SEX AGE HEIGHT WEIGHT

Alice F 13 56.5 84Alice F 13 56.5 84

Barbara F 13 65.3 98Barbara F 13 65.3 98

Carol F 14 62.8 102.5Carol F 14 62.8 102.5

Judy F 14 64.3 90Judy F 14 64.3 90

Jeffrey M 13 62.5 84Jeffrey M 13 62.5 84

Alfred M 14 69 112.5Alfred M 14 69 112.5

Ronald M 15 67 133Ronald M 15 67 133

Philip M 16 72 150Philip M 16 72 150

Page 4: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

BY Statement Syntax and UsageBY Statement Syntax and Usage

Sort orderSort order is platform dependent and is based on is platform dependent and is based on the internal ordering of the platform character the internal ordering of the platform character set, called the set, called the collating sequencecollating sequence. .

ASCIIASCII (PC) character set order: (PC) character set order:

..., 1, 2, 3, ... A, B, C, ... a, b, c ... ..., 1, 2, 3, ... A, B, C, ... a, b, c ...

EBCDICEBCDIC (MVS) character set order: (MVS) character set order:

..., a, b, c, ... A, B, C, ... 1, 2, 3, ... ..., a, b, c, ... A, B, C, ... 1, 2, 3, ...

Page 5: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

BY Statement Syntax and UsageBY Statement Syntax and Usage

BY Sex BY Sex DESCENDINGDESCENDING Age Name; Age Name;

NAME SEX AGE HEIGHT WEIGHTNAME SEX AGE HEIGHT WEIGHT

Janet F 15 62.5 112.5Janet F 15 62.5 112.5

Carol F 14 62.8 102.5Carol F 14 62.8 102.5

Alice F 13 56.5 84Alice F 13 56.5 84

Barbara F 13 65.3 98Barbara F 13 65.3 98

Philip M 16 72 150Philip M 16 72 150

Alfred M 14 69 112.5Alfred M 14 69 112.5

Henry M 14 63.5 102.5Henry M 14 63.5 102.5

Page 6: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

BY Statement Syntax and UsageBY Statement Syntax and Usage

BY Age BY Age NOTSORTEDNOTSORTED;;

NAME AGE HEIGHT WEIGHTNAME AGE HEIGHT WEIGHT

Carol 14 62.8 102.5Carol 14 62.8 102.5

Judy 14 64.3 90Judy 14 64.3 90

Janet 15 62.5 112.5Janet 15 62.5 112.5

Ronald 15 67 133Ronald 15 67 133

Mary 15 66.5 112Mary 15 66.5 112

Alice 13 56.5 84Alice 13 56.5 84

Jeffrey 13 62.5 84Jeffrey 13 62.5 84

Page 7: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

BY Statement Syntax and UsageBY Statement Syntax and Usage

PROC FORMAT;PROC FORMAT; VALUE $Initials 'A'-<'B'='A'VALUE $Initials 'A'-<'B'='A' 'B'-<'C'='B''B'-<'C'='B' ......RUN;RUN;

DATA Class;DATA Class; SET Class;SET Class; FORMAT Name $Initials.;FORMAT Name $Initials.; BY Name BY Name GROUPFORMATGROUPFORMAT; ; RUN;RUN;

Page 8: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

BY Statement Syntax and UsageBY Statement Syntax and Usage

GROUPFORMAT GROUPFORMAT (cont.)(cont.)

NAME AGE HEIGHT WEIGHTNAME AGE HEIGHT WEIGHT

Alice 13 56.5 84Alice 13 56.5 84

Alfred 14 69 112.5Alfred 14 69 112.5

Judy 14 64.3 90Judy 14 64.3 90

Janet 15 62.5 112.5Janet 15 62.5 112.5

Jeffrey 13 62.5 84Jeffrey 13 62.5 84

William 15 66.5 112William 15 66.5 112

Page 9: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Introduction to Data StructureIntroduction to Data Structure

Variables may be divided into two classes: Variables may be divided into two classes:

• primary key variablesprimary key variables, whose values may be , whose values may be combined uniquely to identify one observation or combined uniquely to identify one observation or event, and event, and

• non-primary keysnon-primary keys, whose values cannot be , whose values cannot be combined to uniquely identify an observation. combined to uniquely identify an observation.

The primary and non-primary keys are all related to The primary and non-primary keys are all related to each other in some fashion known as each other in some fashion known as functional functional dependenciesdependencies. .

Primary keys must be uniquePrimary keys must be unique or or form unique form unique combinationscombinations called called ccomposite keysomposite keys..

Page 10: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Introduction to Data StructureIntroduction to Data Structure

The most fundamental rule is that The most fundamental rule is that no two rows shall have no two rows shall have the same unique values for all primary key variablesthe same unique values for all primary key variables. .

VEHICLETYPE MODEL MAKE YEAR COLORVEHICLETYPE MODEL MAKE YEAR COLORTruck 1500 Chevy 2008 BlueTruck 1500 Chevy 2008 BlueTruck 1500 Chevy 2008 BlueTruck 1500 Chevy 2008 Blue

should be reduced to:should be reduced to:

VEHICLETYPE MODEL MAKE YEAR COLOR COUNTVEHICLETYPE MODEL MAKE YEAR COLOR COUNTTruck 1500 Chevy 2008 Blue 2Truck 1500 Chevy 2008 Blue 2

Page 11: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Introduction to Data StructureIntroduction to Data Structure

Each variable in the dataset should have Each variable in the dataset should have atomic valuesatomic values..

VEHICLE MODEL YEAR COLOR NUMSOLD PACKAGEVEHICLE MODEL YEAR COLOR NUMSOLD PACKAGETruck 1500 2008 Blue 2 Sports, StandardTruck 1500 2008 Blue 2 Sports, StandardTruck 1500 2008 Gold 3 Sports, Sports, StandardTruck 1500 2008 Gold 3 Sports, Sports, Standard

should be restructured as:should be restructured as:

VEHICLE MODEL YEAR COLOR NUMSOLD PACKAGEVEHICLE MODEL YEAR COLOR NUMSOLD PACKAGETruck 1500 2008 Blue 1 SportsTruck 1500 2008 Blue 1 SportsTruck 1500 2008 Blue 1 StandardTruck 1500 2008 Blue 1 StandardTruck 1500 2008 Gold 2 SportsTruck 1500 2008 Gold 2 SportsTruck 1500 2008 Gold 1 StandardTruck 1500 2008 Gold 1 Standard

This is called This is called First Normal Form with RedundanciesFirst Normal Form with Redundancies..

Page 12: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

BY Statement in the Data StepBY Statement in the Data Step

The BY statement provides two automatic temporary The BY statement provides two automatic temporary variables for each BY variable: variables for each BY variable: FIRST.variableFIRST.variable and and LAST.variableLAST.variable..

They indicate whether an observation is:They indicate whether an observation is:

• the first in a BY groupthe first in a BY group• the last in a BY groupthe last in a BY group• neither the first nor the last in a BY groupneither the first nor the last in a BY group• both first and last, as is the case when there is only both first and last, as is the case when there is only

one observation in a BY group.one observation in a BY group.

Page 13: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

BY Statement in the Data StepBY Statement in the Data Step

SEXSEX AGEAGE FIRST.SEXFIRST.SEX LAST.SEXLAST.SEX FIRST.AGEFIRST.AGE LAST.AGELAST.AGE

FF 1313 11 00 11 00

FF 1313 00 00 00 11

FF 1515 00 00 11 00

FF 1515 00 11 00 11

MM 1313 11 00 11 11

MM 1414 00 00 11 00

MM 1414 00 00 00 11

M M 1616 00 11 11 11

BY Sex Age;BY Sex Age;

Page 14: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

BY Statement in the Data StepBY Statement in the Data Step

SEXSEX AGEAGE FIRST.SEXFIRST.SEX LAST.SEXLAST.SEX FIRST.AGEFIRST.AGE LAST.AGELAST.AGE

FF 1313 11 00 11 11

FF 1515 00 11 11 11

MM 1313 11 00 11 11

MM 1414 00 00 11 11

M M 1616 00 11 11 11

Sorted variables with unique values have all FIRST.variable and LAST.variables Sorted variables with unique values have all FIRST.variable and LAST.variables set to 1set to 1. .

Here Age is unique within Sex:Here Age is unique within Sex:

BY Sex Age;BY Sex Age;

Page 15: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

BY Statement in the Data StepBY Statement in the Data Step

Examples:Examples:

• Unduplication exampleUnduplication example

• Counting records exampleCounting records example

Page 16: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Combining DatasetsCombining Datasets

In the DATA step the BY statement is In the DATA step the BY statement is used for combining data with:used for combining data with:

• InterleavingInterleaving with the with the SETSET statement statement• Match-mergingMatch-merging with the with the MERGEMERGE statement statement• UpdatingUpdating with the with the UPDATEUPDATE statement statement

andand• ModifyingModifying (beyond scope of presentation)(beyond scope of presentation)

Page 17: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Interleaving DatasetsInterleaving Datasets

When a BY statement is used with a SET statement that When a BY statement is used with a SET statement that specifies specifies two or more datasetstwo or more datasets, the DATA step reads , the DATA step reads the files simultaneously, alternating between the the files simultaneously, alternating between the files based on the BY variable order. This files based on the BY variable order. This maintains maintains the sort order of the data the sort order of the data from the datasets as they from the datasets as they are processed.are processed.

For example, suppose there are two datasets, one for For example, suppose there are two datasets, one for males and one for females, and both are sorted on males and one for females, and both are sorted on Age. They can be interleaved into a single dataset Age. They can be interleaved into a single dataset sorted on Age and Gender. sorted on Age and Gender.

Page 18: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Interleaving DatasetsInterleaving Datasets

Example:Example:

• SET statement interleaving exampleSET statement interleaving example

Page 19: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Interleaving DatasetsInterleaving Datasets

With With interleavinginterleaving, the sum of a variable for , the sum of a variable for each by-group may be attached back to the each by-group may be attached back to the original non-aggregated dataset.original non-aggregated dataset.

This requires at least two passes of the data, This requires at least two passes of the data, but the efficiency and complexity may vary but the efficiency and complexity may vary considerably based on the approach. considerably based on the approach.

Page 20: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Interleaving DatasetsInterleaving Datasets

Example:Example:

• Howard Shreier look-ahead processingHoward Shreier look-ahead processing

Page 21: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Wookie One-LinersWookie One-LinersYou?! It was your idea for Jar Jar?! You?! It was your idea for Jar Jar?!

And Lando never suggested a flea And Lando never suggested a flea bath again.bath again.

"I just need one head to finish my "I just need one head to finish my C3PO"C3PO"

Allright, allright. I promise: No more Allright, allright. I promise: No more Colt-45 commercials! Colt-45 commercials!

What do you mean,"We're OUT of What do you mean,"We're OUT of shampoo??!!!!" shampoo??!!!!"

Page 22: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Match-Merging Datasets Match-Merging Datasets

When a BY statement is used with a MERGE statement, When a BY statement is used with a MERGE statement, the SAS datasets are read simultaneously, merging the SAS datasets are read simultaneously, merging observations based on matching BY variables.observations based on matching BY variables.

When merging multiple datasets, usually When merging multiple datasets, usually at least all but at least all but one of the datasets should be uniqueone of the datasets should be unique on the BY on the BY variables.variables.

The combined unique observations are merged with The combined unique observations are merged with each matching observation in the non-unique each matching observation in the non-unique dataset. The unique observations are duplicated dataset. The unique observations are duplicated across the non-unique observations. across the non-unique observations.

Page 23: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Match-Merging Datasets Match-Merging Datasets

Example:Example:

• MERGE statementMERGE statement

Page 24: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Updating DatasetsUpdating Datasets

The UPDATE statement only allows two datasets, a The UPDATE statement only allows two datasets, a mastermaster dataset and a dataset and a transactiontransaction dataset. The dataset. The master dataset is specified first and the transaction master dataset is specified first and the transaction dataset second, followed by a BY statement. dataset second, followed by a BY statement.

As with MERGE, the two datasets are read As with MERGE, the two datasets are read simultaneously, updating observations from the simultaneously, updating observations from the master dataset with observations from the master dataset with observations from the transaction dataset based on the lowest level transaction dataset based on the lowest level groupings of the BY variables.groupings of the BY variables.

When a transaction variable has a missing value, by When a transaction variable has a missing value, by default UPDATE default UPDATE does not overwrite the value in the does not overwrite the value in the master datasetmaster dataset, whereas the MERGE statement does., whereas the MERGE statement does.

Page 25: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Updating DatasetsUpdating Datasets

Examples:Examples:

• Updating prices in an inventoryUpdating prices in an inventory

• Flattening a datasetFlattening a dataset

Page 26: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Do-Loop of Whitlock (DoW)Do-Loop of Whitlock (DoW)

The SET statement may be wrapped inside a DO UNTIL The SET statement may be wrapped inside a DO UNTIL loop with the BY statement controlling the loop. loop with the BY statement controlling the loop.

DATA ...;DATA ...;

<Stuff done before break-event>;<Stuff done before break-event>;

DO <Index Specs> UNTIL <Break-Event>;DO <Index Specs> UNTIL <Break-Event>;

SET ...;SET ...;

By ...;By ...;

<Stuff done for each record>;<Stuff done for each record>;

END;END;

<Stuff done after break-event...>;<Stuff done after break-event...>;

RUN;RUN;

Page 27: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Do-Loop of Whitlock (DoW)Do-Loop of Whitlock (DoW)

The DoW works with the The DoW works with the natural executionnatural execution of of the DATA step by isolating what happens the DATA step by isolating what happens between two consecutive break events. between two consecutive break events.

Statements and functions are placed within the Statements and functions are placed within the loop, and the implicit action of the DATA step loop, and the implicit action of the DATA step resets calculated values to missingresets calculated values to missing after after each BY group. each BY group.

In our example the break events are BY groups, In our example the break events are BY groups, but in other cases but in other cases could be anything that could be anything that triggerstriggers the DO loop to stop. the DO loop to stop.

Page 28: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Do-Loop of Whitlock (DoW)Do-Loop of Whitlock (DoW)

Examples:Examples:

• Standard DATA stepStandard DATA step

• Whitlock/Dorfman DoWWhitlock/Dorfman DoW

• Sequential DoWs Sequential DoWs

Page 29: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

The BY Statement in The BY Statement in SAS ProceduresSAS Procedures

Nearly all SAS PROCs that process datasets allow for Nearly all SAS PROCs that process datasets allow for the BY statement.the BY statement.

The syntax is the same as in the DATA step, except for The syntax is the same as in the DATA step, except for the GROUPFORMAT option which is only available to the GROUPFORMAT option which is only available to the DATA step.the DATA step.

Procedures that produce printed output, such as Procedures that produce printed output, such as PROC PRINT, format printed output into BY groups.PROC PRINT, format printed output into BY groups.

Procedures that summarize datasets, like PROC FREQ Procedures that summarize datasets, like PROC FREQ or PROC SUMMARY process the data in groups, or PROC SUMMARY process the data in groups, sometimes as an alternative to other statements sometimes as an alternative to other statements such as TABLES or CLASS. such as TABLES or CLASS.

Page 30: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

The PRINT ProcedureThe PRINT Procedure

PROC PRINT writes dataset values in columnar PROC PRINT writes dataset values in columnar table form with the variable names or labels table form with the variable names or labels at the top of each column. at the top of each column.

The BY statement, and the related PAGEBY and The BY statement, and the related PAGEBY and SUMBY statements can be used with PROC SUMBY statements can be used with PROC PRINT. PRINT.

Page 31: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

The PRINT ProcedureThe PRINT Procedure

Examples:Examples:

• BY statementBY statement

• BY statement with ID statement BY statement with ID statement

• PAGEBY statementPAGEBY statement

• SUMBY statementSUMBY statement

Page 32: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

The FREQ ProcedureThe FREQ Procedure

The FREQ procedure calculates frequencies and statistics The FREQ procedure calculates frequencies and statistics on discrete variables. These can be printed or output.on discrete variables. These can be printed or output.

Levels of a tabulation are requested with a TABLES Levels of a tabulation are requested with a TABLES statement, or for sorted variables with a BY statement. statement, or for sorted variables with a BY statement.

PROC FREQ does not show rows or columns for missing PROC FREQ does not show rows or columns for missing categories of a variable in a BY group, but in the TABLE categories of a variable in a BY group, but in the TABLE statement the row or column is zero filled.statement the row or column is zero filled.

The BY and TABLE statements produce different statistics The BY and TABLE statements produce different statistics for tabulation levels with missing categories. for tabulation levels with missing categories.

Page 33: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

The FREQ ProcedureThe FREQ Procedure

Examples:Examples:

• PROC FREQ with the TABLES statementPROC FREQ with the TABLES statement

• PROC FREQ with the BY statementPROC FREQ with the BY statement

Page 34: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

The SUMMARY or The SUMMARY or MEANS ProcedureMEANS Procedure

In PROC SUMMARY and PROC MEANS the BY statement In PROC SUMMARY and PROC MEANS the BY statement is an alternate to the CLASS statement. is an alternate to the CLASS statement.

All permutations of levels of CLASS variables are All permutations of levels of CLASS variables are summarized. For three class variables A, B, and C, summarized. For three class variables A, B, and C, statistics are calculated for the overall data and all levels statistics are calculated for the overall data and all levels of A, B, C, A*B, A*C, B*C, and A*B*C. of A, B, C, A*B, A*C, B*C, and A*B*C.

Sorted variables may be alternatively specified in a BY Sorted variables may be alternatively specified in a BY statement, but only permutations including that variable statement, but only permutations including that variable will be summarized. will be summarized.

For example, if A is specified in the BY statement rather For example, if A is specified in the BY statement rather than the CLASS statement, then only statistics for A, than the CLASS statement, then only statistics for A, A*B, A*C, and A*B*C are produced. A*B, A*C, and A*B*C are produced.

Page 35: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

The SUMMARY or The SUMMARY or MEANS ProcedureMEANS Procedure

Examples:Examples:

• PROC SUMMARY with the CLASS statementPROC SUMMARY with the CLASS statement

• PROC SUMMARY with the BY statementPROC SUMMARY with the BY statement

Page 36: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

The SQL ProcedureThe SQL Procedure

PROC SQL performs actions both similar to the DATA step PROC SQL performs actions both similar to the DATA step and summarizing procedures such as SUMMARY, and summarizing procedures such as SUMMARY, TABULATE, and UNIVARIATE.TABULATE, and UNIVARIATE.

PROC SQL has unique syntax conforming to the SQL PROC SQL has unique syntax conforming to the SQL programming language.programming language.

The BY statement in PROC SQL is replaced by the The BY statement in PROC SQL is replaced by the GROUP BY statement. GROUP BY statement.

In PROC SQL if data are not sorted then the procedure will In PROC SQL if data are not sorted then the procedure will sort the data internally as needed by the GROUP BY sort the data internally as needed by the GROUP BY statement. statement.

Page 37: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

The SQL ProcedureThe SQL Procedure

Example:Example:

• Aggregating grouped data with PROC SQL Aggregating grouped data with PROC SQL GROUP BY statementGROUP BY statement

Page 38: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Thanks to SVSUG ChairThanks to SVSUG Chair

Andrew KarpAndrew Karp

Page 39: The Power of the BY Statement SVSUG 2009.06.25 Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)

Contact InformationContact Information

Your comments and questions are valued Your comments and questions are valued and encouraged. and encouraged.

Paul Choate, California Developmental ServicesPaul Choate, California Developmental ServicesPhone: (916) 654-2160Phone: (916) 654-2160E-mail: [email protected]: [email protected]