20
SAS 介介介介介 Presented by 介介介介介介介介 介介介介介介介介

SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

Embed Size (px)

Citation preview

Page 1: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

SAS介绍和举例

Presented by

经济实验教学中心商务数据挖掘中心

Page 2: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

Raw Data

Read in Data

Process Data(Create new variables)

Output Data(Create SAS Dataset)

Analyze Data Using Statistical Procedures

Data Step

PROCs

Page 3: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

Structure of Data

• Made up of rows and columns• Rows in SAS are called observations• Columns in SAS are called variables

An observation is all the information for one entity (patient, patient visit, clinical center, county)SAS processes data one observation at a time

Page 4: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

Example of Data

12 observations and 5 variables

F 23 S 15 MNF 21 S 15 WIF 22 S 09 MNF 35 M 02 MNF 22 M 13 MNF 25 S 13 WIM 20 S 13 MNM 26 M 15 WIM 27 S 05 MNM 23 S 14 IAM 21 S 14 MNM 29 M 15 MN

Page 5: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

Example of Data

12 observations and 5 variables

F 23 S 15 MNF 21 S 15 WIF 22 S 09 MNF 35 M 02 MNF 22 M 13 MNF 25 S 13 WIM 20 S 13 MNM 26 M 15 WIM 27 S 05 MNM 23 S 14 IAM 21 S 14 MNM 29 M 15 MN

Page 6: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

Example of Data

12 observations and 5 variables ?

F23S15MNF21S15WIF22S09MNF35M02MNF22M13MNF25S13WIM20S13MNM26M15WIM27S05MNM23S14IAM21S14MNM29M15MN

Need to know the starting and ending

position for each variable.

Page 7: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

Types of Data

• Numeric (e.g. age, blood pressure)

• Character (patient ID, diagnosis)

You need to tell SAS if the data is character. The default is numeric.

Page 8: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

Rules for SAS Statements and Variables

• SAS statements end with a semicolon (;)• SAS statements can be entered in lower or

uppercase• Multiple SAS statements can appear on one

line• A SAS statement can use multiple lines• Variable names can be from 1-32 characters

and must begin with A-Z or an underscore (_)

Page 9: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

* This is a short example program to demonstrate what a SAS program looks like. This is a comment statement because it begins with a * and ends with a semi-colon ;

DATA demo; INFILE DATALINES; INPUT gender $ age marstat $ credits state $ ;

if credits > 12 then fulltime = 'Y'; else fulltime = 'N'; if state = 'MN' then resid = 'Y'; else resid = 'N'; DATALINES;F 23 S 15 MNF 21 S 15 WIF 22 S 09 MNF 35 M 02 MNF 22 M 13 MNF 25 S 13 WIM 20 S 13 MNM 26 M 15 WIM 27 S 05 MNM 23 S 14 IAM 21 S 14 MNM 29 M 15 MN;RUN;TITLE 'Running the Example Program';PROC PRINT DATA=demo ; VAR gender age marstat credits fulltime state ;RUN;

Page 10: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

1 DATA demo; Create a SAS dataset called demo2 INFILE DATALINES; Where is the data?3 INPUT gender $ What are the variable age names and types? marstat $ credits state $ ;

4 if credits > 12 then fulltime = 'Y'; else fulltime = 'N';

5 if state = 'MN' then resid = 'Y'; else resid = 'N';

Statements 4 and 5 create 2 new variables

Page 11: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

6 DATALINES; Tells SAS the data is comingF 23 S 15 MNF 21 S 15 WIF 22 S 09 MNF 35 M 02 MNF 22 M 13 MNF 25 S 13 WIM 20 S 13 MNM 26 M 15 WIM 27 S 05 MNM 23 S 14 IAM 21 S 14 MNM 29 M 15 MN; Tells SAS the data is ending

7 RUN; Tells SAS to run the statements above

Page 12: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

Main SAS Windows (PC)

• Editor Window – where you type your program

• Log Window –lists program statements processed, giving notes, warnings and errors.

Always look at the log window !

Tells how SAS understood your program

• Output Window – gives the output generated from the PROCs

Submit program by clicking on run icon

Page 13: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

PC SAS WINDOWS (OUTPUT WINDOW IS HIDDEN)

Page 14: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

Main SAS Files

• Program file – type your program in text editor– fname.sas

• Log file – lists program statements processed, giving notes, warnings and errors. – fname.log

• Output file – gives the output generated from the PROCs– fname.lst

Submit program by typing: sas fname.sas

Page 15: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

Messages in SAS Log

• Notes – messages that may or may not be important

• Warnings – messages that are usually important

• Errors – fatal in that program will abort

(notes and warnings will not abort your program)

Page 16: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

* This is a short example program to demonstrate what a SAS program looks like. This is a comment statement because it begins with a * and ends with a semi-colon ;

DATA demo; INFILE DATALINES; INPUT gender $ age marstat $ credits state $ ;

if credits > 12 then fulltime = 'Y'; else fulltime = 'N'; if state = 'MN' then resid = 'Y'; else resid = 'N'; DATALINES;F 23 S 15 MNF 21 S 15 WIF 22 S 09 MNF 35 M 02 MNF 22 M 13 MNF 25 S 13 WIM 20 S 13 MNM 26 M 15 WIM 27 S 05 MNM 23 S 14 IAM 21 S 14 MNM 29 M 15 MN;RUN;TITLE 'Running the Example Program';PROC PRINT DATA=demo ; VAR gender age marstat credits fulltime state ;RUN;

Page 17: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

LOG WINDOW (or file)

NOTE: Copyright (c) 1999-2001 by SAS Institute Inc., Cary, NC, USA.NOTE: SAS (r) Proprietary Software Release 8.2 (TS2M0) Licensed to UNIVERSITY OF MINNESOTA, Site 0009012001.NOTE: This session is executing on the WIN_NT platform.

NOTE: SAS initialization used: real time 7.51 seconds cpu time 0.89 seconds

1 * This is a short example program to demonstrate what a2 SAS program looks like. This is a comment statement because3 it begins with a * and ends with a semi-colon ;45 DATA demo;6 INFILE DATALINES;7 INPUT gender $ age marstat $ credits state $ ;89 if credits > 12 then fulltime = 'Y'; else fulltime = 'N';10 if state = 'MN' then resid = 'Y'; else resid = 'N';11 DATALINES;

NOTE: The data set WORK.DEMO has 12 observations and 7 variables.NOTE: DATA statement used: real time 0.38 seconds cpu time 0.06 seconds

Page 18: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

25 RUN;26 TITLE 'Running the Example Program';27 PROC PRINT DATA=demo ;28 VAR gender age marstat credits fulltime state ;29 RUN;

NOTE: There were 12 observations read from the data set WORK.DEMO.NOTE: PROCEDURE PRINT used: real time 0.19 seconds cpu time 0.02 seconds

30 PROC MEANS DATA=demo N SUM MEAN;31 VAR age credits ;32 RUN;

NOTE: There were 12 observations read from the data set WORK.DEMO.NOTE: PROCEDURE MEANS used: real time 0.25 seconds cpu time 0.03 seconds

33 PROC FREQ DATA=demo; TABLES gender;34 RUN;

NOTE: There were 12 observations read from the data set WORK.DEMO.NOTE: PROCEDURE FREQ used: real time 0.15 seconds cpu time 0.03 seconds

Page 19: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

OUTPUT WINDOW (OR LST FILE)Running the Example Program

Obs gender age marstat credits fulltime state

1 F 23 S 15 Y MN 2 F 21 S 15 Y WI 3 F 22 S 9 N MN 4 F 35 M 2 N MN 5 F 22 M 13 Y MN 6 F 25 S 13 Y WI 7 M 20 S 13 Y MN 8 M 26 M 15 Y WI 9 M 27 S 5 N MN 10 M 23 S 14 Y IA 11 M 21 S 14 Y MN 12 M 29 M 15 Y MN

The MEANS Procedure

Variable N Sum Mean----------------------------------------------age 12 294.0000000 24.5000000credits 12 143.0000000 11.9166667-----------------------------------------------

The FREQ Procedure

Cumulative Cumulativegender Frequency Percent Frequency Percent-----------------------------------------------------------F 6 50.00 6 50.00M 6 50.00 12 100.0

Page 20: SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze

Some common procedures

PROC PRINT• print out your data - always a good idea!!

PROC MEANS• descriptive statistics for continuous data

PROC FREQ• descriptive statistics for categorical data

PROC UNIVARIATE• very detailed descriptive statistics for continuous data

PROC TTEST• performs t-tests (continuous data)