56
IOWA STATE UNIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science 500 Lecture No. 7 September 21, 2010

I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

Embed Size (px)

Citation preview

Page 1: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Using Basic Graphical and Statistical Procedures

(Chapter in the 8 Little SAS Book)

Animal Science 500

Lecture No. 7

September 21, 2010

Page 2: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

SAS Graphical Capabilities

u SAS has an extensive graphical ability

u Can graph your distribution with a normal distribution overlay

u Can graph various bar graphs

u However it may not be as intuitive to use

u Various styles of graphs can be used

Page 3: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

SAS Graphical Capabilities

u Many other programs that are available that are easier to use and more intuitive

u Other programs with graphical capabilities more easily interface with word processing and other software

Page 4: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Assumptions of the Analysis of Variance

u The analysis of variance has basic assumptions

1. Treatments randomly applied experimental units

2. Independence of residuals (,ij) within groups

3. Homogeneity of residual variances among groups

4. Treatment observations normally distributed

Page 5: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariateu Proc Univariate can be used to request a variety of

statistics to summarize the data distribution of each analysis variable:

1.Sample moments 2.Basic measures of location and variability 3.Confidence intervals for the mean, standard deviation, and variance 4.Tests for location 5.Tests for normality 6.Trimmed and Winsorized means 7.Robust estimates of scale 8.Quantiles and related confidence intervals 9.Extreme observations and extreme values 10.Fequency counts for observations 11.Missing values

Page 6: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc UnivariateuUsing various options in the PROC UNIVARIATE statement user can do the following: 1.Specify the input data set to be analyzed 2.Secify a graphics catalog for saving traditional graphics output 3.Specify rounding units for variable values 4.Specify the definition used to calculate percentiles 5.Specify the divisor used to calculate variances and standard

deviations 6.Request that plots be produced on line printers and define special printing characters used for features 7.Suppress tables 8.Save statistics in an output data set

Page 7: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate OutputThe UNIVARIATE ProcedureVariable: write (writing score)

Moments N 200 Sum Weights 200Mean 52.775 Sum Observations 10555Std Deviation 9.47858602 Variance 89.843593Skewness -0.4820386 Kurtosis -0.7502476Uncorrected SS 574919 Corrected SS 17878.875Coeff Variation 17.9603714 Std Error Mean 0.67023725

Basic Statistical Measures Location Variability

Mean 52.77500 Std Deviation 9.47859Median 54.00000 Variance 89.84359Mode 59.00000 Range 36.00000 Interquartile Range 14.50000

Page 8: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate Output meaninga.  Moments - Moments are a statistical summaries of a distribution.

b.  N - This is the number of valid observations for the variable.  The total number of observations is the sum of N and the number of missing values.  If there are missing values for the variable, proc univariate will output the statistics about the missing values, such as the number and the percentage of missing values.

c.  Mean - This is the arithmetic mean across the observations. It is the most widely used measure of central tendency. It is commonly called the average. The mean is sensitive to extremely large or small values.

d.  Std Deviation - Standard deviation is the square root of the variance.  It measures the spread of a set of observations.  The larger the standard deviation is, the more spread out the observations are.

e.  Skewness - Skewness measures the degree and direction of asymmetry.  A symmetric distribution such as a normal distribution has a skewness of 0, and a distribution that is skewed to the left, e.g. when the mean is less than the median, has a negative skewness.

f.  Uncorrected SS - This is the sum of squared data values.  The two summations: sum of observations and sum of squares are related to the calculation of variance in the following way:

Variance= (sum of squares -(sum of observations)2/N)/(N-1)

g.  Coeff Variation - The coefficient of variation is another way of measuring variability. It is a unitless measure. It is defined as the ratio of the standard deviation to the mean and is generally expressed as a percentage. It is useful for comparing variation between different variables.

Page 9: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate Output meaningh. Sum Weights - A numeric variable can be specified as a weight variable to weight the values

of the analysis variable. The default weight variable is defined to be 1 for each observation. This field is the sum of observation values for the weight variable. In our case, since we didn't specify a weight variable, SAS uses the default weight variable. Therefore, the sum of weight is the same as the number of observations.

i.  Sum Observations - This is the sum of observation values. In case that a weight variable is specified, this field will be the weighted sum. The mean for the variable is the sum of observations divided by the sum of weights.

j.  Variance - The variance is a measure of variability. It is the sum of the squared distances of data value from the mean divided by the variance divisor. The variance divisor is defined to be either N-1 or N controlled by the option vardef. The default option is vardef=df, which is N-1. The Corrected SS is the sum of squared distances of data value from the mean. Therefore, the variance is the corrected SS divided by N-1. We don't generally use variance as an index of spread because it is in squared units. Instead, we use standard deviation.

k.  Kurtosis - Kurtosis is a measure of the heaviness of the tails of a distribution. In SAS, a normal distribution has kurtosis 0. Extremely nonnormal distributions may have high positive or negative kurtosis values, while nearly normal distributions will have kurtosis values close to 0. Kurtosis is positive if the tails are "heavier" than for a normal distribution and negative if the tails are "lighter" than for a normal distribution.  Please see our FAQ on kurtosis What's with the different formulas for kurtosis?

Page 10: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate Output meaningl.  Corrected SS - This is the sum of squared distance of data values from the mean. This number

divided by the number of observations minus one gives the variance.

m.  Std Error Mean - This is the estimated standard deviation of the sample mean. If we drew

repeated samples of size 200, we would expect the standard deviation of the sample means to be close to the standard error. The standard deviation of the distribution of sample mean is estimated as the standard deviation of the sample divided by the square root of sample size. This provides a measure of the variability of the sample mean.  The Central Limit Theorem tells us that the sample means are approximately normally distributed when the sample size is 30 or greater

Page 11: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate Output meaningMean - This is the arithmetic mean across the observations. It is the most widely used measure of

central tendency. It is commonly called the average. The mean is sensitive to extremely large or small values.

Median - The median is a measure of central tendency. It is the middle number when the values are arranged in ascending (or descending) order. Sometimes, the median is a better measure of central tendency than the mean. It is less sensitive than the mean to extreme observations.

Mode - The mode is another measure of central tendency. It is the value that occurs most frequently in the variable. It is used most commonly when the variable is a categorical variable.

Std Deviation - Standard deviation is the square root of the variance. It measures the spread of a set of observations. The larger the standard deviation is, the more spread out the observations are

Variance - The variance is a measure of variability. It is the sum of the squared distances of data value from the mean divided by the variance divisor. The variance divisor is defined to be either N-1 or N controlled by the option vardef. The default option is vardef=df, which is N-1. The Corrected SS is the sum of squared distances of data value from the mean. Therefore, the variance is the corrected SS divided by N-1. We don't generally use variance as an index of spread because it is in squared units. Instead, we use standard deviation.

Range - The range is a measure of the spread of a variable. It is equal to the difference between the largest and the smallest observations. It is easy to compute and easy to understand. However, it is very insensitive to variability.

Interquartile Range - The interquartile range is the difference between the upper and the lower quartiles. It measures the spread of a data set. It is robust to extreme observations.

Page 12: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate OutputThe UNIVARIATE ProcedureVariable: write (writing score)

Moments N 200 Sum Weights 200Mean 52.775 Sum Observations 10555Std Deviation 9.47858602 Variance 89.843593Skewness -0.4820386 Kurtosis -0.7502476Uncorrected SS 574919 Corrected SS 17878.875Coeff Variation 17.9603714 Std Error Mean 0.67023725

Basic Statistical MeasuresLocation Variability

Mean 52.77500 Std Deviation 9.47859Median 54.00000 Variance 89.84359Mode 59.00000 Range 36.00000 Interquartile Range 14.50000

Page 13: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate Output Tests for Location: Mu0=0

Test -Statistic- -----p Value------

Student's t t 78.74077 Pr > |t| <.0001Sign M 100 Pr >= |M| <.0001Signed Rank S 10050 Pr >= |S| <.0001

Quantiles (Definition 5)

Quantile Estimate

100% Max 67.099% 67.095% 65.090% 65.075% Q3 60.050% Median 54.025% Q1 45.510% 39.05% 35.51% 31.0

0% Min 31.0

Page 14: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate Output meaning

Test - This column lists the various tests that are provided.

Statistic - This column lists the values of the test statistics.

p Value - This column lists the p-values associated with the test statistics.

Student's t - The Student t-test is used to test the null hypothesis that the population mean equals Mu0. The default value in SAS for Mu0 is 0.

The t-statistic is defined to be the difference between the mean and the hypotheses mean divided by the standard error of the mean.

The p-value is the two-tailed probability computed using a t distribution. If the p-value associated with the t-test is small (usually set at p < 0.05), there is evidence to reject the null hypothesis in favor of the alternative. In other words, the mean is statistically significantly different than the hypothesized value. If the p-value associated with the t-test is not small (p > 0.05), the null hypothesis is not rejected. In our example, our t-value is 78.74077 and the corresponding p-value is less than 0.0001. We conclude that there is a statistically significant difference between the mean of the variable write and zero.

Page 15: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate Output meaningSign - The sign test is a simple nonparametric procedure to test the null hypothesis regarding the population median.  It does not require that the sample is drawn from a normal distribution.  It is used when we have a small sample from a nonnormal distribution.  The statistic M is defined to be M=(N+-N-)/2 where N+ is the number of values that are greater than Mu0 and N- is the number of values that are less than Mu0.  Values equal to Mu0 are discarded.  Under the hypothesis that the population median is equal to Mu0, the sign test calculates the p-value for M using a binomial distribution.  The interpretation of the p-value is the same as for t-test.  In our example the M-statistic is 100 and the p-value is less than 0.0001. We conclude that the median of variable write is significantly different from zero.

Signed Rank - The signed rank test is also known as the Wilcoxon test.  It is used to test the null hypothesis that the population median equals Mu0.  It assumes that the distribution of the population is symmetric.  The Wilcoxon signed rank test statistic is computed based on the rank sum and the numbers of observations that are either above or below the median.  The interpretation of the p-value is the same as for the t-test.  In our example, the S-statistic is 10050 and the p-value is less than 0.0001.  We therefore conclude that the median of the variable write is significantly different from zero.

Page 16: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate Output meaningQualntile Meanings

100% Max - This is the maximum value of the variable.  One hundred percent of all values are equal to or less than this value.

95% - Ninety-five percent of all values of the variable are equal to or less than this value.

75% Q3 - This is the third quantile.  Seventy-five percent of all values are equal to or less than this value.

50% Median - This is the median.  The median splits the distribution such that half of all values are above this value, and half are below.

25% Q1 - This is the first quantile.  Twenty-five percent of all values of the variable are equal to or less than this value.

0% Min - This is the minimum value.  Zero percent of values are less than this value.

Page 17: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate Output

Extreme Observationsee

----Lowest---- ----Highest---Value Obs Value Obs

31 89 67 118 31 40 67 160 31 39 67 177 31 31 67 183 33 70 67 185

Page 18: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate OutputStem Leafff # Boxplotgg 66 0000000 7 | 64 0000000000000000 16 | 62 0000000000000000000000 22 | 60 00000000 8 +-----+z 58 0000000000000000000000000 25 | | 56 000000000000 12 | | 54 00000000000000000000 20 *-----*aa 52 0000000000000000 16 | + |c 50 00 2 | | 48 00000000000 11 | | 46 00000000000 11 | | 44 0000000000000 13 +-----+bb 42 000 3 | 40 0000000000000 13 | 38 000000 6 | 36 00000 5 | 34 00 2 | 32 0000 4 | 30 0000 4 | ------+-------+-------+---------+--------+

Page 19: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate Meaning

Extreme Observations - This is a list of the five lowest and five highest values of the variable.

Stem Leaf - The stem-leaf plot is used to visualize the overall distribution of a variable. In this display, the stem is the portion of the value to the left and the leaf is the part to the right. The number on the right is the number of leaves on each stem. For example, one the first line, the stem is 66, and there are seven 0's to the right of this stem, indicating that there are seven cases with a value of 66 or 67 for this variable.

Boxplot - The box plot is a graphical representation of the 5-number summary for a variable. It is based on the quartiles of a variable. The rectangular box corresponds to the lower quartile and the upper quartile. The line in the middle is the median. The plus sign in the middle is the mean. We can visually compare the lengths of the whiskers. If one is clearly longer than the other one, the distribution may be skewed.

Page 20: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate Output Meaning

75% Q3 - This is the third quantile. Seventy-five percent of all values are equal to or less than this value.

50% Median - This is the median. The median splits the distribution such that half of all values are above this value, and half are below.

Mean - This is the arithmetic mean across the observations. It is the most widely used measure of central tendency. It is commonly called the average. The mean is sensitive to extremely large or small values.

25% Q1 - This is the first quantile. Twenty-five percent of all values of the variable are equal to or less than this value.

Page 21: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Normal Probability Plotcc 67+ +++ ***** ** | ******* | ***** | **++ | ****+ | ***++ | ***++ | ***++ | **++ 49+ **+ | *** | *** | ++* | +*** | +** | +** | ++* | +*** 31+**+** +------+------+------+------+------+------+------+------+------+------+

Page 22: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Univariate Output Meaning

Normal Probability Plot - The normal probability plot is used to investigate whether the variable is normally distributed. The plus signs in the plot are indicate a normal distribution and they form a straight line.  The asterisks are show the data values.  If our variable is close to normal distribution, then the asterisks will also be close to a straight line and thus cover most of the plus signs.  There are different types of departure from normality.

Page 23: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Page 24: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Corr (Correlations)

u Is part of the base SAS software and computes correlations

u Measures the strength of relationship between two variables

u Values can range from -1 to 1

u If two variables completely uncorrelated they would have a correlation of 0

u If two variables are perfectly correlated they would have values of either -1 or 1 depending on whether correlation was negative or positive

Page 25: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Corr (Correlations)

u SAS basic statementn PROC CORR;

l Will compute correlations between all numeric variables.

n Add the word Var (list);l Computes correlations between variables you have listed

n Add the word With along with the Var list;l Computes correlations using the var list across the top and

variables in the with list down the side

n Defaultl Computes Pearson product-moment correlation coefficients

n Add options to the PROC statement to request non-parametric correlations

Page 26: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Corr (Correlations)

u SAS basic statementn PROC CORR Spearman;n The Spearman option calculates the Spearman’s rank

correlations instead of Pearson’s correlationsn Other options

l HOEFFDING for Hoeffding’s D-Statisticl KENDALL for Kendall’s tau-b coefficient

Page 27: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Corr (Correlations)u By default, PROC CORR prints a report that includes descriptive

statistics and correlation statistics for each variable. l Number of observations with nonmissing values,l Mean, l Standard Deviation, l Minimum, and l Maximum.

u For each pair of variables, PROC CORR prints the correlation coefficients, the number of observations used to calculate the coefficient, and the p-value.

u If you specify the ALPHA option, PROC CORR prints Cronbach’s coefficient alpha, the correlation between the variable and the total of the remaining variables, and Cronbach’s coefficient alpha by using the remaining variables for the raw variables and the standardized variables.

Page 28: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Corr (Correlations)u What does the P-Value mean that is associated with each correlatio?

Answer = A significant P-value with a correlation just means the correlation is different from zero

u Remember that correlations do not imply cause and effect. The correlation really just says how two variables vary with each other.

Page 29: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Corr Output

Fish Measurement Data

The CORR Procedure

4 Variables: Weight3 Length3 Height Width

Simple StatisticsVariable N Mean Std Dev Sum Minimum MaximumWeight3 34 8.44751 0.97574 287.21524 6.23168 10.00000Length3 34 38.38529 4.21628 1305 30.00000 46.50000Height 34 15.22057 1.98159 517.49950 11.52000 18.95700Width 34 5.43805 0.72967 184.89370 4.02000 6.74970

Page 30: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Corr OutputPearson Correlation Coefficients, N=34

Prob > |r| under H0: Rho=0

Weight3 Length3 Height Width

Weight3 1.00000.96523<0.0001

0.98261<0.0001

0.92789<0.0001

Length 3 0.96523<0.0001

1.0000 0.95492<0.0001

0.92171<0.0001

Length 0.98261<0.0001

0.95492<0.0001

1.0000 0.92632<0.0001

Width 0.92789<0.0001

0.92171<0.0001

0.92632<0.0001

1.0000

Page 31: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Corr Options

ALPHA calculates and prints Cronbach’s coefficient alpha. PROC CORR computes separate coefficients using raw and standardized values (scaling the variables to a unit variance of 1). For each VAR statement variable, PROC CORR computes the correlation between the variable and the total of the remaining variables. It also computes Cronbach’s coefficient alpha by using only the remaining variables.

If a WITH statement is specified, the ALPHA option is invalid. When you specify the ALPHA option, the Pearson correlations will also be displayed. If you specify the OUTP= option, the output data set also contains observations with Cronbach’s coefficient alpha. If you use the PARTIAL statement, PROC CORR calculates Cronbach’s coefficient alpha for partialled variables. See the section Partial Correlation for details.

BEST=n prints the highest correlation coefficients for each variable. Correlations are ordered from highest to lowest in absolute value. Otherwise, PROC CORR prints correlations in a rectangular table, using the variable names as row and column labels.

If you specify the HOEFFDING option, PROC CORR displays the statistics in order from highest to lowest.

COV displays the variance and covariance matrix. When you specify the COV option, the Pearson correlations will also be displayed. If you specify the OUTP= option, the output data set also contains the covariance matrix with the corresponding _TYPE_ variable value 'COV.' If you use the PARTIAL statement, PROC CORR computes a partial covariance matrix.

Displayed 4 of many. Examine the option that you might need or view the options and see what can be done!

Page 32: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC Reg

u Reg procedure fits linear regression models by least-squares and is on of many SAS procedures which performs regression analyses

u Reg is part of the SAS / STAT software and is licensed separately from the Base SAS software

u Show linear regression

u Proc Reg can is capable of analyzing models with many regressor variables using a variety of model –selection methods

Page 33: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Reg

u Selection methods available in Proc Regn Stepwise regressionn Forward selectionn Backward elimination

u Other procedures (Procs) for :n Non-linearn Logistic Regresssion

u Basic form

u PROC REG;n MODEL dependent = independent;

Page 34: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Reg Example

proc reg data = "d:\hsb2";

model science = math female socst read / clb; run;

quit;

Page 35: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Reg Output Analysis of Variance

Sum of MeanSource DF Squares Square F Value Pr > F

Model 4 9543.72074 2385.93019 46.69 <.0001Error 195 9963.77926 51.09630Corrected Total 199 19507

Root MSE 7.14817 R-Square 0.4892Dependent Mean 51.85000 Adj R-Sq 0.4788Coeff Var 13.78624

Page 36: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc Reg Output Parameter Estimates Parameter StandardVariable Label DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 12.32529 3.19356 3.86 0.0002math math score 1 0.38931 0.07412 5.25 <.0001female 1 -2.00976 1.02272 -1.97 0.0508socst social studies score 1 0.04984 0.06223 0.80 0.4241read reading score 1 0.33530 0.07278 4.61 <.0001

Parameter EstimatesVariable Label DF 95% Confidence Limits

Intercept Intercept 1 6.02694 18.62364math math score 1 0.24312 0.53550female 1 -4.02677 0.00724socst social studies score 1 -0.07289 0.17258read reading score 1 0.19177 0.47883

Page 37: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC REG OUTPUTYpredicted = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4

The column of estimates provides the values for b0, b1, b2, b3 and b4 for this equation.      math - The coefficient is .3893102.  So for every unit increase in math, a 0.38931 unit increase in science is predicted, holding all other variables constant.

female - For every unit increase in female, we expect a -2.00976 unit decrease in the

science score, holding all other variables constant.  Since female is coded 0/1 (0=male, 1=female) the interpretation is more simply: for females, the predicted science score would be 2 points lower than for males. 

socst - The coefficient for socst is .0498443.  So for every unit increase in socst, we expect an approximately .05 point increase in the science score, holding all other variables constant.

read - The coefficient for read is .3352998.  So for every unit increase in read, we expect a .34 point increase in the science score. 

Page 38: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC REG OUTPUTStandard Error - These are the standard errors associated with the coefficients. 

t Value - These are the t-statistics used in testing whether a given coefficient is significantly different from zero.

Pr > |t|- This column shows the 2-tailed p-values used in testing the null hypothesis that the coefficient (parameter) is 0.   Using an alpha of 0.05:

The coefficient for math is significantly different from 0 because its p-value is 0.000, which is smaller than 0.05.

The coefficient for socst (.0498443) is not statistically significantly different from 0 because its p-value is definitely larger than 0.05.

The coefficient for read (.3352998) is statistically significant because its p-value of 0.000 is less than .05.

Page 39: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC REG OUTPUTThe intercept is significantly different from 0 at the 0.05 alpha level.

95% Confidence Limits - These are the 95% confidence intervals for the coefficients.  The confidence intervals are related to the p-values such that the coefficient will not be statistically significant if the confidence interval includes 0.  These confidence intervals can help you to put the estimate from the coefficient into perspective by seeing how much the value could vary.

Page 40: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Creating Statistical Graphics with PROC REG

General formODS GRAPHICS ON;

PROC REG PLOTS (OPTIONS) = (PLOT-LIST);

Model dependent = independent;

Run;

Quit;

Page 41: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Creating Statistical Graphics with PROC REGFITPLOT scatter plot with regression line and confidence

and prediction bands

RESIDUALS residuals plotted against independent variable

DIAGNOSTICS diagnostics panel including all of the following plots

COOKSD Cook’s D statistic by observation number

OBSERVATIONBY PREDICTED dependent variable by predicted value

QQPLOT Normal Quantile Plot of Residuals

RESIDUAL BYPREDICTED residuals by predicted values

RESIDUALHISTOGRAM histogram of residuals

RFPLOT residual fit plot

RSTUDENTBY LEVERAGE studentized residuals by leverage

RSTUDENTBYPREDICTED studentized residuals by predicted values

Page 42: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Default Options

u By default the FITPLOT, RESIDUAL and DIAGNOSTIC plots are generated

Page 43: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc ANOVA

u One of many SAS procedures that can perform Analysis of Variance or ANOVA

u Is part of the SAS/STAT that is licensed separately from the base SAS software

u Is designed for balanced data n Equal numbers of observations in each combination of

the classification factorsn Exception is for the one-way ANOVA where the data not

need be balanced

Page 44: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc ANOVA

u One-way analysis of variance.n The null hypothesis tested by one-way ANOVA is that

two or more population means are equal. n The question is whether (H0) the population means may

equal for all groups and that the observed differences in sample means are due to random sampling variation, or (Ha) the observed differences between sample means are due to actual differences in the population means.

Page 45: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc ANOVA

u Assumptions needed for the ANOVA. 1)random, independent sampling from some larger

population;

2)normal population distributions;

3)equal variances within the population. n Assumption 1 is crucial for any inferential statistic. n Assumptions 2 and 3 can be relaxed when large

samples are used, and n Assumption 3 can be relaxed when the sample sizes

are roughly the same for each group even for small samples.

Page 46: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc ANOVA

u If you are not performing a one-way analysis of variance and / or your data is not balanced you should be using the General Linear Models Procedure or GLM

Page 47: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ANOVA u The ANOVA procedure performs analysis of variance

(ANOVA) n It is designed for use with balanced data from a wide variety of

experimental designs.

u In analysis of variance, a continuous response variable, known as a dependent variable, is measured under experimental conditions identified by classification variables, known as independent variables.

u The variation in the response is assumed to be due to effects in the classification, with random error accounting for the remaining variation.

Page 48: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ANOVA

u General form

PROC ANOVA

CLASS variable-list;

Model dependent = effects;n The two required statements are the CLASS and

MODEL statements.n The CLASS statement MUST come before the Model

statementn For the one way ANOVA only one variable is listed

Page 49: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ANOVA

u Many options available when using the ANOVAMeans – calculates means for the dependent variable for

any of the main effects included in the model statement

Several mean separation or comparison tests including

1. Bonferroni t tests (BON)

2. Duncan’s multiple-range test (DUNCANS)

3. Scheffe’s multiple-comparison procedure (SCHEFFE)

4. Pairwise t tests (T)

5. Tukey’s studentized range test (TUKEYS)

Page 50: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ANOVA

u Many options available when using the ANOVAn General form MEANS effects / options;n The effects can be any main effect in the model

statementl Cannot be any crossed or nested effects

n The options can be any one of the comparison tests (Duncans or Tukeys for example)

Page 51: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ANOVA

u If the ODS Graphics are turned on PROC ANOVA will produce a grouped box plot of the effect variable for one-way ANOVA and for all effects in the MEANS statement

Page 52: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc ANOVA output

u The output from an ANOVA analysis has at least two parts

1. Table providing information about the classification variables in the model

1. Number of levels

2. Values

3. Number of observations

2. An ANOVA table

3. Options like means will be outputted next

Page 53: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc ANOVA output exampleGirls’ Heights on Basketball Teams

The ANOVA Procedure

Class Level Information

CLASS Levels Values

Team 5 Blue gold gray pink red

Number of Observations 60

Page 54: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc ANOVA output exampleGirls’ Heights on Basketball Teams

The ANOVA Procedure

Dependent Variable: Height

Source DF Sums of Squares Mean Squares F Value Pr > F

Model 4 228.00 57.00 4.14 0.0053

Error 55 758.00 13.7828282

Corrected Total 59 986.00

R-Square Coeff Var Root MSE Height Mean

0.2331 7.279 3.712 51.00

Source DF Anova SS Mean Square F Value Pr > F

Team 4 228.000 57.00 4.14 0.0053

Page 55: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc ANOVA output exampleSource source of variationDF degrees of freedom for the model, error, and totalSum of Squares sum of squares for the portion attributed to the model, error, and

the totalMean Square Mean square (sum of squares divided by the degrees of freedom)F Value F value (mean square for model divided by the mean square for

errorPr > F significance probability associated with the F statisticR-square R-square (how predictive your model is)Coeff Var coefficient of variation (standard deviation divided by the mean)

How much variation you have among means of the same variableRoot MSE root mean square error (The name comes from the fact that it is

the square root of the mean of the squares of the values) a statistical measure of the magnitude of a varying quantityIt gives a sense for the typical size of the numbers and is squared to account for negative numbersThe RMS is always the same as or just a little bit larger than the average of the unsigned values

Height mean mean of the dependent variable in this case height

Page 56: I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science

IOWA STATE UNIVERSITYDepartment of Animal Science

Proc ANOVA output example

Girls’ Height on Basketball Teams

The ANOVA Procedure

Scheffe’s Test for Height

NOTE: This test controls the type I experimentwise error rate.

Alpha 0.05

Error Degrees of Freedom 55

Error Mean Square 13.78182

Critical Value of F 2.53969

Minimum Significant Difference 4.8306

Means with the same letter are not significantly different

Scheffe Grouping Mean N team

A 54.833 12 Pink

B A 50.500 12 gold

B A 50.333 12 gray

B 49.833 12 blue

B 49.500 12 red