PASW-SPSS STATISTICS

David Yens, Ph.D. NYCOM

PASW-SPSS STATISTICSPASW-SPSS STATISTICS

David P. Yens, Ph.D. New York College of Osteopathic

Medicine, NYIT [email protected]

PRESENTATION 5 REVIEW OF ANOVA CORRELATION AND

REGRESSION

2010


ANALYSIS OF VARIANCEANALYSIS OF VARIANCE Simple

◦Used to determine whether there are differences in means among more than two groups, or:

Factorial◦ on more than one dimension (independent

variable).◦Examples:

1. Compare blood pressures resulting from the use of three treatments.

2. Compare blood pressures resulting from the use of three treatments and between males and females.

D Yens, NYCOM 3 04/21/23

TREATMENTGROUP

AGROUP

BGROUP

CMean BloodPressure

Meanfor A

Meanfor B

Meanfor C


MALES FEMALESTREATMENT

AMean A

TREATMENTB

Mean B

TREATMENTC

Mean C

MeanMales

MeanFemales


ANOVAANOVA

Determining differences after ANOVA◦Planned contrasts◦Post-hoc analyses

hsbdataBEffect of fathers education on

◦Grades◦Visualization test◦Math achiement

ANOVAANOVAANALYZE COMPARE MEANS ONE-WAY ANOVA OR

◦ GENERAL LINEAR MODEL UNIVARIATE

Several options are available

Length of stay in different hospitalsPATIENT

A1 A2 A3 A41 2 3 5 102 3 6 8 113 4 7 9 134 3 5 10 85 4 4 4 96 2 5 6 9

HOSPITAL

Analysis of Variance

Anova: Single Factor

SUMMARY

Groups Count Sum Average Variance

A1 6 18 3 0.8

A2 6 30 5 2

A3 6 42 7 5.6

A4 6 60 10 3.2

ANOVA

Source of Variation SS df MS F P-value F crit

Between Groups 160.5 3 53.5 18.44828 5.6E-06 3.0983912

Within Groups 58 20 2.9

Total 218.5 23


OTHER ANALYSIS OF OTHER ANALYSIS OF VARIANCE METHODSVARIANCE METHODS

◦Repeated measures◦Analysis of Covariance

Test statistic - F


STATISTICAL ANALYSESSTATISTICAL ANALYSES

ANALYSIS OF VARIANCE (Repeated measures)◦Used to assess before and after

measures on the same individuals exposed to two or more treatments.

◦Example: Assess the increase in blood pressure for two groups exposed to different treatments.

REPEATED MEASURES REPEATED MEASURES ANOVAANOVA

04/21/23 D Yens, NYCOM 12

TREATMENT

TREATMENT A TREATMENT B TREATMENT C

SUBJECT 1 BP-A BP-B BP-C

SUBJECT 2

SUBJECT 3

SUBJECT 4

"----

MEAN BLOOD PRESSURE MEAN FOR A MEAN FOR B MEAN FOR C

REPEATED MEASURES REPEATED MEASURES ANOVAANOVATemperature over 4 days

PATIENTA1 A2 A3 A4

1 101.2 100.3 99.8 98.72 102.3 100.7 100.1 99.13 103.2 101.1 100.2 99.14 102.2 100.6 99.2 98.55 104 102.1 100.1 99.26 103.2 101.5 100.3 99.3

TEMPERATURE





CORRELATION AND CORRELATION AND REGRESSIONREGRESSION

Morgan, Chapt. 8Morgan, Chapt. 8CORRELATION – Expresses

relationship onlyREGRESSION – Prediction of one

variable from another. Implies direction of influence, does NOT prove causality

MULTIPLE REGRESSION – Prediction of a target variable from 2 or more predictors (independent variables)


CORRELATIONCORRELATIONCorrelation coefficient is a number

between -1 and +1 whose sign is on the same as the slope of the line and whose magnitude is related to the degree of linear association between two variables

R2, the coefficient of determination, expresses the proportion of variance in the dependent variable explained by the independent variable◦On a ratio scale; an r2 =.50 is twice as

large as .25Interpretation of values

ASSUMPTIONS FOR PEARSON ASSUMPTIONS FOR PEARSON CORRELATION & SIMPLE CORRELATION & SIMPLE

REGRESSIONREGRESSION

Linear relationshipScores normally distributedOutliers can have a major impact

VARIABLES FOR VARIABLES FOR CORRELATIONCORRELATIONGrades MathAchievement4 9.005 10.336 7.673 5.003 -1.675 1.006 12.004 8.00ETC.

EXAMPLE FROM TEXTEXAMPLE FROM TEXTCheck assumptions

OBTAINING A OBTAINING A SCATTERPLOTSCATTERPLOTGRAPHS LEGACY DIALOGS SCATTER/DOT

SCATTERPLOTSCATTERPLOT

ADDING REGRESSION ADDING REGRESSION LINELINENow double-click the output

chart

USING CHART BUILDERUSING CHART BUILDERGRAPHS CHART BUILDER OKSELECT “Gallery”SELECT “Scatter/Dot” With mouse, move “Simple Scatter” to

Chart PreviewFind/move “math achievement test” to

vertical axis boxFind/move “grades in h.s.” to horizontal

axis box Click OK

OUTPUT FROM CHART OUTPUT FROM CHART BUILDERBUILDER

TO OBTAIN A FIT LINETO OBTAIN A FIT LINEDouble-click on chartSELECT “Elements”SELECT “Interpolation line”

FIT LINEFIT LINE

TO GET A CORRELATION TO GET A CORRELATION BETWEEN THE 2 VARIABLESBETWEEN THE 2 VARIABLESANALYZE CORRELATE BIVARIATE

CORRELATION OUTPUTCORRELATION OUTPUT

Mean Std. Deviation Ngrades in h.s. 5.68 1.570 75

math achievement test 12.5645 6.67031 75

grades in h.s.

math achievement

testPearson Correlation 1 .504**

Sig. (2-tailed) .000

N 75 75

Pearson Correlation .504** 1

Sig. (2-tailed) .000

N 75 75

**. Correlation is significant at the 0.01 level (2-tailed).

Descriptive Statistics

Correlations

grades in h.s.

math achievement test

CORRELATION EXAMPLECORRELATION EXAMPLEDayya (2005) looked at

predictors of obesity. In one example, he plotted percent of calories in carbs against BMI to see if there was a relationship with the following result:

Dayya, D. Analysis of the CDC-NHANES Database to Identify Predictors Of Obesity in a Multiple Linear and Logistic Regression Model. New York Medical Journal, online, Dec. 2005.

CORRELATION EXAMPLECORRELATION EXAMPLE


REGRESSIONREGRESSION

The simplest regression is y=a+bx, where y is the dependent variable (plotted on the vertical axis), x is the independent variable (plotted on the horizontal axis), and a is the y intercept.

Refers to a mathematical equation that allows one variable (the target variable) to be predicted from another (the independent variable).

Implies a direction of influence; it does not prove causality.

From Greenhaigh, T. How to read a paper: statistics for the non-statistician. II. BMJ, 315 (7105)


Simple RegressionSimple RegressionThe regression line is the straight

line passing through the data that minimizes the sum of squared differences between the original data and the fitted points◦Least-squares analysis◦This was the basis for ANOVA

proceduresIntercept term is equivalent to the

grand mean

QUESTIONQUESTIONCan we predict math

achievement from grades in high school?

Using the same variables as before:

ANALYZE REGRESSION LINEAR

INITIAL OUTPUT TABLESINITIAL OUTPUT TABLES

Mean Std. Deviation Nmath achievement test 12.5645 6.67031 75

grades in h.s. 5.68 1.570 75

math achievement test grades in h.s.math achievement test 1.000 .504

grades in h.s. .504 1.000

math achievement test . .000

grades in h.s. .000 .

math achievement test 75 75

grades in h.s. 75 75

N

Descriptive Statistics

Correlations

Pearson Correlation

Sig. (1-tailed)

REGRESSION TABLESREGRESSION TABLESVariables Entered

Variables Removed Method

1 grades in h.s.a . Enter

R R SquareAdjusted R

SquareStd. Error of the

Estimate1 .504a .254 .244 5.80018

Sum of Squares df Mean Square F Sig.Regression 836.606 1 836.606 24.868 .000a

Residual 2455.875 73 33.642

Total 3292.481 74

Standardized Coefficients

B Std. Error Beta Tolerance VIF(Constant) .397 2.530 .157 .876

grades in h.s. 2.142 .430 .504 4.987 .000 1.000 1.000

(Constant) grades in h.s.1 1.964 1.000 .02 .02

2 .036 7.421 .98 .98

1

a. Dependent Variable: math achievement test

1

a. Dependent Variable: math achievement test

Collinearity Diagnosticsa

Model Dimension

Eigenvalue Condition Index

Variance Proportions

b. Dependent Variable: math achievement test

Coefficientsa

Model

Unstandardized Coefficients

t Sig.

Collinearity Statistics

Model

a. Predictors: (Constant), grades in h.s.

ANOVAb

Model

1

a. Predictors: (Constant), grades in h.s.

Variables Entered/Removedb

Model

a. All requested variables entered.

b. Dependent Variable: math achievement test

Model Summary

REGRESSION EXAMPLEREGRESSION EXAMPLEWe could look at the Dayya data

again to predict BMI from percent calories in carbs. Do you think we could obtain an accurate prediction?

Other uses of regression might be to predict the number of fillings required during a 5-year period from the number of times teeth were brushed a week.

DAYYA DATADAYYA DATA

Multiple RegressionMultiple RegressionA more complex mathematical

equation that allows the target variable to be predicted from two or more independent variables (often known as co-variables).

EXAMPLE: predicting blood pressure from age, height, weight, and drug dosage.

SEESEE YOU IN ---YOU IN ---


MULTIPLE REGRESSIONMULTIPLE REGRESSIONFINAL POINTS

◦Sample size – number of subjects at least 5 (preferably 10) times the number of variables

◦The multiple R should be at least .7◦The change in R2 should be at least a few

percent◦A gradual fall off should be seen in the

prediction of each successive variable◦Fewer predictor variables are better than

many; too many make interpretation difficult

◦Analyze the influence of outliers

Documents

PASW-SPSS STATISTICS