Upload
glenna
View
70
Download
2
Tags:
Embed Size (px)
DESCRIPTION
PASW-SPSS STATISTICS. David P. Yens, Ph.D. New York College of Osteopathic Medicine, NYIT [email protected] PRESENTATION 5 REVIEW OF ANOVA CORRELATION AND REGRESSION 2010. ANALYSIS OF VARIANCE. Simple - PowerPoint PPT Presentation
Citation preview
David Yens, Ph.D. NYCOM
PASW-SPSS STATISTICSPASW-SPSS STATISTICS
David P. Yens, Ph.D. New York College of Osteopathic
Medicine, NYIT [email protected]
PRESENTATION 5 REVIEW OF ANOVA CORRELATION AND
REGRESSION
2010
David Yens, Ph.D. NYCOM
ANALYSIS OF VARIANCEANALYSIS OF VARIANCE Simple
◦Used to determine whether there are differences in means among more than two groups, or:
Factorial◦ on more than one dimension (independent
variable).◦Examples:
1. Compare blood pressures resulting from the use of three treatments.
2. Compare blood pressures resulting from the use of three treatments and between males and females.
D Yens, NYCOM 3 04/21/23
TREATMENTGROUP
AGROUP
BGROUP
CMean BloodPressure
Meanfor A
Meanfor B
Meanfor C
D Yens, NYCOM 4 04/21/23
MALES FEMALESTREATMENT
AMean A
TREATMENTB
Mean B
TREATMENTC
Mean C
MeanMales
MeanFemales
D Yens, NYCOM 5 04/21/23
ANOVAANOVA
Determining differences after ANOVA◦Planned contrasts◦Post-hoc analyses
hsbdataBEffect of fathers education on
◦Grades◦Visualization test◦Math achiement
ANOVAANOVAANALYZE COMPARE MEANS ONE-WAY ANOVA OR
◦ GENERAL LINEAR MODEL UNIVARIATE
Several options are available
Length of stay in different hospitalsPATIENT
A1 A2 A3 A41 2 3 5 102 3 6 8 113 4 7 9 134 3 5 10 85 4 4 4 96 2 5 6 9
HOSPITAL
Analysis of Variance
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
A1 6 18 3 0.8
A2 6 30 5 2
A3 6 42 7 5.6
A4 6 60 10 3.2
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 160.5 3 53.5 18.44828 5.6E-06 3.0983912
Within Groups 58 20 2.9
Total 218.5 23
David Yens, Ph.D. NYCOM
OTHER ANALYSIS OF OTHER ANALYSIS OF VARIANCE METHODSVARIANCE METHODS
◦Repeated measures◦Analysis of Covariance
Test statistic - F
D Yens, NYCOM 11 04/21/23
STATISTICAL ANALYSESSTATISTICAL ANALYSES
ANALYSIS OF VARIANCE (Repeated measures)◦Used to assess before and after
measures on the same individuals exposed to two or more treatments.
◦Example: Assess the increase in blood pressure for two groups exposed to different treatments.
REPEATED MEASURES REPEATED MEASURES ANOVAANOVA
04/21/23 D Yens, NYCOM 12
TREATMENT
TREATMENT A TREATMENT B TREATMENT C
SUBJECT 1 BP-A BP-B BP-C
SUBJECT 2
SUBJECT 3
SUBJECT 4
"----
MEAN BLOOD PRESSURE MEAN FOR A MEAN FOR B MEAN FOR C
REPEATED MEASURES REPEATED MEASURES ANOVAANOVATemperature over 4 days
PATIENTA1 A2 A3 A4
1 101.2 100.3 99.8 98.72 102.3 100.7 100.1 99.13 103.2 101.1 100.2 99.14 102.2 100.6 99.2 98.55 104 102.1 100.1 99.26 103.2 101.5 100.3 99.3
TEMPERATURE
REPEATED MEASURES REPEATED MEASURES ANOVAANOVA
REPEATED MEASURES REPEATED MEASURES ANOVAANOVA
REPEATED MEASURES REPEATED MEASURES ANOVAANOVA
David Yens, Ph.D. NYCOM
CORRELATION AND CORRELATION AND REGRESSIONREGRESSION
Morgan, Chapt. 8Morgan, Chapt. 8CORRELATION – Expresses
relationship onlyREGRESSION – Prediction of one
variable from another. Implies direction of influence, does NOT prove causality
MULTIPLE REGRESSION – Prediction of a target variable from 2 or more predictors (independent variables)
David Yens, Ph.D. NYCOM
CORRELATIONCORRELATIONCorrelation coefficient is a number
between -1 and +1 whose sign is on the same as the slope of the line and whose magnitude is related to the degree of linear association between two variables
R2, the coefficient of determination, expresses the proportion of variance in the dependent variable explained by the independent variable◦On a ratio scale; an r2 =.50 is twice as
large as .25Interpretation of values
ASSUMPTIONS FOR PEARSON ASSUMPTIONS FOR PEARSON CORRELATION & SIMPLE CORRELATION & SIMPLE
REGRESSIONREGRESSION
Linear relationshipScores normally distributedOutliers can have a major impact
VARIABLES FOR VARIABLES FOR CORRELATIONCORRELATIONGrades MathAchievement4 9.005 10.336 7.673 5.003 -1.675 1.006 12.004 8.00ETC.
EXAMPLE FROM TEXTEXAMPLE FROM TEXTCheck assumptions
OBTAINING A OBTAINING A SCATTERPLOTSCATTERPLOTGRAPHS LEGACY DIALOGS SCATTER/DOT
SCATTERPLOTSCATTERPLOT
ADDING REGRESSION ADDING REGRESSION LINELINENow double-click the output
chart
USING CHART BUILDERUSING CHART BUILDERGRAPHS CHART BUILDER OKSELECT “Gallery”SELECT “Scatter/Dot” With mouse, move “Simple Scatter” to
Chart PreviewFind/move “math achievement test” to
vertical axis boxFind/move “grades in h.s.” to horizontal
axis box Click OK
OUTPUT FROM CHART OUTPUT FROM CHART BUILDERBUILDER
TO OBTAIN A FIT LINETO OBTAIN A FIT LINEDouble-click on chartSELECT “Elements”SELECT “Interpolation line”
FIT LINEFIT LINE
TO GET A CORRELATION TO GET A CORRELATION BETWEEN THE 2 VARIABLESBETWEEN THE 2 VARIABLESANALYZE CORRELATE BIVARIATE
CORRELATION OUTPUTCORRELATION OUTPUT
Mean Std. Deviation Ngrades in h.s. 5.68 1.570 75
math achievement test 12.5645 6.67031 75
grades in h.s.
math achievement
testPearson Correlation 1 .504**
Sig. (2-tailed) .000
N 75 75
Pearson Correlation .504** 1
Sig. (2-tailed) .000
N 75 75
**. Correlation is significant at the 0.01 level (2-tailed).
Descriptive Statistics
Correlations
grades in h.s.
math achievement test
CORRELATION EXAMPLECORRELATION EXAMPLEDayya (2005) looked at
predictors of obesity. In one example, he plotted percent of calories in carbs against BMI to see if there was a relationship with the following result:
Dayya, D. Analysis of the CDC-NHANES Database to Identify Predictors Of Obesity in a Multiple Linear and Logistic Regression Model. New York Medical Journal, online, Dec. 2005.
CORRELATION EXAMPLECORRELATION EXAMPLE
David Yens, Ph.D. NYCOM
REGRESSIONREGRESSION
The simplest regression is y=a+bx, where y is the dependent variable (plotted on the vertical axis), x is the independent variable (plotted on the horizontal axis), and a is the y intercept.
Refers to a mathematical equation that allows one variable (the target variable) to be predicted from another (the independent variable).
Implies a direction of influence; it does not prove causality.
From Greenhaigh, T. How to read a paper: statistics for the non-statistician. II. BMJ, 315 (7105)
David Yens, Ph.D. NYCOM
Simple RegressionSimple RegressionThe regression line is the straight
line passing through the data that minimizes the sum of squared differences between the original data and the fitted points◦Least-squares analysis◦This was the basis for ANOVA
proceduresIntercept term is equivalent to the
grand mean
QUESTIONQUESTIONCan we predict math
achievement from grades in high school?
Using the same variables as before:
ANALYZE REGRESSION LINEAR
INITIAL OUTPUT TABLESINITIAL OUTPUT TABLES
Mean Std. Deviation Nmath achievement test 12.5645 6.67031 75
grades in h.s. 5.68 1.570 75
math achievement test grades in h.s.math achievement test 1.000 .504
grades in h.s. .504 1.000
math achievement test . .000
grades in h.s. .000 .
math achievement test 75 75
grades in h.s. 75 75
N
Descriptive Statistics
Correlations
Pearson Correlation
Sig. (1-tailed)
REGRESSION TABLESREGRESSION TABLESVariables Entered
Variables Removed Method
1 grades in h.s.a . Enter
R R SquareAdjusted R
SquareStd. Error of the
Estimate1 .504a .254 .244 5.80018
Sum of Squares df Mean Square F Sig.Regression 836.606 1 836.606 24.868 .000a
Residual 2455.875 73 33.642
Total 3292.481 74
Standardized Coefficients
B Std. Error Beta Tolerance VIF(Constant) .397 2.530 .157 .876
grades in h.s. 2.142 .430 .504 4.987 .000 1.000 1.000
(Constant) grades in h.s.1 1.964 1.000 .02 .02
2 .036 7.421 .98 .98
1
a. Dependent Variable: math achievement test
1
a. Dependent Variable: math achievement test
Collinearity Diagnosticsa
Model Dimension
Eigenvalue Condition Index
Variance Proportions
b. Dependent Variable: math achievement test
Coefficientsa
Model
Unstandardized Coefficients
t Sig.
Collinearity Statistics
Model
a. Predictors: (Constant), grades in h.s.
ANOVAb
Model
1
a. Predictors: (Constant), grades in h.s.
Variables Entered/Removedb
Model
a. All requested variables entered.
b. Dependent Variable: math achievement test
Model Summary
REGRESSION EXAMPLEREGRESSION EXAMPLEWe could look at the Dayya data
again to predict BMI from percent calories in carbs. Do you think we could obtain an accurate prediction?
Other uses of regression might be to predict the number of fillings required during a 5-year period from the number of times teeth were brushed a week.
DAYYA DATADAYYA DATA
Multiple RegressionMultiple RegressionA more complex mathematical
equation that allows the target variable to be predicted from two or more independent variables (often known as co-variables).
EXAMPLE: predicting blood pressure from age, height, weight, and drug dosage.
SEESEE YOU IN ---YOU IN ---
David Yens, Ph.D. NYCOM
MULTIPLE REGRESSIONMULTIPLE REGRESSIONFINAL POINTS
◦Sample size – number of subjects at least 5 (preferably 10) times the number of variables
◦The multiple R should be at least .7◦The change in R2 should be at least a few
percent◦A gradual fall off should be seen in the
prediction of each successive variable◦Fewer predictor variables are better than
many; too many make interpretation difficult
◦Analyze the influence of outliers