37

CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Embed Size (px)

Citation preview

Page 1: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables
Page 2: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

CORRELATION:CORRELATION:

Correlation analysis Correlation analysis is used to measure the strength of

association (linear relationship) between two

quantitative variables

The analysis is only concerned with strength of the relationship ;

hence no causal effect is implied

A scatter plot (or scatter diagram) is used to show the relationship between two variables

2

Page 3: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Linear relationships

y

x

Curvilinear relationships

y

x

x

yy

x

Scatter Plot Examples

3

Page 4: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Strong relationships

Weak relationships

y

y

y

y

x

x

x

x

4

Page 5: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

No relationship

5

Page 6: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

• The population correlation coefficient ρ (rho) measures the strength of the association between the variables

• The sample correlation coefficient r is an estimate of ρ and is used to measure the strength of the linear relationship in the sample observations.

Correlation coefficient:

• The value of r varies from sample to sample, its sampling distribution is student t distribution

6

Page 7: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

• Are unit free

• Range between -1 and 1

• The closer to -1, the stronger the negative linear

relationship

• The closer to 1, the stronger the positive linear relationship

• The closer to 0, the weaker the linear relationship

Both and r

7

Page 8: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

A general guideline on interpretation of correlation

8

Page 9: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Significance test for correlation:

Hypotheses tested are H0: ρ = 0 (no correlation)

H1: ρ ≠ 0 (correlation exists)

Test statistic is

If p-value is less than level of significance (); then there is evidence of a linear relationship between two variables.

)2(~

2nr1

rt

2

nt

9

Page 10: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Pincherle and Robinson (1974) note a marked inter-observer variation in blood pressure readings. They found that doctors who read high on systolic tended to read high on diastolic. The table below shows the mean systolic and diastolic blood pressure reading by 14 doctors.

Research question: Is the association between the two variables significant?

10

Example:

Page 11: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Scatter plot of blood pressure data:

11

Page 12: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Correlations

1 .418

. .136

14 14

.418 1

.136 .

14 14

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

SYSTOLIC

DIASTOLI

SYSTOLIC DIASTOLI

r= 0.418; low positive correlation between systolic anddiastolic blood pressure

p-value= 0.136; there isn’t sufficient evidence to indicate an association between systolic and diastolic blood pressure

12

Page 13: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Regression analysis Regression analysis is used to:

Predict the value of a dependent variable based on the value of at least one independent variable

Explain the impact of changes in an independent variable on the dependent variable

Dependent variable: the variable we wish to explain

Independent variable: the variable used to explain the dependent variable

REGRESSION ANLYSIS:REGRESSION ANLYSIS:

13

Page 14: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

SIMPLE LINEAR REGRESSIONSIMPLE LINEAR REGRESSION:

• Only one independent variable is used to explain the dependent variable

• Relationship between dependent and independent variables is described by a linear function

• Changes in dependent variable are assumed to be caused by changes in independent variable.

14

Page 15: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

The model is of the form

The parameters and are called the regression coefficients; is the intercept and is the slope of the regression fit.

y is the dependent variable and X is the independent variable

is error term; it introduces randomness into the model.

0

0

1

1

Xy 10

15

Page 16: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Using sample information

And the estimated regression model fit is

0 1

1 11

2 2 2

1 1

ˆ ˆ

( )( )ˆ

( )

n n

i i i ii i

n n

i ii i

y x

x x y y x y n x y

x x x nx

Xy 10ˆˆˆ

16

Page 17: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

is the estimated change in the average value of y as a result of a one-unit change in x

1

Interpretation of regression coefficient:

Example:Research question: is there a linear relationship between BP And age ?Answer this question using information on 30 individuals

17

Page 18: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

BP is dependent variableAge is the independent variable

18

Page 19: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

The estimated regression fit is

age971.0715.98ˆ y

For every additional year in age, the BP increases by 0.97 units

19

Page 20: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Test for overall significance fit:

Hypotheses for test for determining if the model fitted is Statistically significant are:

H0 : regression fit is not significant

H1 : regression fit is significant

Make use of ANOVA table to make a decision about the test

20

Page 21: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

ANOVA table

SSR is explained variation attributable to the relationship between dependent and independent variablesSSE is unexplained variation; occurs due to chanceSST is the total variation

source of variation d.f S.S M.S F-ratio p-value

Regression 1 SSR MSR=SSR/1 Fc=MSR/MSE Pr(F > Fc)

Residual n-2 SSE MSE=SSE/(n-2)    

Total n-1 SST      

If p-value is less than level of significance, fitted model is a significant fit

21

Page 22: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

ANOVA

Source of variation S.S df M.S F Sig.

Regression 6394.023 1 6394.023 21.330 .000Residual 8393.444 28 299.766Total 14787.467 29

p-value is <0.001 , fitted model is significant

22

Page 23: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Test for significance of predictor:

0 1 1 1: 0 . : 0H vs H

1

1

ˆ~ ( 2)

ˆ.t t n

s e

Hypotheses of the test is :

Test statistic is

If p-value is less than level of significance, the predictor is linearly associated with response

23

Page 24: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Coefficients t Sig.

B Std. ErrorIntercept 98.715 10.000 9.871 .000age .971 .210 4.618 .000

a. Dependent Variable: bp

Coefficients:

The value of test statistic is 4.618, p-value is <0.001; age is linearly associated with BP

24

Page 25: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable

The coefficient of determination is also called R-squared and is denoted as R2

SST

SSRR 2

1R0 2

Graded interpretation : r2 = 0.1-0.3 weak relationship ; 0.4-0.7 moderate relationship; 0.8-1 strong relationship

Coefficient of determination:

25

Page 26: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

MULTIPLE LINEAR REGRESSION:MULTIPLE LINEAR REGRESSION:

Use two or more independent variables to explain the dependent variable

Multiple linear regression allows us to investigate the joint effect of several independent variables on the dependent

We relate a single outcome(dependent) variable to two or more independent variables simultaneously

26

Page 27: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Aim of fitting regression line is:

Identify independent variables that are associated with the

dependent variable in order to promote understanding of the

underlying process.

Determine the extent to which each independent variable is

linearly related to the dependent variable after adjusting for other

variables that may be related to it.

Predict the value of the dependent variable as accurately

as possible from the predictor values.

27

Page 28: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

The regression model is of the form:

where are the independent variables and

are the regression coefficients.

0 1 1 2 2 3 3 p pY X X X X

21, ...., pX X X

0 1, ,...., p

28

Page 29: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Interpretation of regression coefficients:

The regression coefficient is the estimated

change in the average value of dependent variable for every unit

increase in the corresponding predictor , holding other factors

in the model constant.

Each of the estimates is adjusted for the effects of all other

predictors.

1,2,..,;k k p

kX

29

Page 30: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

Inference on regression coefficients:

We can make inference on each regression coefficient ;

by carrying out statistical hypothesis test

The test statistic is

k

0 1H : 0 vs. H : 0k k

ˆ~ ( 1)

ˆ. .( )k

k

T t n ps e

30

Page 31: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

If p-value is less than level of significance, the independent variable is linearly associated with dependent after adjusting for all other independent variables

Test for significance of model fit:

Analysis is same as that of simple model, our focus is on p-value in ANOVA table. The inference is the same; i.e p-value < ; model fitted is statistically significant.

31

Page 32: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

32

Adjusted R statistic: related to coefficient of determination.

It also measures the proportion of variation of dependent variable that is accounted for by the independent variables.

1SST/n

1pSSE/n1Radj 2

Page 33: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

33

A regression model is fitted to determine if a linear relationship exists between patient satisfaction level and :the patient's age(in years), severity of illness (an index) and anxiety level (an index) . The data used was for 30 patients selected at random.

For the data collected, larger values of patient satisfaction, severity of illness and anxiety level are , respectively associated with more satisfaction, increased severity in illness and more anxiety.

Example:

Page 34: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

34

168.6078 1.2742age 6.0072anx.ˆ 0.8473sev.y

The estimated regression fit is

Adjusting for severity of illness and anxiety level of patients ; for every additional year in age, the satisfaction level on average decreases by 1.27 units

Interpretation of regression coefficients:

Page 35: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

35

Adjusting for age and anxiety level; the satisfaction level on average decreases by 0.84 units for every unit increase in severity of illness .

Adjusting for age and anxiety level of patients; the satisfaction level on average decreases by 6 units for every unit increase in anxiety level.

Page 36: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

36

Source of variation (d.f) S.S M.S F-ratio P-value

Regression 3 7256.3 2418.767 30.46 < 0.001

Residual 26 2063.2 79.4

Total 29 9319.5

ANOVA table

Overall the fit is significant since p-value is < 0.001

Page 37: CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables

37

  s.e() z p-valueage -1.2742 0.2406 -5.295 < 0.001severity -0.8473 0.4599 -1.842 0.077anxiety -6.0072 6.2042 -0.968 0.3418intercept 168.6078      

Parameter estimates:

From results above; age is significant variable; i.e controlling for anxiety level and severity of illness of patients; age is significantly associated with satisfaction level