Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin...

Preview:

Citation preview

Amsterdam Rehabilitation Research Center | Reade

Correlation and linear regression analysis

Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Content

Correlation and linear regression analysis

Association researchHowever, also used in experimental studies

Amsterdam Rehabilitation Research Center | Reade

Correlation and regression

- Interested in relationship/association/correlation- Direction and magnitude of relationship- Dependent or independent variables- Association does not imply a ‘cause and effect’ relationship

Amsterdam Rehabilitation Research Center | Reade

Correlation

Amsterdam Rehabilitation Research Center | Reade

Correlation

Expressed as productmomentcorrelation Pearson coefficent (r) when data are not skewed

or rank order correlation Spearman (rs) when data are ordinal, skewed or in case of presence of outliers.

DimensionlessRage between +1 and –1 (0 = no correlation)Magnitude indicates how close the points are to a

straight line (the strength of an association)

+1 or –1: perfect correlation: all points lying on the line

5

Amsterdam Rehabilitation Research Center | Reade

Between -1 to 1.

Amsterdam Rehabilitation Research Center | Reade

70605040302010

Leeftijd

220

200

180

160

140

120

100

Sy

st.

blo

ed

dru

k

R Sq Linear = 0,432

Amsterdam Rehabilitation Research Center | Reade

70605040302010

Leeftijd

220

200

180

160

140

120

100

Sy

st.

blo

ed

dru

k

R Sq Linear = 0,712

Amsterdam Rehabilitation Research Center | Reade

Correlation coefficient

Range: -1 ≤ r ≤ 1.

In SPSS Model Summary

,844a ,712 ,702 9,563Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Leeftijda.

Coefficientsa

97,077 5,528 17,562 ,000,949 ,116 ,844 8,174 ,000

(Constant)Leeftijd

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Syst. bloeddruka.

Amsterdam Rehabilitation Research Center | Reade

Formula correlation

n

i

n

iii

n

iii

yyxx

yyxxr

1 1

22

1

)()(

))((

Amsterdam Rehabilitation Research Center | Reade

Regression analysis

Amsterdam Rehabilitation Research Center | Reade

Statistical analysisData were analyzed with SPSS for Windows 16.0 (SPSS Inc). According to their distribution, the

various parameters are expressed as mean (± standard deviation) or median (interquartile range). Data with a non-Gaussian distribution was log transformed for analysis if possible. To compare the groups, student’s T-test or Mann-Whitney U test was used when appropriate. Furthermore, correlations between variables were analyzed by using Pearson correlation or Spearman’s rho tests. Univariate linear regression analyses were performed on log-transformed data to investigate the influence of possible confounders (i.e. sex, smoking status, systolic blood pressure and body mass index (BMI) on the results. Wilcoxon signed-rank test was used to investigate the differences in values at baseline and at 8 weeks in the prospectively followed subgroup of patients (n=9). P-values less than 0.05 were considered statistically significant.

I C van Eijk, M E Tushuizen, A Sturk, B A C Dijkmans, M Boers, A E Voskuyl, M Diamant, G.J. Wolbink, R Nieuwland and M T NurmohamedCirculating microparticles remain associated with complement activation despite intensive anti-inflammatory therapy in early rheumatoid arthritisAnn Rheum Dis published online 16 Nov 2009;

Amsterdam Rehabilitation Research Center | Reade

Typical association question

Research question: is there an association between age and pain in patients with …?

Hypothesis: pain increases in older patients

Y = a + bX + e

Amsterdam Rehabilitation Research Center | Readeage

pain

50

Amsterdam Rehabilitation Research Center | Reade15

Simple (uni) linear regression analysis

Difference with correlation analysis: prediction line that gives the best description of the scatter

plot, best fitting line difficult to draw line by hand solve problem with mathematical equation

Amsterdam Rehabilitation Research Center | Reade

Simple (uni) linear regression analysis

We use the ‘Method of Least Squares’ to fit the best line

Minimal distance between the data and the fitting line

Amsterdam Rehabilitation Research Center | Readeage

pain

50

Amsterdam Rehabilitation Research Center | Reade

Simple regression analysis

1 = difference between age 0 and age 1 difference between age 1 and age 2

----------------------------------- difference between age 30 and age 31

Pain = 0 + 1 * age

What is 0?

What is 1? 1 = Beta=b

0 = pain at age is 0

Amsterdam Rehabilitation Research Center | Reade19

Mathematical equation to describe the relationship

y = a + b*x

y is called the dependent (outcome) variable

x is called the independent (predictor, explanatory) variable

a is the intercept: value of y when x=0

b (unstandardized beta) is the ´slope´: it represents the amount by which Y increases on average if we increase x by one unit

a and b are called regression coefficients

Amsterdam Rehabilitation Research Center | Reade

Simple regression analysis

Regression coefficient is equal to the difference in the outcome variable when the determinant one unit changes

Amsterdam Rehabilitation Research Center | Readeage

pain

50

1

1

Amsterdam Rehabilitation Research Center | Reade

Simple regression analysis

pain = - 20 + 0,5 * age

What is –20? What is 0,5?

Amsterdam Rehabilitation Research Center | Reade

You can also analyse difference between two groups with simple regression analysis.

Amsterdam Rehabilitation Research Center | Reade

Back to 2 groups and analysis of pain

group pre after

medication 75.8 (6.8) 65.8 (10.1)

placebo 75.4 (7.1) 68.2 (9.0)

Amsterdam Rehabilitation Research Center | Reade

Group Statistics

100 -7,2000 3,75513

100 -10,0000 6,81650

groepplacebo

nieuwe medicatie

VERSCHILN Mean Std. Deviation

Independent Samples Test

3,598 198 ,000 2,8000 1,26530 4,33470VERSCHILt df Sig. (2-tailed)

MeanDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Amsterdam Rehabilitation Research Center | Reade

Now analysed by simple regression analysis

placebo Medication 1

Continuous outcome

Pain

Amsterdam Rehabilitation Research Center | Readeplacebo Medication 1

Continuous outcome

Pain

Amsterdam Rehabilitation Research Center | Reade

Simple regression analysis

Regression coefficient is equal to the difference in mean between two comparable groups

Amsterdam Rehabilitation Research Center | Reade

Simpe (uni) linear regression analysis

1 = mean difference between placebo and medication

Pain = 0 + 1 * group

placebo = 0; medication = 1

0 = mean in controlegroup

Amsterdam Rehabilitation Research Center | Reade

Coefficientsa

-7,200 ,550 -13,084 ,000

-2,800 ,778 -3,598 ,000

(Constant)

groep

Model1

B Std. Error

UnstandardizedCoefficients

t Sig.

Dependent Variable: VERSCHILa.

0 1

Amsterdam Rehabilitation Research Center | Reade

Hypothesis test for β

SEt

N-2 degrees of freedom

Amsterdam Rehabilitation Research Center | Reade

P value?

t -3,598

778,0

800,2t

Amsterdam Rehabilitation Research Center | Reade

Back to the exampleExperimental designIncluding another medicineThree comparable groups

Amsterdam Rehabilitation Research Center | Reade

Pain

group To T1

medication1 75.8 (6.8) 65.8 (10.1)

medication2 76.8 (7.5) 61.9 (11.7)

placebo 75.4 (7.1) 68.2 (9.0)

Amsterdam Rehabilitation Research Center | Readeplacebo medication1 medication2

Continuous outcome

Amsterdam Rehabilitation Research Center | Reade

Coefficientsa

-6,850 ,586 -11,681 ,000

-3,850 ,454 -8,475 ,000

(Constant)

groep

Model1

B Std. Error

UnstandardizedCoefficients

t Sig.

Dependent Variable: VERSCHILa.

Group analysed as continuous variabele

Amsterdam Rehabilitation Research Center | Reade

But…, group isn’t a continous variable: a categorical variable Therefore it needs to be analysed by dummy-variables

Amsterdam Rehabilitation Research Center | Reade

Amsterdam Rehabilitation Research Center | Reade

Dummy variables

Categorical Variables Codings

,000 ,000

1,000 ,000

,000 1,000

placebo

nieuwe medicatie

alternatieve medicatie

GROEP(1) (2)

Parameter coding

Dummy 1: new medication - placebo

Dummy 2: alt. medication - placebo

Placebo: controle / control groep

Amsterdam Rehabilitation Research Center | Reade

Simple regression analysis

Pain = 0 + 1 * medicationgroup1 + 2 * medicatiogroup2

What is 0?0 = mean of placebogroupWhat is 1?1 = difference between placebo and medication1What is 2?2 = difference between placebo and medication2

Amsterdam Rehabilitation Research Center | Readeplacebo medication1 medication2

Continuous outcome

Amsterdam Rehabilitation Research Center | Reade

Coefficientsa

-7,200 ,642 -11,222 ,000

-2,800 ,907 -3,086 ,002

-7,700 ,907 -8,487 ,000

(Constant)

DUMMIE1

DUMMIE2

Model1

B Std. Error

UnstandardizedCoefficients

t Sig.

Dependent Variable: VERSCHILa.

Pain = 0 + 1 * medicationgroup1 +

2 * medicationgroup2

Amsterdam Rehabilitation Research Center | Reade

Intermezzo

Little excercise..•

Is there a relationship between your height (cm) and shoesize (european size)…

•Estimate relationcoefficient…

•What does that mean?

•Estimate formula Height = ? + ? * shoesize

•Group: men/woman.

•Groups: occupational therapy, physiotherapy, other.

Amsterdam Rehabilitation Research Center | Reade

Assumption linear regression analysis

Linear relationship between x en y•S

catter diagram(

otherwise Logaritmic transformation (next week)

For each value of x, there is a distribution of values of y in the population; this distribution is Normal

•Analyses of the residuals

Variability of the distribution of y values in the population is the same for all values of x, i.e. the variance is constant (s2 / sd)

•Analyses of the residuals

Amsterdam Rehabilitation Research Center | Reade

Checking for linearity

Scatterplot

Adding a quadratic term

Splitting exposure variable into groups (4-5)

Amsterdam Rehabilitation Research Center | Reade

Adding a quadratic termpain

age

Amsterdam Rehabilitation Research Center | Reade

Checking for linearity

Splitting exposure variable into groups

Amsterdam Rehabilitation Research Center | Reade

Splitting exposure variable into groupspain

age

1

2

34

Amsterdam Rehabilitation Research Center | Reade

Example in SPSS

Examine the association between age and pain score at baseline.

ScatterplotLinear regression analysisChecking for linearity

•Adding a quadratic term

•Splitting exposure variable into groups

Amsterdam Rehabilitation Research Center | Reade

Scatter plot

40,00 50,00 60,00 70,00 80,00 90,00

age

65,00

70,00

75,00

80,00

pa

in

Amsterdam Rehabilitation Research Center | Reade

Lineair regression analysis

Pain (at baseline) = 56.2 + 0.23 * age

Coefficientsa

56,239 2,131 26,394 ,000

,234 ,033 ,523 7,005 ,000

(Constant)

age

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: paina.

Amsterdam Rehabilitation Research Center | Reade

Adding a quadratic term

Coefficientsa

56,239 2,131 26,394 ,000

,234 ,033 ,523 7,005 ,000

49,128 13,116 3,746 ,000

,456 ,405 1,020 1,124 ,263

-,002 ,003 -,499 -,549 ,584

(Constant)

age

(Constant)

age

age2

Model1

2

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: paina.

Amsterdam Rehabilitation Research Center | Reade

Splitting exposure variable into groups

Produce categorical age variableRecode to dummy variablesPerform linear regression analysis with dummiesAre the B’s increasing in a linear order with comparable

distance between the dummies?

Amsterdam Rehabilitation Research Center | Reade

Splitting exposure variable into groups

Coefficientsa

68,382 ,535 127,936 ,000

1,924 ,774 ,223 2,486 ,014

3,346 ,750 ,404 4,459 ,000

5,446 ,768 ,638 7,094 ,000

(Constant)

dummy1

dummy2

dummy3

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: paina.

Amsterdam Rehabilitation Research Center | Reade

Questions?

55

Recommended