22
7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression Zacariah Labby, PhD, DABR Asst. Prof. (CHS), Dept. of Human Oncology University of Wisconsin – Madison Conflicts of Interest None to disclose Aug 1, 2016 Labby - AAPM 2016 2

Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

1

Analysis of Dependent Variables: Correlation and Simple Regression Zacariah Labby, PhD, DABR

Asst. Prof. (CHS), Dept. of Human OncologyUniversity of Wisconsin – Madison

Conflicts of Interest

None to disclose

Aug 1, 2016 Labby - AAPM 2016 2

Page 2: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

2

Purpose•Review basic statistics and identify appropriate use of statistics related to analyzing simple relationships between two variables:•Correlation statistics• Linear regression and model fitting

Aug 1, 2016 Labby - AAPM 2016 3

STATISTICS OF CORRELATION

Aug 1, 2016 Labby - AAPM 2016 4

Page 3: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

3

Correlation: Review of Terminology•Dependent vs. Independent Variables•Standard plot: X is Independent

Y is Dependent•Linear vs. Monotonic• Linear: increase in X leadsto proportional increase in Y•Monotonic: increase in Xleads to some increase in Y

Aug 1, 2016 Labby - AAPM 2016 5

0

1

0 1

y

x

Correlation: Review of Terminology•Variable Type•Continuous•Example: Ionization chamber charge collected vs. Dose delivered

•Discrete•Example: Number of patients seen vs. Calendar year

•Ordinal•Example: Severity of normal tissue toxicity vs. Prescription Level

•Categorical•Example: RECIST response classification vs. Radiologist Observer

Aug 1, 2016 Labby - AAPM 2016 6

Page 4: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

4

Correlation: Metrics of Interest • Four big categories of data•Continuous

•Discrete

•Ordinal

•Categorical

Aug 1, 2016 Labby - AAPM 2016 7

Correlation: Metrics of Interest • Four big categories of data•Continuous

•Discrete

•Ordinal

•Categorical

Three major correlation metrics

Pearson’s r

Spearman’s ⍴

Fleiss’ κ

Aug 1, 2016 Labby - AAPM 2016 8

Page 5: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

5

Correlation: Metrics of Interest • Four big categories of data•Continuous

•Discrete

•Ordinal

•Categorical

Three major correlation metrics

Pearson’s r

Spearman’s ⍴

Fleiss’ κ

Aug 1, 2016 Labby - AAPM 2016 9

Correlation: Pearson’s r• “Linear” or “Product-Moment” correlation•Applies only to continuous data•Parametric correlation• Tendency of dependent variable to increase linearly with the independent variable

•Key Point:• There is an assumed form to the relationship•Linear, and therefore also monotonic

Aug 1, 2016 Labby - AAPM 2016 10

Page 6: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

6

Correlation: Pearson’s r

Aug 1, 2016 Labby - AAPM 2016 11

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

xy

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

x

y

r = 1.00 r = 0.97 r = 0.76

r = 0.96 r = 0.96 r = 0.75

Correlation: Pearson’s r

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

x

y

r = 1.00 r = 0.97 r = 0.76

Page 7: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

7

Correlation: Spearman’s ⍴• “Rank” correlation•Applies to continuous, discrete, or ordinal data•Non-parametric correlation• Tendency of dependent variable to increase with the independent variable

•Key Point:• There is no assumed relationship, only monotonicity

•Math: Pearson’s r of rank-transformed data

Aug 1, 2016 Labby - AAPM 2016 13

Correlation: Spearman’s ⍴

Aug 1, 2016 Labby - AAPM 2016 14

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x

y

Page 8: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

8

Correlation: Spearman’s ⍴

Aug 1, 2016 Labby - AAPM 2016 15

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x

y

Raw: (0,0)Rank: (1,1)

(X,Y) pairs

Correlation: Spearman’s ⍴

Aug 1, 2016 Labby - AAPM 2016 16

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x

y

Raw: (0,0)Rank: (1,1)

(X,Y) pairs

Raw: (0.05,0.0025)Rank: (2,2)

Page 9: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

9

Correlation: Spearman’s ⍴

Aug 1, 2016 Labby - AAPM 2016 17

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x

y

Raw: (1,1)Rank: (20,20)

Pearson’s r of rank-transformed data: 1.00

Raw: (0,0)Rank: (1,1)

Raw: (0.05,0.0025)Rank: (2,2)

Correlation: Spearman’s ⍴

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

x

y

r = 1.00⍴ = 1.00

r = 0.97⍴ = 0.97

r = 0.76⍴ = 0.90

Page 10: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

10

Correlation: Spearman’s ⍴

Aug 1, 2016 Labby - AAPM 2016 19

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

xy

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

x

y

r = 1.00⍴ = 1.00

r = 0.97⍴ = 0.97

r = 0.76⍴ = 0.90

r = 0.96⍴ = 1.00

r = 0.96⍴ = 0.99

r = 0.75⍴ = 0.89

Correlation: Which Metric?

Continuous variables; “When one goes up, does the other (reliably) go down?”Aug 1, 2016 Labby - AAPM 2016 20

Months after Baseline

Rela

tive

Chan

ge fr

om B

asel

ine

−1.0

−0.5

0.0

0.5

1.0

0 1 2 3 4 5 6 7

Disease Volume

Lung Volume

Z.E. Labby et al, J Thorac Oncol 8, (2013)

Page 11: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

11

Correlation: Which Metric?

Continuous variables; “When one goes up, does the other (reliably) go down?”Aug 1, 2016 Labby - AAPM 2016 21

Months after Baseline

Rela

tive

Chan

ge fr

om B

asel

ine

−1.0

−0.5

0.0

0.5

1.0

0 1 2 3 4 5 6 7

Disease Volume

Lung Volume

Z.E. Labby et al, J Thorac Oncol 8, (2013) Answer:Spearman’s ⍴

Correlation: Fleiss’ κ•Categorical correlation•Applies only to categorical data•Categorical data could be inherently ordinal

•Non-parametric correlation•How well do independent categories sort dependent categories?

•Math: number of dependent-independent pairs in agreement over the number expected by chance alone.

Aug 1, 2016 Labby - AAPM 2016 22

Page 12: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

12

Correlation: Fleiss’ κ•Example:• 5 radiologists contour tumors in• 31 patients•Response classification from baseline to post-chemo CT scans•Progressive Disease•Stable Disease•Partial Response•Complete Response

Aug 1, 2016 Labby - AAPM 2016 23

Obs. 1 Obs. 2 Obs. 3 Obs. 4 Obs. 5

Progression 6 11 7 11 14

Stable 17 10 19 15 9

Partial 7 10 5 4 8

Complete 1 0 0 1 0

κ = 0.64Landis and Koch, Biometrics, 33,159–174 (1977)

Correlation vs. Agreement•Quick tangent…

Important question:Do you already know that the two variables will be correlated?

Example: Tumor volumes as assessed by Physician vs. Algorithm

Aug 1, 2016 Labby - AAPM 2016 24

Page 13: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

13

Correlation vs. Agreement•Especially with implicit independent variables (i.e., the true value remains unknown), correlation isn’t as meaningful•Correlation is only the strength of a relationship between two variables•Agreement is the actual 1:1 accuracy

Aug 1, 2016 Labby - AAPM 2016 25

0.0 0.2 0.4 0.6 0.8 1.0

3.0

3.2

3.4

3.6

3.8

.0

x

y

Bland and Altman, “Statistical methods for assessing agreement between two methods of clinical measurement,” Lancet 327, 307 (1986).

Correlation vs. Agreement

Aug 1, 2016 Labby - AAPM 2016 26

Average of Physician and Algorithm

Difference

(Physician –Algorithm)

Mean

Mean + 2SD

Mean - 2SD

Bland and Altman, “Statistical methods for assessing agreement between two methods of clinical measurement,” Lancet 327, 307 (1986).

Page 14: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

14

Correlation vs. Agreement•Absolute agreement vs. Relative agreement•Absolute: plot raw differences•Relative: plot log differences

•Get mean, SD of log-transformed data, then apply exponential to get relative agreement bounds

Aug 1, 2016 Labby - AAPM 2016 27

Bland and Altman, “Statistical methods for assessing agreement between two methods of clinical measurement,” Lancet 327, 307 (1986).

ln𝑥𝑦 = ln 𝑥 − ln 𝑦

SIMPLE MODELING

Aug 1, 2016 Labby - AAPM 2016 28

Page 15: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

15

Correlation vs. Agreement vs. Modeling•Correlation: Strength of relationship•Agreement: Accuracy of 1:1 match•Modeling: Quantifying the relationship

•Rules of Modeling:1. Prefer model with n-1 parameters to n2. Prefer model with k-1 independent variables to k3. Prefer linear model to curved model

Aug 1, 2016 Labby - AAPM 2016 29

Crawley, Statistics: An Introduction using R, Wiley (2005)

Simple Linear Regression•Linear regression is linear in the coefficients, not necessarily in the independent variable

•Linear:

•Not Linear:

Aug 1, 2016 Labby - AAPM 2016 30

𝒚 = 𝛼 + 𝛽𝒙𝒚 = 𝛼 + 𝛽𝒙,

𝒚 = 𝛼 + 𝑒.𝒙

Page 16: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

16

Simple Linear Regression• I was going to put the math here, but…

Aug 1, 2016 Labby - AAPM 2016 31

Simple Linear Regression•Sources of Variance in the data

•Your model:•Reality:

•Variance in y can be explained by•Variance in x•Residual uncertainty (called 𝝐 )

Aug 1, 2016 Labby - AAPM 2016 32

𝑦0 = 𝛼 + 𝛽𝑥0 + 𝜖0𝒚 = 𝛼 + 𝛽𝒙

Random (residual) error from fit for each 𝑥0

Page 17: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

17

Simple Linear Regression•Sources of Variance in the data•Explained Sum of squared errors (ESS)•Residual Sum of squared errors (RSS)• Total sum of squared errors (TSS)

•Coefficient of Determination: ESS/TSS•Proportion of total variation in y explained by the model

Aug 1, 2016 Labby - AAPM 2016 33

𝑓 𝑥0 − 𝑦3 , + 𝑦0 − 𝑓 𝑥0, = 𝑦0 − 𝑦3 ,

Simple Linear Regression•Coefficient of Determination: ESS/TSS• Proportion of total variation in y explained by the model

•Has another name…R-squared!

•Pearson’s correlation coefficient• 𝑟 = 𝑅,�

•Drive home: Correlation quantifies strength of relationship, not relationship itself

Aug 1, 2016 Labby - AAPM 2016 34

𝑅, =𝐸𝑆𝑆𝑇𝑆𝑆

Page 18: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

18

Simple Linear Regression•Making predictions• From analysis, derive best-fit values of fit 𝛼:, 𝛽<, etc.•Predict new values according to 𝑦=>? = 𝛼: +𝛽<𝑥=>?

•However, models have uncertainty!•Variance estimates can be provided for 𝛼:, 𝛽<, etc. (e.g., 𝜎:A)

Aug 1, 2016 Labby - AAPM 2016 35

Simple Linear Regression•Confidence Bands•Variance associated with mean predicted response

•Prediction Bands•Variance associated with single new prediction• Takes into account residual errors in linear model

Aug 1, 2016 Labby - AAPM 2016 36

Var 𝛼: + 𝛽<𝑥=>?

Var 𝛼: + 𝛽<𝑥=>? + 𝜎:E,(in some ways, this is like the difference

between standard deviation and

standard error)

Page 19: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

19

Simple Linear Regression

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

x

y

Aug 1, 2016 Labby - AAPM 2016 37

Data Points

Linear Model

95% Conf. Bands

95% Pred. Bands

Example•Task: Department administrator asks you to figure out the relationship between patient census and required RadTechhours.•Question 1: what kind of relationship would we expect?•Probably Linear with some residual uncertainty

•Question 2: which correlation metric would you use?•Pearson’s r

•Question 3: how would you quantify the relationship?•Simple linear regression

Aug 1, 2016 Labby - AAPM 2016 38

Page 20: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

20

Example•Regression model

•Predict tomorrow’s RadTech staffing level if you know the patient workload•Could staff at the upper 95% prediction band?

Aug 1, 2016 Labby - AAPM 2016 39

StaffingRequirements = 𝛼: + 𝛽<PatientWorkload𝛼: = Fixed StaffOverhead𝛽< = ScalableCoefficient

A plug for “R”…•R is a free software package for data analysis and is very common in the statistics community.

•Good text for learning R and basic stats:•Statistics: An Introduction using R by Michael J. Crawley, published 2005 by John Wiley & Sons, Ltd

Aug 1, 2016 Labby - AAPM 2016 40

Page 21: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

21

A plug for “R”…model=lm(y~x)

conf.bands=predict(model,interval=‘conf’)

pred.bands=predict(model,interval=‘pred’)

Aug 1, 2016 Labby - AAPM 2016 41

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

x

y

Use your Biostatisticians•Many large centers have at least one biostatistician on staff• In many centers, free consultations for•Experimental design•Simple clinical trials•Data analysis questions

•Prevent headaches and lost costs for rework and rejected papers

Aug 1, 2016 Labby - AAPM 2016 42

Page 22: Analysis of Dependent Variables: Correlation and …amos3.aapm.org/abstracts/pdf/115-31921-387514-118826...7/28/16 1 Analysis of Dependent Variables: Correlation and Simple Regression

7/28/16

22

Aug 1, 2016 Labby - AAPM 2016 43