Regression and Correlation

Preview:

DESCRIPTION

Regression and Correlation. Jake Blanchard Fall 2010. Introduction. We can use regression to find relationships between random variables This does not necessarily imply causation Correlation can be used to measure predictability. Regression with Constant Variance. - PowerPoint PPT Presentation

Citation preview

Regression and Correlation

Jake BlanchardFall 2010

IntroductionWe can use regression to find

relationships between random variables

This does not necessarily imply causation

Correlation can be used to measure predictability

Regression with Constant VarianceLinear Regression: E(Y|

X=x)=+xIn general, variance is function of

xIf we assume the variance is a

constant, then the analysis is simplified

Define total absolute error as the sum of the squares of the errors

Linear Regression

n

ii

n

iii

n

iiii

n

iii

n

iii

n

iii

xx

xxyy

xysolve

xyx

xy

xyxy

1

2

1

1

2

1

2

1

2

1

22

02

02

Variance in Regression AnalysisRelevant variance is conditional:

Var(Y|X=x)

2

2|2

22|

1

22

1

22|

1

22|

1

2

2121

Y

XY

XY

n

ii

n

iiXY

n

iiiXY

ss

r

ns

xxyyn

s

xyn

s

Confidence IntervalsRegression coefficients are t-

distributed with n-2 dofStatistic below is thus t-

distributed with n-2 dof

And the confidence interval is

n

ii

ixY

xYi

xx

xxn

s

Yi

1

2

2

|

|

1

n

ii

iXY

nixY

xx

xxn

styi

1

2

2

|2,

211|

1

ExampleExample 8.1Data for compressive strength (q)

of stiff clay as a function of “blow counts” (N)

038.08305.0

2

029.0

112.0

22.191

12.9591123.27.18

22|

22

222

222

ns

Nq

NnNqNnqN

qnqs

NnNs

qN

Nq

i

ii

iq

iN

744.0,21.07.18*104353

7.184101038.*306.2477.

477.04*112.0029.04

306.2

1

95.0|

2

2

95.0|

8,975.0

1

2

2

|2,

211|

Nq

Nq

i

n

ii

iXY

nixY

yNat

t

xx

xxn

styi

Plot

Correlation Estimate

22

2|2

,

,

1,

1,

121

11

11

rss

nn

ss

ss

yxnyx

n

ss

yyxx

n

Y

xYyx

Y

Xyx

YX

n

iii

yx

YX

n

iii

yx

Regression with Non-Constant VarianceNow relax

assumption of constant variance

Assume regions with large conditional variance weighted less

)(2

)(1

)(1

|1

)|()(|

|

1

2

2

22

2

11

2

1

1111

1

11

1

22

22

22

xsgsn

yyws

xgww

xwxww

ywxwyxww

w

xwyw

xyw

xgxXYVarw

weightsxxXYExgxXYVar

xY

n

iii

iii

n

iii

n

iii

n

ii

n

iii

n

iii

n

iiii

n

ii

n

ii

n

iii

n

iii

n

iiii

iii

Example (8.2)Data for maximum settlement (x)

of storage tanks and maximum differential settlement (y)

From looking at data, assume g(x)=x (that is, standard deviation of y increases linearly with x

2

22

1|

ii xw

xxXYVar

Example (8.2) continued

96.0

243.00589.0

65.0045.0

627.0923.011.165.1

|

2

xss

ssyx

xy

y

x

Multiple Regression

ikkiii xxxy ...22110

“Nonlinear” Regression

)()|( xgxYE

Use LINEST in Excel

Recommended