27
Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Embed Size (px)

Citation preview

Page 1: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Simple linear regression and

correlation analysis

1. Regression2. Correlation3. Significance testing

Page 2: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

1. Simple linear regression analysis

Simple regression describes relationship between two variables

Two variables, generally Y = f(X) Y = dependent variable (regressand) X = independent variable (regressor)

Page 3: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Simple linear regression

f (x) – regression equation ei – random error, residual deviation

independent random quantity N (0, σ2)

iii exfy )(

Page 4: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Simple linear regression – straight line

b0 = constant

b1 = coefficient of regression

ii xbbesty 10

Page 5: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Parameter estimates → least squares condition

difference of the actual Y from the estimated Y est. is minimal

hence

n is number of observations (yi,xi)

adjustment under partial derivation of

function according to parameters b0, b1, derivation of the S sum of squared deviationas are equated to zero:

iiiii xbbyestyyd 10

min,1

20010

n

iii xbbybbf

02 100

iii xxbbyb

f

012 101

ii xbbyb

f

min2

1

n

iii estyy

min2

1

n

iii estyy

ii xbbesty 10

Page 6: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Two approches to parameter estimates with using of least squares condition (made for straight line equation)

1. Normal equation system for straight line

2. Matrix computation approach

y = dependent variable vector X = independent variable matrix b = vector of regression coefficient (straight line → b0 and b1) ε = vector of random values

by

iiii yxxbxb 210

ii yxbnb 10

yb 1

Page 7: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Simple linear regression

observation yi

smoothed values yi est; yi´

residual deviation

residual sum of squares

residual variance

estyyd iii

21

210

2

n

xbby

kn

Ss

n

iii

rr

n

ii

n

iiir destyyS

1

22

1

Page 8: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Simple lin. reg. → dependence Y on X

Straight line equation

Normal equation system

Parameter estimates – computational formula

n

ii

n

iiyxyx yxbbn

1110

n

iii

n

iiyx

n

iiyx yxxbxb

11

21

10

221

ii

iiiiyx

xxn

yxyxnb

xbyb yxyx 10

iyxyxi xbbesty 10

Page 9: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Simple lin. reg. → dependence X on Y

Associated straight line equation

Parameters estimates – computational formula

ixyxyi ybbestx 10

221

ii

iiiixy

yyn

yxyxnb

ybxb xyxy 10

Page 10: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

2. Correlation analysis corr. analysis measures strength of dependence – coeff. of correlation „r“ │r│is in <0; +1>

│r│is in <0; 0,33> weak dependence │r│is in <0,34; 0,66> medium strong dependence │r│is in <0,67; 1> strong to very strong dependence

r2 = coeff. of determination, proportion (%) of variance Y, that is caused by the effect of X

2222 . iiii

iiiiyx

yynxxn

yxyxnr

xyyxyx bbr 11 .x

yyxyx s

srb .1

y

xxyxy s

srb .1

xyyx rr

Page 11: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

3. Significance testing in simple regression

Page 12: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Significance test of parameters b1 (straight line)

(two-sided)

test criterion

estimate sb for par. b1

table value (two-sided)

if test criterion>table value→H0 is rejected and H1 is valid;if test alfa>p-value→H0 is rejected

bs

bt 1

0:0 H 0:1 H

)( knt

2

1 2

n

r

s

ss

x

yb

Page 13: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Coefficient of regression estimation

interval estimate for the unknown βi

1iii bbP

bkn st )(

Page 14: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Significance test of coeff. corr. r (straight line)

(two-sided)

test criterion

table value (two-sided)

if test criterion>table value→H0 is rejected and H1 is valid;if test alfa>p-value→H0 is rejected

21 2

nr

rt

0:0 H 0:1 H

)( knt

Page 15: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Coefficient of correlation estimation

small samples and not normal distribution Fischer Z – transformation first r is assigned to Z (by tables)

interval estimate for the unknown σ

last step Z1 a Z2 is assigned to r1 a r2

13

1;

3

121

nuZZ

nuZZ

Page 16: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

The summary ANOVA

VariationSum of deviaton

squaresdf Variance

Test criterion

along the regression

functionk - 1

across the regression

functionn - k

21 yyS i

2 iir yyS

112

1 k

Ss

kn

Ss rr

22

21

rs

sF

Page 17: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

The summary ANOVA (alternatively)

test criterion

table value

11 2

2

k

kn

R

RF

)1(),1( nmF

Page 18: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Multicollinearity

relationship between (among) independent variables

among independent variables (X1; X2….XN) is almost perfect linear relationship, high multicollinearity

before model formation is needed to analyze of relationship

linear independent of culumns (variables) is disturbed

Page 19: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Causes of multicollinearity

tendencies of time series, similar tendencies among variables (regression)

including of exogenous variables, delay

using 0;1 coding in our sample

Page 20: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Consequences of multicollinearity wrong sampling null hypothesis about zero regression

coefficient is not rejected, really is rejected confidence intervals are wide regression coeff estimation is very

influented by data changing regression coeff can have wrong sign regression equation is not suitable for

prediction

Page 21: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Testing of multicollinearity

Paired coefficient of correlation t - test

Farrar-Glauber test

test criterion

table value

RpnB ln526

11

2

2/)1(1 pp

if test criterion>table value→H0 is rejected

Page 22: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Elimination of multicollinearity

variables excluding

get new sample

once again re-formulate and think out the model (chosen variables)

variables transformation – chosen variables recounting (not total consumption, but consumption per capita… etc.)

Page 23: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Regression diagnostics

Data quality for the chosen model

Suitable model for the chosen dataset

Method conditions

Page 24: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Data quality evaluation

A) outlying observation in „y“ setStudentized residuals

|SR| > 2 → outlying observation

→ outlying need not to be influential (influential has cardinal influence on regression)

Page 25: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Data quality evaluation

B) outlying observation in „x“ setHat Diag leverage

hii – diagonal values of hat matrix H

H = X . (XT . X)-1 . XT

hii > → outlying observationn

p2

Page 26: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Data quality evaluation

C) influential observation Cook D (influential obs. influence the whole equation)

Di > 4 → influential obs.

Welsch – Kuh DFFITS distance (influential obs. influence smoothed observation)

|DFFITS| > → influential obs.n

p2

Page 27: Simple linear regression and correlation analysis 1. Regression 2. Correlation 3. Significance testing

Method condition

regression parameters <-∞; +∞>

regression model is linear in parameters(not linear – data transformation)

independent of residues

normal distribution of residues N(0;σ2)