Simple regression and correlation

Preview:

Citation preview

SIMPLE

REGRESSION

AND

CORRELATION

Prepared by: WET SOCIETY :D

DEFINITION OF TERMS

CORRELATION The correlations term is used when:

1) Both variables are random variables,

2) The end goal is simply to find a number that expresses the relation between the

variables

REGRESSIONThe regression term is used when

1) One of the variables is a fixed variable,

2) The end goal is use the measure of relation to predict values of the random

variable based on values of the fixed variable

WET SOCIETY \m/

CORRELATION

Correlations range from -1

(perfect negative relation)

through 0 (no relation) to +1

(perfect positive relation)

WET SOCIETY \m/

CORRELATION = -1.0WET SOCIETY \m/

CORRELATION = 0.0WET SOCIETY \m/

CORRELATION = +1.0WET SOCIETY \m/

CALCULATING THE COVARIANCE:

The first step in calculating a correlation co-

efficient is to quantify the covariance between

two variables.

WET SOCIETY \m/

CALCULATING THE COVARIANCE:

Alternative formula:

WET SOCIETY \m/

THE PEARSON PRODUCT-MOMENT CORRELATION COEFFICIENT (R)

The Pearson Product-Moment Correlation Coefficient, r, is computed simple by standardizing the covariance estimate as follows:

This results in r values ranging from -1.0 to +1.0 as discussed earlier

WET SOCIETY \m/

THE PEARSON PRODUCT-MOMENT CORRELATION COEFFICIENT (R)

There is another way to represent this formula. It is:

where SPXY is the sum of the products of X and Y, SSX is the

sum of squares for X and SSY is the sum of squares for Y

WET SOCIETY \m/

SUMS OF SQUARES AND SUMS OF PRODUCTS

WET SOCIETY \m/

SUMS OF SQUARES AND SUMS OF PRODUCTS

WET SOCIETY \m/

ADJUSTED RWET SOCIETY \m/

EXAMPLE 1

In this class, height and ratings of physical attractiveness vary

across individuals. What is the correlation between height and

these ratings in our class?

PhyHeightSubject

7691

8612

6683

5664

8665

....

107148

WET SOCIETY \m/

We can create a scatter plot of these data by simply plotting

one variable against the other:

correlation = 0.146235 or +0.15

WET SOCIETY \m/

EXAMPLE 2

Consider the height and weight variables from our class dataset ...

WET SOCIETY \m/

SUM (XY) = 99064

Subject Height (X) Weight (Y)

1 69 108

2 61 130

3 68 135

4 66 135

5 66 120

6 63 115

7 72 150

8 62 105

9 62 115

10 67 145

11 66 132

12 63 120

Mean 65.42 125.83

Sum(X) = 785 Sum(Y) = 1510

Sum (X2) = 51473 Sum(Y2) = 192238

WET SOCIETY \m/

WET SOCIETY \m/

WET SOCIETY \m/

So, based on the 12 subjects we examined,

the correlation between height and weight

was +0.55

WET SOCIETY \m/

Unfortunately, the r we measure using our sample

is not an unbiased estimator of the population

correlation coefficient (rho)

We can correct for this using the adjusted

correlation coefficient which is computed as

follows:

WET SOCIETY \m/

WET SOCIETY \m/

THE REGRESSION LINE

The regression line represents

the best prediction of the

variable on the Y axis for each

point along the X axis.

WET SOCIETY \m/

COMPUTING THE REGRESSION LINE

where = the predicted value of Y

b = the slope of the line (the change in Y as a function of X)

X = the various values of X

a = the intercept of the line (the point where the line hits the Y

axis)

WET SOCIETY \m/

Slope(b) = (NΣXY - (ΣX)(ΣY)) /

(NΣX2 - (ΣX)2)

Intercept(a) = (ΣY – b(ΣX)) / Nwhere

x and y are the variables.

N = Number of values or elements

X = First Score

Y = Second Score

ΣXY = Sum of the product of first and

Second Scores

ΣX = Sum of First Scores

ΣY = Sum of Second Scores

ΣX2 = Sum of square First Scores

WET SOCIETY \m/

REGRESSION EXAMPLE

To find the Simple/Linear Regression of

To find regression equation, we will first find slope, intercept and use it to form regression equation..

X Values Y Values

60 3.1

61 3.6

62 3.8

63 4

65 4.1

WET SOCIETY \m/

Step 1: Count the number of values.

N = 5

Step 2: Find XY, X2

See the below table

X Value Y Value X*Y X*X

60 3.160 *3.1 =

18660 *60 =

3600

61 3.661 *3.6 =

219.661 *61 =

3721

62 3.862 *3.8 =

235.662 *62 =

3844

63 4 63 *4 =25263 *63 =

3969

65 4.165 *4.1 =

266.565 *65 =

4225

WET SOCIETY \m/

Step 3: Find ΣX, ΣY, ΣXY, ΣX2.

ΣX = 311

ΣY = 18.6

ΣXY = 1159.7

ΣX2 = 19359

WET SOCIETY \m/

Step 4: Substitute in the above slope

formula given.

Slope(b) = (NΣXY - (ΣX)(ΣY)) /

(NΣX2 - (ΣX)2)

= ((5)*(1159.7)-

(311)*(18.6))/((5)*(19359)-(311)2)

= (5798.5 - 5784.6)/(96795 -

96721)

= 13.9/74

= 0.19

WET SOCIETY \m/

Step 5: Now, again substitute in the above

intercept formula given.

Intercept(a) = (ΣY - b(ΣX)) / N

= (18.6 - 0.19(311))/5

= (18.6 - 59.09)/5

= -40.49/5

= -8.098

Step 6: Then substitute these values in

regression equation formula

Regression Equation(y) = a + bx

= -8.098 + 0.19x.

WET SOCIETY \m/

Suppose if we want to know the approximate y

value for the variable x = 64. Then we can

substitute the value in the above equation.

Regression Equation(y) = a + bx

= -8.098 + 0.19(64).

= -8.098 + 12.16

= 4.06

This example will guide you to find the relationship

between two variables by calculating the

Regression from the above steps.

WET SOCIETY \m/

Recommended