Correlation & Regression

Preview:

Citation preview

Name of Institution

1

CORRELATION & REGRESSION ANALYSIS

Name of Institution

2

CORRELATION

• When the relationship is of quantitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in a brief formula is known as correlation.

• The measure of correlation called the coefficient of correlation indicates the strength & direction of relationship between two variables.

• The coefficient between two variables x and y is denoted by r or rxy

or ρ.

• It lies between – 1 to + 1.

• If r = 0, then the variables are said to be independent.

Name of Institution

3

TYPES OF CORRELATION

I) Based on Direction:--Positive Correlation: When increase/decrease in the value of one variable results in a corresponding increase/ decrease in the value of other variable.Negative Correlation: When increase/ decrease in the value of one variable results in a corresponding decrease/ increase in the value of other variable.

II) Based on Degree:-- High

ModerateLow

Name of Institution

4

METHODS OF STUDYING CORRELATION

1) Scatter Diagram Method.

2) Karl Pearson’s Coefficient of Correlation.

3) Spearman’s Rank Correlation Coefficient.

Name of Institution

5

SCATTER DIAGRAM

• The simplest method for studying correlation in two variables is a special type of dot chart called Dotogram or Scatter Diagram.

• In this method given data are plotted in the form of dots, for each pair of X and Y.

• The more the plotted points scatter over the chart, the lesser is the degree of relationship between two variables.

• The more nearly the points come to the line, the higher the degree of relationship.

Name of Institution

Y

X

= -1= -1 Y

X

= 0= 0

Y

X

= 1= 1 Y

X

= 0= 0

Perfect negativeCorrelation

No Correlation

Perfect PositiveCorrelation

No Correlation

Name of Institution

7

Advantages:

1. It is readily comprehensive and enables us to form a rough idea of the nature of relationship between the two variables x and y.2. It is not affected by extreme observations.

Disadvantages:

1.It is not a suitable method if the number of observations is fairly large.2.It is only a rough measure of correlation where the exact magnitude cannot be known.

Name of Institution

8

KARL PEARSON COEFFICIENT OF CORRELATION

• Also known as Pearsonian Coefficient of Correlation.

• It describes the degree & direction of relationship between two variables X and Y.

• It is denoted by the symbol ‘r’.

• The value of Pearson’s coefficient of correlation lies between -1 to +1.

• If X and Y are independent variables then coefficient of correlation is zero.

Name of InstitutionPEARSON FORMULA

• Correlation coefficient is denoted by r given by the formula:-

n

yy

n

xx

n

yxxy

ror

formThird

yyxx

yyxxr

formSecond

yxCov

yx

yxCovr

formFirst

yx

2

2

2

2

22

)(

)()(

))((

),.(

varvar

),.(

Name of Institution

10

Ques 1. Calculate Karl Pearson coefficient of correlation.

X Y

12 14

9 8

8 6

10 9

11 11

13 12

7 3

Name of Institution

11

Ques 2. A financial analyst wanted to find out whether inventory turnover influences any company’s earnings per share.Random sample of 7 companies listed in stock exchange were selected and the following data was recorded for each.Find the correlation coefficient.

Company Inventory turnover

Earnings per share (%)

A 4 11

B 5 9

C 7 13

D 8 7

E 6 13

F 3 8

G 5 8

Name of Institution

12

Ques 3. The following table gives the indices of industrial production and number of registered unemployed people (in lakhs). Calculate Karl Pearson’s coefficient of correlation.

Index of production

No. of unemployed

100 15

102 12

104 13

107 11

105 12

112 12

103 19

99 26

Name of InstitutionSPEARMAN CORRELATION

• Rank X and Y separately.• The largest value gets rank 1 and the second

largest 2 and so on.• Formula is:-

• For tied ranks:-

YRankXRankdwherenn

d

;

)1(

*61

2

2

.

)1(

.......)(121

)(121

*61

2

23

213

12

repeatedisvalueatimesofnumbertheismHere

nn

mmmmd

Name of Institution

Question1) Calculate the coefficient of correlation for the following heights in inches of fathers(X) and sons(Y).

X Y

65 67

66 68

67 65

67 68

68 72

69 72

70 69

72 71

Name of Institution

15

Question 2) Find rank correlation coefficient between x and y.

X Y

85 18.3

91 20.8

56 16.9

72 15.7

95 19.2

76 18.1

89 17.5

51 14.9

59 18.9

90 15.4

Name of Institution

Question 3) obtain the rank correlation coefficient for the following data.

X Y

68 62

64 58

75 68

50 45

64 81

80 60

75 68

40 48

55 50

64 70

Name of InstitutionREGRESSION

• Regression analysis provides a mathematical model of the relationship between two variables, in which one is independent and one is dependent.

• If X and Y are two variables, then we have two regression lines:-

(a) Regression line of X on Y.

(b) Regression line of Y on X.

Name of InstitutionRegression line X on Y.

The regression line of X on Y is given by:-

X= a + b Y

where, b is called regression coefficient X on Y, denoted by bxy

Here, Y is the independent variable and X is dependent variable.

Normal equations to estimate a and b are:-

2YbYaXY

YbnaX

Name of Institution

Another form of regression equation X on Y is :-

y

xxy

y

x

rbHere

YYrXX

*,

*

Name of InstitutionRegression line Y on X.

The regression line of Y on X is given by:-

Y= a + b X

where, is called regression coefficient X on Y, denoted by byx

Here, X is the independent variable and Y is dependent variable.

Normal equations to estimate a and b are:-

2XbXaXY

XbnaY

Name of Institution

Another form of regression equation Y on X is :-

x

yyx

x

y

rbHere

XXrYY

*,

*

Name of InstitutionProperties of regression lines and

coefficients

• Both the regression lines passes through the point • The correlation coefficient is the geometric mean of two

regression coefficients of X and Y i.e

• If one of the regression coefficients is greater than 1,the other must be less than 1.

• bxy and byx and correlation coefficient (r) have the same sign.

for eg:-if bxy = -0.664 and byx = -0.234

then r = -(0.664*0.234)1/2 = -0.394

yx,

yxxy bbr

Name of Institution

QUESTION 1) You are given the following information about advertising expenditure and sales.

Advertisement(x) Sales(y)

A.M 10 90

S.D 3 12

And r = 0.8

(a)Obtain the two regression lines.

(b)Find the likely sales when advertisement budget is Rs 15 lakhs?

Name of Institution

QUESTION 2) The two regression lines are given by:-

3 X + 12 Y = 19

9 X +3 Y = 46

And σx = 4.

Obtain:-

(a). Mean values of X and Y.

(b) The value of correlation coefficient.

(c) Standard deviation of y.

Name of Institution

25

Question 3. For the following data,

Obtain the two regression equations and hence find the correlation coefficient.

X 1 2 3 4 5

Y 2 5 3 8 7

Name of Institution

26

Question 4. The following data gives the ages and blood pressure of 10 women.

(i) Find the correlation coefficient between age and blood pressure.(ii) Determine the regression equation of blood pressure on age.(iii) Estimate the blood pressure of a woman whose age is 45 years.

Age 56 42 36 47 49 42 60 72 63 55

B.P 147 125 118 128 145 140 155 160 149 150