Upload
umar-sheikh
View
987
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
UNIT III
REGRESSION
Meaning
The dictionary meaning of regression is “the act of returning or going back”;
First used in 1877 by Francis Galton; Regression is the statistical tool with the help
of which we are in a position to estimate (predict) the unknown values of one variable from the known values of another variable;
It helps to find out average probable change in one variable given a certain amount of change in another;
Importance
Regression lines
For two variables X and Y, we will have two regression lines:
1. Regression line X on Y gives values of Y for given values of X;
2. Regression line Y on X gives values of X for given values of Y;
Regression Equation
Regression equations are algebraic expressions of regression lines;
Y on XRegression equation expressed as
Y=a+bXY is dependent variableX is independent variable‘a’ & ‘b’ are constants/parameters of line‘a’ determines the level of fitted line (i.e. distance
of line above or below origin)‘b’ determines the slope of line (i.e change in Y for
unit change in X)
Regression equations are algebraic expressions of regression lines;
X on YRegression equation expressed as
X=a+bYX is dependent variableY is independent variable‘a’ & ‘b’ are constants/parameters of line‘a’ determines the level of fitted line (i.e. distance
of line above or below origin)‘b’ determines the slope of line (i.e change in Y for
unit change in X)
Method of Least Square
Constant “a” & “b” can be calculated by method of least square;
The line should be drawn through the plotted points in such a manner that the sum of square of the vertical deviations of actual Y values from estimated Y values is the least i.e. ∑(Y-Ye)2 should be minimum;
Such a line is known as line of best fit; with algebra & calculus:For Y on X For X on Y∑Y=Na+b ∑X ∑X=Na+b ∑Y∑XY=a ∑X + b ∑X2 ∑XY=a ∑Y + b ∑Y2
Multiple Regression
When we use more than one independent variable to estimate the dependent variable in order to increase the accuracy of the estimate; the process is called multiple regression analysis.
It is based on the same assumptions & procedure that are encountered using simple regression.
The principal advantage of multiple regression is that it allows us to use more of the information available to us to estimate the dependent variable;
Estimating equation describing relationship among three variables
Y= a+b1X1+b2X2
where, Y = estimated value corresponding to the dependent variable
a= Y intercept b1 and b2 = slopes associated with X1
and X2, respectively
X1 and X2 = values of the two independent variables
Normal Equations:
we use three equations (which statistician call the “normal equation”) to determine the values of the constants a, b1 and b2
∑Y=Na+b1∑X1 + b2∑X2
∑X1Y=a ∑X1 + b1 ∑X1
2 + b2∑X1 X2
∑X2Y=a ∑X2 + b2 ∑X2
2 + b1∑X1 X2
Difference between regression & correlation
Correlation coefficient (r) between x & y is a measure of direction & degree of linear relationship between x & y;
It does not imply cause & effect relationship between the variables.
It indicates the degree of association
bxy & byx are mathematical measures expressing the average relationship between the two variables
It indicates the cause & effect relationship between variables.
It is used to forecast the nature of dependent variable when the value of independent variable is know
Correlation Regression