Simple linear regression

SIMPLE LINEAR REGRESSION

Reporters: Atty. Gener R. Gayam, CPAAgapito “pete” M. Cagampang, PMRaymond B. Cabling, MD

SIMPLE LINEAR REGRESSION

A.The Scatter DiagramIn solving problems that concern estimation and

forecasting, a scatter diagram can be used as a graphical approach. This technique consists of joining the points corresponding to the paired scores of dependent and independent variables which are commonly represented by X and Y on the X – y coordinate system.

Below is an illustration of a scatter diagram using the data in Table 6.1. This table shows the data about the six years working experience and the income of eight employees in a big industrial corporation.

Table 6.1

EmployeesYears of Working

ExperienceIncome

(Thousand of Pesos)

X YA 2 8B 8 10C 4 11D 11 13E 5 9F 13 17G 4 8H 15 14

ΣX = 62 ΣY = 90 =7.75 Ῡ = 11.25

Working experience and Income of Eight Employees

Figure 6.1 – A Scatter Diagram for Table 6.1 Data

10 12 14 162 4 6 8X

Y

17

15

13

11

9

7

0 X XX

X

X

X

X

X

For you to roughly predict the value of a dependent variable, such as years of working experience, from the dependent variable, which is income, your next step is to draw a trend line. This is a line passing through the series of points such that the total vertical measurement of the points below this line is more or less equal to the total measurements of the points above the line. If these requirements are satisfied, you draw a correct trend Y. The illustration is shown in figure 6.1

Figure 6.2 - A trend line drawn on the linear direction between working experience and income of eight employees

10 12 14 162 4 6 8X

Y

17

15

13

11

9

7

0

Trend Line

Using the trend line draw in Figure 6.1 above, the value estimated for Y when X is 16, is 18. You should not fail to remember that if a “straight line” appears to describe the relationship, the algebraic approach called the regression formula can be used as explained in the next topic.

B. The Least Square Linear Regression

Equation The least square linear regression equation can

be understood through this formula known from algebra.

Y = a + bxFor instance the Y = a+bx in figure 6.1 in that line

that gives the smallest sum of the squares of the vertical measurements or distance of the points from the line.

In solving the regression equations, you need to solve first, Σ (Xi - X) (Yi - Ῡ)

Σ (Xi - X)²b = and

a = Ῡ - bX

Example: Solve the least squares regression line for the data scores in Table 6.1.

Employees X Y (X–X) (Y–Y) (X–X) (Y–Y) (X–X)²

A 2 8 -5.75 -3.25 18.6875 33.0625B 8 10 0.25 -1.25 -0.3125 0.0625C 4 11 -3.75 -0.25 0.9375 14.0625D 11 13 3.25 1.75 5.6875 10.5625E 5 9 -2.75 -2.25 6.1875 7.5625F 13 17 5.25 5.75 30.1875 27.5625G 4 8 -3.75 -3.25 12.1875 14.0625H 15 14 7.25 2.75 19.9375 52.5625

93.8125 159.50– .312593.50

ΣX = 62 ΣY = 90

X = 7.75 Y = 11.25

Solution:Σ(Xi – X) (Yi – Y)

93.50159.50

=

=

a = Y – bX

= 11.25 ‒ .59 (7.75)

= 11.25 ‒ 4.75

= 6.68 Answer

0.5862068

0.59 Answer

b =

=

Σ(Xi – X)²

After solving the values of b and a, your regression equation obtained from Table 6.1 is.

Y = 6.68 + .59 X

Now letting X = 16, What is Y?

Solution:

Y = 6.68 + .59 (16)

= 6.68 + 9.44

= 16.12

Now, we are interested in the distance of the Y values from Y₁ the corresponding ordinate of the regression line. Here, we are going to base our measure of dispersion or variation around the regression line on the distance (Y₁ ‒ Y)². This can be well understood by this standard error of estimate formula given below.

Se = Σ(Yi ‒ Ŷ)² n ‒ 2

C. The standard Error of Estimate

√

However, this formula entails a very tedious process of computing the standard error of estimate, so that the formula by Basil P. Korin (1977), which is easier to solve suggested as follows:

Se = ΣYi² ‒ a(Yi) ‒ b(Xi ‒ Yi)

n ‒ 2Note:

The symbol a and b stand for the intercept and the slope of the regression line.

√

Example:Solve the standard error of estimate for the regression line

which was derived from the data in Table 6.1.Se = Σ(Yi ‒ Ŷ)²

n ‒ 2√ Y X Ŷ (Y ‒ Ŷ) (Y ‒ Ŷ)²8 2 7.86 0.14 0.0196

10 8 11.4 -1.4 1.9611 4 9.04 1.96 3.841613 11 13.17 -0.17 0.02899 5 9.63 -0.63 0.3969

17 13 14.35 2.65 7.02258 4 9.04 -1.04 1.0816

14 15 15.53 -1.53 2.340916.692

Step 1 – Compute the value of Y at each of the X values.Example:

Y = 6.68 + .59 (2) = 6.68 + 1.18 = 7.68

Do the rest by following the same procedure.Step 2 – Get the difference between (Yi ‒ Ŷ).

Example:8 – 7.86 = .14

Step 3 – Square all the difference Yi ‒ Ŷ.Example:

(.14)² = .0196

Step 4 – Apply the formula.

Se = Σ(Yi ‒ Ŷ)² n ‒ 2

= 16.692 8 – 2

= 16.692 6

= 2.782

= 1.67

√

√

√√

Xi Yi Yi² XiYi2 8 64 168 10 100 804 11 121 44

11 13 169 1435 9 81 45

13 17 289 2214 8 64 32

15 14 196 210

ΣY = 90 ΣY² = 1084 ΣXY = 791

Solution 2:Se = ΣYi² ‒ a(Yi) ‒ b(Xi ‒ Yi)

n ‒ 2

Step 1 – Square Y₁ Example:

(8²) = 64Step 2 – Multiply XiYi

Example:2 X 8 = 16

Step 3 – Get the sum of Yi² and XiYiStep 4 – Apply the formula

= 1084 – 6.68 (90) – .59 (791) n – 2

= 1084 – 601.2 – 466.69 8 – 2

= 1084 – 1067.89 8 – 2= 16.11 6= 2.685= 1.64

√√√√√

The standard error of estimate is interpreted as the standard deviation. For example, if we measure vertically three standard errors from the regression line above and below, we will find that the same value of X will always fall between the upper and lower 3Se Limits.

In the example above of the standard error of estimate which is 1.64 you will come up with 4.92 units (3) (1.64) above and below the regression line. This means that these “bounds” of 4.92 unit above and below the regression line pertain to all observations taken for that particular sample. If you draw two parallel lines, each of them lying one Se from the regression line, you will expect two thirds of the observations falling between these bounds. See Figure 6.1 for the illustration of the data in Table 6.1.

7

X

Y

17

15

13

11

9

02 4 6 8 10 12 14 16

Figure 6.3 – A regression Line with One Standard Error Distance

Y = 6.68 + .59 X

Documents

Simple linear regression