Upload
agnes-smith
View
236
Download
0
Tags:
Embed Size (px)
Citation preview
CHAPTER 3
INTRODUCTORY LINEAR REGRESSION
Introduction
Linear regression is a study on the linear relationship between two variables. This is done by fitting a linear equation to the observed data.
The linear equation is then used to predict values for the data.
In a simple linear relationship, only two variables are involved:a)X is the independent variable - the variables has been controlledb)Y is the dependent variable - the response variables. In other
word, the value of y depends on the value of x.
Example
A nutritionist studying weight loss programs might wants to find out if reducing intake of carbohydrate can help a person reduce weight.a) X is the carbohydrate intake (independent variable).b) Y is the weight (dependent variable).
An entrepreneur might want to know whether increasing the cost of packaging his new product will have an effect on the sales volume.a) X is costb) Y is sales volume
Scatter plots
A scatter plot is essentially a plot between the pair of (x,y) values.
The purpose of constructing the plot is to examine the relationship between the two variables.
(a) Positive linear relationship
01020304050607080
5 5.5 6 6.5 7
Y
X
0
100
200
300
400
500
5 5.2 5.4 5.6 5.8 6 6.2
Y
X
0
2
4
6
8
10
0 5 10
Y
X
A linear regression equation is a mathematical equation that can be used to predict the values of one dependent variable from known values of an independent variable.
This equation represents a straight line so it is of the form, y=mx+c, where m is the slope and c is the y-intercept.
The true regression line or the probabilistic model is given by,
where,
: intercept of the line.
: slope of the line.
: random error.
y x
y
We will call this model the simple linear regression model because it has only one independent variable.
This regression line is estimated from the data collected by fitting a straight line to the data set and getting the equation of the straight line,
ˆˆy x
Least Squares Method
The least squared method is the commonly method used for estimating the regression coefficient, .
The straight line fitted to the data set is the line .
and
ˆˆy x
2
1 1 12
1 1
2
ˆ ˆˆ and ; where
;
xy
xx
n n n
i i in ni i i
xy i i xx ii i
ii
yy i
Sy x
S
x y x
S x y S xn n
y
S y
2
1
1
n
n
i n
Example
1
To test the existence of a linear relationship between any two variables
x and y, we proceed with testing the hypothesis
: 0 (the slope is zero meaning there is no linear relationship)
: 0 (the sloH
H
ope is not zero meaning there exist a linear
relationship)
ˆˆNote: If 0, the model reduces to . This implies the values of
have no effect on , that is there is no relationship b
y x
y
etween the
two variables.
0
1
Testing procedure:
Set up hypothesis:
: 0
: 0
ˆCalculate the test statistic:
ˆ
where
ˆ 1ˆ2
yy xy
xx
H
H
tVar
S SVar
n S
0
/2
/2 /2
Determine the rejection region:
This is two-tailed test so reject
If or
, where are based on 2 degrees of freedom.
H
t t
t t t n
/2, 2nt /2, 2nt
Example
The analysis of variance (ANOVA) method is an approach to test the significance of the regression. We can arrange the test procedure using this approach in an ANOVA table as shown below
The test hypotheses are
Source of Variation
Sum of Squares
Degrees of
freedom
Mean Square
testf
Regression xySSR BS 1 MSR MSR
fMSE
Residual
SSE=SST-SSR
n-2 MSE
Total yySST S n-1
0 : 0H
1 : 0H
We will reject if at α level of significance. Then we conclude there exist a linear relationship between the two variable being investigated.
,1, 2test nf f 0H
Correlation
Correlation measures the strength of a linear relationship between the two variables. One numerical measure is the Pearson product moment correlation coefficient, r.
Properties of r: Values of r close to 1 implies there is a strong positive linear
relationship between x and y. Values of r close to -1 implies there is a strong negative linear
relationship between x and y. Values of r close to 0 implies little or no linear relationship
between x and y.
xy
xx yy
Sr
S S
1 1r
Before
65 63 76 46 68 72 68 57 36 96
After 68 66 86 48 65 66 71 57 42 87
0.05.
2
2
2
10 44435
647 44279 64.7
656 44884 y = 65.6
647 65644435 1991.8
10
64744279 2418.1
10
448.
xy
xx
yy
Solution
n xy
x x x
y y
S
S
S
265684 1850.4
10
1991.8ˆa) 0.82370.8237
ˆˆ 65.6 0.8237 64.7 12.3063
ˆ 12.3063 0.8237
xy
xx
S
S
y x
y x
b) 60
12.3063 0.8237 60 61.7283
x
y
0
1
c)
1. : 0 (no linear relationship)
: 0 (exist linear relationship)
ˆ 0.8237 2. t 7.9260
0.0108ˆ
ˆ 1.850.4 0.8237 1991.81 1ˆ 2 8 2418.1
test
yy xy
xx
H
H
Var
S SVar
n S
0.0108
0.025,8
0.025,8
3. 0.05, 2.306
4. 7.926 2.306test
t
t t
0 reject the score before is linearly related to their scores after
the trip
H
2.306 2.306
d)
1991.8 0.9416
2418.1 1850.4
There is a strong positive linear relationship between
score obtained before and after.
xy
xx yy
Sr
S S
Suppose you wish to investigate the relationship between the numbers of hours student’s spent studying for an examination and the mark they achieved.
Students A B C D E F G H
numbers of hours (x)
5 8 9 10 10 12 13 15
Final marks ( (y) 49 60 55 72 65 80 82 85
Numbers of hours student’s spent studying
for an examination ( x – Independent
variable )
the mark (y) they achieved.( y – Dependent variable )
will cause
ˆlinear regression model : 26.89 4.06y x
Strong Linear positive correlation
89.26% of variation in marked achieved is due to variation in
numbers oh hours student’s spent studying
This chapter introduces important methods (regression) for making inferences about a relationship between two variables and describing such a relationship with an equation that can be used for predicting value of one variable given the value of the other variable.