Upload
abigail-hickey
View
236
Download
0
Tags:
Embed Size (px)
Citation preview
SADC Course in Statistics
Simple Linear Regression
(Session 02)
2To put your footer here go to View > Header and Footer
Learning ObjectivesAt the end of this session, you will be able to
• understand the meaning of a simple linear regression model, its aims and terminology
• determine the best fitting line describing the relationship between a quantitative response (y) and a quantitative explanatory variable (x)
• Interpret the unknown parameters of the regression line
3To put your footer here go to View > Header and Footer
An illustrative example
Data on the next slide shows the average number of cigarettes smoked per adult in 1930 and the death rate per million in 1952 for sixteen countries.
The question of interest is whether there is a relationship between the death rate (y) and level of smoking (x). Here both y and x are quantitative measurements.
4To put your footer here go to View > Header and Footer
The DataCountry Cig. Smoked (x) Death rate (y)England and Wales 1378 461Finland 1662 433
Austria 960 380Nethelands 632 276Belgium 1066 254Switzerland 706 236New Zealand 478 216U.S.A. 1296 202Denmark 465 179Australia 504 177Canada 760 176France 585 140Italy 455 110Sweden 388 89Norway 359 77Japan 723 40
5To put your footer here go to View > Header and Footer
Start by plotting - shows pattern
-a straight line relationship seems plausible here.
010
020
030
040
050
0D
eat
h ra
te (
y)
0 500 1000 1500 2000Cigarettes smoked (x)
6To put your footer here go to View > Header and Footer
Recall reasons for modelling
• To determine which of (often) several factors explain variability in the key response of interest;
• To summarise the relationship(s);
• For predictive purposes, e.g. predicting y for given x’s, or identifying x’s that optimise y in some way;
Note: Presence of an association betweenvariables does not necessarily implycausation.
7To put your footer here go to View > Header and Footer
Describe variation in response (here death rate) in terms of its relationship with the explanatory variable (here cig. numbers).
Model : Model : data = pattern + residual
–can describe pattern as: a + bx , if straight line relationship seems
reasonable
–residual is unexplained variation - assumed to be random.
Describing the Regression Model
8To put your footer here go to View > Header and Footer
If there is only one explanatory variable, we have a Simple Linear Regression Model.
Here data = pattern + residual becomes:
y = + x +
where + x =pattern and = residual.• is called the intercept• is called the slope• the ’s represent the departure of the true line from the observed values.
Simple Linear Regression Model
9To put your footer here go to View > Header and Footer
A Diagrammatic Representation
}
}
x
y
y x
i
x
y
××
×
××
××
×
×
i
i
10To put your footer here go to View > Header and Footer
and are the unknown parameters in the model. They are estimated from the data
• The random error, , is assumed to have a– normal distribution– with constant variance (whatever the
value of x)
We shall return to these assumptions later.
Parameters of Model & Assumptions
11To put your footer here go to View > Header and Footer
Results of model fitting------------------------------------------------------ deathrate|Coef. Std.Err. t P>|t| [95% Conf.Int.]---------+--------------------------------------------Cigars | .2410 .0544 4.43 0.001 .1245 .3577Const. | 28.31 46.92 0.60 0.556 -72.34 128.95------------------------------------------------------
These are estimates of coefficients of the regression equation since this is a sample of data - precision quantified by standard errors
Estimated equation is: y = 28.31 + 0.241 * x
Note: The t and P>|t| columns will be discussed in the next session.
12To put your footer here go to View > Header and Footer
The fitted line0
100
200
300
400
500
0 500 1000 1500 2000Cigarettes smoked (x)
Death rate (y) Fitted values
13To put your footer here go to View > Header and Footer
Interpreting model parameters
• Slope (regression coefficient): If cigarettes smoked increases by 1 unit per year, death rate will increase by 0.24 units. In other words, if cigarettes smoked increases by 100 units, death rate will increase by 24 units.
• Intercept of 28.31 only has meaning if the range of x values (cigarettes smoked) under study includes the value of zero. Here zero cigarettes smoked still gives an estimated death rate of 28.3
14To put your footer here go to View > Header and Footer
Predictions from the lineThe model equation can also be used to
predict y at a given value of x
Thus from y = 28.31 + 0.241 x, predicted death rate ( ) in a country where
number of cigarettes smoked is x=1000, is given by
= 28.31 + 0.241 (1000)= 269.3
Note: Predictions will be discussed in greater detail in Session 9.
ˆˆy x
y
15To put your footer here go to View > Header and Footer
Computation of model estimates (for reference only)
i iˆy x ˆˆ y xn
i i i i
2 2i i
x y ( x )( y ) / n Sxyˆx ( x ) / n Sxx
Note: Can also write i i
2i
(x x)(y y)Sxy
Sxx (x x)
16To put your footer here go to View > Header and Footer
Practical work follows to ensure learning objectives are
achieved…