Upload
charity-shields
View
233
Download
2
Tags:
Embed Size (px)
Citation preview
2/30
Outline
The method of Least Squares
Inferences based on the Least Squares Estimators
Curvilinear Regression
Multiple Regression
3/30
11.1 The Method of Least Squares
Study the case where a dependent variable is to be predicted in terms of a single independent variable.
The random variable Y depends on a random variable X.
Regressing curve of Y on x, the relationship between x and the mean of the corresponding distribution of Y.
4/30
Linear regression
5/30
Linear regression
Linear regression: for any x, the mean of the distribution of the Y’s is given by x
In general, Y will differ from this mean, and we denote this difference as follows
Y x is a random variable and we can also choose
so that the mean of the distribution of this random is equal to zero.
6/30
EXx 1 2 3 4 5 6 7 8 9 10 11 12
y 16 35 45 64 86 96 106 124 134 156 164 182
7/30
Analysisˆ
ˆi i i
y a bx
e y y
1
n
ii
e as close as possible to zero.
8/30
Principle of least squares
2 2
1 1
( ( ))n n
i i ii i
e y a bx
Choose a and b so that
is minimum. The procedure of finding the equation of the line which best fits a given set of paired data, called the method of least squares. Some notations:
2
2 2 1
1 1
( )( )
n
in ni
xx i ii i
xS x x x
n
2
2 2 1
1 1
( )( )
n
in ni
yy i ii i
yS y y y
n
1 1
1 1
( )( )( )( )
n n
i in ni i
xy i i i ii i
x yS x x y y x y
n
9/30
Least squares estimators
, where , are the means of ,xy
xx
Sa y b x and b x y x y
S
Fitted (or estimated) regression line
y a bx Residuals: observation – fitted value= ( )i iy a bx
The minimum value of the sum of squares is called the residual sum of squares or error sum of squares. We will show that n
2
1
2
residual sum of squares= ( - - )
/
i ii
xy xy xx
SSE y a bx
S S S
10/30
EX solution
Y = 14.8 X + 4.35
11/30
X-and-Y
X-axis Y-axis
independent dependent
predictor predicted
carrier response
input output
12/30
Example
You’re a marketing analyst for Hasbro Toys. You gather the following data:
Ad $ Sales (Units)1 12 13 24 25 4
What is the relationship between sales & advertising?
13/30
0
1
2
3
4
0 1 2 3 4 5
Scattergram Sales vs. Advertising
Sales
Advertising
14/30
the Least Squares Estimators
15/30
11.2 Inference based on the Least Squares Estimators
We assume that the regression is linear in x and, furthermore, that the n random variable Yi are independently normally distribution with the means
Statistical model for straight-line regression
i i iY x
ix
i are independent normal distributed random variable having zero means and the common variance 2
16/30
Standard error of estimate
The i-th deviation and the estimate of is
2
2 2
1
1[ ( )]
2
n
e i ii
S y a bxn
Estimate of can also be written as follows2
2
2
( )
2
xyyy
xxe
SS
SS
n
17/30
Statistics for inferences: based on the assumption made concerning the distribution of the values of Y, the following theorem holds.
Theorem. The statistics
2
( ) ( )
( )xx
xxe xx e
nSa bt and t S
s S n x s
are values of random variables having the t distribution with n-2 degrees of freedom.
Confidence intervals2
/ 2
/ 2
1 ( ):
1:
exx
e
xx
xa t s
n S
b t sS
18/30
Example
The following data pertain to number of computer jobs per day and the central processing unit (CPU) time required.
Number of jobs
x
CPU time
y
1
2
3
4
5
2
5
4
9
10
19/30
EX
1) Obtain a least squares fit of a line to the observations on CPU time
2, 0xy
xx
Sb a y bx
S 2y x
20/30
Example
2) Construct a 95% confidence interval for α
22 / 46 400 /10
22 3
yy xy xxe
S S Ss
n
The 95% confidence interval of α, / 2 0.025 3.182t t
2
/ 2
1 1 90 3.182* 2 * 4.72
5 10exx
xa t s
n S
21/30
Example
3) Test the null hypothesis against the alternative hypothesis at the 0.05 level of significance.
1
1
Solution: the t statistic is given by
( ) 2 110 2.236
2xx
e
bt S
s
Criterion: 0.05 2.353t t
Decision: we cannot reject the null hypothesis
22/30
11.3 Curvilinear Regression
Regression curve is nonlinear.
Polynomial regression: 2
0 1 2p
pY x x x
Y on x is exponential, the mean of the distribution of values of Y is given by xy
Take logarithms, we have log log logy x
Thus, we can estimate by the pairs of value , ( , log )i ix y
23/30
Polynomial regression
If there is no clear indication about the function form of the regression of Y on x, we assume it is polynomial regression
20 1 2
kkY a a x a x a x
24/30
Polynomial Fitting
•Really just a generalization of the previous case•Exact solution•Just big matrices
25/30
11.4 Multiple Regression
0 1 1 2 2 k kb b x b x b x The mean of Y on x is given by
0 1 1 2 2
21 0 1 1 1 2 1 2
22 0 2 1 1 2 2 2
y nb b x b x
x y b x b x b x x
x y b x b x x b x
20 1 1
1
[ ( )]n
i i k iki
y b b x b x
Minimize
We can solve it when r=2 by the following equations
26/30
Example
P365.
27/30
Multiple Linear Fitting
X1(x), . . .,XM(x) are arbitrary fixed functions of x (can be nonlinear), called the basis functions
normal equations of the least squaresproblem
Can be put in matrix form and solved
28/30
Correlation Models
1. How strong is the linear relationship between 2 variables?
2. Coefficient of correlation usedPopulation correlation coefficient denoted Values range from -1 to +1
29/30
Correlation
Standardized observation
The sample correlation coefficient r
Observation - Sample mean
Sample standard deviationi
x
x x
s
1
1( )( )
1
ni i
i x y
x x y yr
n s s
30/30
Coefficient of Correlation Values
-1.0-1.0 +1.0+1.000-.5-.5 +.5+.5
No No CorrelationCorrelation
Increasing degree of Increasing degree of negative correlationnegative correlation
Increasing degree of Increasing degree of positive correlationpositive correlation