30
Ch11 Curve Fitting Dr. Deshi Ye

# Ch11 Curve Fitting Dr. Deshi Ye [email protected]

Embed Size (px)

Citation preview

Ch11 Curve Fitting

Dr. Deshi Ye

[email protected]

2/30

Outline

The method of Least Squares

Inferences based on the Least Squares Estimators

Curvilinear Regression

Multiple Regression

3/30

11.1 The Method of Least Squares

Study the case where a dependent variable is to be predicted in terms of a single independent variable.

The random variable Y depends on a random variable X.

Regressing curve of Y on x, the relationship between x and the mean of the corresponding distribution of Y.

4/30

Linear regression

5/30

Linear regression

Linear regression: for any x, the mean of the distribution of the Y’s is given by x

In general, Y will differ from this mean, and we denote this difference as follows

Y x is a random variable and we can also choose

so that the mean of the distribution of this random is equal to zero.

6/30

EXx 1 2 3 4 5 6 7 8 9 10 11 12

y 16 35 45 64 86 96 106 124 134 156 164 182

7/30

Analysisˆ

ˆi i i

y a bx

e y y

1

n

ii

e as close as possible to zero.

8/30

Principle of least squares

2 2

1 1

( ( ))n n

i i ii i

e y a bx

Choose a and b so that

is minimum. The procedure of finding the equation of the line which best fits a given set of paired data, called the method of least squares. Some notations:

2

2 2 1

1 1

( )( )

n

in ni

xx i ii i

xS x x x

n

2

2 2 1

1 1

( )( )

n

in ni

yy i ii i

yS y y y

n

1 1

1 1

( )( )( )( )

n n

i in ni i

xy i i i ii i

x yS x x y y x y

n

9/30

Least squares estimators

, where , are the means of ,xy

xx

Sa y b x and b x y x y

S

Fitted (or estimated) regression line

y a bx Residuals: observation – fitted value= ( )i iy a bx

The minimum value of the sum of squares is called the residual sum of squares or error sum of squares. We will show that n

2

1

2

residual sum of squares= ( - - )

/

i ii

xy xy xx

SSE y a bx

S S S

10/30

EX solution

Y = 14.8 X + 4.35

11/30

X-and-Y

X-axis Y-axis

independent dependent

predictor predicted

carrier response

input output

12/30

Example

You’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad \$ Sales (Units)1 12 13 24 25 4

What is the relationship between sales & advertising?

13/30

0

1

2

3

4

0 1 2 3 4 5

Sales

14/30

the Least Squares Estimators

15/30

11.2 Inference based on the Least Squares Estimators

We assume that the regression is linear in x and, furthermore, that the n random variable Yi are independently normally distribution with the means

Statistical model for straight-line regression

i i iY x

ix

i are independent normal distributed random variable having zero means and the common variance 2

16/30

Standard error of estimate

The i-th deviation and the estimate of is

2

2 2

1

1[ ( )]

2

n

e i ii

S y a bxn

Estimate of can also be written as follows2

2

2

( )

2

xyyy

xxe

SS

SS

n

17/30

Statistics for inferences: based on the assumption made concerning the distribution of the values of Y, the following theorem holds.

Theorem. The statistics

2

( ) ( )

( )xx

xxe xx e

nSa bt and t S

s S n x s

are values of random variables having the t distribution with n-2 degrees of freedom.

Confidence intervals2

/ 2

/ 2

1 ( ):

1:

exx

e

xx

xa t s

n S

b t sS

18/30

Example

The following data pertain to number of computer jobs per day and the central processing unit (CPU) time required.

Number of jobs

x

CPU time

y

1

2

3

4

5

2

5

4

9

10

19/30

EX

1) Obtain a least squares fit of a line to the observations on CPU time

2, 0xy

xx

Sb a y bx

S 2y x

20/30

Example

2) Construct a 95% confidence interval for α

22 / 46 400 /10

22 3

yy xy xxe

S S Ss

n

The 95% confidence interval of α, / 2 0.025 3.182t t

2

/ 2

1 1 90 3.182* 2 * 4.72

5 10exx

xa t s

n S

21/30

Example

3) Test the null hypothesis against the alternative hypothesis at the 0.05 level of significance.

1

1

Solution: the t statistic is given by

( ) 2 110 2.236

2xx

e

bt S

s

Criterion: 0.05 2.353t t

Decision: we cannot reject the null hypothesis

22/30

11.3 Curvilinear Regression

Regression curve is nonlinear.

Polynomial regression: 2

0 1 2p

pY x x x

Y on x is exponential, the mean of the distribution of values of Y is given by xy

Take logarithms, we have log log logy x

Thus, we can estimate by the pairs of value , ( , log )i ix y

23/30

Polynomial regression

If there is no clear indication about the function form of the regression of Y on x, we assume it is polynomial regression

20 1 2

kkY a a x a x a x

24/30

Polynomial Fitting

•Really just a generalization of the previous case•Exact solution•Just big matrices

25/30

11.4 Multiple Regression

0 1 1 2 2 k kb b x b x b x The mean of Y on x is given by

0 1 1 2 2

21 0 1 1 1 2 1 2

22 0 2 1 1 2 2 2

y nb b x b x

x y b x b x b x x

x y b x b x x b x

20 1 1

1

[ ( )]n

i i k iki

y b b x b x

Minimize

We can solve it when r=2 by the following equations

26/30

Example

P365.

27/30

Multiple Linear Fitting

X1(x), . . .,XM(x) are arbitrary fixed functions of x (can be nonlinear), called the basis functions

normal equations of the least squaresproblem

Can be put in matrix form and solved

28/30

Correlation Models

1. How strong is the linear relationship between 2 variables?

2. Coefficient of correlation usedPopulation correlation coefficient denoted Values range from -1 to +1

29/30

Correlation

Standardized observation

The sample correlation coefficient r

Observation - Sample mean

Sample standard deviationi

x

x x

s

1

1( )( )

1

ni i

i x y

x x y yr

n s s

30/30

Coefficient of Correlation Values

-1.0-1.0 +1.0+1.000-.5-.5 +.5+.5

No No CorrelationCorrelation

Increasing degree of Increasing degree of negative correlationnegative correlation

Increasing degree of Increasing degree of positive correlationpositive correlation