ETC5410: Nonparametric smoothing methodsrobjhyndman.com/etc5410/additive.pdf · ETC5410:...

Preview:

Citation preview

ETC5410: Nonparametric smoothing methods 1

ETC5410: Nonparametricsmoothing methods

July 2008

Rob J Hyndmanhttp://www.robjhyndman.com/

ETC5410: Nonparametric smoothing methods 2

Outline

1 Density estimation

2 Kernel regression

3 Splines

4 Additive models

5 Functional data analysis

ETC5410: Nonparametric smoothing methods 3

ETC5410: Nonparametricsmoothing methods

4. Additive models

1 Penalized regression splines

2 Mixed model representation

3 Additive models

4 Case study: electricity demand

ETC5410: Nonparametric smoothing methods Penalized regression splines 4

Outline

1 Penalized regression splines

2 Mixed model representation

3 Additive models

4 Case study: electricity demand

ETC5410: Nonparametric smoothing methods Penalized regression splines 5

Penalized spline regression

Recall linear version:

r(x) = b(x)β

where

b(x) =[1 x (x − κ1)+ . . . (x − κK )+

]and β = [β0, β1, u1, . . . , uK ]′ chosen to minimize

‖y − Bβ‖2 subject to β′Dβ ≤ C

with B = [b′(x1), . . . ,b′(xn)]′ and D =

[02×2 02×K

0K×2 IK×K

].

Solution: βλ = (B′B + λ2D)−1B′y.

ETC5410: Nonparametric smoothing methods Penalized regression splines 6

Penalized regression splines

●●●

●●

●●●●

●●

●●

●●

●●

●●

500 550 600 650 700 750 800 850

010

2030

40

price

ship

men

ts

ETC5410: Nonparametric smoothing methods Penalized regression splines 6

Penalized regression splines

500 550 600 650 700 750 800 850

010

2030

40

price

●●●

●●

●●●●

●●

●●

●●

●●

●●

lambda = 10

ETC5410: Nonparametric smoothing methods Penalized regression splines 6

Penalized regression splines

500 550 600 650 700 750 800 850

010

2030

40

price

●●●

●●

●●●●

●●

●●

●●

●●

●●

lambda = 40

ETC5410: Nonparametric smoothing methods Penalized regression splines 6

Penalized regression splines

500 550 600 650 700 750 800 850

010

2030

40

price

●●●

●●

●●●●

●●

●●

●●

●●

●●

lambda = 70

ETC5410: Nonparametric smoothing methods Penalized regression splines 6

Penalized regression splines

500 550 600 650 700 750 800 850

010

2030

40

price

●●●

●●

●●●●

●●

●●

●●

●●

●●

lambda = 100

ETC5410: Nonparametric smoothing methods Penalized regression splines 6

Penalized regression splines

500 550 600 650 700 750 800 850

010

2030

40

price

●●●

●●

●●●●

●●

●●

●●

●●

●●

lambda = 130

ETC5410: Nonparametric smoothing methods Penalized regression splines 6

Penalized regression splines

500 550 600 650 700 750 800 850

010

2030

40

price

●●●

●●

●●●●

●●

●●

●●

●●

●●

lambda = 160

ETC5410: Nonparametric smoothing methods Penalized regression splines 6

Penalized regression splines

500 550 600 650 700 750 800 850

010

2030

40

price

●●●

●●

●●●●

●●

●●

●●

●●

●●

lambda = 190

ETC5410: Nonparametric smoothing methods Penalized regression splines 6

Penalized regression splines

500 550 600 650 700 750 800 850

010

2030

40

price

●●●

●●

●●●●

●●

●●

●●

●●

●●

lambda = 220

ETC5410: Nonparametric smoothing methods Penalized regression splines 6

Penalized regression splines

500 550 600 650 700 750 800 850

010

2030

40

price

●●●

●●

●●●●

●●

●●

●●

●●

●●

lambda = 250

ETC5410: Nonparametric smoothing methods Penalized regression splines 6

Penalized regression splines

500 550 600 650 700 750 800 850

010

2030

40

price

●●●

●●

●●●●

●●

●●

●●

●●

●●

lambda = 280

ETC5410: Nonparametric smoothing methods Mixed model representation 7

Outline

1 Penalized regression splines

2 Mixed model representation

3 Additive models

4 Case study: electricity demand

ETC5410: Nonparametric smoothing methods Mixed model representation 8

Mixed model representation

Split B matrix in two:

X =

1 x1...

...1 xn

and Z =

(x1 − κ1)+ . . . (x1 − κK )+... . . . ...

(xn − κ1)+ . . . (xn − κK )+

and let β = [β0, β1]′ and u = [u1, . . . , uK ]′.Then we want to minimize

‖y − Xβ − Zu‖2 + λ2‖u‖2This is equivalent to finding the Best LinearUnbiased Predictor (BLUP) of the mixed model

y + Xβ + Zu + ε

where ui ∼ N(0, σ2u) and εj ∼ N(0, σ2

ε).

ETC5410: Nonparametric smoothing methods Mixed model representation 9

Mixed model representation

Advantages

Automatic penalty selection: use REML.

Easy to fit using standard software.

Easy to develop Bayesian version

FormulasLet λ = σε/σu and V = Cov(y) = σ2

uZZ′ + σ2εI.

Thenβ = (XV−1X)−1X′V−1y.

u = σ2uZ′V−1(y − Xβ)−1.

V estimated using profile log-likelihood methods.

ETC5410: Nonparametric smoothing methods Mixed model representation 9

Mixed model representation

Advantages

Automatic penalty selection: use REML.

Easy to fit using standard software.

Easy to develop Bayesian version

FormulasLet λ = σε/σu and V = Cov(y) = σ2

uZZ′ + σ2εI.

Thenβ = (XV−1X)−1X′V−1y.

u = σ2uZ′V−1(y − Xβ)−1.

V estimated using profile log-likelihood methods.

ETC5410: Nonparametric smoothing methods Mixed model representation 9

Mixed model representation

Advantages

Automatic penalty selection: use REML.

Easy to fit using standard software.

Easy to develop Bayesian version

FormulasLet λ = σε/σu and V = Cov(y) = σ2

uZZ′ + σ2εI.

Thenβ = (XV−1X)−1X′V−1y.

u = σ2uZ′V−1(y − Xβ)−1.

V estimated using profile log-likelihood methods.

ETC5410: Nonparametric smoothing methods Mixed model representation 9

Mixed model representation

Advantages

Automatic penalty selection: use REML.

Easy to fit using standard software.

Easy to develop Bayesian version

FormulasLet λ = σε/σu and V = Cov(y) = σ2

uZZ′ + σ2εI.

Thenβ = (XV−1X)−1X′V−1y.

u = σ2uZ′V−1(y − Xβ)−1.

V estimated using profile log-likelihood methods.

ETC5410: Nonparametric smoothing methods Mixed model representation 9

Mixed model representation

Advantages

Automatic penalty selection: use REML.

Easy to fit using standard software.

Easy to develop Bayesian version

FormulasLet λ = σε/σu and V = Cov(y) = σ2

uZZ′ + σ2εI.

Thenβ = (XV−1X)−1X′V−1y.

u = σ2uZ′V−1(y − Xβ)−1.

V estimated using profile log-likelihood methods.

ETC5410: Nonparametric smoothing methods Mixed model representation 9

Mixed model representation

Advantages

Automatic penalty selection: use REML.

Easy to fit using standard software.

Easy to develop Bayesian version

FormulasLet λ = σε/σu and V = Cov(y) = σ2

uZZ′ + σ2εI.

Thenβ = (XV−1X)−1X′V−1y.

u = σ2uZ′V−1(y − Xβ)−1.

V estimated using profile log-likelihood methods.

ETC5410: Nonparametric smoothing methods Mixed model representation 10

Choice of knots

Provided the set of knots is relatively densewith respect to the {xi}, the result hardlychanges.

Choose enough knots to model structure, butnot too many knots to cause computationalproblems.RWC recommend:

max(m/4, 35) knots where m = number of uniqueobservations.κj =

(j+1K+1

)th sample quantile of the unique {xi}.

ETC5410: Nonparametric smoothing methods Mixed model representation 10

Choice of knots

Provided the set of knots is relatively densewith respect to the {xi}, the result hardlychanges.

Choose enough knots to model structure, butnot too many knots to cause computationalproblems.

RWC recommend:

max(m/4, 35) knots where m = number of uniqueobservations.κj =

(j+1K+1

)th sample quantile of the unique {xi}.

ETC5410: Nonparametric smoothing methods Mixed model representation 10

Choice of knots

Provided the set of knots is relatively densewith respect to the {xi}, the result hardlychanges.

Choose enough knots to model structure, butnot too many knots to cause computationalproblems.RWC recommend:

max(m/4, 35) knots where m = number of uniqueobservations.κj =

(j+1K+1

)th sample quantile of the unique {xi}.

ETC5410: Nonparametric smoothing methods Mixed model representation 10

Choice of knots

Provided the set of knots is relatively densewith respect to the {xi}, the result hardlychanges.

Choose enough knots to model structure, butnot too many knots to cause computationalproblems.RWC recommend:

max(m/4, 35) knots where m = number of uniqueobservations.

κj =(

j+1K+1

)th sample quantile of the unique {xi}.

ETC5410: Nonparametric smoothing methods Mixed model representation 10

Choice of knots

Provided the set of knots is relatively densewith respect to the {xi}, the result hardlychanges.

Choose enough knots to model structure, butnot too many knots to cause computationalproblems.RWC recommend:

max(m/4, 35) knots where m = number of uniqueobservations.κj =

(j+1K+1

)th sample quantile of the unique {xi}.

ETC5410: Nonparametric smoothing methods Mixed model representation 11

Example

500 550 600 650 700 750 800 850

1015

2025

3035

price

●●●

●●●

●●

●●

●●

ETC5410: Nonparametric smoothing methods Mixed model representation 12

Example

Implementation in R

fit <- spm(shipments ∼ f(price))plot(fit, col=”red”, lwd=2,

shade.col=”yellow”, rug.col=”blue”)points(price, shipments, col=”blue”)

ETC5410: Nonparametric smoothing methods Additive models 13

Outline

1 Penalized regression splines

2 Mixed model representation

3 Additive models

4 Case study: electricity demand

ETC5410: Nonparametric smoothing methods Additive models 14

Additive models

One way around curse of dimensionality is toassume surface is additive:

r(x) = r0 +m∑

i=1

ri(xi).

Restricts complexity of surfaces but still allowsa much richer class of surfaces than parametricmodels.

Need to estimate m one-dimensional functionsinstead of one m-dimensional function.

ETC5410: Nonparametric smoothing methods Additive models 14

Additive models

One way around curse of dimensionality is toassume surface is additive:

r(x) = r0 +m∑

i=1

ri(xi).

Restricts complexity of surfaces but still allowsa much richer class of surfaces than parametricmodels.

Need to estimate m one-dimensional functionsinstead of one m-dimensional function.

ETC5410: Nonparametric smoothing methods Additive models 15

Additive models

Usually have m different bandwidths to selectwhen fitting an additive model.

Generalization of multiple regression model

Y = β0 +m∑

i=1

βixi

which is also additive in its predictors.However, additive model do not assumedependence is linear.

Estimated functions, ri , are analogues ofcoefficients in linear regression.

Interpretation easy with additive structure.

ETC5410: Nonparametric smoothing methods Additive models 15

Additive models

Usually have m different bandwidths to selectwhen fitting an additive model.

Generalization of multiple regression model

Y = β0 +m∑

i=1

βixi

which is also additive in its predictors.However, additive model do not assumedependence is linear.

Estimated functions, ri , are analogues ofcoefficients in linear regression.

Interpretation easy with additive structure.

ETC5410: Nonparametric smoothing methods Additive models 15

Additive models

Usually have m different bandwidths to selectwhen fitting an additive model.

Generalization of multiple regression model

Y = β0 +m∑

i=1

βixi

which is also additive in its predictors.However, additive model do not assumedependence is linear.

Estimated functions, ri , are analogues ofcoefficients in linear regression.

Interpretation easy with additive structure.

ETC5410: Nonparametric smoothing methods Additive models 15

Additive models

Usually have m different bandwidths to selectwhen fitting an additive model.

Generalization of multiple regression model

Y = β0 +m∑

i=1

βixi

which is also additive in its predictors.However, additive model do not assumedependence is linear.

Estimated functions, ri , are analogues ofcoefficients in linear regression.

Interpretation easy with additive structure.

ETC5410: Nonparametric smoothing methods Additive models 16

Body fat predictionsiri Percent body fat using Siri’s equationage Age (yrs)weight Weight (lbs)height Height (inches)adipos Adiposity index = Weight/Height2 (kg/m2)neck Neck circumference (cm)chest Chest circumference (cm)abdom Abdomen circumference (cm) at the umbilicus and level

with the iliac cresthip Hip circumference (cm)thigh Thigh circumference (cm)knee Knee circumference (cm)ankle Ankle circumference (cm)biceps Extended biceps circumference (cm)forearm Forearm circumference (cm)wrist Wrist circumference (cm) distal to the styloid processes

ETC5410: Nonparametric smoothing methods Additive models 17

Body fat prediction

Implementation in R

fat <- fat[fat$height>50 & fat$weight<300,]attach(fat)fit <- spm(siri ∼ f(age) + f(height) + f(weight) +

f(abdom) + f(adipos) + f(neck) +f(chest) + f(hip) + f(thigh))

summary(fit)

ETC5410: Nonparametric smoothing methods Additive models 18

Additive models

Estimate each function using a univariatesmoother.

Any of the functions ri can be fitted linearly byusing the linear regression ‘smoother’ for thatvariable.

Categorical variables are easily incorporated byfitting constant for each level of the variable.

Can allow interactions between two variables byfitting a bivariate surface to the partialresiduals.

ETC5410: Nonparametric smoothing methods Additive models 18

Additive models

Estimate each function using a univariatesmoother.

Any of the functions ri can be fitted linearly byusing the linear regression ‘smoother’ for thatvariable.

Categorical variables are easily incorporated byfitting constant for each level of the variable.

Can allow interactions between two variables byfitting a bivariate surface to the partialresiduals.

ETC5410: Nonparametric smoothing methods Additive models 18

Additive models

Estimate each function using a univariatesmoother.

Any of the functions ri can be fitted linearly byusing the linear regression ‘smoother’ for thatvariable.

Categorical variables are easily incorporated byfitting constant for each level of the variable.

Can allow interactions between two variables byfitting a bivariate surface to the partialresiduals.

ETC5410: Nonparametric smoothing methods Additive models 18

Additive models

Estimate each function using a univariatesmoother.

Any of the functions ri can be fitted linearly byusing the linear regression ‘smoother’ for thatvariable.

Categorical variables are easily incorporated byfitting constant for each level of the variable.

Can allow interactions between two variables byfitting a bivariate surface to the partialresiduals.

ETC5410: Nonparametric smoothing methods Additive models 19

Backfitting algorithm

Estimate additive models by estimating eachfunction using a univariate smoother.

Backfitting algorithm is iterative procedure forfitting additive models.

Consider the conditional expectation

E(Y − r0 −

∑j 6=k

rj(xj)∣∣ xk

)= rk(xk).

True for k = 1, . . . ,m.

Using this result and given estimates of rj forj = 1, . . . ,m, we compute improved estimateof rk as follows.

ETC5410: Nonparametric smoothing methods Additive models 19

Backfitting algorithm

Estimate additive models by estimating eachfunction using a univariate smoother.

Backfitting algorithm is iterative procedure forfitting additive models.

Consider the conditional expectation

E(Y − r0 −

∑j 6=k

rj(xj)∣∣ xk

)= rk(xk).

True for k = 1, . . . ,m.

Using this result and given estimates of rj forj = 1, . . . ,m, we compute improved estimateof rk as follows.

ETC5410: Nonparametric smoothing methods Additive models 19

Backfitting algorithm

Estimate additive models by estimating eachfunction using a univariate smoother.

Backfitting algorithm is iterative procedure forfitting additive models.

Consider the conditional expectation

E(Y − r0 −

∑j 6=k

rj(xj)∣∣ xk

)= rk(xk).

True for k = 1, . . . ,m.

Using this result and given estimates of rj forj = 1, . . . ,m, we compute improved estimateof rk as follows.

ETC5410: Nonparametric smoothing methods Additive models 19

Backfitting algorithm

Estimate additive models by estimating eachfunction using a univariate smoother.

Backfitting algorithm is iterative procedure forfitting additive models.

Consider the conditional expectation

E(Y − r0 −

∑j 6=k

rj(xj)∣∣ xk

)= rk(xk).

True for k = 1, . . . ,m.

Using this result and given estimates of rj forj = 1, . . . ,m, we compute improved estimateof rk as follows.

ETC5410: Nonparametric smoothing methods Additive models 20

Backfitting algorithm

Let ei |k ← yi − r0 −∑

j 6=k rj(xi), i = 1, . . . , n.

Then rk ← smooth{(xi , ei |k)}.

These improved estimates are computed fork = 1, . . . ,m. Then the whole cycle is repeateduntil the individual functions don’t change.

Initialize rj(x) = 0, j = 1, . . . ,m.

Alternatively, initialize rj(x) = βjx where βjx isterm from linear regression.

ETC5410: Nonparametric smoothing methods Additive models 20

Backfitting algorithm

Let ei |k ← yi − r0 −∑

j 6=k rj(xi), i = 1, . . . , n.

Then rk ← smooth{(xi , ei |k)}.

These improved estimates are computed fork = 1, . . . ,m. Then the whole cycle is repeateduntil the individual functions don’t change.

Initialize rj(x) = 0, j = 1, . . . ,m.

Alternatively, initialize rj(x) = βjx where βjx isterm from linear regression.

ETC5410: Nonparametric smoothing methods Additive models 20

Backfitting algorithm

Let ei |k ← yi − r0 −∑

j 6=k rj(xi), i = 1, . . . , n.

Then rk ← smooth{(xi , ei |k)}.

These improved estimates are computed fork = 1, . . . ,m. Then the whole cycle is repeateduntil the individual functions don’t change.

Initialize rj(x) = 0, j = 1, . . . ,m.

Alternatively, initialize rj(x) = βjx where βjx isterm from linear regression.

ETC5410: Nonparametric smoothing methods Additive models 21

Convergence

Convergence of backfitting algorithm not alwaysguaranteed.

Convergence proven in some cases, includingsmoothing splines.

Unproven for locally-weighted regressions.

Seems to work well in practice.

ETC5410: Nonparametric smoothing methods Additive models 21

Convergence

Convergence of backfitting algorithm not alwaysguaranteed.

Convergence proven in some cases, includingsmoothing splines.

Unproven for locally-weighted regressions.

Seems to work well in practice.

ETC5410: Nonparametric smoothing methods Additive models 21

Convergence

Convergence of backfitting algorithm not alwaysguaranteed.

Convergence proven in some cases, includingsmoothing splines.

Unproven for locally-weighted regressions.

Seems to work well in practice.

ETC5410: Nonparametric smoothing methods Additive models 22

Coplots

Graphical representation of multidimensionalsurface.Plot slices of surface by plotting value of surfaceagainst one of explanatory variables while holding allother values fixed.

If slices all show similar shape (apart from ashift up or down), then no interaction betweenvariables. Additive model might be preferable.

If slices show changes in shape, there isinteraction. Additive model is not appropriate.

ETC5410: Nonparametric smoothing methods Additive models 22

Coplots

Graphical representation of multidimensionalsurface.Plot slices of surface by plotting value of surfaceagainst one of explanatory variables while holding allother values fixed.

If slices all show similar shape (apart from ashift up or down), then no interaction betweenvariables. Additive model might be preferable.

If slices show changes in shape, there isinteraction. Additive model is not appropriate.

ETC5410: Nonparametric smoothing methods Additive models 23

Inference for Additive Models

Each fitted function can be written as a linear smootherri = Siy for some n × n matrix Si .

Assuming iid errors, then Cov(r) = SiSTi σ

2 whereσ2 = V(Yj).

Since r(x) is simply linear function of the individualfunctions ri , the additive model is also a linear smoother.

Denote smoothing matrix as S:

r(x) = Sy = r01 +m∑

j=1

Sjy

where 1 = [1, 1, . . . , 1]T . Then S =∑m

i=0 Si where S0 issuch that S0y = r01.

Thus all inference results for linear smoothers may beapplied to additive model.

ETC5410: Nonparametric smoothing methods Additive models 23

Inference for Additive Models

Each fitted function can be written as a linear smootherri = Siy for some n × n matrix Si .

Assuming iid errors, then Cov(r) = SiSTi σ

2 whereσ2 = V(Yj).

Since r(x) is simply linear function of the individualfunctions ri , the additive model is also a linear smoother.

Denote smoothing matrix as S:

r(x) = Sy = r01 +m∑

j=1

Sjy

where 1 = [1, 1, . . . , 1]T . Then S =∑m

i=0 Si where S0 issuch that S0y = r01.

Thus all inference results for linear smoothers may beapplied to additive model.

ETC5410: Nonparametric smoothing methods Additive models 23

Inference for Additive Models

Each fitted function can be written as a linear smootherri = Siy for some n × n matrix Si .

Assuming iid errors, then Cov(r) = SiSTi σ

2 whereσ2 = V(Yj).

Since r(x) is simply linear function of the individualfunctions ri , the additive model is also a linear smoother.

Denote smoothing matrix as S:

r(x) = Sy = r01 +m∑

j=1

Sjy

where 1 = [1, 1, . . . , 1]T . Then S =∑m

i=0 Si where S0 issuch that S0y = r01.

Thus all inference results for linear smoothers may beapplied to additive model.

ETC5410: Nonparametric smoothing methods Additive models 23

Inference for Additive Models

Each fitted function can be written as a linear smootherri = Siy for some n × n matrix Si .

Assuming iid errors, then Cov(r) = SiSTi σ

2 whereσ2 = V(Yj).

Since r(x) is simply linear function of the individualfunctions ri , the additive model is also a linear smoother.

Denote smoothing matrix as S:

r(x) = Sy = r01 +m∑

j=1

Sjy

where 1 = [1, 1, . . . , 1]T . Then S =∑m

i=0 Si where S0 issuch that S0y = r01.

Thus all inference results for linear smoothers may beapplied to additive model.

ETC5410: Nonparametric smoothing methods Additive models 23

Inference for Additive Models

Each fitted function can be written as a linear smootherri = Siy for some n × n matrix Si .

Assuming iid errors, then Cov(r) = SiSTi σ

2 whereσ2 = V(Yj).

Since r(x) is simply linear function of the individualfunctions ri , the additive model is also a linear smoother.

Denote smoothing matrix as S:

r(x) = Sy = r01 +m∑

j=1

Sjy

where 1 = [1, 1, . . . , 1]T . Then S =∑m

i=0 Si where S0 issuch that S0y = r01.

Thus all inference results for linear smoothers may beapplied to additive model.

ETC5410: Nonparametric smoothing methods Additive models 24

Degrees of freedom of additivemodel

Need a measure of df associated with eachpredictor. Let S(i) denote smoothing matrix thatwould be obtained if we omitted xi from thepredictor space. Then df due to xi is defined to be

dfi = tr(2S− SST )− tr(2S(i) − S(i)ST(i)).

Derive approximate F tests for the ith predictor inthis way.

ETC5410: Nonparametric smoothing methods Additive models 25

Generalised additive models

What if Y is non-Gaussian (e.g. binary (0,1)).Want to extend additive models in same was aslinear models have been extended to generalisedlinear models.

A generalised additive model (GAM) is defined byspecifying:

1 distribution of response variable

2 link function: g(µ) = r0 +m∑

j=1

rj(xj) where

µ = E(Y | x1, . . . , xm)

ETC5410: Nonparametric smoothing methods Additive models 25

Generalised additive models

What if Y is non-Gaussian (e.g. binary (0,1)).Want to extend additive models in same was aslinear models have been extended to generalisedlinear models.A generalised additive model (GAM) is defined byspecifying:

1 distribution of response variable

2 link function: g(µ) = r0 +m∑

j=1

rj(xj) where

µ = E(Y | x1, . . . , xm)

ETC5410: Nonparametric smoothing methods Additive models 25

Generalised additive models

What if Y is non-Gaussian (e.g. binary (0,1)).Want to extend additive models in same was aslinear models have been extended to generalisedlinear models.A generalised additive model (GAM) is defined byspecifying:

1 distribution of response variable

2 link function: g(µ) = r0 +m∑

j=1

rj(xj) where

µ = E(Y | x1, . . . , xm)

ETC5410: Nonparametric smoothing methods Additive models 26

Examples:

Y binary and g(µ) = log[µ(1− µ)]. This is alogistic additive model.

Y normal and g(µ) = µ. This is a standardadditive model.

ETC5410: Nonparametric smoothing methods Additive models 26

Examples:

Y binary and g(µ) = log[µ(1− µ)]. This is alogistic additive model.

Y normal and g(µ) = µ. This is a standardadditive model.

ETC5410: Nonparametric smoothing methods Additive models 26

Examples:

Y binary and g(µ) = log[µ(1− µ)]. This is alogistic additive model.

Y normal and g(µ) = µ. This is a standardadditive model.

EstimationHastie and Tibshirani describe method for fittingGAMs using a method known as “local scoring”which is an extension of the Fisher scoringprocedure.

ETC5410: Nonparametric smoothing methods Case study: electricity demand 27

Outline

1 Penalized regression splines

2 Mixed model representation

3 Additive models

4 Case study: electricity demand

Forecasting electricity demand 2

The problem

We want to forecast the peak electricitydemand in a half-hour period in ten yearstime.

We have ten years of half-hourly electricitydata, temperature data and someeconomic and demographic data.

The location is South Australia: home tothe most volatile electricity demand in theworld.

Sounds impossible?

Forecasting electricity demand 2

The problem

We want to forecast the peak electricitydemand in a half-hour period in ten yearstime.

We have ten years of half-hourly electricitydata, temperature data and someeconomic and demographic data.

The location is South Australia: home tothe most volatile electricity demand in theworld.

Sounds impossible?

Forecasting electricity demand 2

The problem

We want to forecast the peak electricitydemand in a half-hour period in ten yearstime.

We have ten years of half-hourly electricitydata, temperature data and someeconomic and demographic data.

The location is South Australia: home tothe most volatile electricity demand in theworld.

Sounds impossible?

Forecasting electricity demand 2

The problem

We want to forecast the peak electricitydemand in a half-hour period in ten yearstime.

We have ten years of half-hourly electricitydata, temperature data and someeconomic and demographic data.

The location is South Australia: home tothe most volatile electricity demand in theworld.

Sounds impossible?

Forecasting electricity demand 2

The problem

We want to forecast the peak electricitydemand in a half-hour period in ten yearstime.

We have ten years of half-hourly electricitydata, temperature data and someeconomic and demographic data.

The location is South Australia: home tothe most volatile electricity demand in theworld.

Sounds impossible?

Forecasting electricity demand 3

Demand data

Forecasting electricity demand 4

Demand dataSouth Australian operational demand (summer 06/07)

SA

dem

and

(GW

)

1.0

1.5

2.0

2.5

Nov 06 Dec 06 Jan 07 Feb 07 Mar 07

Forecasting electricity demand 5

Demand dataSA demand (first 3 weeks of January 2007)

Date in January

SA

dem

and

(GW

)

1.0

1.5

2.0

2.5

1 2 3 4 5 6 7 8 9 10 12 14 16 18 2011 13 15 17 19 21

Forecasting electricity demand 6

Demand boxplots

Forecasting electricity demand 7

Temperature data

Forecasting electricity demand 8

Demand drivers

calendar effects

prevailing weather conditions (and thetiming of those conditionals

climate changes

economic and demographic changes

changing technology

Modelling framework

Semi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.

Forecasting electricity demand 8

Demand drivers

calendar effects

prevailing weather conditions (and thetiming of those conditionals

climate changes

economic and demographic changes

changing technology

Modelling framework

Semi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.

Forecasting electricity demand 8

Demand drivers

calendar effects

prevailing weather conditions (and thetiming of those conditionals

climate changes

economic and demographic changes

changing technology

Modelling framework

Semi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.

Forecasting electricity demand 8

Demand drivers

calendar effects

prevailing weather conditions (and thetiming of those conditionals

climate changes

economic and demographic changes

changing technology

Modelling framework

Semi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.

Forecasting electricity demand 8

Demand drivers

calendar effects

prevailing weather conditions (and thetiming of those conditionals

climate changes

economic and demographic changes

changing technology

Modelling framework

Semi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.

Forecasting electricity demand 8

Demand drivers

calendar effects

prevailing weather conditions (and thetiming of those conditionals

climate changes

economic and demographic changes

changing technology

Modelling framework

Semi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.

Forecasting electricity demand 8

Demand drivers

calendar effects

prevailing weather conditions (and thetiming of those conditionals

climate changes

economic and demographic changes

changing technology

Modelling frameworkSemi-parametric additive models withcorrelated errors.

Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.

Forecasting electricity demand 8

Demand drivers

calendar effects

prevailing weather conditions (and thetiming of those conditionals

climate changes

economic and demographic changes

changing technology

Modelling frameworkSemi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.

Variables selected to provide bestout-of-sample predictions for 2005/06 summer.

Forecasting electricity demand 8

Demand drivers

calendar effects

prevailing weather conditions (and thetiming of those conditionals

climate changes

economic and demographic changes

changing technology

Modelling frameworkSemi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.

Forecasting electricity demand 9

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

yt,p denotes demand at time t (measured inhalf-hourly intervals) during period p, p = 1, . . . ,48;

hp(t) models all calendar effects;

fp(w1,t,w2,t) models all temperature effects wherew1,t is a vector of recent temperatures at KentTown and w2,t is a vector of recent temperatures atAdelaide airport;

zj,t is a demographic or economic variable at time t

nt denotes the model error at time t.

Forecasting electricity demand 9

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

yt,p denotes demand at time t (measured inhalf-hourly intervals) during period p, p = 1, . . . ,48;

hp(t) models all calendar effects;

fp(w1,t,w2,t) models all temperature effects wherew1,t is a vector of recent temperatures at KentTown and w2,t is a vector of recent temperatures atAdelaide airport;

zj,t is a demographic or economic variable at time t

nt denotes the model error at time t.

Forecasting electricity demand 9

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

yt,p denotes demand at time t (measured inhalf-hourly intervals) during period p, p = 1, . . . ,48;

hp(t) models all calendar effects;

fp(w1,t,w2,t) models all temperature effects wherew1,t is a vector of recent temperatures at KentTown and w2,t is a vector of recent temperatures atAdelaide airport;

zj,t is a demographic or economic variable at time t

nt denotes the model error at time t.

Forecasting electricity demand 9

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

yt,p denotes demand at time t (measured inhalf-hourly intervals) during period p, p = 1, . . . ,48;

hp(t) models all calendar effects;

fp(w1,t,w2,t) models all temperature effects wherew1,t is a vector of recent temperatures at KentTown and w2,t is a vector of recent temperatures atAdelaide airport;

zj,t is a demographic or economic variable at time t

nt denotes the model error at time t.

Forecasting electricity demand 9

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

yt,p denotes demand at time t (measured inhalf-hourly intervals) during period p, p = 1, . . . ,48;

hp(t) models all calendar effects;

fp(w1,t,w2,t) models all temperature effects wherew1,t is a vector of recent temperatures at KentTown and w2,t is a vector of recent temperatures atAdelaide airport;

zj,t is a demographic or economic variable at time t

nt denotes the model error at time t.

Forecasting electricity demand 10

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

hp(t) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:

hp(t) = `p(t) + αt,p + βt,p + γt,p + δt,p

`p(t) is “time of summer” effect (a regression spline);

αt,p is day of week effect;

βt,p is “holiday” effect;

γt,p New Year’s Eve effect;

δt,p is millennium effect;

Forecasting electricity demand 10

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

hp(t) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:

hp(t) = `p(t) + αt,p + βt,p + γt,p + δt,p

`p(t) is “time of summer” effect (a regression spline);

αt,p is day of week effect;

βt,p is “holiday” effect;

γt,p New Year’s Eve effect;

δt,p is millennium effect;

Forecasting electricity demand 10

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

hp(t) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:

hp(t) = `p(t) + αt,p + βt,p + γt,p + δt,p

`p(t) is “time of summer” effect (a regression spline);

αt,p is day of week effect;

βt,p is “holiday” effect;

γt,p New Year’s Eve effect;

δt,p is millennium effect;

Forecasting electricity demand 10

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

hp(t) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:

hp(t) = `p(t) + αt,p + βt,p + γt,p + δt,p

`p(t) is “time of summer” effect (a regression spline);

αt,p is day of week effect;

βt,p is “holiday” effect;

γt,p New Year’s Eve effect;

δt,p is millennium effect;

Forecasting electricity demand 10

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

hp(t) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:

hp(t) = `p(t) + αt,p + βt,p + γt,p + δt,p

`p(t) is “time of summer” effect (a regression spline);

αt,p is day of week effect;

βt,p is “holiday” effect;

γt,p New Year’s Eve effect;

δt,p is millennium effect;

Forecasting electricity demand 11

Fitted results (3pm)

0 50 100 150

−0.

40.

00.

20.

4

Day of summer

Effe

ct o

n de

man

d

Mon Tue Wed Thu Fri Sat Sun

−0.

40.

00.

20.

4

Day of week

Effe

ct o

n de

man

d

Normal Day before Holiday Day after

−0.

40.

00.

20.

4

Holiday

Effe

ct o

n de

man

d

Time: 3:00 pm

Forecasting electricity demand 12

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

fp(w1,t,w2,t) =6∑

k=0

[fk,p(xt−k) + gk,p(dt−k)

]+ qp(x+

t ) + rp(x−t ) + sp(xt)

+6∑j=1

[Fj,p(xt−48j) + Gj,p(dt−48j)

]xt is ave temp across sites at time t;

dt is the temp difference between sites at time t;x+t is max of xt values in past 24 hours;x−t is min of xt values in past 24 hours;xt is ave temp in past seven days.

Each function is smooth and estimated using regression splines.

Forecasting electricity demand 12

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

fp(w1,t,w2,t) =6∑

k=0

[fk,p(xt−k) + gk,p(dt−k)

]+ qp(x+

t ) + rp(x−t ) + sp(xt)

+6∑j=1

[Fj,p(xt−48j) + Gj,p(dt−48j)

]xt is ave temp across sites at time t;dt is the temp difference between sites at time t;

x+t is max of xt values in past 24 hours;x−t is min of xt values in past 24 hours;xt is ave temp in past seven days.

Each function is smooth and estimated using regression splines.

Forecasting electricity demand 12

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

fp(w1,t,w2,t) =6∑

k=0

[fk,p(xt−k) + gk,p(dt−k)

]+ qp(x+

t ) + rp(x−t ) + sp(xt)

+6∑j=1

[Fj,p(xt−48j) + Gj,p(dt−48j)

]xt is ave temp across sites at time t;dt is the temp difference between sites at time t;x+t is max of xt values in past 24 hours;

x−t is min of xt values in past 24 hours;xt is ave temp in past seven days.

Each function is smooth and estimated using regression splines.

Forecasting electricity demand 12

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

fp(w1,t,w2,t) =6∑

k=0

[fk,p(xt−k) + gk,p(dt−k)

]+ qp(x+

t ) + rp(x−t ) + sp(xt)

+6∑j=1

[Fj,p(xt−48j) + Gj,p(dt−48j)

]xt is ave temp across sites at time t;dt is the temp difference between sites at time t;x+t is max of xt values in past 24 hours;x−t is min of xt values in past 24 hours;

xt is ave temp in past seven days.

Each function is smooth and estimated using regression splines.

Forecasting electricity demand 12

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

fp(w1,t,w2,t) =6∑

k=0

[fk,p(xt−k) + gk,p(dt−k)

]+ qp(x+

t ) + rp(x−t ) + sp(xt)

+6∑j=1

[Fj,p(xt−48j) + Gj,p(dt−48j)

]xt is ave temp across sites at time t;dt is the temp difference between sites at time t;x+t is max of xt values in past 24 hours;x−t is min of xt values in past 24 hours;xt is ave temp in past seven days.

Each function is smooth and estimated using regression splines.

Forecasting electricity demand 12

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

fp(w1,t,w2,t) =6∑

k=0

[fk,p(xt−k) + gk,p(dt−k)

]+ qp(x+

t ) + rp(x−t ) + sp(xt)

+6∑j=1

[Fj,p(xt−48j) + Gj,p(dt−48j)

]xt is ave temp across sites at time t;dt is the temp difference between sites at time t;x+t is max of xt values in past 24 hours;x−t is min of xt values in past 24 hours;xt is ave temp in past seven days.

Each function is smooth and estimated using regression splines.

Forecasting electricity demand 12

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

fp(w1,t,w2,t) =6∑

k=0

[fk,p(xt−k) + gk,p(dt−k)

]+ qp(x+

t ) + rp(x−t ) + sp(xt)

+6∑j=1

[Fj,p(xt−48j) + Gj,p(dt−48j)

]xt is ave temp across sites at time t;dt is the temp difference between sites at time t;x+t is max of xt values in past 24 hours;x−t is min of xt values in past 24 hours;xt is ave temp in past seven days.

Each function is smooth and estimated using regression splines.

Forecasting electricity demand 13

Fitted results (3pm)

15 25 35

−0.

40.

00.

20.

4

TemperatureE

ffect

on

dem

and

15 25 35

−0.

40.

00.

20.

4

Lag 1 temperature

Effe

ct o

n de

man

d

15 25 35

−0.

40.

00.

20.

4

Lag 2 temperature

Effe

ct o

n de

man

d

15 25 35

−0.

40.

00.

20.

4

Lag 3 temperature

Effe

ct o

n de

man

d

15 25 35

−0.

40.

00.

20.

4

Lag 4 temperature

Effe

ct o

n de

man

d

15 25 35

−0.

40.

00.

20.

4

Lag 5 temperatureE

ffect

on

dem

and

15 25 35

−0.

40.

00.

20.

4

Lag 1 day temperature

Effe

ct o

n de

man

d

15 25 35

−0.

40.

00.

20.

4

Lag 2 day temperature

Effe

ct o

n de

man

d

15 25 35

−0.

40.

00.

20.

4

Last week average temp

Effe

ct o

n de

man

d

15 25 35

−0.

40.

00.

20.

4

Previous max temp

Effe

ct o

n de

man

d

5 10 20 30

−0.

40.

00.

20.

4

Previous min tempE

ffect

on

dem

and

−10 −5 0 5

−0.

40.

00.

20.

4

Temperature differential

Effe

ct o

n de

man

d

−5 0 5

−0.

40.

00.

20.

4

Lag 1 temp differential

Effe

ct o

n de

man

d

−10 −5 0 5

−0.

40.

00.

20.

4

Lag 2 temp differential

Effe

ct o

n de

man

d

−10 −5 0 5

−0.

40.

00.

20.

4

Lag 3 temp differential

Effe

ct o

n de

man

d

−8 −4 0 4

−0.

40.

00.

20.

4

Lag 4 temp differential

Effe

ct o

n de

man

d

−8 −4 0 4

−0.

40.

00.

20.

4

Lag 5 temp differential

Effe

ct o

n de

man

d

−10 −5 0 5

−0.

40.

00.

20.

4

Lag 6 temp differential

Effe

ct o

n de

man

d

−10 −5 0 5

−0.

40.

00.

20.

4

Lag 1 day temp differential

Effe

ct o

n de

man

d

Time: 3:00 pm

Forecasting electricity demand 14

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

Other variables described by linearrelationships with coefficients c1, . . . , cJ.

Estimation based on annual data.

Variable Coefficient Std. Error t value P valueIntercept −0.13981 0.04338 −3.222 0.018094Gross State Product 0.01684 0.00108 15.649 0.000004Lag Price −0.04957 0.00727 −6.818 0.000488Cooling Degree Days 0.36300 0.01716 21.157 0.000001

Forecasting electricity demand 14

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

Other variables described by linearrelationships with coefficients c1, . . . , cJ.Estimation based on annual data.

Variable Coefficient Std. Error t value P valueIntercept −0.13981 0.04338 −3.222 0.018094Gross State Product 0.01684 0.00108 15.649 0.000004Lag Price −0.04957 0.00727 −6.818 0.000488Cooling Degree Days 0.36300 0.01716 21.157 0.000001

Forecasting electricity demand 14

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

Other variables described by linearrelationships with coefficients c1, . . . , cJ.Estimation based on annual data.

Variable Coefficient Std. Error t value P valueIntercept −0.13981 0.04338 −3.222 0.018094Gross State Product 0.01684 0.00108 15.649 0.000004Lag Price −0.04957 0.00727 −6.818 0.000488Cooling Degree Days 0.36300 0.01716 21.157 0.000001

Forecasting electricity demand 14

Equations

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

Other variables described by linearrelationships with coefficients c1, . . . , cJ.Estimation based on annual data.

Variable Coefficient Std. Error t value P valueIntercept −0.13981 0.04338 −3.222 0.018094Gross State Product 0.01684 0.00108 15.649 0.000004Lag Price −0.04957 0.00727 −6.818 0.000488Cooling Degree Days 0.36300 0.01716 21.157 0.000001

Forecasting electricity demand 15

Predictions

Year

Ann

ual d

eman

d

1998 2000 2002 2004 2006

1.25

1.30

1.35

1.40

1.45

1.50

1.55

ActualFitted

Forecasting electricity demand 16

Predictions65

7075

8085

9095

R−squared

Time of day

R−

squa

red

(%)

12 midnight 6:00 am 9:00 am 12 noon 3:00 pm 6:00 pm 9:00 pm3:00 am 12 midnight

Forecasting electricity demand 17

PredictionsActual demand

Time

1998 2000 2002 2004 2006

1.0

1.5

2.0

2.5

3.0

Predicted demand

Time

1998 2000 2002 2004 2006

1.0

1.5

2.0

2.5

3.0

Forecasting electricity demand 18

Predictions

1.0 1.5 2.0 2.5 3.0

1.0

1.5

2.0

2.5

3.0

Predicted demand

Act

ual d

eman

d

Forecasting electricity demand 19

Peak demand forecasting

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

Multiple alternative futures created byresampling residuals using a seasonalbootstrap;

generating simulations of futuretemperature patterns based on seasonallybootstrapping past temperatures (withsome adjustment for extremes and climatechange);using assumed values for GSP and Price.

Forecasting electricity demand 19

Peak demand forecasting

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

Multiple alternative futures created byresampling residuals using a seasonalbootstrap;generating simulations of futuretemperature patterns based on seasonallybootstrapping past temperatures (withsome adjustment for extremes and climatechange);

using assumed values for GSP and Price.

Forecasting electricity demand 19

Peak demand forecasting

log(yt,p) = hp(t) + fp(w1,t,w2,t) +

J∑j=1

cjzj,t + nt

Multiple alternative futures created byresampling residuals using a seasonalbootstrap;generating simulations of futuretemperature patterns based on seasonallybootstrapping past temperatures (withsome adjustment for extremes and climatechange);using assumed values for GSP and Price.

Forecasting electricity demand 20

Peak demand distribution

Demand

1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Low

Demand (GW)

Den

sity

1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Base

Demand (GW)

Den

sity

1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

High

Demand (GW)

Den

sity

2007/20082008/20092009/20102010/20112011/20122012/20132013/20142014/20152015/20162016/20172017/2018

3 4 5 6

0.0

0.5

1.0

1.5

Low

Demand (GW)

Den

sity

3 4 5 6

0.0

0.5

1.0

1.5

Base

Demand (GW)

Den

sity

3 4 5 6

0.0

0.5

1.0

1.5

High

Demand (GW)

Den

sity

2007/20082008/20092009/20102010/20112011/20122012/20132013/20142014/20152015/20162016/20172017/2018

Forecasting electricity demand 21

Peak demand distribution

Annual maximum demand

3 4 5 6

0.0

0.5

1.0

1.5

Low

Demand (GW)

Den

sity

3 4 5 6

0.0

0.5

1.0

1.5

Base

Demand (GW)

Den

sity

3 4 5 6

0.0

0.5

1.0

1.5

High

Demand (GW)

Den

sity

2007/20082008/20092009/20102010/20112011/20122012/20132013/20142014/20152015/20162016/20172017/2018

3 4 5 6

0.0

0.5

1.0

1.5

Low

Demand (GW)

Den

sity

3 4 5 6

0.0

0.5

1.0

1.5

Base

Demand (GW)

Den

sity

3 4 5 6

0.0

0.5

1.0

1.5

High

Demand (GW)

Den

sity

2007/20082008/20092009/20102010/20112011/20122012/20132013/20142014/20152015/20162016/20172017/2018

Forecasting electricity demand 22

Peak demand distribution

2000 2005 2010 2015

2.5

3.0

3.5

4.0

Year

Pro

b of

exc

eeda

nce

in o

ne y

ear

90%50%10%2%

●●

● ●

Recommended