Download pdf - Master's Thesis Slides

Approximate Dynamic Programming Methods forResidential Water Heating

A thesis submitted in partial fulfillment for the degree of Master’s of Sciencein the

Department of Electrical Engineering

byMatthew Motoki

December 3, 2015

Outline

1 Motivation

2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement

3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming

Prescient Lower Bound (PLB)

4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation

5 Problem ExtensionsSolar Water HeatingAutomated Demand Response

6 Conclusion

Outline

1 Motivation






6 Conclusion

Motivation

Why do we need a smarter water heater?

I Energy efficiency is important.

• Electricity is expensive.• Burning fossil fuels is bad for the environment.

I Can we do better than water heaters with an adjustable set-point?

• If so, then are there any provable guarantees that can be made?• Theoretically, what is best that we can do?

I The legacy grid is becoming obsolete.

• Renewable energy sources are variable and distributed.• Energy storage capabilities of water heaters have been fully exploited.

1 / 31

Outline

1 Motivation






6 Conclusion

Outline

1 Motivation






6 Conclusion

Problem Formulation

State VariableDefine t ∈ 0,∆t, . . . , (N − 1)∆t. Define tk = mod(k ,N)∆t, where k = 0, 1, . . . isthe simulation time stage.

The state x := (T , h) summarizes the information needed to make a decision.

We require T ∈ [Tamb,Tmax ]. The temperature at tk is written Tk .

The hot water usage history is hk := (ti ,wi ) | 0 ≤ i < mod(k ,N), where wk is theintensity of the hot water draw at time tk .

2 / 31

Problem Formulation

Decision VariableThe decision variable is

uk :=

1, if the water heater is on

0, if the water heater is off.

We assume that the decision uk is constant during the interval [tk , tk+1). A feasibledecision uk ∈ Ωu, is one that does not violate T ∈ [Tamb,Tmax ]. A policy µ is amapping from a state into a feasible decision.

3 / 31

Problem Formulation

Hot Water Demand (Disturbance Variable) 1

We model hot water demand as a cyclostationary random process W(t) given by

W(t) := specific heat∑τ∈Ωτ

Npeople∑i=1

Nτi∑j=1

F (j)τ,i ·

(T (j)τ,i − Tamb

)· IS(j)τ,i ≤ t < S(j)

τ,i +D(j)τ,i

,

where Ωτ := shower , bath, . . . , dishwasher is the set of possible usage events, Npeople

is the number of people in a household, Nτi is the number of events of type τcorresponding to the i th person in the household, and the following are randomvariables:

S(j)τ,i := the start time of E(j)

τ,i ,

D(j)τ,i := the duration of E(j)

τ,i ,

F (j)τ,i := the flow rate of E(j)

τ,i ,

T (j)τ,i := the desired temperature of E(j)

τ,i .

4 / 31

Problem Formulation

Hot Water Demand (Disturbance Variable) 2

We can only observe W(t) at pre-specified times t ∈ Ωt , therefore, we approximateW(t) using a piecewise linear interpolation

W(t) :=W(tk) +t − tk

∆t[W(tk + ∆t)−W(tk)].

for all k = 0, 1, . . . and t ∈ [tk , tk + ∆t). The discrete-time analog of W(t) to be theaverage of W(t) over t ∈ [tk , tk + ∆t),

Wk :=1

∆t

∫ tk+∆t

tk

W(t) dt = 12 [W(tk) +W(tk + ∆t)].

We denote particular realizations of W(t) and Wk using w(t) and wk , respectively.We discretize wk ∈ 0,∆w , . . . ,wmax. We write the conditional probability massfunction of Wk given hk as pWk

(wk | hk).

5 / 31

Outline

1 Motivation






6 Conclusion

Problem Formulation

State Equation

The state equation maps the current state xk , current decision uk , and currentdisturbance wk into the next state xk+1 according to

xk+1 = f (xk , uk ,wk) :=(fT (Tk , uk ,wk), fh(tk , hk ,wk)

),

where

Tk+1 = fT (Tk , uk ,wk) := maxTk − rcool∆t (Tk − Tamb)

+ rheat∆t uk − rloss∆t wk , Tamb

hk+1 = fh(hk ,wk) :=

(tk ,wk) ∪ hk , tk 6= (N − 1)∆t

∅, otherwise,

for all k = 0, 1, . . .

6 / 31

Outline

1 Motivation






6 Conclusion

Problem Formulation

Objective Function

The objective is to minimize over all policies µ, the following function

Jµ(x0) = limK→∞

EW

[1

K

K−1∑k=0

g(Xk , µ(Xk),Wk ; θ

) ∣∣∣∣∣ x0

],

= limK→∞

1

K

K−1∑k=0

EW0,W1,...,Wk ,

[g(Xk , µ(Xk),Wk ; θ

) ∣∣ x0

].

where X0 = x0 is given and Xk = f(Xk−1, µ(Xk−1),Wk−1

), for all k = 1, 2, . . .

7 / 31

Problem Formulation

Stage Cost

The stage cost is

g (xk , uk ,wk ; θ) := α gdiscomfort (xk , uk ,wk ;Tmin) + (1− α) goperating (xk , uk) ,

where θ := α,Tmin is a customer-defined parameter set, α ∈ [0, 1] is the relativeweighting of the objectives, and Tmin is the minimum desirable temperature during ahot water use.

Operating Cost

The operating cost is

goperating (uk) :=1

∆t

∫ tk+∆t

tk

C (t) rating uk dt,

where C (t) is the cost of power and rating is the power rating of the water heater.8 / 31

Problem Formulation

Discomfort CostThe discomfort cost is

gdiscomfort

(xk , uk ,wk ;Tmin

):=

1

∆t

∫ tk+∆t

tk

maxTmin − T (t), 0

· Iw(t) > 0 dt,

where

T (t) := Tk +t − tk

∆t[fT (Tk , uk ,wk)− Tk ],

for all k = 0, 1, . . .

9 / 31

Outline

1 Motivation






6 Conclusion

Problem Formulation

Problem StatementFind a feasible on/off policy that minimizes an expected objective cost.

minimizeµ

limK→∞

EW

[1

K

K−1∑k=0

g(Xk , µ(Xk),Wk ; θ

) ∣∣∣∣∣ x0

]subject to Xk+1 = f

(Xk , µ(Xk),Wk

), µ(xk) ∈ 0, 1,

Tk ∈ [Tamb,Tmax ], for all k = 0, 1, . . .

This is a discrete-time, average cost periodic Markov decision problem (MDP).

10 / 31

Outline

1 Motivation






6 Conclusion

Outline

1 Motivation






6 Conclusion

Methodology

Finite-Horizon Dynamic Programming

The goal is to minimize over all policies µ, the following function

Jµ(x0) = EW

[gterminal(XM) +

M−1∑k=0

g(Xk , µk(Xk),Wk ; θ

) ∣∣∣∣∣ x0

],

where M is the horizon and gterminal is a terminal cost function.

The optimal policy µ∗ is the minimizer of Bellman’s equations

J∗(xM) = gterminal(xM),

J∗(xk) = minuk∈0, 1

EWk

[g(xk , uk ,wk ; θ) + J∗

(f (xk , uk ,Wk)

)| xk],

where J∗ is known as the optimal cost-to-go function.

11 / 31

Outline

1 Motivation






6 Conclusion

Methodology

Average Cost Dynamic Programming for Periodic MDP’s

Relative value iteration (VI) can be used to solve average cost periodic MDP’s.

1. Initialize J and µ arbitrarily and fix a reference state xref .

2. Calculate the new cost-to-go function J ′ by solving an N-horizon MDP usingJ(x0) as the terminal cost function.

3. Update the current cost-to-go function using J(xk)← J ′(xk)− J ′(xref ).

4. Repeat step 2 until convergence is achieved.

The relative value iteration algorithm terminates with J being a differential costfunction—interpreted as the minimum expected N-stage costs relative to the referencestate xref ; furthermore, J(xref ) is interpreted as the average cost of completing a cycle.

12 / 31

Outline

1 Motivation






6 Conclusion

Methodology

Approximate Dynamic Programming (ADP)

Exact dynamic programming is hard because of the large state-space; in particular, Tk

is continuous and dimension of hk increases at every stage (except the last stage of acycle). Simplify the model to get a more tractable problem.

1. Temperature Aggregation

2. Usage History Aggregation

3. Approximate Transition Probabilities Using Density Estimation

4. Q-Learning

13 / 31

Methodology

Temperature Aggregation

Discretize temperature T ∈ Tamb,Tamb + ∆T , . . . ,Tamb + (n − 1)∆T.Let A(T ) be the following random function of T

A(T ) :=

sgn(T − T )∆T , w.p. |T − T |/∆T

0, w.p. 1− |T − T |/∆T ,

where T = round(T/∆T )∆T . The aggregate problem has the following modifiedthermodynamics

Tk+1 = fT (Tk , uk ,wk) := round(Tk+1/∆T )∆T +A(Tk+1),

where Tk+1 = fT (Tk , uk ,wk).

14 / 31

Methodology

Usage History Aggregation

Here the goal is find a low-dimensional feature vector φk , such thatpWk

(wk | hk) ≈ pWk(wk | φk). We are interested in φk with simple update rules

φk+1 = fφ(φk ,wk). For example,

φ(1a)k = Iwk−1 > 0, φ

(1a)k+1 = Iwk > 0,

φ(2a)k =

k−1∑i=iStartUse

Iwi > 0, φ(2a)k+1 = Iwk 6= 0 ·

(φ

(2a)k + Iwk > 0

),

φ(3a)k =

k−1∑i=iStartCycle

Iwi > 0, φ(3a)k+1 = Imod(k ,N) = 0 ·

(φ

(3a)k + Iwk > 0

).

The aggregate problem uses xk = (Tk , tk ,φk) in place of xk .

15 / 31

Methodology

Approximate Transition Probabilities Using Density Estimation

• A closed-form expression for pW is hard to find.

• Use kernel density estimation to get an estimate of pWk(wk | hk).

• Estimation of high dimensional pdf’s is difficult, so use usage history aggregationto estimate pWk

(wk | φk) instead.

• Use the estimate pWk(wk | φk) to calculate the transition probabilities

Pr[fT (Tk , uk ,Wk) = Tk+1 | Tk ,φk , uk

]and Pr

[fφ(φk ,Wk) = φk+1 | φk

].

16 / 31

Methodology

Model-Free Q-Learning

• Model-Free Q-Learning involves learning from trajectories of the form(x0, u0), (x1, u1), . . . (xp, up) where uk = µ(xk).

• Q-factors are updated using the following formula

Q(xk , uk)← (1− γ)Q(xk , uk) + γ

[g(xk , uk ,wk ; θ) + min

vk+1

Q(xk+1, vk+1)

],

where xk+1 = f (xk , uk ,wk) and 0 ≤ γ ≤ 1 is the learning rate.

• The policy is updated using µ(xk)← IQ(xk , 1) < Q(xk , 0).• Model-Free Q-Learning does not require knowledge of the transition probabilities,

but it suffers from the problem of “Exploration v.s. Exploitation”.

• An ε-greedy algorithm can be used to tradeoff between exploration andexploitation.

17 / 31

Methodology

Model-Based Q-Learning

• Model-Based Q-Learning involves learning from usage trajectories w0,w1, . . . ,wp.

• The model of the system is used to obtain a family of state-decision pairtrajectories corresponding to each usage trajectory.

• The Q-factors are updated using the same formula.

• Model-Based Q-Learning does not require knowledge of the transitionprobabilities and it does not have the problem of “Exploration v.s. Exploitation”.

18 / 31

Outline

1 Motivation






6 Conclusion

Methodology


1. Generate/observe a series of usage trajectories.

2. Solve the finite-horizon problem corresponding to these trajectories exactly.

3. The average of the optimal costs is a lower bound for the objective function.

This lower bound represents represents the minimum possible objective cost, given thathot water usage is known.

19 / 31

Outline

1 Motivation






6 Conclusion

Outline

1 Motivation






6 Conclusion

Results

Numerical Simulations Setup

Figure: Simulate Hot Water Usage Data

20 / 31

Results


Figure: Hot Water Usage Probability Mass Function

20 / 31

Results


0 2 4 6 8 10 12 14 16 18 20 22 240.18

0.2

0.22

0.24

0.26

0.28

0.3

Price o

f P

ow

er

($/k

W)

Figure: Time-Varying Price of Power

20 / 31

Outline

1 Motivation






6 Conclusion

Results

Set-Point MethodsThe policy of a set-point water heater maps (Tk , uk−1) to uk :

µset−point(Tk , uk−1;ϑ) :=

0, if Tk > Tset(tk) + δ(tk)

1, if Tk < Tset(tk)− δ(tk)

uk−1, otherwise

for all k = 0, 1, . . . , where ϑ := Tset , δ.A simple case occurs when δ(tk) ≡ 0:

µsimple

(Tk ;Tset

):= I

Tk < Tset(tk)

,

for all k = 0, 1, . . .Relative VI with state xk = Tk does no worse than simple set-points.

21 / 31

Results

Simple Set-Point with HECO Pricing

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

1

3

5

7

9

11

13

15

Dis

com

fort

Cos

t (°

C/us

e)

Operating Cost ($/day)

SimpleSet−PointSolution

DynamicProgramming

Solution

PrescientLower Bound

Set−Point (°C)

25 30 35 40 45 50 55

22 / 31

Results

Simple Set-Point with HECO Pricing

1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.750

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Dis

com

fort

Cos

t (°

C/us

e)


Set−Point (°C)

25 30 35 40 45 50 55

22 / 31

Results

Simple Set-Point with Constant Pricing

0.1 0.25 0.4 0.55 0.7 0.85 1 1.15 1.3 1.45 1.6

1

3

5

7

9

11

13

15

Dis

com

fort

Cos

t (°

C/us

e)


SimpleSet−PointSolution

DynamicProgramming

Solution

PrecientLower Bound

Set−Point (°C)

25 30 35 40 45 50 55

23 / 31

Results

Simple Set-Point with Constant Pricing

1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.60

0.4

0.8

1.2

1.6

2

2.4

Dis

com

fort

Cos

t (°

C/us

e)


Set−Point (°C)

25 30 35 40 45 50 55

23 / 31

Outline

1 Motivation






6 Conclusion

Results


0.1 0.25 0.4 0.55 0.7 0.85 1 1.15 1.3 1.45 1.6

1

3

5

7

9

11

13

15

Dis

co

mfo

rt C

ost

(°C

/use

)


Hard, 1Hard, 1/3Hard, 1/10Coarse, 1Coarse, 1/3Coarse, 1/10PLB

24 / 31

Results


1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.60

0.15

0.3

0.45

0.6

0.75

Dis

com

fort

Cost (

°C

/use)


Hard, 1Hard, 1/3Hard, 1/10Coarse, 1Coarse, 1/3Coarse, 1/10PLB

24 / 31

Outline

1 Motivation






6 Conclusion

Results

Usage History Aggregation

1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.60

0.15

0.3

0.45

0.6

0.75

Dis

com

fort

Cost (

°C

/use)


∅φ(1a)

φ(2a)

φ(3a)

PLB

25 / 31

Outline

1 Motivation






6 Conclusion

Outline

1 Motivation






6 Conclusion

Problem Extension

Solar Water Heating

Let Vk be a random variable representing the solar irradiance at time tk . In practice,we will have estimate vk using forecasting methods. Let efficiency(vk) convertirradiance into usable power. The modified temperature equation is

fT (Tk , uk ,wk , vk) = max Tk − rcool∆t(Tk − Tamb) + rheat∆t uk

− rloss∆t wk + rsolar∆t · efficency(vk),Tamb

where rsolar is a conversion factor from power to temperature.

26 / 31

Outline

1 Motivation






6 Conclusion

Problem Extensions

Demand Response

Compensate customers for reducing/shifting electricity use.

Water Heater

-L

KDE

pW

6pW

DP@@@R

µ

ExpectedLoad

Utility

-C minimizeC

n∑k=1

(aL2

k + bLk + c)

subject to L = f ′(C),

1

n

n∑k=1

C(k) = Cavg ,

Cmin ≤ C(k) ≤ Cmax .

27 / 31

Problem Extensions - Automated Demand Resonse

Heursitic for Setting Price

Find β1, β2 ≥ 0 such that

C = β1L + β2,1

N

N∑k=1

C(k) = Cavg , Cmin ≤ C(k) ≤ Cmax ,

and β1 is maximal.The closed-form solution is

β∗1 = max

Cmax − Cavg

Lmax − Lavg,Cmin − Cavg

Lmin − Lavg

and β∗2 = Cavg − β∗1Lavg .

The update isC← (1− η)C + η(β∗1L + β∗2).

28 / 31

Problem Extensions

Automated Demand Resonse Simulation

29 / 31

Outline

1 Motivation






6 Conclusion

Conclusion

Summary

• Formulated the problem of minimizing a weighted sum of operating anddiscomfort costs as an average cost MDP.

• Considered approximate DP methods such as aggregation, density estimation, andQ-Learning.

• Approximate DP is at least as good as simple set-points.

• Applications of Water heaters optimized with approximate DP are solar waterheating and automated demand response.

• A longer cycle (e.g., a week or a month) should be considered.

• Non-stationary usage patterns should be considered.

30 / 31

Thank You

31 / 31