CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient

CHAPTER 15 CHAPTER 15

SSIMULATIONIMULATION--BBASEDASED OOPTIMIZATIONPTIMIZATION IIII: : SSTOCHASTICTOCHASTIC GGRADIENT AND RADIENT AND

SSAMPLE AMPLE PPATHATH MMETHODSETHODS

•Organization of chapter in ISSO–Introduction to gradient estimation–Interchange of derivative and integral–Gradient estimation techniques

•Likelihood ratio/score function (LR/SF)•Infinitesimal perturbation analysis (IPA)

–Optimization with gradient estimates–Sample path method

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

15-2

• Estimate the gradient of the loss function with respect to parameters for optimization from simulation outputs

where L() is a scalar-valued loss function to minimize and is a p-dimensional vector of parameters

• Essential properties of gradient estimates— Unbiased:

— Small variance

L

g

Issues in Gradient EstimationIssues in Gradient Estimation

ˆ( ) ( )E g g

( )

( )L

g

15-3

Two Types of ParametersTwo Types of Parameters

where V is the random effect in the system, is the probability density function of V

• Distributional parameters D: Elements of that enter via their effect on probability distribution of V. For example, if scalar V has distribution N(,2), then and 2 are distributional parameters

• Structural parameters S: Elements of that have effects directly on the loss function (via Q)

• Distinction not always obvious

|pV

( ) , , |S DL E Q Q p dVV

15-4

Interchange of Derivative and IntegralInterchange of Derivative and Integral• Unbiased gradient estimations using only one simulation

require the interchange of derivative and integral:

• Above generally not true. Technical conditions needed for validity:

— Q ·pV and are continuous—

—

• Above has implications in practical applications

Q pLQ p d d

, |, |

V

V

?

/Q pV

0 0, | ,Q p q q dV

1 1, | ,Q p q q dV

15-5

A General Form of Gradient EstimateA General Form of Gradient Estimate• Assume that all the conditions required for the exchange of

derivative and integral are satisfied,

• Hence, an unbiased gradient estimate can be obtained as

1

| ,( ) , |

| ,, | |

log | ,,

p QQ p d

p QQ p p d

p QE Q

VV

VV V

V

g

V VV

p Q

Q

log | ,ˆ( ) , V V Vg V

Output from one Output from one simulationsimulation!!

15-6

Two Gradient Estimates: LR/SF and IPATwo Gradient Estimates: LR/SF and IPA

• Likelihood Ratio/ Score Function (LR/SF): only distributional parameters

• Infinitestimal Perturbation Analysis (IPA): only structural parameters

log | ,

ˆ( ) ,p Q

Q V V Vg V

pure LR/SFpure LR/SF pure IPApure IPA

LR SF

pQ/

log |ˆ ( ) ,

V V

g V

IPA

Q ,ˆ ( )

V

g

15-7

Comparison of Pure LR/SF and IPAComparison of Pure LR/SF and IPA• In practice, neither extreme (LR/SF or IPA) may provide a

framework for reasonable implementation:– LR/SF may require deriving a complex distribution

function starting from U(0,1)

– IPA may lead to intractable Q/with a complex Q(,V)

• Pure LR/SF gradient estimate tend to suffer from large variance (variance can grow with the number of components in V)

• Pure IPA may result in a Q(,V) that fails to meet the conditions for valid interchange of derivative and integral. Hence can lead to biased gradient estimate.

• In many cases where IPA is feasible, it leads to low variance gradient estimate

15-8

• Let Z be exponential random variable with mean . That is

. Define L E(Z) . Then L/1.

— LR/SF estimate: V Z; Q(,V) V.

— IPA estimate: V U(0,1); Q(,V) logV (Z logV).

• Both of LR/SF and IPA estimators are unbiased

A Simple Example: Exponential Distribution A Simple Example: Exponential Distribution

/| zZp z e

/

log |ˆ ( ) 1V

LR SF

p V V Vg V

,

ˆ ( ) logIPA

Q Vg V

15-9

• Use the gradient estimates in the root-finding stochastic approximation (SA) algorithm to minimize the loss function L() E[Q(,V)]: Find such that g() 0 based on simulation outputs

• A general root-finding SA algorithm:

where ak is the step size with

• If Yk is unbiased and has bounded variance (and other appropriate assumptions hold), then (a.s.)

Stochastic Optimization with Stochastic Optimization with Gradient EstimateGradient Estimate

k k k k ka1ˆ ˆ ˆ( ) Y

0, 0,k k ka a a

an estimate ofan estimate of kˆ( )g

ˆk

15-10

Simulation-Based OptimizationSimulation-Based Optimization• Use gradient estimate derived from one simulation run in the

iteration of SA:

where Vk is the realization of V from a simulation run with parameter set at

ˆ ˆ

,log |ˆ ˆ , ,( )

( ) ( )k k

kV kk k k k

QpQ

VVY V

ˆk

run one simulation with to obtain Vk

ˆk

derive gradient estimate from Vk

iterate SA with the gradient estimate

ˆk

1ˆ

k

15-11

Example: Experimental ResponseExample: Experimental Response(Examples 15.4 and 15.5 in (Examples 15.4 and 15.5 in ISSOISSO))

• Let {Vk} be i.i.d. randomly generated binary (on-off) stimuli with “on” probability . Assume Q(,,Vk) represents negative of specimen response, where is design parameter. Objective is to design experiment to maximize the response (i.e., minimize Q) by selecting values for and .

• Gradient estimate: [, ]T;

where and denotes derivative w.r.t. x

1| 1 , 0 or1Vp

ˆ ˆ, ,ˆ ( 1)

ˆ ,

( ) ( )( )

( )

kk k k k

k k

k k

VQ Q

Y

Q

V V

V

ˆ ˆ ˆ,

T

k k k xQ

15-12

Experimental Response (continued)Experimental Response (continued)• Specific response function:

where is a structural parameter, but is both a distributional and structural parameter. Then:

2, (1 )Q V V

1| 1 , 0 or 1.Vp

2 2( ) ,L E Q V

1 3; 1 3

1 3

/( ) /

/L

15-13

Search Path in Experimental Response ProblemSearch Path in Experimental Response Problem

15-14

Sample Path MethodSample Path Method• Sample path method based on reusingreusing a fixed set of

simulation runs• Method based on minimizing rather than L()

— represents sample mean of N simulation runs

• If N is large, then minimum of is close to minimum of L() (under conditions)

• Optimization problem with is effectively deterministic— Can use standard nonlinear programming— IPA and/or LR/SF methods of gradient estimation still

relevant• Generally need to choose a fixed value of (reference value)

to produce the N simulation runs• Choice of reference value has impact on for finite N

( )NL ( )NL

( )NL

( )NL

( )NL

15-15

Accuracy of Sample Path MethodAccuracy of Sample Path Method• Interested in accuracy of sample path method in seeking true

optimal (minimum of L())• Let represent minimum of surrogate loss

• Let denote final solution from nonlinear programming method

• Hence, error in estimate is due to two sources: — Error in nonlinear programming solution to finding

— Difference in and

• Triangle inequality can be used to provide bound to overall error:

• Sometimes numerical values can be assigned to two right-hand terms in triangle inequality

( )NL *N

*N

*N

N Nˆ ˆ

Documents

CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient