Upload
owen-short
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
CHAPTER 15 CHAPTER 15
SSIMULATIONIMULATION--BBASEDASED OOPTIMIZATIONPTIMIZATION IIII: : SSTOCHASTICTOCHASTIC GGRADIENT AND RADIENT AND
SSAMPLE AMPLE PPATHATH MMETHODSETHODS
•Organization of chapter in ISSO–Introduction to gradient estimation–Interchange of derivative and integral–Gradient estimation techniques
•Likelihood ratio/score function (LR/SF)•Infinitesimal perturbation analysis (IPA)
–Optimization with gradient estimates–Sample path method
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall
15-2
• Estimate the gradient of the loss function with respect to parameters for optimization from simulation outputs
where L() is a scalar-valued loss function to minimize and is a p-dimensional vector of parameters
• Essential properties of gradient estimates— Unbiased:
— Small variance
L
g
Issues in Gradient EstimationIssues in Gradient Estimation
ˆ( ) ( )E g g
( )
( )L
g
15-3
Two Types of ParametersTwo Types of Parameters
where V is the random effect in the system, is the probability density function of V
• Distributional parameters D: Elements of that enter via their effect on probability distribution of V. For example, if scalar V has distribution N(,2), then and 2 are distributional parameters
• Structural parameters S: Elements of that have effects directly on the loss function (via Q)
• Distinction not always obvious
|pV
( ) , , |S DL E Q Q p dVV
15-4
Interchange of Derivative and IntegralInterchange of Derivative and Integral• Unbiased gradient estimations using only one simulation
require the interchange of derivative and integral:
• Above generally not true. Technical conditions needed for validity:
— Q ·pV and are continuous—
—
• Above has implications in practical applications
Q pLQ p d d
, |, |
V
V
?
/Q pV
0 0, | ,Q p q q dV
1 1, | ,Q p q q dV
15-5
A General Form of Gradient EstimateA General Form of Gradient Estimate• Assume that all the conditions required for the exchange of
derivative and integral are satisfied,
• Hence, an unbiased gradient estimate can be obtained as
1
| ,( ) , |
| ,, | |
log | ,,
p QQ p d
p QQ p p d
p QE Q
VV
VV V
V
g
V VV
p Q
Q
log | ,ˆ( ) , V V Vg V
Output from one Output from one simulationsimulation!!
15-6
Two Gradient Estimates: LR/SF and IPATwo Gradient Estimates: LR/SF and IPA
• Likelihood Ratio/ Score Function (LR/SF): only distributional parameters
• Infinitestimal Perturbation Analysis (IPA): only structural parameters
log | ,
ˆ( ) ,p Q
Q V V Vg V
pure LR/SFpure LR/SF pure IPApure IPA
LR SF
pQ/
log |ˆ ( ) ,
V V
g V
IPA
Q ,ˆ ( )
V
g
15-7
Comparison of Pure LR/SF and IPAComparison of Pure LR/SF and IPA• In practice, neither extreme (LR/SF or IPA) may provide a
framework for reasonable implementation:– LR/SF may require deriving a complex distribution
function starting from U(0,1)
– IPA may lead to intractable Q/with a complex Q(,V)
• Pure LR/SF gradient estimate tend to suffer from large variance (variance can grow with the number of components in V)
• Pure IPA may result in a Q(,V) that fails to meet the conditions for valid interchange of derivative and integral. Hence can lead to biased gradient estimate.
• In many cases where IPA is feasible, it leads to low variance gradient estimate
15-8
• Let Z be exponential random variable with mean . That is
. Define L E(Z) . Then L/1.
— LR/SF estimate: V Z; Q(,V) V.
— IPA estimate: V U(0,1); Q(,V) logV (Z logV).
• Both of LR/SF and IPA estimators are unbiased
A Simple Example: Exponential Distribution A Simple Example: Exponential Distribution
/| zZp z e
/
log |ˆ ( ) 1V
LR SF
p V V Vg V
,
ˆ ( ) logIPA
Q Vg V
15-9
• Use the gradient estimates in the root-finding stochastic approximation (SA) algorithm to minimize the loss function L() E[Q(,V)]: Find such that g() 0 based on simulation outputs
• A general root-finding SA algorithm:
where ak is the step size with
• If Yk is unbiased and has bounded variance (and other appropriate assumptions hold), then (a.s.)
Stochastic Optimization with Stochastic Optimization with Gradient EstimateGradient Estimate
k k k k ka1ˆ ˆ ˆ( ) Y
0, 0,k k ka a a
an estimate ofan estimate of kˆ( )g
ˆk
15-10
Simulation-Based OptimizationSimulation-Based Optimization• Use gradient estimate derived from one simulation run in the
iteration of SA:
where Vk is the realization of V from a simulation run with parameter set at
ˆ ˆ
,log |ˆ ˆ , ,( )
( ) ( )k k
kV kk k k k
QpQ
VVY V
ˆk
run one simulation with to obtain Vk
ˆk
derive gradient estimate from Vk
iterate SA with the gradient estimate
ˆk
1ˆ
k
15-11
Example: Experimental ResponseExample: Experimental Response(Examples 15.4 and 15.5 in (Examples 15.4 and 15.5 in ISSOISSO))
• Let {Vk} be i.i.d. randomly generated binary (on-off) stimuli with “on” probability . Assume Q(,,Vk) represents negative of specimen response, where is design parameter. Objective is to design experiment to maximize the response (i.e., minimize Q) by selecting values for and .
• Gradient estimate: [, ]T;
where and denotes derivative w.r.t. x
1| 1 , 0 or1Vp
ˆ ˆ, ,ˆ ( 1)
ˆ ,
( ) ( )( )
( )
kk k k k
k k
k k
VQ Q
Y
Q
V V
V
ˆ ˆ ˆ,
T
k k k xQ
15-12
Experimental Response (continued)Experimental Response (continued)• Specific response function:
where is a structural parameter, but is both a distributional and structural parameter. Then:
2, (1 )Q V V
1| 1 , 0 or 1.Vp
2 2( ) ,L E Q V
1 3; 1 3
1 3
/( ) /
/L
15-13
Search Path in Experimental Response ProblemSearch Path in Experimental Response Problem
15-14
Sample Path MethodSample Path Method• Sample path method based on reusingreusing a fixed set of
simulation runs• Method based on minimizing rather than L()
— represents sample mean of N simulation runs
• If N is large, then minimum of is close to minimum of L() (under conditions)
• Optimization problem with is effectively deterministic— Can use standard nonlinear programming— IPA and/or LR/SF methods of gradient estimation still
relevant• Generally need to choose a fixed value of (reference value)
to produce the N simulation runs• Choice of reference value has impact on for finite N
( )NL ( )NL
( )NL
( )NL
( )NL
15-15
Accuracy of Sample Path MethodAccuracy of Sample Path Method• Interested in accuracy of sample path method in seeking true
optimal (minimum of L())• Let represent minimum of surrogate loss
• Let denote final solution from nonlinear programming method
• Hence, error in estimate is due to two sources: — Error in nonlinear programming solution to finding
— Difference in and
• Triangle inequality can be used to provide bound to overall error:
• Sometimes numerical values can be assigned to two right-hand terms in triangle inequality
( )NL *N
*N
*N
N Nˆ ˆ