View
222
Download
5
Tags:
Embed Size (px)
Citation preview
Stochastic Trust Region Gradient-Free Method (STRONG)
-A Response-Surface-Based Algorithm in Stochastic Optimization via Simulation
Kuo-Hao Chang
Advisor: Hong Wan
School of Industrial Engineering, Purdue University
Acknowledgement: The project was partially supported by grant from Naval Postgraduate School.
Purdue University
2
Outline
• Background
• Problem Statement
• Literatures Review
• STRONG
• Preliminary Numerical Evaluations
• Future Research
3
Background• Stochastic Optimization
The minimization (or maximization) of a function in the presence of randomness
• Optimization via Simulation:No explicit form of the objective function (only observations from simulation), function evaluations are stochastic and usually computationally expensive.
• ApplicationsInvestment portfolio optimization, production planning, traffic control etc.
4
Problem Statement (I)• Consider the unconstrained continuous minimization
problem
The response can only be observed by
: randomness defined in the probability space : the noisy term showing dependence on x
x)x(L),x(Q
P) F,,(x
)) ,x(Q(Eminargx
5
Problem Statement (II)
• Given: a simulation oracle of capable generating
s.t. Strong Law of Large Numbers hold for every
• Find: a local minimizer , i.e., find having a neighborhood such that every satisfies
*x
),x(Q x
*x)x(V * )x(Vx *)x(L)x(L *
6
Problem Assumptions• For 1. 2.
• For the underlying function 1. is bounded below and twice differentiable for every 2.
),0(N~ 2xx
2x
2x
xsup and unknown is
x
)x(L
1111 H(x) ,)( such that , , , xLx
)x(L x
7
Literatures Review (Fu, 1994; Fu 2002)
Methodology Efficient Convergent Automated
Stochastic ApproximationUsually
No
Yes Human
tuning
Sample-Path OptimizationUsually
Yes
Yes Yes
Response Surface Methodology(RSM)
Yes No No
Other Heuristic Methods (e.g.Genetic Algorithm, Tabu Search
etc.)
Yes Usually
No
Yes
8
Proposed Work
• A RSM-based method with convergence property (combining the trust region method for deterministic optimization with the RSM)
• Does not require human involvement
• Appropriate DOE to handle high-dimensional problems (on-going work)
9
Response Surface Methodology Stage I• Employ a proper experimental design• Fit a first-order model• Perform a line search• Move to a better solutionStage II (when close to the optimal solution)• Employ a proper experimental design• Fit a second-order model• Find the optimal solution
10
RSM (Mongomery, 2001)
11
Deterministic Trust Region Framework (Conn et al. 2000)
Suppose we want to minimize a deterministic objective function f(x) • Step 0: Given an initial point ,an initial trust-region radius , and some constants
satisfy and set
• Step 1: Compute a step within the trust region that “sufficiently reduces” the local model constructed by Taylor expansion (to second-order) around
• Step 2: Compute
if then define ; otherwise define • Step 3:
Increment k by 1 and go to step 1.
kx
kk dx )dx(f kk
)x(f k
)0(mk
)d(m kk
0x 00k
kd k
)d(m)0(m
)dx(f)x(fkkk
kkkk
,0k kk1k dxx k1k xx
21 10
)d(m k kx
10 10
0kk
1
1k
0k
1kk
21k
if
if
if
12
*xMinimum Local
Trust Region Method
13
Comparison between RSM and TR
• Similarity Build a local model to approximate the response function and use it to
generate the search function
• Differences TR
Developed for deterministic optimization and has nice convergence property
Cannot handle the stochastic case. Require explicit objective function, gradient and Hessian matrix
RSM Can handle the stochastic case, has well-studied DOE techniques Human involvement is required; no convergence property.
• Combine these two methods.
14
STRONG
• Stochastic TRust RegiON Gradient-Free Method
• “Gradient-Free”: No direct gradient measurement. Rather, the algorithm is based on an approximation to the gradient. (Spall, 2003; Fu, 2005)
• Combine RSM and TR
• Consists of two algorithmsMain algorithm: approach the optimal solution (major
framework)Sub_algorithm: obtain a satisfactory solution within the
trust region
15
Stochastic Trust Region
• Use “response surface” model to replace Taylor’s expansion
(deterministic model)
(stochastic model)
k: iteration counter
• Use to replace
kkkTkkkkk d)x(Hd2
1d)x(L)x(L)d(m
k k
)d(r)0(r
)x(Q)x(Qkkk
1kkk
)d(r k
)d(m k
kkkTkk^
kkk d)x(Hd2
1d)x(L)x(L̂)d(r
16
STRONG-Main Algorithm
Fit second-ordermodel
Solve the subproblem
Compute the ratio
Iterate accepted?
Update trust/ sampling
region
Sufficient reduction test
Yes
Sub-algorithm
No
Yes
No
Stage II
Initialization
Fit first-order model
Perform line search
Reduction test
YesStage I
No
17
Trust Region & Sampling Region
• Trust Region
, : radius of Trust Region iteration k
• Sampling Region
, :radius of Sampling Region in iteration k
• Initial and are determined by users in the initialization stage
( ); Later shrink/expand by the same ratio automatically
}x-x ,Rx{T kT
knk
}x-x ,Rx{S kS
knk
Sampling Region
Trust Region
0T 0
S
kT
kS
0T
0S
18
Select Appropriate DOE
• For constructing first and second order model in stage I and stage II.
• Currently require orthogonality for the second- order model to guarantee the consistency of gradient estimation.
19
Estimation Method in STRONGGiven an appropriate design strategy and initial sample size for the center point• Intercept estimator
, here represents the observation at the point , is determined by the algorithm.
• Gradient and Hessian estimator Suppose we have n design points and the response values are
, respectively.
:Design Matrix , thenX
n21 y,...y ,y
)x(Q ki
thikx
)x(L
)x(Ly
)x(Ly
Y
k
k2
k1
ny
.
.
.
k
N
1i
ki
k
N
)x(Q)x(L
k
YX)XX()x(H
)x(L T1T
k
k
kN
20
Decide the Moving Direction and Step
• Definition (Subproblem)
• Determine the new iterate solution is accepted or not
if then the solution is rejected
then the solution is accepted
• Definition (Reduction Test) for stage I
• Definition (Sufficient Reduction Test) for stage II
0)dx(L)x(L kkk
kT
kTkkk
dd)x(Hd
2
1d)x(L)x(L)d(rminarg
p
d s.t.
),)x(H
)x(Lmin()x(L
2
1)dx(L)x(L k
Tk
k
kkkk
0k
0k
21
Three situations we cannot find a satisfactory solution
• The local approximation model is poor
• The step size is too large
• Sampling error of observed response for and
kx1kx
22
Solutions
• Shrink the trust region and sampling region
• Increase the replications of the center point
• Add more the design points
• Collect all the visited solutions within the trust region and increase the replication for each of them.
23
STRONG- Sub-algorithm (Trust Region)
} ,x{ 0i kk ,x 1k,x 2k ,x 3k 4kx
24
Sub-algorithm (Sampling Region)
25
STRONG-Sub_algorithm
Employ an proper orthogonal design
Construct the second-order
model
Solve the subproblem
Compute the ratio
Update the reject solution set
Update the trust/ sampling region
Update the best solution in the reject
solution set
Sufficient reduction test
Iterate accepted?
Yes
Main algorithm
Yes
No
No
26
Implementation Issues
• Initial solution
• Scaling problems
• Experimental designs
• Variance reduction techniques
• Timing to employ the “sufficient reduction” test
• Stopping rules
27
Advantages of STRONG
• Allow unequal variances
• Have the potential of solving high-dimensional problems with efficient DOE
• It is automated
• Local convergence property
28
Limitations of STRONG
• Computationally intensive if the problem is large-scaled
• Slow convergence speed if the variables are ill-scaled
29
Preliminary Numerical Evaluation (I)
• Rosenbrock test function
• The minimal solution locates at (1,1) and the minimal value of objective function is 0
• Full factorial design for stage I and central composite design for stage II
21
2212 )x1()xx(100)x(Q
) ,0(Ni.i.d 2
~
30
The Performance of STRONG
• Case 1Initial solution is (30, -30)
Variance of noise is 10
Sample size for each design point is 2
0
20000000
40000000
60000000
80000000
100000000
0 100 200 300 400 500 600 700 800
number of function evaluations
distan
ce to
opt
imum
# of observations
0 86490841
120 1680400
300 49351
420 11921
530 62.84
600 24.24
680 1.58
)x(L)x(L *k
31
The performance of FDSA
• Case 2Initial solution: (30, -30)
Variance of noise: 10
Bound of Parameter: (100, -100)
Number of observations
0 86490841
10 9801000000(diverge)
20 9801000000(diverge)
100 9801000000(diverge)
)x(L)x(L *k
32
The performance of FDSA-with good starting solution
• Case 3Initial solution: (3,3)
Variance of noise: 10
Bound of parameter: (0,5)
0
20000000
40000000
60000000
80000000
100000000
0 100 200 300 400 500 600 700 800
number of function evaluations
distan
ce to
opt
imum
)x(L)x(L *k
# of observations
0 3604
20 1566.7
40 52.02
60 1.01
80 0.82
100 0.77
33
Future Research
• Large-Scale ProblemsDesign of experimentVariance reduction technique
• Test Practical Problems
• Ill-Scale Problems Iteratively different shape of trust region
34
Thanks!Thanks!
Questions?
35
Trust Region and Line Search
Trust Region Step
Contours of f(x)
Contours of local model r(x)
Line Search Step
kx 1kx
1kx
*x
36
Hypothesis Testing Scheme
• Hypothesis testing
cannot yield sufficient reduction
can yield sufficient reduction
Type I error is required to satisfy k
1kk
k0 x :H
k1 x :H
37
Relevant Definitions in Sub_algorithm
• Reject-solution Set
denotes the reject-solution set which collects all the visited solutions up to in the sub_algorithm and
• Simulation Allocation Rule (SAR) (Hong and Nelson, 2006)
SAR guarantees that (additional observations allocated to x at iteration ) if x is a newly visited
solution at iteration and for all visited solutions
ik
ik
ik }x,...x,x{ i21i kkkk
ik)\x( 1ii kk
)x(0i
k i
1)x(aik
i
ii
k
0ikk
i)x(aNlim
38
Features of Sub_algorithm
• Trust Region and Sampling Region keep shrinking
• Sample size for center point is increasing
• Design points are accumulated
• The local model quality keeps improving
• Being more conservative in optimization step size ( )
• Reduce the sampling error for each visited point in the set
Intuitive explanation:
}x,...x,x{ i10 kkkk
0kx
T
)d(r)0(r
)x(Q)x(Qiii
i0
i
kkk
kkk
)d(m)0(m
)x(L)x(Liii
i0
kkk
kk
39
Significant Theorems in STRONG
• (Theorem 3.2.3)
• (Corollary 3)
In the sub-algorithm, if
• (Theorem 3.2.4)
For any initial point , the algorithm generates a sequence of iterates and
0k
KkC
i
i
havewe,,
,K,0)x(Q
will eventually d
k when s.t. K, then ,k if algorithm,-sub the in x
i
i0
k
kii
^k
0x }x{ k
0)x(Q lim k^
k
0lim then ,0)x(L i0 kT
i
k
40
Some Problems with TR if it is applied in stochastic case
• TR is developed for deterministic optimization are available• Bias in intercept and gradient estimation
• Ratio
• Inconsistent comparing basis Notice:
In general,
)d(m)0(m
)dx(L)x(L
timprovemen edictedPr
timprovemen Actualkkk
kkkk
)x(L ,)x(L
?)x(L)x(Q ?)x(L)x(Q^^
)x(L)0(m kk
)d(r)0(r
)dx(Q)x(Qkkk
kkN
kNk
)x(Q)0(r kN
k
)d(m)0(m
)x(L)x(Lkkk
1kkk
)d(r)0(r
)x(Q)x(Qkkk
1kN
kN
41
General Properties of the Algorithm1. a.s.
2. a.s.
3. If then therefore the algorithm won’t get stuck in a nonstationary point
)x(L)x(L kk
)x(L)x(L kk
)d(r)0(r kkk 0)x(L^
k
42
Algorithm Assumptions
• For
• For the local approximation model
w.p.1 (x)B ,)x(Qthat such , , ,x 2
^
2
^
22
tindependen are s'x
43
Literatures Review (I)
• Stochastic Approximation Robbins-Monro (1951) algorithm-gradient based Kiefer-Wolfowitz (1952) algorithm-use finite-difference method as the
gradient estimate
The basic form of stochastic approximation
is the finite-difference gradient estimate, Strength: Under proper conditions Weakness:
The gain sequence need to get tuned manually Suffers from slow convergence in some (Andradottir, 1998) When the objective function grows faster than quadratically, it will fail to converge. (Andradottir, 1998)
))x(gax(x nnn1n
)x(g n
^
a.s. xx *n
}a{ n
44
Literatures Review (II)
• Response Surface Methodology Proposed by Box and Wilson (1951)
A sequential experimental procedure to determine the best input combination so as to maximize the output or yield rate.
Strength: A general procedurePower statistical tools such as design of experiment, regression
analysis and ANOVA are all at its disposal (Fu, 1994)
Weakness:No convergence guaranteeHuman involvements needed
45
Literatures Review (III)
• Other heuristic methods Genetic Algorithm Evolutionary Strategies Simulated Annealing Tabu Search Neld and Mead’s Simplex Search
• Strengths “Usually” can obtain a satisfactory solution
• Weakness No general convergence theory
46
Literatures Review (Fu, 1994; Fu 2002)
Strength Weakness
Stochastic Approximation
Various gradient estimation methods. Converge under proper conditions
Converge slowly when the objective function is flat. Fail to converge when the objective function is steep. Sometimes need to get the gain sequence tuned manually. Only use the gradient information.
Sample-Path Optimization
Easy to extend it to situations
where the objective function cannot be evaluated
analytically
Need excessive function evaluations
Response Surface Methodology (RSM)
Systematic and sequential procedure Efficient and effective Well-studied statistical tools
as back-up
No convergence guarantee Not automated
Heuristic Methods Usually can obtain a
satisfactory solution Simple and efficient
No general convergence guarantee