7
460 .. IEEE TRANSACTIONS ON AUTOMATIC CONTROL. VOL. AC-26. NO. 2. APRIL 1981 Fig. 4. Decompo5ition of a multistageproblem Ail). ?l,.(r): x(0) I ..... .(t), U,(O) ..... U[.(PI) TF( I): x(O).. .. , x( t), UF(0) ,... * UF( I- I), i.e.. each player has perfect memory of the state (x) and his own control history. The point here is that given .x(t). u,.(t- I), X([- I). we can in general calculate uF( t - 1) or vice versa [4]. As shown in Fig. 4, the leader at time r can choose hs decision u,(f) based on uF(r- I) (or the entire past decisions of the follower). thus L essentially imposes a kind of reversed information structure on F. Note that whoever gets to declare his strategy First becomes the leader. This approach requires separate-treat- ments at t=T- 1 and 0 by solving u,(T- 1) first which has a perma- nently optimal solution, and considering uF( - I) to be fixed at zero as is evident in the works of Basar and Tolwinski. Also. bear in mind that the distinction between closed-loop Stackelberg controls and Stackelberg feedback strategies [I] still exists. With this understanding. closed-loop Stackelberg strategy for linear quadratic deterministic problem can be solved using the basic idea discussed in Section 111. It is clear that many strategies y;, are possible due to the enormous flexibility here. For example, L’s strategy may punish a nonrational behavior of F for one stage only (as in (41). two stages, etc.. or for the rest of the game (as in [2]). It is thus possible that different y;. may enjoy various advantages. VIII. CONCLIXON In this paper we have identified two reasons why L may be interested in the decision of F. First. knowing F’s strategy and hs decision may enable 1. to infer the states of nature [RI)]. Second, F‘s decision may directly affect L’s payoff [R2)]. We then discussed mechanisms by which L can induce F to behave cooperatively. In case R I ) the mechanism is to transform F’s payoff function so that it looks like L‘s ow-n. In case R2) it is to make directly any choice of F’s strategy other than the cooperative one unpalatable. In either case the crucial requirement is that we have the reversed information structure as defined in Section 111. It serves as a unifying ingredient in diverse applications. ACKNOWLEDGMENT The authors would like to thank Dr. G. J. Olsder for many insightful comments on this subject. REFERENCES M Simaan and J. 8. Crw. Jr..“Additlonalaspects of theStackelbergstrategy in nonzerosum games.” J. Oprrnliz. Theon .4pp/.. vol. 1 I. pp. 613-626. June 1973. T. Basar and H. Selbuz. “Closed-loop Stackelberg strateges a x b applicatlons in the optimal control of multilevel systems.“ IEEE Tronr Buromur. Conrr.. vol. AC-24. pp. 166- 179. Apr. 1979. G P. Papavassilopoulos and J. B. Cnn. Jr.. “Nonclassical control problems and Stackelberggames.” IEEE Truns. Auromur. Conrr.. vd. AC-24. pp. 155-166. Apr 1979 game.” J. Oprrnrr:. Theon Appl . to be published. B. Toolwinski. “Closed-loop Stackelberg solution to multi-stage hear-quadratic M. Spence. “Nonlinear pricing and welfare.” J. Public Econ . vol. X. pp I- 18. Aug. T. Groves and >I, Loeb. “Incentives in a divisionalized firm.” Wunugemenr So.. rol 1977 25. pp 321 - 230. Mar. 1979. H. von Stackclberg. The Theon- o/ the .Marker Eronon1.v. Oxford. England: Oxford Vniv. Press. 1952. C.-I Chen and J. B. Crw. Jr.. “Stackelberg solution for twoFper-wn games with biased information patterns.” IEEE Trans. rluromur. Conrr.. ~ol. AC-17. pp. 791-798. Dec. 1972. J. Oprrmr:. Theon. Appl.. vol. 11. pp. 533-555. May 1973. 31. Simaan and J. B. Cnn. Jr.. “On the Stackelberg strategy in nonzero-sum games: T. Basar. “Information structures and equilibria in dynamic games.” in ,?dew Ten& m Dmanric Swem Theav and Economrcs. M. Aoki and A. Marzollo, Eds. New 7. Basar and G. J. Olsder. “Teamoptimal closed-loop Stackelberg strategies in York: Academic. 1978. hierarchialcontrolproblems.”Memorandum NR. 242. Dep Appl. Math.. Twente Univ. of Technol.. The Netherlands. Memo. NR 242. Feb. 1979. pp. 1-25. Y-C. Ho and K-C. Chu. “Team decision theory and information struc1ures in optimalcontrolproblems, Part I.” IEEETrans. .4uromor. Conrr.. vol. AC-17, pp. Y-C. Ho. D. M. Chiu. and R. Muralidharan. ”A model for optimal peak load pricing 15-22. Feb. 1972. by electric utilities.” in Proc. Lmrence S m p . on Swr. and Decision Scr., Berkeley. CA. Oct. 1977. pp. 16-23. I. Pressman. “A mathematical formulation of the peak-load pricing problem,” Bell J. Econ. Munugemenr Sn., vol. I, pp. 304-326,. Autumn 1970. P. Dasgupta. P. Hammond. and E. Maskin. “The implementation of social choice pp. 185-216. Apr. 1979. rules: Some general results on incentive compatibility.” R~I. Econ. Sfudies. vol. 46. and Marhemurxal Economics. P. T. Liu. Ed. New York: Plenum. 1980. T. Basar, “Hierarchical decisionmaking under uncertainty.” in D~rrurmc Oprimi;otion G. P. Papavassilopoulos and J. B. Cruz. Jr.. “Sufficient conditions for Stackelberg and Nasb strategies with memory.”J. Oprrmrz. Thee? rlppl.. Sept. 1980. Y-C. Ho. P. B. Luh. and G. J. Olsder. “A control theoretic blew on incentives.” in Prac. 4Jh Inr. Con/ on Ana!nls and Oprrmi; Svs.. Ih’RIA. Yersailles. France Dec. 1980: also Springer-Verlag Lecture Kotes Series. to be published. A New Computational Methodfor Stackelberg and Min-Max Problems by Use of a Penalty Method KIYOTAKA SHIMIZU AND EITARO AIYOSHI Abstract-This paper is concernedwiththeStackelbergproblemand the min-max problem in competitive systems. The Stackelberg approach is applied to the optimization of two-level systems where the higher level determines the optimal value of its decision variables (parameters for the lower level) so as to minimize its objective, Hide the lower level minimizes its own objective with respect to the lower level decision variables under the given parameters. Meanwhile, the min-max problem is to determine a min-max solution such that a function maximized with respect to the maximizer’s variables is minimized with respect to the minimizer‘s varia- bles. This problem is also characterized by a parametric approach in a two-level scheme. New computational methods are proposed here; that is, a series of nonlinear programming problems approximating the original two-level problem by application of a penalty method to a constrained parametric problem in thelowerlevelaresolvediteratively. It is proved that a sequence of approximated solutions converges to the correct Stac- kelberg solution, or the min-max solution. Some numerical examples are presented to illustrate the algorithms. I. INTRODUCTION This paper is concerned d t h the Stackelberg problem and the min-max problem in competitive systems. The Stackelberg solution [ 141, [ 121. [ 131. [8] is the most rational one to answer a question: what will be the best strategy for Player 1 who knows Player 2’s objective function and has to choose his strategy first, while Player 2 chooses his strategy after announcement of Player I’s strategy. A problem in the field of competitive economics is one such problem. The min-max problem [3]. [2]. [4]. [9]. [6] is formulated so that a function, maximized with respect to the maximizer‘s variables. is mini- mized with respect to the minimizer’s variables. The min-max solution is optimal for the minimizer against the worst possible case that might be taken by the opponent (the maximizer). Thus. the min-max concept plays an important role in game theoq. Many articles on the equilibrium solutions. such as the Nash solution and the saddle-point solution. have been published. However, the Stackel- Manuscript received September 4. 1979; rebised F e b r u q 18. 1980 and June 30. 1980. Paper recommended by A. J. Laub. Chairman of the Computational Methods and Discrete Systems Committee. Kohoku-ku Yokohama. Japan. The authors are with the school of Engineenng. Keio University. 3-141 Hiyoshi 0018-9286/81/0400-0460$~.75 c,l981 IFEE

1.5.1 a New Computational Method [30]

Embed Size (px)

DESCRIPTION

Paper

Citation preview

460

. .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL. VOL. AC-26. NO. 2. APRIL 1981

Fig. 4. Decompo5ition of a multistage problem

A i l ) .

? l , . ( r ) : x(0) I . . . . . .(t), U , ( O ) ..... U [ . ( P I )

T F ( I ) : x ( O ) . . .. , x( t ) , U F ( 0 ) ,... * UF( I - I ) ,

i.e.. each player has perfect memory of the state ( x ) and his own control history. The point here is that given . x ( t ) . u,.(t- I ) , X([- I). we can in general calculate u F ( t - 1) or vice versa [4]. As shown in Fig. 4, the leader at time r can choose h s decision u , ( f ) based on u F ( r - I ) (or the entire past decisions of the follower). thus L essentially imposes a kind of reversed information structure on F. Note that whoever gets to declare his strategy First becomes the leader. This approach requires separate-treat- ments at t = T - 1 and 0 by solving u,(T- 1) first which has a perma- nently optimal solution, and considering uF( - I ) to be fixed at zero as is evident in the works of Basar and Tolwinski. Also. bear in mind that the distinction between closed-loop Stackelberg controls and Stackelberg feedback strategies [I] still exists. With this understanding. closed-loop Stackelberg strategy for linear quadratic deterministic problem can be solved using the basic idea discussed in Section 111. It is clear that many strategies y;, are possible due to the enormous flexibility here. For example, L’s strategy may punish a nonrational behavior of F for one stage only (as in (41). two stages, etc.. or for the rest of the game (as in [2]). It is thus possible that different y;. may enjoy various advantages.

VIII. CONCLIXON

In this paper we have identified two reasons why L may be interested in the decision of F. First. knowing F’s strategy and h s decision may enable 1. to infer the states of nature [RI)]. Second, F‘s decision may directly affect L’s payoff [R2)]. We then discussed mechanisms by which L can induce F to behave cooperatively. In case RI) the mechanism is to transform F’s payoff function so that it looks like L‘s ow-n. In case R2) it is to make directly any choice of F’s strategy other than the cooperative one unpalatable. In either case the crucial requirement is that we have the reversed information structure as defined in Section 111. It serves as a unifying ingredient in diverse applications.

ACKNOWLEDGMENT

The authors would like to thank Dr. G. J. Olsder for many insightful comments on this subject.

REFERENCES

M Simaan and J. 8. Crw. Jr.. “Additlonal aspects of the Stackelberg strategy in nonzerosum games.” J. Oprrnliz. Theon .4pp/.. vol. 1 I . pp. 613-626. June 1973. T. Basar and H. Selbuz. “Closed-loop Stackelberg strateges a x b applicatlons in the optimal control of multilevel systems.“ I E E E Tronr Buromur. Conrr.. vol. AC-24. pp. 166- 179. Apr. 1979. G P. Papavassilopoulos and J. B. C n n . Jr.. “Nonclassical control problems and Stackelberg games.” IEEE Truns. Auromur. Conrr.. v d . AC-24. pp. 155-166. Apr 1979

game.” J. Oprrnrr:. Theon Appl . to be published. B. Toolwinski. “Closed-loop Stackelberg solution to multi-stage hear-quadratic

M. Spence. “Nonlinear pricing and welfare.” J . Public Econ . vol. X. pp I - 18. Aug.

T. Groves and >I, Loeb. “Incentives in a divisionalized firm.” Wunugemenr So.. rol 1977

25. pp 321 - 230. Mar. 1979. H. von Stackclberg. The Theon- o/ the .Marker Eronon1.v. Oxford. England: Oxford Vniv. Press. 1952. C.-I Chen and J. B. Crw. Jr.. “Stackelberg solution for twoFper-wn games with biased information patterns.” IEEE Trans. rluromur. Conrr.. ~ o l . AC-17. pp. 791-798. Dec. 1972.

J . Oprrmr:. Theon. Appl.. vol. 11. pp. 533-555. May 1973. 31. Simaan and J. B. Cnn. Jr.. “On the Stackelberg strategy in nonzero-sum games:

T. Basar. “Information structures and equilibria in dynamic games.” in , ? d e w Ten& m Dmanric S w e m Theav and Economrcs. M. Aoki and A. Marzollo, Eds. New

7. Basar and G. J. Olsder. “Teamoptimal closed-loop Stackelberg strategies in York: Academic. 1978.

hierarchial control problems.” Memorandum NR. 242. Dep Appl. Math.. Twente Univ. of Technol.. The Netherlands. Memo. NR 242. Feb. 1979. pp. 1-25. Y-C. Ho and K-C. Chu. “Team decision theory and information struc1ures in optimal control problems, Part I.” IEEE Trans. .4uromor. Conrr.. vol. AC-17, pp.

Y-C. Ho. D. M. Chiu. and R. Muralidharan. ”A model for optimal peak load pricing 15-22. Feb. 1972.

by electric utilities.” in Proc. Lmrence S m p . on S w r . and Decision Scr., Berkeley. CA. Oct. 1977. pp. 16-23. I. Pressman. “A mathematical formulation of the peak-load pricing problem,” Bell J. Econ. Munugemenr Sn . , vol. I , pp. 304-326,. Autumn 1970. P. Dasgupta. P. Hammond. and E. Maskin. “The implementation of social choice

pp. 185-216. Apr. 1979. r u l e s : Some general results on incentive compatibility.” R ~ I . Econ. Sfudies. vol. 46.

and Marhemurxal Economics. P. T. Liu. Ed. New York: Plenum. 1980. T. Basar, “Hierarchical decisionmaking under uncertainty.” in D~rrurmc Oprimi;otion

G. P. Papavassilopoulos and J. B. Cruz. Jr.. “Sufficient conditions for Stackelberg and Nasb strategies with memory.”J. Oprrmrz. Thee? rlppl.. Sept. 1980. Y-C. Ho. P. B. Luh. and G. J. Olsder. “A control theoretic blew on incentives.” in Prac. 4Jh Inr. Con/ on Ana!nls and Oprrmi; Svs.. Ih’RIA. Yersailles. France Dec. 1980: also Springer-Verlag Lecture Kotes Series. to be published.

A New Computational Method for Stackelberg and Min-Max Problems by Use of a Penalty Method

KIYOTAKA SHIMIZU AND EITARO AIYOSHI

Abstract-This paper is concerned with the Stackelberg problem and the min-max problem in competitive systems. The Stackelberg approach is applied to the optimization of two-level systems where the higher level determines the optimal value of its decision variables (parameters for the lower level) so as to minimize its objective, Hide the lower level minimizes its own objective with respect to the lower level decision variables under the given parameters. Meanwhile, the min-max problem is to determine a min-max solution such that a function maximized with respect to the maximizer’s variables is minimized with respect to the minimizer‘s varia- bles. This problem is also characterized by a parametric approach in a two-level scheme. New computational methods are proposed here; that is, a series of nonlinear programming problems approximating the original two-level problem by application of a penalty method to a constrained parametric problem in the lower level are solved iteratively. It is proved that a sequence of approximated solutions converges to the correct Stac- kelberg solution, or the min-max solution. Some numerical examples are presented to illustrate the algorithms.

I. INTRODUCTION

This paper is concerned d t h the Stackelberg problem and the min-max problem in competitive systems.

The Stackelberg solution [ 141, [ 121. [ 131. [8] is the most rational one to answer a question: what will be the best strategy for Player 1 who knows Player 2’s objective function and has to choose his strategy first, while Player 2 chooses his strategy after announcement of Player I ’s strategy. A problem in the field of competitive economics is one such problem.

The min-max problem [3]. [2]. [4]. [9]. [6] is formulated so that a function, maximized with respect to the maximizer‘s variables. is mini- mized with respect to the minimizer’s variables. The min-max solution is optimal for the minimizer against the worst possible case that might be taken by the opponent (the maximizer). Thus. the min-max concept plays an important role in game theoq.

Many articles on the equilibrium solutions. such as the Nash solution and the saddle-point solution. have been published. However, the Stackel-

Manuscript received September 4. 1979; rebised F e b r u q 18. 1980 and June 30. 1980. Paper recommended by A. J. Laub. Chairman of the Computational Methods and Discrete Systems Committee.

Kohoku-ku Yokohama. Japan. The authors are with the school of Engineenng. Keio University. 3 - 1 4 1 Hiyoshi

0018-9286/81/0400-0460$~.75 c,l981 IFEE

113% TRANSACTIONS ON AUTOMATIC CONTROL. VOL. AC-26, NO. 2. APRIL 1981 46 1

berg and the min-max solutions differ from them in that Player 2 makes his decision after Player 1 does. i.e., a precedence in decision-making order exists. Therefore, both the Stackelberg and the min-max problems can be represented by hierarchical optimization problems, where a part of constraints in the upper level problem consists of a parameterized optimi- zation problem in the lower level. The problems of this type can be solved by a parametric approach in principle. However, that approach causes difficulties in developing an available computational method also. Some computational methods for the min-max problem were proposed in [4]. [6], [9]. [ 1 I ] , where a gradient type algorithm was discussed [4]. [6] and a relaxation method [Y], [ll]. In [Is], a penalty method was applied to calculate a saddle-point solution. However. papers on the Stackelberg and the min-max solutions are comparatively few. most of them limited to separate constraints cases when the constraints for each player do not depend on another player's strategies.

In this paper. we consider the most general problems with unseparate constraints. and propose new computational methods based on the theoq of a barrier method (an interior penalty method [l]. [ j ] ) . Our method is based on that a series of nonlinear programming problems approximating the original hierarchical one are solved iteratively by applying the barrier method to the lower level problem. It proves that a sequence of approxi- mate solutions converges to the true solution for the original problem.

11. FORMCLATIOK OF THE STACKELBERG PROBLEM

In a two-player game. Player 1 has all information about Player 2's objective function and constraints. while Player 2 knows nothing about player 1 but the strategy announced by Player I . Then. until Player 1 announces his own optimal strategy. Player 2 cannot solve his optimizing problem responding to the Player 1's strategy. In such a situation, Player I has the leadership in playing the game and can decide the best strategy in anticipation of the Player 2's act. The optimal solution for Player 1 with precedence in decision is called a Stackelberg Solution with Player 1 as leader. The Stackelberg solution is the optimal strategy for the leader when the follower reacts by playing optimally.

Let X E R " ~ , E R and gI ER"'' be the decision variable vector, the objective function and the vector constraint function for Player I . respec- tively. while J E R":, /, E R and g 2 ER"" for Player 2. Two functions, fl( x. y) and /,( x. J), map R"I X R": into the real lines such that Player 1 wishes to minimize/, and Player 2 nishes to minimize/,. Then. Player 2's minimal solution $(x) responding to a Player 1's strategy x satisfies the follo\ving relation:

/,(**J:(x))Gf,(x.Y)

forVyEYsatisfyingg,(x.J)GO. ('?

For such J'( x). if there exists X I ' such that

/ l ( x \ ~ . . i . ( x ~ ~ ) ) ~ / l ( x . j ( x ? )

for VxEXsatisfyingg,(x, y ( x ) ) ~ O . (2)

x > ' is called the Stackelberg solution for Player 1 as a leader and J" ~ p ( x ) ' ) is an optimal solution for Player 2 in response to x\ ' . In other words, the Stackelberg solution xrl can be defined to be a solution to the following two-level decision problem:

min/,(x. JYX)) (3a) X

subject to EX (3b)

gl(x. .i.(X))GO (3c)

/dx. i ( x ) ) = minf2(x. 1'1 (3d) L'

subject t o y € Y (3e)

&(X, y)<O (30

where f (x ) denotes a parametric minimal solution to the lower level

problem (3d)-(3f). Here. we assume existence of ( x , j ( x ) ) satisfying (3b)-(3f). For simplicity of notation, we shall denote x'l and $ 1 nith x.' and J". respectively. in the following section.

111. APPROXIMATE TECHNIQUE USING A BARRIER METHOD FOR

THE STACKELBERG SOLUTION

Our approach to the problem (3) begins with replacing the lower level problem by an unconstrained problem based on the theory of a barrier method. Then, the parametric minimization problem (3d)-(3f) is trans- formed into an unconstrained parametric minimization problem with the augmented objective function P combined with the constraint functions. The parametric solutionp(x), whch is regarded as a function of x. can be approximated by an implicit function satisfying a stationaq condition for the resulting unconstrained lower level problem. A sequence of approxi- mated problems will be solved by use of appropriate nonlinear program- ming techniques.

Let the feasible region Y be given with the vector function h , E Rq' as F={ y lh , ( y)<O). and let

S ( x ) = { p J h z ( J ' ) ~ O , g 2 ( x , p ) < o } .

We begin with imposing the following assumption: a)int S(x). the interior of S(x). which is given by

intS(x)=(y;h2(p)<0,gz(x.y)<O},

is not empty for any fixed x and its closure becomes S(x).

is any point in R"', as Let us define the augmented objective function on xXint S(x). where x

p'(x.S.)=f,(x.J)+'Q(gz(x.P).h?(y)) (4)

Lvhere r>O and + is such a continuous function defined on the negative domain that

cb(gz(x.J).hz(Y))~O. a spEin tS (x )

Q ( g 2 ( x , ~ ) . h 2 ( ~ ) ) - + + c x . asJ+aS(x).

Here, a.S(x) denotes the boundary of S(x).

mented objective function: We consider an unconstrained minimization problem with the aug-

min P '( x. J) ( 5 ) L'

in place of the constrained minimization problem:

minf,(x, u)

subject toyES(x) . (6 )

In order to apply the theory of a barrier method, we impose the

b) The functionsf2(x. y ) and g,(x, y ) are continuous at any (x, J ) E

c) The set Y is compact. This implies that S(x) is also compact. Then, it is assured that the problems ( 5 ) and (6) have their optimal

solutions j ' ( x ) Eint S( x) and I( x) E S( x), respectively, for any fixed x. Let us consider a sequence of the optimal solutions { s ; r l x ) ) for the problem (5) in response to a positive parameter sequence { r k ) strictly decreasing to zero. When the parameter x is fixed, it follows directly from a theory of a barrier method (an interior penal5 function method [ 11. [5]) that any accumulation point of the sequence { p r 1 x ) ) is optimal for the problem (6). On the other hand, when the parameter x varies as a sequence ( x k ) . the convergence about ( j ' ^ ( x k ) ) is left unsettled. So. we prepare the following lemma.

Len~nzu 1: Let { j r ' ( x h ) ) be a sequence of the optimal solutions to the problem (5) in response to a sequence { x'} C X converging to 1 and a positive sequence { r ' ) strictly decreasing to zero. If assumptions a)-c)

1'

following assumptions.

R"] X R"' and h2( y) is continuous at anyyER"2.

464 11-1-1- TKASSACTIONS ON AUTO>t1ATIC COKTROL. VOL. AC-76. NO. 2. APRIL 1981

become differentiable, whose gradient and Jacobian at ( x . y) can be given as follows, respectively.

c ~ l ( X ) = ~ ~ ; f l ( X . . l . ) + ~ I ' f i ( X . l ) ~ ? l ( X )

= ~ ~ f I ( x . . , ~ ) - q . f , ( r . , ~ ) [ ~ ~ , ~ P ' ( x . y ) ] Tr.;Pr(x,J)

~ . , ( X ) = ~ . ~ g l ( x . y ) - ~ ~ R I ( X . . ) ' ) [ ~ ~ ~ P ~ ( x . y ) ] Y,I.P'(X.J.).

I

( 2 5 ) - I

(26)

Then, the problem (22) is equivalent to the following problem under the relation (23):

minfl(x.?(xN (27a) X

subject toh,(x)<O (27b)

g1(x. v(x))<O. ( 2 7 ~ )

From now on. let us t r y to solve the problem (27) instead of the problem (12). Note. however. it is impossible to represent the function q ( x ) in an explicit form. This matter causes difficulties in directly apply-ing nonlinear programming to the problem (27). In the most of iterative methods of NLP. however. the data needed for its computation are values of gradients of the objective and the constraint functions as well as their function values at the current point. Let x' bc the current point. Then. in our case. the value y' ~ q ( x') coincides u-ith the solution ?'( x' ) to the problem ( 5 ) with x=x'. whch can be solved easill. By use of this valuey'. we can evaluate $,(x[) and e l ( x ' ) by ( 2 5 ) and (26) along with

jl(x')=f,(x~.JJ). g,(x')=gl(x'.y').

Only after the preparation of those data, we can apply the existing nonlinear programming, such as Zoutendijk's feasible direction method [ 161. to the problem (27).

IV. AN APPLICATION TO T H ~ MIS-MAX PROBLI..?.~

We consider the min-max problem as a special case of the Stackelberg problem (3). Setting!, = -f, =fin the problem (3). \ve have the follon-ing min-max problem:

minf(x. j ( x ) ) (28a)

subject to x € A' (28b)

X

g,(x. j ( X ) ) G O (28C)

f ( x , j ( x ) ) = m a x f ( x , y ) (28d) Y

subject toy E Y (Be)

gz(X. y)<O (280

where Y={ y l h , ( y)SO) and j ( x ) is a parametric maximum solution to the lo\ver level problem (28d)-(28r). The solution to the problem (28 ) . which we call a min-maw solution. is the best for Pla\er I ngaimt the worst possible case that might be taken by Player 2. when Player 1 hnon-s nothing about Player 2 .

In a special case. when the constraints on x and y are independent (separate constraints). the min-max problem is formulated simply as

min max j ( x. p) X L X .I'E Y

whereX=(x(h,(x)<O) and Y = ( y ~ h ~ ( y ) < O )

Most studies on min-max problems have been limited to the separate type not including unseparate constraints (2%) and (280. Danskin [3] presented the directional derivative of the ma?iimal valued function +(x) = ,.f( x, p) =f( x. j( x)). Using h s . Bram [2] and Schmitendorf

[ I O ] presented optimality conditions for the min-ma problem of the separate tqpe in a form like the optimality conditions for usual nonlinear programming problems. As to computational methods. however. only few methods were proposed. based on a gradient method [4]. [6]. and a relaxation method [9]. [ 1 I] , that are not applicable to min-ma.; problems subject to the unseparate constraints. So. we apply the solution method presented for the Stackelberg problem to the min-max problem (28).

Let us define the augmented objective function for the lo\ver level problem as

P ' ( x . y ) = f ( x . y ) - r ~ ( ~ ? ( x . y ) . h ? ( y ) ) .

and replace the constrained maximization problem in (28d)-(18f) with unconstrained maximization problem

max P'( x. y). (29) I'

Furthermore, since the objective function is common in both the upper and lower levels. this peculiarity makes it possible for us to transform the min-max problem (28 ) into the folloxing problem:

minP'(x.j'(x)) (30a X

subject toxEA' (30b)

g,(x. j ' (x))<O ( 3 0 ~ )

P r ( x , i ' (x))= maxP'(x.J*). (304 1'

Notice that f ( x . i ( x ) ) in (2821) is replaced with the maximal valued function P'( x. i r ( x ) ) in (30a). This point distinguishes bettveen the Stackelberg problem and the min-max problem.

Here. we impose the following assumptions corresponding to b) and d) in Section 111.

b') The functions f( x. y ) and g2( x. y ) are continuous in (x , 1) E R"I X R": and h,( J) are continuous in-rER":.

d') The function/(x. p) is strictly concave in y, g,(x. p ) and hz( y ) are convex in y. Furthermore. + ( g 2 . h , ) is an increasing convex function in ( g 2 . h : ) .

Then. the following lemma can be obtained. Lemrnn 2: Let { i r ' ( x L )) be a sequence of the optimal solution to the

problem (29) in response to a sequence ( x L ) CX converging to T and a positive sequence ( r ' ) strictly decreasing to zero. If assumptions a), b'), c). and d') are satisfied. then it holds that

i) I imLdx j r \xL)= i ( . t ) i) l i m ' - ~ ~ r ( x L . ~ r ~ x L ) ) = f ( i , j ( . ? ) ~ .

ProoJ: i) Setting fi = -f in Corollary 1 in Section 111. we have ths result

immediately. ii) Suppose that P' \xL , f(x* )) does not converge to f( T, F(f)),

then there exist a positive number c and a positive integer K, such that

~ P " ( X ~ . ? ~ ~ ( X ~ ) ) - ~ ( T . ~ ( T ) ) ~ ~ ~ € forVX>K,. (31)

Here. consider an open ball B( j( T ): 8 ) C R'': around j( 2 ) . then the continuity off at y implies the existence of a positive 6 such that

J ( i . y ) - j ( . ? . t ( i ) ) <c forV. r rB( f ( f ) : 8 ) .

Aswmption a) enables us to choose another point y" such that y " € int X( i ) t ? B ( ?(i): 8 ) . For such)"'. i t hold5 that

, f ( ~ . y " ) - ! ( T . j ( ~ ) ) I < E - ( 3 4

By the same reason. as in the proof of Lemma 1 . we have the existence of a sequence [ y L } c Y and a positive integer K , such that

y' ES(X' ) forVX>K2

and that yA -y", Since y"Eint S( ?). \ve have the existence of a positive integer K , such that

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. AC-26, NO. 2. APRIL 1981 465

]P"(xk,yy")-f(f ,y")J<~ forVk>K, (33)

in the similar manner to the proof of Lemma 1. Set K=max( K,, K z , K 3 ) . Then, using (33). (32), and (3 I ) in turn, we have the following relations for all k> K.

Prk(xk,yk)>f(f,y,,)--E

> f ( f , j q a ) ) - 2 r z P * ( x k , j , r k ( x k ) ) .

This contradicts the fact that P r * ( x k ) is a maximum solution to the problem (29) in response to x k and rk. Thus, we can conclude that p r I x k , j r * ( x k ) ) converges tof(f,+(f)). w

Let us impose the following assumption. e') The functions/(x, y ) and gl(x,y) are continuous in (x, y). Then,

the following theorem be obtained. Theorem 2: Let (x' ] be the sequence of optimal solutions generated

from the problem (30) in response to a positive sequence {rk) strictly decreasing to zero. If the assum~tions a), b'), c), d'), e'), f ) , and g) are satisfied, then the sequence { x r } has an accumulation point, any of which is optimal for the min-max problem (28).

Proof (Feasibility of the Accumulation Point): By a similar way in Theorem 1, we can prove the feasibility.

(Optimality of the Accumulation Point): Suppose that f does not solve the problem (28); then in the same manner as in Theorem 1, we have the existence of x" E X such that

g,(x", P(X"))<O (34)

and f ( x " . - i j ( x " ) ) < f ( f , j q f ) ) .

Here. let

/(f.p(i))-/(~".p(~"))=2-~. where-E>O. (35)

As x is fixed at x". the theory of a barrier method yields that as k + c o ,

P ' A ( x ' , , j ! ' " x " ) ) ~ / ( x " , Q ( x " ) ) .

That is. there exists a positive integer K" such that

IP"(x",j,"(x"))-f(x",P(x"))l<f forVk>K". (36)

On the other hand, ii) in Lemma 2 implies the existence of a positive integer.f? such that

IP'"(~',,'A(~rA))-/(f,i,(f))!<~ forVk>f?. (37)

Set K=max(K". f?) . Then. using (36). (35). and (37) in turn, we have the following relations for a l l k > K :

P ' . * ( , " , p ' ~ ( x ~ ~ ) ) < f ( x ~ ~ , , ( x " ) ) + c

= f ( i , p ( f ) ) - E < P ' I ( X ' I , p l A ( X I . * ) )

Since we find that (x", j'ix'')) satisfies (30c) in the similar manner to the proof in Theorem 1. this relation contradicts the fact that (xr*) solves the problem (30).

200 8.430 8.425 13.55 126.9 50 8.992 8.988 81.88 44.05 I O 9.414 9.413 88.96 13.63 I 9.730 9.729 94.74 2.107 0.1 9.873 9.872 97.49 0.5589 0.01 9.944 9.943 98.88 0.1194

The computational aspect for the problem (30) is similar to that for the problem ( I 2) in Section 111. So. we omit the computational consideration for the min-max problem.

v . SOME NUhlERlCAL EXAMPLES

In order to illustrate the convergence properties, we shall give some

Example I: As the Stackelberg problem of the unseparate type. we examples for the Stackelberg problems.

present the following simple problem:

minx2+(p(x)- 10)' X

subject to - x + p ( x ) G O

O G x G I5

(~+Zp(x)-3O)~=min(x+2y-30) ' Y

subject to x+y-20<0

OGyG20.

K e Stackelberg solution of this problem is x' = 10 and j ( x s ) = 10, which can be found by geometrical consideration. Applying the presented algorithm in Section I11 by use of the penalty function Q of SUMT [5], we obtain the compytational result shown in Table I. It has been certified that ( x @ , jr*(x' )) approaches gradually to the Stackelberg solution of the original problem in proportion as rk decreases.

Example 2: As the Stackelberg problem of the separate type, we present the following problem with two-dimensional variables:

min(x, -30)' +(xz -20)' -20~3, +20y2

subject to x , + 2 x 2 a 3 0

X

x , +xz d 2 5

x1 d 15

( x , - I ~ ( X ) ) ' + ( X ~ - ~ ~ ( X ) ) ~ = min(x, -y,)' + ( x z -y212 .1'

subjeci to Ody, < I O

0dyz d 10. Our method exhibits the power, as there exists no other appropriate

manner to calculate the Stackelberg solution even for the separate type. Table IIillustrates the convergence property in this case. too.

TABLE I1

rh x;' xf $ t ( X r A ) fikxrA) / , ( x r h , j r \ x r ' ) ) P ~ \ X ~ ~ , ~ ~ ' ( X ~ ' ) )

500 18.44 6.563 6.339 5.173 291.0 564.0 200 19.03 5.968 7.288 5.228 276.0 319.8

50 20.00 5.000 8.553 4.976 253.4 191.4 10 20.00 5.000 9.326 4.821 234.9 133.9

1 20.00 5.000 9.775 4.959 228.7 109.5 True x; x i j ~ ( x ' ) h ( x " ) / l ( x ' , j ( x ' ) ) fJx'.J(x')) Values 20.00 5.000 10.000 5.000 225.0 100.0

Some computational experiments for the min-max problems confirmed the convergence property and availability of our method also.

VI. CoNCLUsloN

In this paper. we have presented two solution methods for the most general tj-pes of the Stackelberg and the min-max problems, both of which are formulated as two-level optimization problems to be solved in principle by a parametric approach. Our methods apply the barrier method to transform the two-level problems into the ordinary nonlinear programming problems. Some numerical examples were given to illustrate the proposed algorithms. In view of the fact that there are scarcely any- available solution methods so far . we are conLinced of the significance of our methods. Nevertheless. there are also troubles in practical computa- tions. For instance. when r is set sufficiently small. the rapid increase or decrease of the function P ‘(x. J) in the vicinity of the boundary of the feasible region troubles us to obtain an accurate solution to the problem ( 5 ) or (29), and the increase of sensitivity of P’( x. J) introduces roundoff errors in the calculation of the gradient of the implicit function. Those troubles give serious influence on the computational results. It is hoped. however, that we will overcome the above stated troubles. which are originated in disadvantages peculiar to the barrier method. with various existing techniques for improvement of the barrier method (SUMT).

ACKNOWLEDGMENT

The authors wish to thank the reviewers for their many constructive comments.

REFERENCES

M Avriel. .Vouhmwr Progranmm~. E n g l e w d Cliffs. NJ: Prcntlce-Hall. 1976. J. Bram. “The Lagrangc multiplier theorem for max-m1n uith w.eraI constraints.“ S l A M J . Appl M u [ h . ~ vol. 14. no. 4. pp. 665-667, 1966 J hi. Damkin. The Theow o/ ,Muv-.&f/n uml I t , Applr~unon IO U>upopo,a 4llorurwm Prohlene. Berlin: Springer-Verlag. 1967 V. F DemJanov. “Algorithms for some minimar problem..“ J Conlpur Sial S I . .

w l 2. no 4. pp. 342-380. 1968. A. V. Flacco and Ci. P. McCormick. “The xqucntial unconatralncd minlmizatlon tcchnsquc for nonlinear programming.” ~ h ~ u : e m e ) ~ r Sc! . bo1 IO. no 2 . pp 601-617. 1964. J. E Hellcr and I. R. C r u r Jr.. “An algorithm lor minmax parametcr optlmiwtion.“ .41mmuruu. vol. X. no. 3. pp. 325-335. 1972 W . W. Hogan. ”Polnt-twset map5 in mathematical programmmg.“ SI.4 M Rrr . w l . IS. no 3. pp 591-603. 1973. Ci. P Papnbas~tlopoulos and I . B. Cruz. Jr.. “TGonclasical control prohlemr. and Stackelberg gameh.“ IEEE Trum. Auronlur Comr vol AC-24. no 1. pp 155- 166.

D M Salmon. “Minimax controller denign.“ IEEE Trmz 41,rontur. Cwn- . vol. 1979

AC-I?. no 4. pp. 369-376. 1 9 6 X . W. E. Schmitendorf. “Necerary condltlonr and sufficient condillon\ for >tatnc min-max problems.” J ,Murh. AMI. Appl. . vol. 57. no. 3. pp. hR3-693. 1977

algorithms hy a rclaxation pr.edurc.” IEEE Trum luromur. Comr.. bo1 AC-25. no K Shlmizu and E Aiyoshi. “NecesaE condition> for mln-max prohlema and

I. IYXO. M. Simaan and I. 8. CNZ. Jr.. “On the Stackelberg strategy In nonzero-sum games.” 3. Oprimlr. Theon) Appl.. vol. I I . no. 5. pp. 533-555. 1973. M. Slmaan. “Stackelberg optimization of two-level svstems.” I E E E T r a m S ~ s r .

H. Von Stackelberg. The Theon. o/ rhe !Marker &onom?. Oxford. England: Oxford Man. Cvhern.. vol. SMC-7. no. 4. pp. 554-556. 1977.

J J. Strodiot and V. H. Ngyen. “An exponential penalty method for nondifferentia- Umv. Press. 1952

hle minmax problem uith general constraints.” J. Oprrmt:. Theow Appl . vol. 21. no 2. pp. 205-219. 1979.

vier. 1960. G. Zoutendijk. .Merho& o/ Feasible Dirernons. Amsterdam. The Netherlands: Else-

Finite Chain Approximation for a Continuous Stochastic Control Problem

Abstracf-A class of feedback policies called finitely switched (FS) policies is introduced. When a one-dimensional nonlinear stochastic system

S . I Marcus. Chairman of the Srochatic Control Committee. T h i s work \\a> upported h> Mvllnuacript rccclved Fcbruap I I . 1980: revlscd Auguat X. 1980 Paper recommcnded b!

the Natlonal Sclcnce Foundallon under Grant 79-03879 and hy the l o ~ n t Scraicea Elcctron- ic\ Program under Contract F44620-76-C-0100

and the Elcctronics Research Laboraton. Uniwrzity of Callfornia. Bcrkelc!. CA 94720 Thc authors arc uith the Department of Electrical Englneering and Computer Sciences

is controlled by FS policies there is a finite state Markov system which is equivalent to the original problem. The finite state problem can be soked by known Markov programming methods.

1. INIXODUCTION

We consider the control of the one-dimensional system described by the It6 equation

d.x,=n2(x,. u,)dr+cr( .x , .u,)dw, . t>O (1.1)

in which .x, is the state. u, is the control, and ( w , ) is standard Brownian motion. The control actions are to be selected in feedback form u, = @ ( x r ) where Q belongs to a specified family of feedback policies 0.

A standard approach to this situation is to treat it as an optimal control problem. One assumes first that 0 is the family of all feedback policies Q taking values in a specified control constraint set L;. One then seeks e* in 0 which is optimal in the sense that it minimizes the expected \due of a given cost function. Q” can be obtained by solving the Bellman- Hamilton-Jacobi equation. This partial differential equation is usually difficult to solve and one is forced to resort to some approximation.

A different lund of approximation is proposed here. A class 0 of feedback policies called f i ~ i r e ! ~ switched (FS) policies is introduced. It has the property that the behavior of the state process ( x , ) when it is controlled by @ in (D can be summarized by an “equivalent“ finite Markov chain. The optimal $* in an FS family Q, can then be obtained by Markov programming techniques.

The basic idea of this approximation is readily traced to the work of Forestier and Varaiya [I] . The only difference is that there (x , ) itself is a discrete-time finite Markov chain. w-hereas here it is an It6 process. This difference. of course. leads to changes in the technical argument. Never- theless. following the discussion in [ I ] . i t is reasonable to kiew the “equivalent” chain obtained here as an “aggregation” of (x,). and to regard an FS policy as being implemented in a two-layer hierarchy. However. this approach does not generalize easily to vector-valued processes. For some related work. see [6].

FS policies are described in Section I1 and the equivalent Markov chain is exhibited in Section 111. The calculation of the parameters of this chain occupies Section IV. An example is treated in Section V.

11. FINITELY SWITCHED POLICIES

An FS family 0 is defined through a pair (S,\I.) where S=(.r, < x 2

< . . . <.x,z) is a finite set of states called the sw,itchingser (“boundaq” set in [I]). and Q is any specified family of functions 4: .x+.+(.x). A particu- larly- policy @ in Q=Q,(S. ‘k) is obtained by assigning to each x , E S a function $,, €9. Then @=\I.“ is simply the family of all assignments +=(+,,:. ..$,,.). To simplify notation such @ is usually denoted by

In the folloLbing we suppose that a family @=@( S . \I.) is given in advance.

Fix a +=(+,,:. ..IC,,,) in 0. The control action ( K , , r>O) indicated by @ and the resulting state (.x,. r>O) can now be described. Iiote that ( u , ) and (x,) are stochastic processes which depend on 9.

For simplicity suppose that the initial state is a switching state. say .x,, =.x, a.s. Then the control action is gven by the feedback policy u , =$,(x,) for O<r< TI. where TI is the first time that x, reaches another switching state. say .x, +.x,. The control action is now given by u, =$,(.x,) for TI < r t T 2 . where T, is the first time after TI that x, reaches a new switching state .xI #.x, when the feedback policy is switched to u , x,), and so on. In brief. as soon as x, encounters a switching state the control is saitched to the corresponding feedback policy-. More precisely, define the random variables

’..$,,).

To =O a.s.

~ , = i n f { t > T , - , ( x J . E S . x T # . ~ T , , ~ , } , n 2 1 . (2.1)

466 I t E E TRAKSACTIONS ON AUTOMATIC CONTROL. YOL. AC-26. NO. 2. APRIL 1981

0 0 1 8-9286/8 1/04OO-0466$OO.75 C 198 1 IEEE