15
Journal of Monetary Economics 32 (1993) 169- 183. North-Holland Learning, experimentation, and monetary policy* Graziella Bertocchi and Michael Spagat Brown Univrrsit~~, Procidencc~. RI 02912. USA Received October 1991, final version received June 1993 We present a model of monetary policy where the policymaker faces uncertainty about which he is learning in a Bayesian fashion. A fixed money supply rule is not optimal since the learning leads to adjustments in the monetary action. We present cases in which it is optimal to bear some cost in terms of current output performance in order to gain information that can be used in the formulation of future monetary policy: experimentation therefore pays. We also show that even passive learning without experimentation still leads to an activist monetary policy, i.e., one that is responsive to new information. Key words: Monetary policy; Learning 1. Introduction Monetary policy is inevitably made under conditions of uncertainty. It is possible that aspects of this uncertainty may be well understood in the sense that the probability distributions over certain parameters are known and the possi- bilities for gathering information that would refine these probability distribu- tions have been fully exhausted. However, it will generally be the case that the economy is constantly generating information that can be used by agents, Correspondmce IO: Michael Spagat, Department of Economics, Brown University, Providence, RI 02912, USA. *This paper was completed while the first author was visiting the Department of Economics at the Universite Catholique de Louvain. The authors would like to thank Oded Galor, Herschel Grossman, William Poole, the editors, and workshop participants at Brown, West Virginia, Louvain, Cambridge, Modena, the European University-1nstitute;the Catholic University of Milan, ICER, the Roval Economic Societv Meeting in London, 1992, the Meeting of the Society for Economic Dynamics and Control in Montreal, 1992, and the Meeting of the International Economic Association in Moscow, 1992, for helpful comments. Special thanks are due to Bruce Mizrach, David Weil, and an anonymous referee. Of course all mistakes are our own. 0304-3932/93/$06.00 ‘0 1993-Elsevier Science Publishers B.V. All rights reserved J.Mon- J

Learning, experimentation, and monetary policy

Embed Size (px)

Citation preview

Journal of Monetary Economics 32 (1993) 169- 183. North-Holland

Learning, experimentation, and monetary policy*

Graziella Bertocchi and Michael Spagat Brown Univrrsit~~, Procidencc~. RI 02912. USA

Received October 1991, final version received June 1993

We present a model of monetary policy where the policymaker faces uncertainty about which he is learning in a Bayesian fashion. A fixed money supply rule is not optimal since the learning leads to adjustments in the monetary action. We present cases in which it is optimal to bear some cost in terms of current output performance in order to gain information that can be used in the formulation of future monetary policy: experimentation therefore pays. We also show that even passive learning without experimentation still leads to an activist monetary policy, i.e., one that is responsive to new information.

Key words: Monetary policy; Learning

1. Introduction

Monetary policy is inevitably made under conditions of uncertainty. It is possible that aspects of this uncertainty may be well understood in the sense that the probability distributions over certain parameters are known and the possi- bilities for gathering information that would refine these probability distribu- tions have been fully exhausted. However, it will generally be the case that the economy is constantly generating information that can be used by agents,

Correspondmce IO: Michael Spagat, Department of Economics, Brown University, Providence, RI 02912, USA.

*This paper was completed while the first author was visiting the Department of Economics at the Universite Catholique de Louvain. The authors would like to thank Oded Galor, Herschel Grossman, William Poole, the editors, and workshop participants at Brown, West Virginia, Louvain, Cambridge, Modena, the European University-1nstitute;the Catholic University of Milan, ICER, the Roval Economic Societv Meeting in London, 1992, the Meeting of the Society for Economic Dynamics and Control in Montreal, 1992, and the Meeting of the International Economic Association in Moscow, 1992, for helpful comments. Special thanks are due to Bruce Mizrach, David Weil, and an anonymous referee. Of course all mistakes are our own.

0304-3932/93/$06.00 ‘0 1993-Elsevier Science Publishers B.V. All rights reserved

J.Mon- J

including the monetary authorities, to refine their understanding of the uncer- tainty that they face. In fact, it seems reasonable to state that practically whenever there is uncertainty in the economy there will be learning about that uncertainty. Therefore it is of particular interest to model the process of mone- tary policy formulation as one which involves constant learning about the impact of monetary policy on the economy and constant changes in policy in accordance with this information.

In fact the Fed is such a major actor in the world economy that it has the potential to affect its own learning possibilities through its choice of monetary policies. It is therefore realistic to postulate that some monetary actions may be very costly in the short run, but could convey information that may improve monetary policy in the future. One could argue that the 1979982 ‘Volker experiment’ is an example of precisely such an experimental policy. Whether or not the Fed actually does sacrifice short-term goals to carry out experiments, we will show that it should, since the government should in general not behave myopically. While the degree to which it should sacrifice current reward in exchange for information depends on such factors as discount rates and the nature of the uncertainty faced, it is certainly true that large informational gains justify small sacrifices.

The theory of monetary policy under uncertainty has a long history. Theil (1964) shows that, when uncertainty is introduced through additive random shocks and the policymaker knows the structure of the economy, the optimal policy will display the ‘certainty equivalence’ property. More serious problems arise, however, when uncertainty is introduced in a multiplicative fashion; the resulting ‘instrument instability’ [see Brainard (1967)] weakens the case for the use of monetary policy for output stabilization, since intervention adds costly variability to the level of output. Along the same lines M. Friedman (1959,1968) proposes a nonactivist fixed rule as an optimal monetary policy under uncer- tainty. The standard literature studies the case where there is uncertainty in the economy that cannot be known, however arguably the fundamental problem for policy is that the structure of the economic system itself is unknown, but information about it can be gained through experience. Therefore in this paper we present a new approach to monetary policymaking under uncertainty, by introducing a learning process into a simple infinite-horizon macroeconomic framework. In this setup, the policymaker, in solving a problem of intertemporal minimization of output variability, optimally takes into account the information revealed by policy actions by using Bayes’ rule in order to update his knowledge of the structure of the economy. In this context, his initial beliefs are summarized by a prior distribution over the relevant parameters. At each stage of the process, there is a potential trade-off between minimization of output variability and the value of the information which can be obtained through an activist policy. The implications of learning for policy prescriptions are intuitively clear; monetary policy should be activist in the sense that it should be responsive to

G. Bertouhi und M. Spagat, Learning und monetury polic:~, 171

new information as it arises. In addition, there is a second sense in which monetary policy should be activist. It should actively seek to generate informa- tion even if it is costly to do so. Since policymakers are constantly learning, keeping the money supply constant, as recommended by M. Friedman, will only be optimal in highly unusual and special cases.

The relevance of informational factors in macroeconomics has already been widely recognized. The traditional literature views the private sector as learning about the nature of the economy or the nature of monetary policy.’ In this paper we take a complementary approach by modelling instead the learning process of the monetary authority.’ In studying the behavior of a central bank, it is reasonable to assume that the policymaker is conscious of the effect that his actions have on the generation of information and that he acts on this know- ledge. Therefore a Bayesian learning model of the type studied in Prescott (1972) Grossman, Kihlstrom, and Mirman (1977) Easley and Kiefer (1988) and Kiefer and Nyarko (1989) is appropriate.

The structure of this paper is as follows. In section 2 we present a simple macroeconomic model where the policymaker is concerned with one target variable, the level of income, and has control over one policy variable, the money supply. Given a quadratic loss function, the policymaker’s goal is to minimize deviations from the full-employment level of output. The policymaker is uncertain over the structural relationship between output and the unan- ticipated component of the money supply. This uncertainty is described by a prior probability distribution over a set of possible linear relationships. In each period the policymaker selects a money supply level which leads stochasti- tally to an output level. After observing output the policymaker updates his prior using Bayes’ rule and then chooses his next monetary action. We derive the optimal monetary policy and show how it involves a trade-off between current objectives and the value of information gathering. In section 3 we study in detail a series of special cases of the general model. The first two are reformulations of the models of Theil (1964) and Brainard (1967) that do not allow for Bayesian learning and are therefore essentially static. The third case allows for learning at no cost due to the fact that all myopically best monetary actions are always equally informative. Nevertheless the policymaker is always learning and hence monetary actions are always changing. In the last two cases learning is active in the sense that the policymaker will engage in costly experimentation; learning,

‘See Taylor (1975). Backus and Drnlill (I 985) Dotsey and King (I 986). Baxter (1989). Marcet and Sargent (i989), and Woodford (1990). In particular, Backus and Driffill (1985) assume that agents learn in a Bayestan fashion about the authority’s type. while in Baxter (1989) Bayesian agents learn the parameters of the monetary policy rule.

‘Related contributions where the authority is learning include Mizrach (1990) Balvers and Cosimano (1991) and Bertocchi (1993, who use a Bayesian learning approach similar to our own; see also the ‘information variable’ approach of B.M. Friedman (1977).

however, will be discrete in case four, while it will be gradual in case five. Our major result is the following. In each scenario with learning, a fixed monetary rule is not optimal, since it is dominated by a feedback rule which takes into account the value of the available information and its evolution. In section 4 we draw some conclusions.

2. The basic model

Consider a simple dynamic macroeconomic model where the policymaker sets monetary policy to affect the level of output. The economy is described by the reduced-form equation

Y, = j + a, + b,M, + E,,

where yt is output at time t, j is a target level for output, M, can be interpreted as the unanticipated component of the money supply in period t, a, and 6, > 0 are random parameters, and E, is an additive shock for period t with mean 0 and finite variance a: .3 This reduced form can be derived from a standard aggregate demand-aggregate supply framework as in the imperfect information model of Lucas (1973).4 Lucas formalizes the money-output relation by postulating an aggregate supply function which shows output as an increasing function of the price surprise; given a conventional aggregate demand function and the asym- metry of information in favor of the authority that we shall formulate belcw, the equilibrium level of output will respond to unanticipated changes of nominal money.”

Uncertainty in (1) is introduced through two sources. First, demand is affected by the shock a,; second, the structure of the economy itself, i.e., its slope and intercept, may be unknown. In particular, E, has distribution F, with finite support which is known to the policymaker. a, and b, have joint distribution Fob for all t, and all random variables are independent of each other.6 It is the underlying joint distribution of a, and b, which is unknown to the policymaker and about which he learns.

3We could impose conditions on the support of a,, h,, and E, to insure positive output for any M, that the policymaker would choose.

4Alternatively, a similar relationship between money and output can be obtained by introducing wage rigidities in the form, for example, of staggered contracts [see Fischer (1977) and Taylor (1980)].

5Dickinson, Driscoll, and Ford (1982) introduce uncertainty about the parameters into the Lucas model in order to show the nonneutrality of money. However, they do not allow the authority to learn.

‘Our model is very similar to Kiefer and Nyarko (1989) but we allow the structural parameters to fluctuate randomly over time.

G. Bertoc’chi and M. Spagat. Learning and monetary policy 173

Let 0 denote the set of all possible joint distribution functions for a, and h,, and let H denote particular elements of 0. The policymaker has an initial belief over the true distribution of the parameters given by pO, where p,,(A) gives the prior probability that Fab~,4 c 0. In every period t the policymaker sets M,. After observing Y, he updates his prior belief p!_ 1 using Bayes’ rule so that his posterior belief is

/lt = B(y,,M,,,u_,) for t = 1,2, . . . , (2)

where B is the Bayes operator. The macro welfare function assigned to the policymaker is defined in terms of deviations of output from a target level. In a multiperiod framework the authority’s loss function is given by

L = E f fi’-‘(y, - Y)2, (3) I=1

where 0 < /I < 1 is the social discount factor. The quadratic term is (3) can be interpreted as the welfare loss of deviations from full employment, which are treated symmetrically.

A history at time t is denoted by h, and is given by all the previous levels of output yr , y,, . ,y,_ , and unanticipated money Ml, M2, . . , M,_ 1. A mone- tary policy is a sequence of functions rrl, n2, . . . that give levels of unanticipated money depending on history, i.e., M, = n,(h,) = U(y,,M,),((Y,,M,), . . . 3 (Y,FI,M,PI)).

The time-varying coefficients of the monetary policy rule, as summarized by the function rrc,, will be determined by the authority optimally, taking into account learning considerations. We assume that the coefficients are not perfect- ly observable by the public and that the authority is aware of its informational advantage. As discussed in Taylor (1975) lack of public information about the decision-making process of the authority is a plausible assumption, which in our setup can be induced by imperfect knowledge of any of the following: the functional form of the social welfare function, the target level of output, the discount factor, or the policymaker’s initial belief. It is this differential informa- tion structure that leads to effects of unanticipated money on output.

The policymaker chooses a monetary policy to maximize an infinite sum of discounted expected rewards, according to the expression

We define the reward in period t by

(4)

(5)

174 G. Bertocchi and M. Spagat. Learning and monetary p/k:,

where O(dq) gives the marginal distribution on a, and h, if 0 is the truth, pt I (do) gives the marginal distribution on 0, and dF, gives the marginal distribution on E. Intuitively, expectations in (5) are taken over the values of the structural parameters, the beliefs on the distribution of the structural parameters, and the additive shock.

Problem (4) can be solved using standard dynamic programming techniques. We denote the value of (4) by V(pO). It can be shown under standard assump- tions that a monetary policy is optimal if and only if it is generated by the functional equation

V(P) = max r(M,p) + Bj J S V(B(j + a + hM + E,M,~)) MER R 0 R’

A careful examination of the two elements that appear in the right-hand side of (6) helps to clarify the nature of the intertemporal decision problem the policymaker is facing; the first term represents the current-period payoff, while the second incorporates the value of the information which can be extracted from observation of the results of the current period’s monetary policy. The policymaker can behave myopically by minimizing deviations from full employ- ment within the current period and ignoring the value of information generated by monetary policy experiments. However, since future decisions will depend on the stock of information accumulated in previous experiments, the rational policymaker should not simply minimize short-run deviations. The higher the discount factor fl, the higher the incentive to sacrifice some current reward in order to achieve a gain in information. Generally, there is a trade-off between current objectives and the value of experimentation.

The stochastic process of beliefs in Bayesian learning problems has been studied in the theoretical literature previously cited. The most important prop- erty is that under the right technical conditions the policymaker’s beliefs will converge to some limiting belief with probability one. However, the limiting belief will not necessarily coincide with truth. This is intuitive since learning can require experimentation and experimentation can be costly in the short run. If learning is too costly, then the policymaker can get stuck forever taking monetary actions that are suboptimal given the truth, but which are optimal given the policymaker’s incomplete knowledge. For expositional simplicity we do not emphasize this point below. However, the convergence properties of these models do not affect their implications for short-run monetary actions, which are the focus of our analysis. For all of the cases in the next section that involve learning, the policymaker’s beliefs will eventually converge to the truth.

175

3. Policy implications of learning

3.1. Two cases without learning

In order to compare our results with the predictions of the standard literature, we first present two special cases that do not involve Bayesian learning.

Case I (Theil’s ‘Certainty Equivalence’) Suppose that a, and h, are degenerate random variables equal to ci and A,

respectively, and the policymaker knows ci and 6 with certainty. If the structure of the economy is deterministic and the only source of uncertainty is the additive random disturbance E,, then (4) reduces to a series of static problems of the form

max J - (ri + KM, + c,)‘dF,. (7) hl, E R R

In this case the optimal policy is the ‘certainty equivalence’ unanticipated money supply M, = d/h in each period t; in other words, the policymaker should simply act on the basis of expected values, as shown by Theil (1964).

Case 2 (Brainard’s Multiplicative Uncertainty) Let a, = ci with certainty, but let h, be random with mean 5, variance gi, and

distribution Fh such that E[htst] = 0, which is known to the policymaker. In this case the randomness of the parameter h,, which measures the effectiveness of monetary policy, introduces uncertainty over the impact of money on output. Again (4) reduces to a sequence of static problems expressed by

max 1 J - (a + b,M, + E,)‘dFbdF,. M,tR RR.

(8)

The optimal policy is to set M, = - &/(6’ + ai) for each t. Thus, the optimal monetary action is constant when the mean and variance of b, are known. As 0: + 0, the optimal policy approaches the certainty solution, and as ci + n;, b, becomes so variable that it is best to set M, = 0. The intuition is that under multiplicative uncertainty use of activist policy is always costly, since it results in excess output variability. Therefore a Friedmanite fixed money supply rule could be justified by a high degree of multiplicative uncertainty.

3.2. Three cases lz’ith learning

We would like to stress again that in the two above cases the policymaker faces a static problem where knowledge of the relevant parameters is both time-invariant and unaffected by the policymaker’s actions. We now consider

176 G. Bertocchi and M. Spagat, Learning and monetary poliq

Fig. I. The response of the output gap to unanticipated money: the no-overlap case.

several examples where learning introduces an inherently dynamic dimension into the policymaker’s problem. While learning will be central in all three cases, each of them will focus on a particular facet of the issue.

Case 3 (Passive Learning) The problem presented in fig. 1 is another special case of the general model.

Let a, = d once again. In each period t, with an unknown probability p, the reduced-form equation is characterized by a slope b’; and at each t and with unknown probability 1 - p, the reduced-form equation is characterized by slope b2. The structure itself of the economic system is therefore random. On the other hand, the distribution of each E, is still known and it is assumed to have bounded support [E, E].

For simplicity, let h < 0 and b’ > h2 > 0. Define &l and k by d + h’M + E = 0 and ci + b2R + E = 0, and let n/r > 0. Intuitively from fig. 1, any value of the unanticipated money supply below g implies with certainty a negative output gap; conversely, any value above A4 implies with certainty a positive output gap. Therefore, myopic reward can be maximized only for levels between AJ and a. Suppose furthermore that b’&f + 8 > b2M + E. Thus for any M in [Aj, A?], the range of F: is small enough to allow the pohcymaker to observe exactly which of the two structural equations is realized, because there

G. Brrtorchi and M. Spagat, Learning and monetary poliq 111

never is any overlap in the two resulting ranges in terms of output deviations.’ Therefore all the monetary actions in the range that could maximize myopic reward also yield the maximum amount of information. Thus it is optimal for the policymaker to maximize the myopic reward given the prior belief in every period, and then update his beliefs.

In each period the policymaker will solve

max J J - [O(ci + h’M + c,)’

+ (1 - O)(d + b2M + ~,)~]/+~(dtI)dF,, (9)

where each 0 E [0, l] is a possible value of p and the beliefs are updated in a Bayesian fashion at each r. Solving (9) is equivalent to solving the Bellman equation (6) since the second term on the right-hand side of (6) which incorpo- rates the value of information, is constant; the optimal policy will therefore not depend on this term. Learning is passive in the sense that it does not require costly experimentation; all monetary actions are equally informative.

Under standard conditions it can be shown that in the long run the pol- icymaker will learn the truth and the unanticipated money supply levels will converge to the optimal level given the truth. Of course, since the structural equation for the economy is random, the policymaker will be able to select a value of M that is only ex ante rather than ex post optimal.

The major point of this example is to show that even when learning is passive, optimal monetary policy is activist. The sequence of monetary actions will not be constant, since at each step the belief of the policymaker is updated on the basis of the new information. The monetarist policy recommendation is there- fore reversed even under passive learning.

Casr 4 (Valuable Experimentation) This is an example of active learning that is designed to display the trade-off

between immediate reward and information as sharply as possible. Consider fig. 2. The model is identical to Case 3 except for the following changes. One of the two structural equations is the true equation, i.e., p is either 1 or 0. In other words, the structure of the economy in this case is deterministic, but the policymaker does not know which of the two possible structures is the true one.8 E, has a uniform distribution with support [a~?]. Also we assume that the support of R, is wide enough such that for each M E [n/r, A?], i.e., the range

‘If no overlap of output deviation ranges occurs at &‘, then overlap will not occur for M < &f.

*This case corresponds most closely to the standard linear model used in Kiefer and Nyarko (1989).

178 G. Bertocchi and M. Spagat. Learning and monrtar,v p&c:,

Fig. 2. The response of the output gap to unanticipated money: the overlap case.

where myopic reward could be maximized, h’M + B < h2 M + E. This condition implies that for M E [ AJ, n;i] there will always be an overlap in the two ranges of output deviations that correspond to the two alternative structures. The overlap range is the shaded region in fig. 2. In this example learning is discrete; in every period either nothing is learned or the full truth is learned. For outcomes in the overlap range no information is obtained.’ The size of the overlap range is strictly decreasing in M up until M*. The truth would be learned with certainty if any M > M* is chosen.

Using the uniformity of E, the size of the overlap range and the probability of learning nothing can be computed precisely. This probability is

P(M) = (b2M + E) - (h’M + B) C-L

(10)

‘This is a direct consequence of the uniformity of E

for any M < M*, where the numerator measures the size of the overlap and the denominator is the length of the support of F. Conversely, with probability

Q(M) = I- (h*M + C) - (h'M + c)

C-E (11)

the true structural equation will be learned. Note the Q(M*) = I and P(0) = 1, i.e., M* maximizes the probability of learning, while M = 0 is totally uninfor- mative.

If everything is learned in period t, then the monetary action switches permanently to the ‘certainty equivalence’ level given the truth, because from that point on the problem will be essentially static. On the other hand, if nothing is learned in period t, then the policymaker will face the same problem in period t + 1, so if M was optima1 in period t, it will be optimal in period f + 1 and the probability of learning the truth will remain the same.

The implications for policy are again that a fixed rule is suboptimal, since at each stage there is a positive probability that the truth will be revealed and that the monetary action will switch. There will be exactly one change in the monetary regime at the time when the truth is learned. However, the pol- icymaker does not know when the switch will occur and to which monetary action he will eventually shift. If it is optimal to choose an initial action for which the probability of learning in a given period is small, with high probability it may take the policymaker a very long time to adjust his actions to the optima1 level given the truth.

The Bellman equation for this example is

j - [~(l)(ci + h'M + r-:)* + p(O)(d + h2M + .s)*]dFF

where ,u( 1) is the prior probability that p = 1 (i.e., equation one obtains) and ~(0) is the prior probability that p = 0 (i.e., equation two obtains). The intuition behind the second term in (12) is the following. With probability P(M) nothing is learned, in which case the discounted future value to the policymaker is PI’(p). With probability Q(M) the truth is learned, in which case the discounted future value to the policymaker is (/I/(1 - b))J, - (a)*dF,, since from that point on he will always choose the ‘certainty equivalence’ money supply.

An important feature of this example, as one can see directly from fig. 2, is that the probability of learning the truth increases as the level of unanticipated money increases. It can be proven that when learning considerations are

180 G. Bertocchi and M. Spagat, Learning and monetary poliq

included, the action chosen will always be at least as large as the action that maximizes the reward myopically. It can even be optimal for the policymaker to choose a level above ti for the purpose of learning. We should stress, however, that the direct relationship between learning and money supply levels is particu- lar to this example and could be reversed in other cases.” Therefore there is no general implication that loose monetary policy implies optimal information processing.

P(M) is strictly less than one. Therefore P(M)*, the probability that the truth has not been learned by period t + 1, approaches zero as t tends to infinity. So the probability that the truth will never be learned is zero, i.e., complete learning will obtain in the long run. This is true for arbitrarily wide supports for E,.

Case 4 illustrates the trade-off between learning and myopic reward in a particularly simple manner. We can reintroduce gradual learning, as in Case 3, by removing the restriction that the E’S are uniformly distributed: then in general output will not be fully revealing and beliefs about the truth will change gradually. Therefore it will pay to make continuous adjustments as the beliefs evolve. However, the uniformity of E turns out to be very convenient analyti- cally. Therefore in Case 5 we reintroduce gradualism in a simpler way by letting p take any value between 0 and 1 as in Case 3.

Case 5 (Gradual Learning) This case coincides with Case 4 except that p can now take any value between

0 and 1 as in Case 3. We show that this implies gradual learning, because in periods where the policymaker learns, he does not learn the whole truth. If M is chosen in a given period, the policymaker observes this structural equation with probability Q(M) > 0 and does not observe it with probability P(M). This implies in general that the sequence of monetary actions will be constantly adjusted as time goes on. The Bellman equation in this case is

J j - [@(a + h'M + E)'

+ (1 - O)(ci + b2M + s)‘]p(dO)dF,

+ P j C0Qt.V VP’(P)) + (1 - @Q(M) W2(~))l&W 0

+ (P(W) V(P) 11

(13)

“See Bertocchi and Spagat (1991) for such an example and a proof.

G. Berrncchi and M. Spagat. Learning and monetary polieJ 181

The term in (13) which reflects the value of information contains the following elements. P’I (p) gives the posterior belief if the prior is ,U and structural equation one occurs and is observed, which happens with probability OQ(M) if 0 is the true p. p2(p) gives the posterior belief if the prior is ~1 and structural equation two occurs and is observed, which happens with probability (1 - @Q(M) if 8 is the true p, With probability P(M) nothing is learned, so the prior p remains unchanged.

Again we compare the myopically optimal monetary action with the fully optimal one. As in Case 4, the optimal level is at least as large as the myopically optimal level, because of the direct relationship between learning and unan- ticipated money supply levels.

It can be shown that with certainty the number of observations on p is infinite. This implies that the policymaker’s beliefs converge to the true p with probabil- ity 1. However, it could take a very long time for the policymaker to learn the

truth. To summarize, the last two cases do display the trade-off between current

reward and information extraction that lies at the heart of the learning question. The actions of a fully optimizing policymaker will in part be experiments directed at learning the structure of the economy to improve future actions, even if these experiments are costly in the short run. Case 3, instead, involves learning but not experimentation, i.e., learning is passive since the process that generates information does not depend on the monetary actions chosen and the policy- maker is free to optimize short-run results based on the current state of

information. For all three cases the implications for the optimal monetary policy are

identical in the sense that fixed rules are dominated by feedback rules that take into account the value of the available information and its evolution. The time path of monetary actions, and the frequency and the size of the adjustment, will then depend on the parameters of the problem. In particular, one switch in regime will occur when learning is discrete, while there will be continuous adjustments when learning is gradual.

4. Conclusions

The question whether or not the money supply should be held constant or should instead respond to new information is an old one. We answer this old question with new tools and we conclude that, when the information content of policy actions is taken into account, a strong case is made for activist monetary policy. A fixed rule would in general turn out to be myopic and therefore suboptimal, because it does not allow any learning. This result holds for several different formulations of the economic structure. We also show that monetary policy should be activist not only in the sense that it should react to new

182 C. Bertocchi und M. Spugat. Learning and monetary poliq

information, but also in the stronger sense that it should actively seek to generate additional information. A certain degree of experimentation, therefore, may be called for in the implementation of monetary policy.

Finally, the question of how well the learning approach can perform as an explanation of the actual behavior of a monetary authority becomes an empiri- cal issue. As a possible direction for future research, gains and losses from experimentation could be quantified, for example, for the specific episode of the Volker disinflation. Even if in our model we have abstracted from consider- ations of inflation in the welfare function, our analysis could provide a rationale for the observed large costs, in terms of output and employment, which followed the 1979982 ‘monetary policy experiment’.”

References

Backus, D. and J. Driffill, 1985, Inflation and reputation, American Economic Review 75, 530-538. Balvers, R. and T. Cosimano, 1991, Periodic learning about hidden state variables, Journal of

Economic Dynamics and Control, forthcoming. Baxter, M., 1989, Rational response to unprecedented policies: The 1979 change in Federal Reserve

operating procedures, Carnegie-Rochester Conference Series on Public Policy 31, 2477296. Bertocchi, G., 1993, A theory of public debt management with unobservable demand, Economic

Journal 103,960-974. Bertocchi, G. and M. Spagat, 1991, Learning, experimentation and monetary policy, IRES dis-

cussion paper no. 9118 (Universite Catholique de Louvain, Louvain). Brainard, W., 1967, Uncertainty and the effectiveness of policy, American Economic Review 57,

41 I ~425. Dickinson, D.G., M.J. Driscoll, and J.L. Ford, 1982, Rational expectations, random parameters and

the non-neutrality of money, Economica 49, 241-248. Dotsey, M. and R.G. King, 1986. Informational implications of interest rate rules, American

Economic Review 76, 33-42. Easley, D. and N.M. Kiefer, 1988, Controlling a stochastic process with unknown parameters,

Econometrica 50, 104551064. Fischer, S., 1977. Long-term contracts, rational expectations and the optimal money supply rule,

Journal of Political Economy 85, 163- 190. Friedman, B.M., 1977, The inefficiency of short-run monetary targets for monetary policy, Brook-

ings Papers on Economic Activity 2, 2933335. Friedman, B.M., 1984, Lessons from the 1979982 monetary policy experiment, American Economic

Review 74, 3822387. Friedman, M., 1959, A program for monetary stability (Fordham University Press, New York, NY). Friedman, M., 1968, The role of monetary policy, American Economic Review 58, 1 ~17. Friedman, M., 1984, Lessons from the 1979982 monetary policy experiment, American Economic

Review 74, 3977400. Grossman, S.J., R.E. Kihlstrom, and L.J. Mirman, 1977, A Bayesian approach to the production of

information and learning by doing, Review of Economic Studies 44, 533-547. Kiefer, N.M. and Y. Nyarko, 1989. Optimal control of an unknown linear process with learning,

International Economic Review 30. 571-586.

“For a discussion of this ‘experiment’ see B.M. Friedman (1984). McCallum (1984), and M. Friedman (1984). Baxter (1989) evaluates the relevance of learning on the part of the public for the same episode.

G. Brrtocchi and M. Spagat. Learning and mom’tary polic:, I83

Lucas, R.E., Jr., 1973, Some international evidence on output-inflation trade-offs, American Eco- nomic Review 63, 3266334.

Marcet, A. and T.J. Sargent, 1989, Convergence of least squares learning mechanisms in self- referential linear stochastic models, Journal of Economic Theory 48, 337-368.

McCallum, B.T., 1984, Monetarist rules in the light of recent experience, American Economic Review 74, 3888391.

Mizrach, B., 1990, Non-convergence to rational expectations and optimal monetary policy in models with learning, in: N. Christodoulakis, ed., Dynamic modelling and control of national economies 1989 (Pergamon Press, Oxford) 293-298.

Prescott, E., 1972, The multiperiod control problem under uncertainty, Econometrica 40, 104331058.

Taylor, J.B., 1975, Monetary policy during a transition to rational expectations, Journal of Political Economy 83, 100991021.

Taylor, J.B., 1980, Aggregate dynamics and staggered contracts, Journal of Political Economy 88, l-24.

Theil, H., 1964, Optimal decision rules for government and industry (North-Holland, Amsterdam). Woodford, M., 1990, Learning to believe in sunspots, Econometrica 58. 2722308.