15
Journal of Forecasting, Vol. 9, 283-297 (1990) Optimal Selection of Forecasts LlAN CHEN University of Virginia G. ANANDALINGAM University of Pennsylvania ABSTRACT Many studies have shown that, in general, a combination of forecasts often outperforms the forecasts of a single model or expert. In this paper we postulate that obtaining forecasts is costly, and provide models for optimally selecting them. Based on normality assumptions, we derive a dynamic programming procedure for maximizing precision net of cost. We examine the solution for cases where the forecasters are independent, cor- related and biased. We provide illustrative examples for each case. KEY WORDS Forecast combination Dynamic programming Bayesian methods INTRODUCTION We present a methodology based on Bayesian analysis for the optimal selection of forecasts from experts or forecasting models. There is a considerable literature on combining forecasts (see the review by Genest and Zidek (1986) and the special issue of the Journal of Forecasting (1989)). The principal motivation for combining forecasts has been to avoid the apriori choice of a single forecasting model. In addition, studies have shown that a combination of forecasts often outperforms the forecasts of a single model or expert (see Mahmoud, 1984, for a summary of the results). As long as each forecasting model or expert provides new information, more reliable forecasts are obtained by combining them. When forecasters are independent, the weight assigned to each is usually inversely proportional to the variance of the past error of forecast (Winkler and Makridakis, 1983) or to the decision maker’s perception of the forecasters error-proneness (Anandalingam and Chen, 1989). The expression for the weight assigned to correlated forecasters is more compIicated. Implicit in all previous studies is the assumption that obtaining forecasts is costless, or at least the cost is not explicitly considered. What is more likely is that whether one obtains forecasts from models or from experts, there is a price to pay. Thus it is usual practice to hire just one expert or obtain inputs from one forecasting firm. This is reflected, for instance, in television and radio stations, where weatherpersons provide us with information from one, very costly, 0277-6693/90/030283- 15$07.50 Received December 1988 0 1990 by John Wiley & Sons, Ltd. Revised JUIJJ 1989

Optimal selection of forecasts

Embed Size (px)

Citation preview

Page 1: Optimal selection of forecasts

Journal of Forecasting, Vol. 9, 283-297 (1990)

Optimal Selection of Forecasts

LlAN CHEN University of Virginia

G. ANANDALINGAM University o f Pennsylvania

ABSTRACT Many studies have shown that, in general, a combination of forecasts often outperforms the forecasts of a single model or expert. In this paper we postulate that obtaining forecasts is costly, and provide models for optimally selecting them. Based on normality assumptions, we derive a dynamic programming procedure for maximizing precision net of cost. We examine the solution for cases where the forecasters are independent, cor- related and biased. We provide illustrative examples for each case.

KEY WORDS Forecast combination Dynamic programming Bayesian methods

INTRODUCTION

We present a methodology based on Bayesian analysis for the optimal selection of forecasts from experts or forecasting models. There is a considerable literature on combining forecasts (see the review by Genest and Zidek (1986) and the special issue of the Journal of Forecasting (1989)). The principal motivation for combining forecasts has been to avoid the apriori choice of a single forecasting model. In addition, studies have shown that a combination of forecasts often outperforms the forecasts of a single model or expert (see Mahmoud, 1984, for a summary of the results). As long as each forecasting model or expert provides new information, more reliable forecasts are obtained by combining them. When forecasters are independent, the weight assigned to each is usually inversely proportional to the variance of the past error of forecast (Winkler and Makridakis, 1983) or to the decision maker’s perception of the forecasters error-proneness (Anandalingam and Chen, 1989). The expression for the weight assigned to correlated forecasters is more compIicated.

Implicit in all previous studies is the assumption that obtaining forecasts is costless, or at least the cost is not explicitly considered. What is more likely is that whether one obtains forecasts from models or from experts, there is a price to pay. Thus it is usual practice to hire just one expert or obtain inputs from one forecasting firm. This is reflected, for instance, in television and radio stations, where weatherpersons provide us with information from one, very costly, 0277-6693/90/030283- 15$07.50 Received December 1988 0 1990 by John Wiley & Sons, Ltd. Revised JUIJJ 1989

Page 2: Optimal selection of forecasts

284 Journal of Forecasting Vol. 9, Zss. No. 3

source. Even if there are N relatively low-cost forecasters, as long as the cost is nonzero, choosing a subset of them may provide more cost-effective forecasts than choosing all of them. In this paper we develop models for optimally selecting forecasts to maximize the value of information. We will alternatively refer to choosing forecasts or forecasters, depending on context in the narrative; both are equivalent.

We examine the problem of choosing correlated biased forecasters. Using normality assumptions and a Bayesian framework, we derive a dynamic programming procedure for solving the problem of optimal selection of unbiased forecasters. We then examine cases where the forecasters are independent and have a common correlation and a common variance. In each of these special cases we show that a fairly simple choice rule can be obtained from the dynamic programming formulation. All these cases are illustrated by examples. Finally, we show how to adapt the dynamic programming procedure for the case where the forecasters are biased. There is a related paper by Clemen and Winkler (1985) in which they have derived expressions for the relationship between the number of independent and dependent forecasters needed to ensure the same precision. They do not examine the forecast selection problem. The focus on their paper is on analyzing the impact of dependence on precision and value of information.

Although the developments in this paper are based on choosing forecasts optimally, it is also applicable to a wide range of decision models where the state space and decision space coincide. This class includes problems of statistical estimation and problems of setting targets in the context of management, planning or control.

The rest of the paper is organized as follows. In the next section we formalize the forecaster selection problem. In the third section we examine the optimal choice of independent forecasters when we maximize precision of the combined forecasts under a budget constraint. The fourth section examines the same problem when the forecasters are unbiased but correlated with each other and the decision maker acts as a supra-Bayesian. The objective function is to maximize net gain in precision. The fifth section provides some special cases of the forecaster selection problem and the sixth section provides illustrative examples of the methods derived in the fourth and fifth sections. We end the paper with some conclusions.

THE FORECASTER SELECTION PROBLEM

Consider a stochastic process [ e l : t = 1,2, ... ] when Or E R' is a continuous random variable at time r for which forecasts are being reported by N experts or forecasteys. The forecasts could also have been obtained from forecasting models. Suppose that at the beginning of each time period the decision maker (DM) has some prior probability distribution of and also obtains N forecasts denoted by the vector

Xr = [ XI I , x2 I , . . . , XNr 1 The DM obtains a composite forecast of the unknown el by combining the vector xr with

his own prior. While it is clear from the literature (Mahmoud, 1984; Winkler and Makridakis, 1983) that the variance (precision) of the composite forecast is minimized (maximized) by including all forecasts ( z I ~ , X Z ~ , ..., X M ) the cost of doing so might be exorbitant. For instance, it is usual for forecasts to be obtained by hiring expert forecasters, or by building the forecasting models, both of which have significant costs attached to them. Thus we need a methodology for choosing the optimal number of forecasts while satisfying budgetary constraints.

Page 3: Optimal selection of forecasts

L. Chen and G . Anandalingam Optimal Selection of Forecasts 285

We examine this problem for the static case, and thus drop the time subscript t for the remainder of this paper. The developments of the static case can easily be extended to the dynamic case. One option is to choose a different set of forecasts (or experts) for every time period. Another is to base the optimal choice of forecasters on statistics of the same time period in the past as the period in the future for which the forecasts are needed.

Al . The DM’s prior probability distribution about 0 is normal with mean PO and precision yo. A2. The likelihood function of N forecasts f(x1, x2 ..., x,, I 0) is proportional to a multivariate

normal distribution with a mean vector p ~ , and a known covariance matrix E N , where p~ is N X 1 vector and CN is a N X N matrix such that

Before we proceed, we need to make the following assumptions:

L N = I In the most general case, assumption A2 suggests that the forecasters (experts) are correlated

with each other. In the simple case, we can ignore the correlation, i.e. let aij = 0 for i # j , and consider the problem of minimizing the total variance, or maximizing the total precision, where individuaI precision is the inverse of the variance:

2 y; = l /a ;

Implicit in ignoring the correlation is the assumption that the forecasters are independent of each other.

Note that all the procedures developed in this paper assume that the parameters of the multivariate normal process are known. If the covariance matrix and bias vector are unknown and must be estimated, the dynamic problem would become an adaptive control problem (see Zellner, 1971; Anandalingam and Chen, 1989). The DM might, for example, decide to select relatively unknown forecasters in order to obtain information about them, on the grounds that this information could improve later selections.

OPTIMAL SELECTION OF INDEPENDENT FORECASTS

Consider, as before, N different forecasters, each of whom costs c; to hire, and each produces a forecast of known precision y;. The total precision of the forecasts is increased by combining the forecasts, but there is a total budget, b, available for hiring the forecasters. Let y = ( y l , .. ., Y N ) be the vector that represents the forecast selection decision; for any i, y, = 1 if forecaster i is selected and y; = 0 if he or she is not. The problem of optimal selection of independent forecasters is formulated as follows:

P1 N

maximize C yiy; i = 1

N

i = I subject to C c;y; < b

(3)

(4)

y;=Oor 1 ( 5 )

Problem P1 is the well-known 0-1 knapsack problem (Bradley et al., 1977), that is, np hard,

Page 4: Optimal selection of forecasts

286 Journal of Forecasting V O ~ . 9, rss. NO. 3

'=

and usually solved by branch-and-bound techniques, which take a fairly significant amount of computation time.

The problem as formulated in PI does not take into account the decision maker's prior on the unknown 8 or the correlation among the forecasters. Also, as is usual in the forecast combination literature, the forecasters are weighted in proportion to the past precision of their forecasts. As shown in Anandalingam and Chen (1989), unless the biases of the forecasters are also taken into account before combining forecasts from them, the decision maker could make significant errors. We deal with the optimal choice of biased forecasters later. In the next section we provide a Bayesian model and a dynamic programming solution procedure for the general forecast selection problem.

81 4.0 3.0 2.0 4.0 16 1.5 1.0 3.0 1.5 4 2 2.0 1.0 2 1.5

Table 1. Results of selecting independent forecasters

Optimal Average Budget constraint forecasts iterations

0 G b G 0 . 0 4 0.04 G b < 0.09 0.09 < b < 0.19 0.19 G b < 0.24 0.24 G b < 0.29 0.29 G b < 0.34 0.34 G b < 0.49 0.49 < b

None x4

xi and x4 x2 and x4 xI , x2 and x4 x3 and x4 x lr x3 and x4 All

4.3 6.4

10.6 12.0 12.3 8.2 8.0 1 .o

Page 5: Optimal selection of forecasts

L . Chen and G. Anandalingam Optimal Selection of Forecasts 287

OPTIMAL SELECTION OF CORRELATED UNBIASED FORECASTERS

Overview In this section we assume that the forecasters (experts) are unbiased, but are correlated with each other, and that assumption A2, corresponding to equation (l), holds true. Unlike in the previous section, DM's prior information is an important input into the forecaster selection problem. We provide Bayesian models for optimal selection of unbiased forecasters to minimize variance (or maximize precision) under budget constraints. Then we obtain some mathematical results in order to develop our algorithm. It must be pointed out that in the mathematical developments in the remainder of this paper we implicitly assume that the DM is independent of all forecasters. If the correlation between the DM and forecasters are considered, the solution to the problem will be exactly the same, except that the covariance matrix given in assumption A2 wiil have dimension (N+ 1) x (N+ l), and will include the variance of the DM's prior forecast and the correlation of the DM with the other forecasters.

Mathematical developments First, we formally introduce the well-known mechanism for sequentially updating the posterior via Bayes theorem.

Theorem I Let +,,(a) denote DM's prior probability distribution and +,(e) the DM's posterior probability distribution after n < N forecasts have been sequentially selected. Then @,,(0) can be obtained as:

h ( 0 ) = f ( x n I ~ ~ , . . . , ~ n - ~ , ~ ) @ ~ - ~ ( ~ ~ n = 1,2, ..., N

Theorem 1 provides a recursive formulation for updating the posterior probability distribution function of 0 sequentially. In order to obtain @ n ( 0 ) at the nth stage, i.e. after the nth forecast is used, we need to find the specific form of the conditional distribution function f(X" 1 XI, ..., x n - I , 0 ) of x,.

Theorem 2 The conditional pdf of xn, n < N, for fixed values of (XI , ..., xn- I , 0 ) has the following form:

where C,- I is a (n - 1) x (n - 1) covariance matrix associated with the forecast vector = (XI, ..., xn- I ) , and Cn.(,z-l) and C { n - ~ ) . n are row and column of En, respectively, such

that

C n ( n - 1 ) = [m, an2, ..., a n ( n - 1 ) l

and

Page 6: Optimal selection of forecasts

288 Journal of Forecasting Vol. 9, Iss. No. 3

Proof See Anderson (1958).

Recalling that the definition of P n - 1 from assumption A2, it is easy to see that

e + z n ( n - I ) xi-’i ( X n - i - p n - i ) = ( l - L ( n - i ) G - ’ i en-IP+Zn(n- l ) Xn-1 (11)

where en- I is a unit vector of n - 1 components. Suppose the mean and variance in equation (10) is denoted by and ld, respectively. We then have

( X n I X I , . . . , n n - l e > - N @ n , i T f i ) (12)

where n - I

F n = P o 0 + bn = P0e + i = c I Pixi

and

with

and

bn = L ( n - I ) xi-’, Xn- I (16)

Should there be a time series of forecasts, then P i ’ s , i = 1,2, ... n, can be considered regression coefficients on the Xi’s. Note that here we are not discussing weights for combining forecasts. Also note that if the DM is correlated with the forecasters, the index i used in the above formulas will start from i = 0 instead of i = 1. The next theorem provides a way of recursively updating the means and variances of the posterior distribution. Note that, as before (equation (2)), the precision rn = l/afr, where afi is given by equation (14).

Theorem 3 If the posterior probability distribution + n - I ( e ) of 8 given forecasts X I , ..., X n - I is normal with mean mi- 1 and precision rz- 1, and the likelihood function f ( X n 1 X I ..., X n - 1, e ) is normal with mean jin = an@ + b,, and precision fn, then the posterior pdf +.,(e) of 8 is normal with mean m i , and precision r: such that

and

and the initial values of mo and ro are the DM’s prior mean and variance, respectively, i.e. mo = PO and ro =yo.

Page 7: Optimal selection of forecasts

L . Chen and G . Anandalingam Optimal Selection of Forecasts 289

Pro0 f See Appendix.

Note that r,* is the updated precision of the combined forecast after n forecasters are chosen. It includes the correlations among all forecasters up to and including the nth one.

Selection of forecasts to maximize net gain in precision In this section we obtain a sofution algorithm for optimally selecting a subset of the available forecasters to maximize net gain in precision that results from using the forecasts. This act of folding in costs into the objective function, in a Lagrangian manner, yields an elegant solution algorithm.

As in the previous section, we can assume that the joint distribution f(x1, ..., xn 10) is multi- variable normal with mean en€), where en is a n-dimensional unit vector and variance-covariance matrix C,,. The updated precision, after n forecasters are chosen, is given by equation (18).

Let J(xl, ..., x,,) be the net gain from n forecasts; then we have:

J ( x I , ..., x,) = Kr,, - 2 Ci (19)

where ci is the cost of obtaining forecast i and K is a constant that translates precision into a cost unit (say, dollars).

Note that the larger the K, the more the importance given to maximizing precision. Without loss of generality, we will let K = 1 from now on. Next, we provide a dynamic programming procedure to solve the forecaster selection problem.

i = I

A dynamic programming procedure Recall from the previous section that:

+ n ( e / ~ l ,..., x n ) = f ( x n ~ x 1 ,..., ~ , , e ) + , - ~ ( e )

where f ( x , I XI , ..., x,,, 0 ) is a univariate normal distribution with mean f i n and variance 8; given by equations (13) and (14), respectively. Also if the posterior distribution 9,- I (0 1 x l , . .., x,,- then the posterior distribution 9,,(0 I XI, ..., xn) is normal with mean m,* and precision r,* given by equation (17) and (18), respectively. We now establish some relationships for the net gain J ( . )

given forecasts XI , . .., x,- is normal with mean m,*- and precision r:-

Theorem 4 If J(xl , ... x,,) is the net gain from forecasts X I ... Xn and J(x,,/xl, ... xn) is marginal gain contributed by forecast xn, given that forecasts XI, ... Xn- I have already been obtained. We then have:

(20) J ( x ~ , ..., x,)= J ( x ~ , ..., xn- l )+ J(xnI XI, - . . ,X~- I )

2.. where J(x,, I XI, ..., xn-1) = ad, , - cn.

Pro0 f As we have defined previously:

Page 8: Optimal selection of forecasts

290 Journal of Forecasting Vol. 9, Iss. No. 3

Using equation (18), this yields t l - 1

J(x,, ... xn) = r;-, + u;tn - c c; - C,, I = 1

= J ( X i , - . . , X n - I ) + J(X\XI,*-.,X~-I) Theorem 4 allows us to formulate the problem as a dynamic program.

Define:

S = (xl, , . . , x , ~ ) is the set of forecasts available Zn = (x(~), ..., qn)) is the set of n forecasts chosen at stage n, n = 1,2, ..., N

Note that index ( i ) is used to distinguish between x(,) E Zn and x; E S . If S , is the state space at stage n, the nth forecast chosen will be x ( ~ ) E S,, where S , = ( S ) - ( Zn I . Let J*(Zn) be the maximum of value of information when n forecasts are chosen, and Z,* is the optimal set of forecasts at stage n. Then the optimal forecaster selection problem is formulated as follows:

P2

J * ( Z n ) = max { J ( Z ~ - I ) + J ( X { ~ ) I 2 n - 1 ) ] , n = 2 , ..., N

J * ( Z ~ ) = max ( J (x (~ ) ) I

(21)

(22)

s, ,,) E s,,

X( I ) t s

Note that, in general, Zn- I may not be the same as Z,*- 1. For a general problem, the optimal set of forecasts is obtained by using the backward induction technique implied by the dynamic programming formulation given by equations (21) and (22). We illustrate this by examples later. In the next section we show that for some special cases there are much simpler solution procedures.

SOME SPECIAL CASES OF THE PROBLEM

In this section we discuss the solutions for the problems which have special structures.

Independent forecasts We examined this case in the third section where we concluded that it was a knapsack problem which was np hard. For this simple problem, the total precision of the combined forecast is directly additive. Thus we can easily adapt the dynamic programming procedure outlined above to solve it efficiently. Substituting equation (14) into equation (18), and recalling an = 1, we obtain

(23) rn = rn- 1 + I / U ;

Thus the value of information obtained from n forecasters is:

J(x,, ..., xn) = J ( X l , . . . I xn- I ) + J(xn) (24)

where

J(xn) = l / ~ i - Cn

Page 9: Optimal selection of forecasts

L. Chen and G. Anandalingam Optimal Selection of Forecasts 291

In such a case the selection rules become trivial. Suppose there is a budget constraint, then the selection rule will be: select the forecasters from the available pool with the largest J ( x n ) values, and stop when the budget runs out. When there is no budget limitation, the forecasters will be sequentially selected as long as J ( x I , x2, ..., X n ) > 0.

Common correlation In this case we assume that the correlations among the forecasters are the same, i.e. we have pij = p , for v i and j . Since the correlations are the same, the sequence of selected forecasters would be based on the ratios of the forecasters incremental precision to cost. Note that the incremental precision of forecasters would depend on both the variance of his forecast and its correlation with other forecasts. Thus, if Z,*-1 = ( x(I), x(2), ..., x ( ~ - I ) ) is the set of optimal forecasters chosen at stage n - 1, it must be a subset of optimal forecasters chosen at stage n. Let x ( ~ ) denote that nth forecaster selected and Z,* the optimal set of forecasters chosen at stage n. We then have

(25) z,*= x ( n ) u z,*-I = {x(I), x(2), -.., ~ ( n ) )

J(x(n) I ~ ( i ) , ..., x(n-1)) = max ( J ( x j I x(I), ..., X ( n - I ) ) , 0 )

where x ( ~ ) is obtained using the following rule:

x, c s,,

Note that equations (25) and (26) together give a simple selection procedure. When there is no budget constraint, the decision maker can select the forecasters according to the selection rule given in equation (26) until J ( X ( n ) I x(I), ..., X ( n - 1 ) ) < 0, where

(26)

Common variances For this special case of the problem the variances of the forecasters are assumed to be the same, but the correlations among them may be different. Here we have af = aj' = a2, vi and j . As in the second case above, the forecasters selected in such a problem are determined by the ratios of their incremental precisions to the costs. Thus the same relation provided by equation (25) is applicable, i.e. the optimal set of forecasters at stage n is built on the optimal set of forecasters obtained at stage n - 1. This fact reduces a large amount of computation time and makes the selection problem easy to solve.

The selection rule for this special problem will be exactly the same as in the second case, i.e. select forecasters sequentially following equation (26) until J ( X ( n ) I x(~), ..., x ( ~ - I ) ) < 0. For this case we have

Discussion Note that there are common characteristics of the solutions discussed in the above three cases: the structure of the optimal set of forecasters chosen at each stage is similar. We see that the optimal set obtained at stage n - 1 also belongs to the optimal set at stage n. Thus the optimal value of .I,*(.) is a monotone increasing function of n before the selecting process stops. The solution procedure exploits this fact: the optimal set of forecasters is sequentially added until

Page 10: Optimal selection of forecasts

292 Journal of Forecasting Vol. 9, Iss. No. 3

we encounter a decrease in the value of J* when we stop. This solution procedure reduces com- putation time considerably as compared to the backward induction method, which is used to solve the general type of problem. In the next subsection, we use three examples to illustrate (1) the solution procedure for the general form of forecaster selection problem and (2) the dif- ference in the solution structures of the special problems.

ILLUSTRATIVE EXAMPLES

The forecaster selection problem becomes simple when the covariance matrix of the forecasters has the special structures discussed above. However, for any problem which has the general form of covariance matrix the special relationship between the optimal sets of forecasters no longer exists. Thus the problem can be solved only by the backward induction technique using the dynamic programming formulation given in equations (21) and (22). In this section we il- lustrate the solution procedure for the general form of the problem with an example. We will also provide examples for establishing optimal sets of forecasts for the special cases 2 and 3 discussed in the previous section.

Example 2 We illustrate the methodology described by problem P2 with an example. As in Example 1 , sup- pose we have four forecasters (XI, X Z , X3, X4). Before the actual realization, their forecasts are considered to be unknown random variables. The covariance matrix and cost is the same as in Example 1 , although in this case the correlations among the experts are not ignored.

The solution of the dynamic program given above in equations (21)-(22) yields the following steps:

n = l ZI J(Z1)

X I 0.061 xz 0.1 x3 x4 0.793

(XI, XZ) 0.0505 (XI, ~ 3 ) -0.1913 (XI, ~ 4 ) 0.9814 + J*(Zz) = J ( x I , x4)

(xz, x3) -0 .4 (XZ, ~ 4 ) 0.00557 (x3, x4) 0.21

(XI, XZ, ~ 3 ) - 0.2165 (XI, ~ 3 , ~ 4 ) 0.8155 (XZ, ~ 3 , ~ 4 ) 0.0716

n = 4 z4 J(Z4)

+ J*(Z3) = J(x I , xz, x3)

(xi, xz, X3, X4) -0.289

Page 11: Optimal selection of forecasts

L. Chen and G. Anandalingam Optimal Selection of Forecasts 293

The optimal number of forecasters chosen should be n * = 2, forecasters 1 and 4, and the optimal value is:

and cost c * ( Z z ) = 0.09. Note that this solution is the same as that in Example 1 for a budget constraint b in range

0.09 < b 0.19. Recall that maximizing precision and minimizing cost have equal weight (i.e. K = 1 ) for the illustration. As the constant of proportionality K increases, i.e. as cost becomes less important, we can expect more of the forecasters to be chosen; conversely, as K decreases, selecting only forecaster # 4 would be optimal.

J*(Zz) = J ( x ~ , ~ 4 ) = 0.9814

Example 3 Here we solve the same problem as in Examples 1 and 2, but assume that the correlations among the experts are the same. Suppose the variances of the forecasters are as in Example 2 and that the correlation coefficient by pij = p = 0.2, V i and j . We obtain the following variance-covariance matrix:

C= [ 7.2 16 1.6 0.981

Note that the forecasters in this example are more correlated with each other then in Examples 1 and 2.

The cost vector of the forecasters is the same as before. The solution steps are given as:

81 7.2 3.6 2.20

3.6 1.6 4 0.49 2.16 0.96 0.48 1.5

n = l ZI J ( Z l )

XI 0.061 x2 0.1 ~3 0.25 ~4 0.793 + J * = J(x4)

(XI, XZ) -0.1116 (XI, ~ 3 ) - 0.4593 (XI, ~ 4 ) (xz, X3) 0.1 147 ( X Z , ~ 4 ) 0.6557 ( ~ 3 , ~ 4 ) 0.7424

0.8685 + J * = J(xi, ~ 4 )

(XI , XZ, ~ 3 ) -0.223 (XI, XZ, ~ 4 ) (xi, x3, ~ 4 ) 2.935 (XZ, ~ 3 , ~ 4 ) 0.6090

n = 4 z4 J(z4)

3.6946 + J* = J ( x I , XZ, ~ 3 )

Page 12: Optimal selection of forecasts

294 Journal of Forecasting Vol. 9, Zss. No. 3

The optimal number of forecasters in this example is n* = 3, and the optimal value of infor- mation obtained is J * ( x l , x3, x4) = 3.6946. The backward induction method (i.e. equation (25)) leads to the following recursive inclusion of the solutions:

z: = (x4)

z: = XI u z: = (XI, x4)

z: = x3 u z; = (XI, x3, x4)

Thus as the forecasters are more correlated with each other, the optimal solution is to choose more of them to maximize net precision. This result may seem odd on the suface. The reason for this could be that different variances lead to negative weights for the forecasts and very high posterior precision: the larger the correlation, the more likely for this behaviour to persist (see Bunn, 1985, for instance). This behavior is highlighted in Table 11.

Example 4 In this example we demonstrate the solutions for the problem of identical variances among forecasters. Now suppose the variances of the forecasters are of = 16, Vi . Let the correlation coefficients between forecasters be as in Examples 1 and 2. Note that, unlike in those examples, all forecasters have a very high variance. Using the same dynamic programming procedure yields

n = 1 ZI J ( Z I

XI 0.2 xz 0.1 x3 0.0 ~4 0.21 + J * = .J(x~)

n = 2 Zz J( Zz 1

(XI, XZ) 0.0778 (XI, ~ 3 ) -0.0727 (XI, ~ 4 )

( X Z , ~ 3 ) - 0.0233 (XZ, ~ 4 ) - 0.1727 ( ~ 3 , ~ 4 ) - 0.1233

0.1023 + J * = J(xI, ~ 4 )

( X I , XZ, ~ 3 ) -0.2185 ( x ! , x ~ , x ~ ) - 0 . 0 4 4 5 6 J * = J ( x I , x z , x ~ ) ( X I , ~ 3 , ~ 4 ) -0.1782 ( ~ 2 , ~ 3 , ~ 4 ) - 0.2762

n = 4 Z4 J ( Z4

(XI, XZ, ~ 3 , ~ 4 ) -0.3138

The optimal number of forecasters in this example is n* = 1 and Z : = x4. The solutions are obtained recursively in each stage in order to satisfy the special relationship discussed in the

Page 13: Optimal selection of forecasts

L . Chen and G. Anandalingam Optimal Selection of Forecasts 295

Table I1

Correlation coefficient Optimal number of forecasters

c 0.17

> 0.69 0.17 - 0.69

previous section; i.e. we have

z:=(X4) 2; = XI u z: = (XI, x4) 2: = x2 u 2; = (XI, x2, x4)

Note that if Examples 3 and 4 are solved by the method discussed in the fifth section, the selection procedure would stop earlier than that using the backward induction method. In that case Example 4 would stop after stage 1, because J(xj 1 Z:) < 0 for all Xj E S2, and Example 3 would stop after stage 3.

As the variance is the same, and are all uniformly high, and the correlation among the forecasters is relatively low, only the forecaster with the lowest cost, x4, is chosen. If all variances are increased further while keeping the correlation the same, only x4 is chosen. (This and other result are not shown for the sake of brevity.) Conversely, as the variances are reduced, first x2, then XI, and lastly x3 are included in the set of optimal forecasters.

CONCLUSIONS

In this paper we have provided a dynamic programming procedure for selecting the optimal subset of a set of available forecasters. We examined the solution structure for cases where the forecasters are independent, correlated and biased. We have shown that, in the case of the illustrative example, as the correlation among the forecasters decreases, while individual variances remain unchanged, we would choose more of them in order to increase overall pre- cision. Similarly, as the variance of the individual forecasts increases, while the correlation re- mains unchanged, the number of forecasts chosen would increase. The more correlated the forecasters are with each other, the less the information produced.

One of the limitations of the paper is the assumption that likelihood function of the forecasts is normal. The assumption is fairly strong, and is necessary to derive the separable structure of the objective function (Theorem 4) and the associated dynamic programming procedure. For genera1 likelihood distributions we would obtain a much more complex scheme for updating the objective function, although our intuition tells us that we could obtain a dynamic program- ming procedure even in this case.

Another drawback in this paper is the assumption that the parameter of the multivariate nor- mal process and biases are known. In an earlier paper we dealt with biased forecasters, and set up an adaptive mechanism to learn about the biases (Anandalingam and Chen, 1989). The selection problem under unknown parameters becomes considerably more complicated if the decision maker also has to take into account the informational value of observations from one time period to the next. This problem can also be handled using an adaptive control technique (e.g. Zellner, 1971, Chapter 11).

Page 14: Optimal selection of forecasts

296 Journal of Forecasting

APPENDIX

Vol. 9, Iss. No. 3

Proof theorem 3 From Theorem 1 the posterior pdf is given by

Also, from Theorem 2 ,

f n

2n f ( x n I X I , ..., X n - 1 , e ) =- exp [xn - (an€) + bn)]

and

then

r,*- I -- (e -m:-I )2 i 2

We can simplify the exponent as follows:

Page 15: Optimal selection of forecasts

L. Chen and G. Anandalingam Optimal Selection of Forecasts 297

Then

i.e. @n(0) is normal with mean rnz and a precision r:. [ QEDI

ACKNOWLEDGEMENTS

This forms part of t h e PhD dissertation of Lian Chen. We t h a n k Don Brown, Julia Pet-Edwards, Bill Scherer, Chip Whi te and an a n o n y m o u s referee for comments and criticisms. All remaining errors are our responsibility.

REFERENCES

Anandalingam, G. and Chen, L. ‘Linear combination of forecasts: a general Bayesian model’, Journal

Anderson, T. W., An Introduction to Multivariate Statistical Analysis, New York: John Wiley (1958). Bradley, S., Hax, A. and Magnanti, T., Applied Mathematical Programming, Reading, MA: Addison-

Bunn, D. W., ‘Statistical efficiency in linear combination forecasts’, International Journal of Forecasting,

Clemen, R. T. and Winkler, R. T., ‘Limits for the precision and value of information from dependent

Genest, C. and Zidek, J . V., ‘Combining probability distributions: a critique and an annoted

Lindley, D. V., ‘Reconciliation of probability distribution’, Operations Research, 31 (1983), 866-80. Mahoud, E., ‘Accuracy in forecasting: a survey’, Journal of Forecasting, 3 (1984), 139-59. Winkler, R. L. and Makridakis, S., ‘The combination of forecasts’, Journal of the Royal Statistical

Zellner, A., (1983), An Introduction to Bayesian Inference in Econometrics, New York: John Wiley,

of Forecasting, 8 (1989), 199-214.

Wesley, (1977).

1 (1985), 153-63.

sources’, Operations Research, 33 (1989, No. 2, March-April, 427-42.

bibliography’, Statistical Science, I (1986), 114-48.

Society, Series A (1983), No. 146, 150-57.

(1971).

A uthors ’ biographies: Lian Chen has a B.4 (1977) from Huazhong University of Science and Technology from the People’s Republic of China, and an MS (1984) in Industrial Engineering and Operations Research from Syracuse University. She is a PhD candidate in Systems Engineering at the University of Virginia.

G. Anandalingam has a BA (1975) from Cambridge University and a PhD. (1981) from Harvard Univer- sity. He is an Assistant Professor of Systems Engineering at the University of Pennsylvania. Prior to that he was an Assistant Professor at the University of Virginia and an Engineer-Economist at Brookhaven National Laboratory. His papers have appeared in Management Science, European Journal of Oper- ational Research, Journal of the Operational Research Society, among others.

Authors’ addresses: Lian Chen, Department of Systems Engineering, University of Virginia.

G. Anandalingam, Department of Systems, University of Pennsylvania, Philadelphia, PA 19104, U.S.A.