13
Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011 www.hydrol-earth-syst-sci.net/15/3701/2011/ doi:10.5194/hess-15-3701-2011 © Author(s) 2011. CC Attribution 3.0 License. Hydrology and Earth System Sciences DREAM (D) : an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter estimation problems J. A. Vrugt 1,2 and C. J. F. TerBraak 3 1 Department of Civil and Environmental Engineering, University of California, Irvine, 4130 Engineering Gateway, Irvine, CA 92697-2175, USA 2 Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands 3 Biometris, Wageningen University and Research Centre, P.O. Box 100, 6700 AC, Wageningen, The Netherlands Received: 19 April 2011 – Published in Hydrol. Earth Syst. Sci. Discuss.: 26 April 2011 Revised: 27 November 2011 – Accepted: 3 December 2011 – Published: 13 December 2011 Abstract. Formal and informal Bayesian approaches have found widespread implementation and use in environmental modeling to summarize parameter and predictive uncertainty. Successful implementation of these methods relies heavily on the availability of efficient sampling methods that ap- proximate, as closely and consistently as possible the (evolv- ing) posterior target distribution. Much of this work has fo- cused on continuous variables that can take on any value within their prior defined ranges. Here, we introduce the- ory and concepts of a discrete sampling method that resolves the parameter space at fixed points. This new code, enti- tled DREAM (D) uses the recently developed DREAM al- gorithm (Vrugt et al., 2008, 2009a,b) as its main building block but implements two novel proposal distributions to help solve discrete and combinatorial optimization problems. This novel MCMC sampler maintains detailed balance and ergodicity, and is especially designed to resolve the emerg- ing class of optimal experimental design problems. Three different case studies involving a Sudoku puzzle, soil water retention curve, and rainfall – runoff model calibration prob- lem are used to benchmark the performance of DREAM (D) . The theory and concepts developed herein can be easily inte- grated into other (adaptive) MCMC algorithms. Correspondence to: J. A. Vrugt ([email protected]) 1 Introduction Formal and informal Bayesian methods have found widespread application and use to summarize parameter and model predictive uncertainty in hydrologic modeling. These parameters generally represent model dynamics, but could also include rainfall multipliers (Kavetski et al., 2006; Kucz- era et al., 2006; Vrugt et al., 2008), error model variables (Smith et al., 2008; Schoups and Vrugt, 2010), and calibra- tion data measurement errors (Sorooshian and Dracup, 1980; Schaefli et al., 2007; Vrugt et al., 2008). Monte Carlo meth- ods are admirably suited to generate samples from the pos- terior parameter distribution, but generally inefficient when confronted with complex, multimodal, and high-dimensional model-data synthesis problems. This has stimulated the de- velopment of Markov Chain Monte Carlo (MCMC) meth- ods that generate a random walk through the search (pa- rameter) space and iteratively visit solutions with stable fre- quencies stemming from an invariant probability distribu- tion. If well designed, such MCMC methods should be more efficient than brute force Monte Carlo or importance sampling methods. To visit configurations with a stable frequency, an MCMC algorithm generates trial moves from the current position of the Markov chain x t to a new state z. The earliest MCMC approach is the random walk Metropolis (RWM) algorithm (Metropolis et al., 1953). Assume that we have already sam- pled points {x 0 ,..., x t } this algorithms proceeds in the fol- lowing three steps. First, a candidate point z is sampled from a proposal distribution q(·) that depends on the present Published by Copernicus Publications on behalf of the European Geosciences Union.

DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011www.hydrol-earth-syst-sci.net/15/3701/2011/doi:10.5194/hess-15-3701-2011© Author(s) 2011. CC Attribution 3.0 License.

Hydrology andEarth System

Sciences

DREAM (D): an adaptive Markov Chain Monte Carlo simulationalgorithm to solve discrete, noncontinuous, and combinatorialposterior parameter estimation problems

J. A. Vrugt 1,2 and C. J. F. Ter Braak3

1Department of Civil and Environmental Engineering, University of California, Irvine, 4130 Engineering Gateway,Irvine, CA 92697-2175, USA2Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands3Biometris, Wageningen University and Research Centre, P.O. Box 100, 6700 AC, Wageningen, The Netherlands

Received: 19 April 2011 – Published in Hydrol. Earth Syst. Sci. Discuss.: 26 April 2011Revised: 27 November 2011 – Accepted: 3 December 2011 – Published: 13 December 2011

Abstract. Formal and informal Bayesian approaches havefound widespread implementation and use in environmentalmodeling to summarize parameter and predictive uncertainty.Successful implementation of these methods relies heavilyon the availability of efficient sampling methods that ap-proximate, as closely and consistently as possible the (evolv-ing) posterior target distribution. Much of this work has fo-cused on continuous variables that can take on any valuewithin their prior defined ranges. Here, we introduce the-ory and concepts of a discrete sampling method that resolvesthe parameter space at fixed points. This new code, enti-tled DREAM(D) uses the recently developed DREAM al-gorithm (Vrugt et al., 2008, 2009a,b) as its main buildingblock but implements two novel proposal distributions tohelp solve discrete and combinatorial optimization problems.This novel MCMC sampler maintains detailed balance andergodicity, and is especially designed to resolve the emerg-ing class of optimal experimental design problems. Threedifferent case studies involving a Sudoku puzzle, soil waterretention curve, and rainfall – runoff model calibration prob-lem are used to benchmark the performance of DREAM(D).The theory and concepts developed herein can be easily inte-grated into other (adaptive) MCMC algorithms.

Correspondence to:J. A. Vrugt([email protected])

1 Introduction

Formal and informal Bayesian methods have foundwidespread application and use to summarize parameter andmodel predictive uncertainty in hydrologic modeling. Theseparameters generally represent model dynamics, but couldalso include rainfall multipliers (Kavetski et al., 2006; Kucz-era et al., 2006; Vrugt et al., 2008), error model variables(Smith et al., 2008; Schoups and Vrugt, 2010), and calibra-tion data measurement errors (Sorooshian and Dracup, 1980;Schaefli et al., 2007; Vrugt et al., 2008). Monte Carlo meth-ods are admirably suited to generate samples from the pos-terior parameter distribution, but generally inefficient whenconfronted with complex, multimodal, and high-dimensionalmodel-data synthesis problems. This has stimulated the de-velopment of Markov Chain Monte Carlo (MCMC) meth-ods that generate a random walk through the search (pa-rameter) space and iteratively visit solutions with stable fre-quencies stemming from an invariant probability distribu-tion. If well designed, such MCMC methods should bemore efficient than brute force Monte Carlo or importancesampling methods.

To visit configurations with a stable frequency, an MCMCalgorithm generates trial moves from the current position ofthe Markov chainxt to a new statez. The earliest MCMCapproach is the random walk Metropolis (RWM) algorithm(Metropolis et al., 1953). Assume that we have already sam-pled points{x0,...,xt } this algorithms proceeds in the fol-lowing three steps. First, a candidate pointz is sampledfrom a proposal distributionq(·) that depends on the present

Published by Copernicus Publications on behalf of the European Geosciences Union.

Page 2: DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

3702 J. A. Vrugt and C. J. F. Ter Braak: DREAM(D)→ discrete MCMC simulation

locationxt . Next, the candidate point is accepted with accep-tance probability,α(z,xt ) (Metropolis et al., 1953; Hastings,1970):

α(z,xt )=1∧p(z)p(xt )

q(z→ xt )

q(xt→ z), (1)

wherep(·) represents the posterior density, andq(xt→ z)(q(z→ xt )) denotes the conditional probability of the for-ward (backward) jump. This last ratio cancels out if a sym-metric proposal distribution is used. Finally, if the proposalis accepted the chain moves toz, otherwise the chain re-mains at its current locationxt . Following a so called burn-inperiod (of say,l steps), the chain approaches its stationarydistribution and the vector{xl+1,...,xl+m} contains samplesfrom π(·). The desired summary of the posterior distribu-tion, π(x) is then obtained from this sample ofm points. InBayesian applications,π(·) is the distribution of partially un-known parameters given the data at hand, and is obtained bycombining the prior distribution and the data likelihood. Thedependence ofπ(·) on any fixed data is assumed throughout.

The standard RWM algorithm has been designed to main-tain detailed balance with respect toπ(·) at each individualstep in the chain:

π(xt )p(xt→ z)=π(z)p(z→ xt ) (2)

whereπ(xt ) (π(z)) denotes the probability of finding the sys-tem in statext (z), andp(xt→ z) (p(z→ xt )) denotes theconditional probability to perform a trial move fromxt to z(z to xt ). The detailed balance condition essentially ensuresthat the samples of{xl+1,...,xl+m} are exactly distributedaccording to the target distribution,π(x).

Existing theory and experiments prove convergence ofwell-constructed MCMC schemes to the appropriate limit-ing distribution under a variety of different conditions. Inpractice, this convergence is often observed to be impracti-cally slow. This deficiency is frequently caused by an inap-propriate selection ofq(·) used to generate trial moves in theMarkov Chain. This inspiredVrugt et al.(2008, 2009a,b) todevelop a simple adaptive RWM algorithm called Differen-tial Evolution Adaptive Metropolis (DREAM) that runs mul-tiple chains simultaneously for global exploration, and au-tomatically tunes the scale and orientation of the proposaldistribution during the evolution to the posterior distribution.This scheme is an adaptation of the Shuffled Complex Evolu-tion Metropolis (Vrugt et al., 2003) global optimization algo-rithm and has the advantage of maintaining detailed balanceand ergodicity while showing excellent efficiency on com-plex, highly nonlinear, and multimodal target distributions(Vrugt et al., 2008, 2009a,b).

In DREAM, N different Markov Chains are run simulta-neously in parallel. If the state of a single chain is given by asingled-dimensional vectorx, then at each generation theNchains in DREAM define a populationX, which correspondsto anN x d matrix, with each chain as a row. Jumps in eachchaini={1,...,N} are generated by adding a multiple of the

difference of the states of randomly chosen pairs of chains ofX to the current statexi :

zi= xi+(1d+ed)γ (δ,d ′)

(δ∑

j=1

xr1(j)−

δ∑n=1

xr2(n)

)+εd (3)

whereδ signifies the number of pairs used to generate theproposal,xr1(j) andxr2(n) are randomly selected without re-placement from the populationX−i

t (the population withoutxit ); r1(j),r2(n) ∈ {1,...,N} and r1(j) 6= r2(n). The val-

ues ofed and εd are drawn independently fromUd(−b,b)

andNd(0,b∗) with, typically,b=0.1 andb∗ small comparedto the width of the target distribution,b∗ = 10−12 say. Notall dimensions ofxi need to be updated in each step, so somedimensions ofzi may be reset to those ofxi . The value ofthe jump-size,γ , depends onδ andd ′, the number of dimen-sions updated jointly in the step. By comparison with RWM,a good choice forγ = 2.4/

√2δd ′ (Roberts and Rosenthal,

2001; Ter Braak, 2006). This choice is expected, for Gaus-sian and Student target distributions, to yield an acceptanceprobability of 0.44 ford’ = 1, 0.28 ford’ = 5 and 0.23 forlarged’. Every 5th generationγ = 1.0 to facilitate jumpingbetween disconnected posterior modes (Vrugt et al., 2008).

The difference vector in Eq. (3) contains the desired in-formation about the scale and orientation of the target dis-tribution, π(x). By accepting each jump with the Metropo-lis ratio min

{π(zi)/π(xi),1

}, a Markov chain is obtained,

the stationary or limiting distribution of which is the poste-rior distribution. The proof of this is given inTer Braak andVrugt (2008) andVrugt et al.(2008, 2009a,b). Because thejoint pdf of theN chains factorizes toπ(x1)× ...×π(xN ),the statesx1...xN of the individual chains are independentat any generation after DREAM has become independentof its initial value. After this burn-in period, the conver-gence of DREAM can thus be monitored with theR-statisticof Gelman and Rubin(1992). This convergence diagnos-tic compares the within and in-between variances of theN

different chains.Although much progress has been made in the past decade

towards the application of Bayesian methods to quantify andanalyze parameter, model structural, forcing, and calibrationdata uncertainty, virtually all this research involved continu-ous parameters that can take on any value within their priordefined ranges. Much less attention has been given hith-erto to posterior sampling problems involving discrete vari-ables. Such non-continuous parameter estimation problemsare not only of considerable theoretical interest, but also ofpractical significance. For instance, contributions are begin-ning to appear in the hydrologic literature that attempt todiscern optimal experimental design strategies that minimizecost, parameter and model predictive uncertainty, or combi-nations thereof. This involves selecting one or multiple dif-ferent measurement locations in time and/or space amongsta discrete set of possibilities. Various algorithms have beenproposed in the applied mathematics and computer science

Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011 www.hydrol-earth-syst-sci.net/15/3701/2011/

Page 3: DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

J. A. Vrugt and C. J. F. Ter Braak: DREAM(D)→ discrete MCMC simulation 3703

literature that solve problems of this kind, yet their main fo-cus is on finding the optimal solution, without recourse to es-timating the underlying posterior uncertainty. This is of par-ticular importance in experimental design problems, whereone cannot convincingly claim that one set of measurementsis significantly better than another plausible combination ofobservations.

In this paper we present a discrete implementation ofDREAM, that is especially designed to efficiently retrieve theposterior distribution of noncontinuous and combinatorialsearch and optimization problems. This new code, hereafterreferred to as DREAM(D) uses DREAM as its main build-ing block, and implements three novel proposal distributionsto explicitly recognize differences in topology between dis-crete and Euclidean search spaces. This sampling methodmaintains detailed balance and ergodicity, and provides ex-plicit information about the posterior uncertainty of the op-timal solution. Within the context of optimal experimentaldesign problems this would provide very valuable informa-tion about which measurements are absolutely required, andwhich data are of less importance. The width of the marginalposterior distribution essentially conveys this information.

The remainder of this paper is organized as follows. Sec-tion 2 presents a short introduction to discrete optimizationproblems, followed by a detailed description of DREAM(D)

in Sect. 3. Section 4 demonstrates the performance ofDREAM(D) using three different case studies involving aSudoku puzzle, water retention curve and rainfall-runoffmodel calibration problem. These results illustrate the abil-ity of DREAM(D) to solve search and optimization prob-lems involving discrete, combinatorial, and experimental de-sign problems. Finally, in Sect. 5 we summarize the theory,concepts and applications presented herein.

2 Nonlinear optimization involving discrete variables

Discrete optimization problems are abundant in many fieldsof study, and have also started to begin to appear in the hy-drologic literature (Furman et al., 2004; Harmancioglu et al.,2004; Perrin et al., 2008; Kleidorfer et al., 2008; Neuman etal., 2011). Figure 1 illustrates a simple discrete parameter es-timation problem involving a tile puzzle. Each tile containsa different letter of the alphabet. The goal is to list the lettersin the appropriate order. The solution to this problem is obvi-ous to a human, but not immediately clear to a computer. Asearch algorithm is therefore required to solve this problem.

If we assign numbers to each letter, A = 1, B = 2 and soforth, we could measure the distance from our initial guessto the actual solution, and iteratively refine this solution bysampling from a (discrete) proposal distribution. Many al-gorithms have been developed in the past decades to effi-ciently resolve problems of this kind. Yet, the main trustof these algorithms is on finding a single optimum solu-tion, without recourse to estimating the underlying posterior

uncertainty. For example, within the context of the travel-ing salesman problem many different routes will exist thatonly deviate marginally from the optimal solution. Bruteforce sampling of the search space can be used to assess allthese plausible routes, but this seems rather inefficient. In-stead, a more intelligent search procedure is warranted thatmore efficiently samples the space of possible solutions. Thegoal of this paper therefore is to introduce theory and con-cepts of DREAM(D), a MCMC simulation algorithm thatis especially designed to efficiently retrieve the posteriordistribution of discrete and combinatorial search problems.

Figure 1a illustrates the application of DREAM(D) to thetile puzzle considered herein. The left panel depicts the ini-tial guess, whereas the remaining three panels (Fig. 1b–d) il-lustrate how DREAM(D) translates this random starting pointinto the final position of the letters. The next sections de-scribe the theory and concepts of DREAM(D) and presentthree different case studies.

3 DREAM (D)⇒ DiffeRential Evolution AdaptiveMetropolis with Discrete Sampling

We now describe our new code, entitled DREAM(D), whichuses DREAM as main building block. The algorithmic pa-rameters (with defaults in parentheses) areN (d) the numberof chains,δ (1) the number of pairs used to generate the pro-posal,CR the crossover probability (see later),K (10) thethinning rate in storing samples andTmax (106) the maximumnumber of generations, typically so large that DREAM(D)

automatically stops after convergence has been achieved.Let X be aN ×d-matrix with rowsxi , i = 1,...,N , that

contain the current states of theN different Markov chainsand that are iteratively updated in turn in the algorithm. Theinitial X = [Xt ;t = 0] is obtained by drawingN samplesfrom pd(x), the prior distribution. Also, letS be an exter-nal archive that stores the elements ofX at regular intervals.The initial population[Xt ;t = 0] is translated into a sam-ple from the posterior target distribution using the followingpseudo code:

1. Sett←1REPEATK times (POPULATION EVOLUTION)

FORi←1,...,N DO (CHAIN EVOLUTION)

(a) Draw the number of dimensions to be updated,d ′, fromthe binomial distribution with totald and probabilityCR.If d ′=0, setd ′←1.

(b) Generate a new proposal,zi , for chaini,

zi= xi+

∥∥∥∥∥∥(1d+ed )γ (δ,d ′)

δ∑j=1

Xr1(j)−

δ∑n=1

Xr2(n)

+εd

∥∥∥∥∥∥d

(4)

where the function‖·‖d rounds each elementj =1,...,d

of the jump vector to the nearest integer. The varioussymbols have been defined after Eq. (3).

(c) Sample without replacementd − d ′ numbers from1,...,d. Denote each selected number generically byj

www.hydrol-earth-syst-sci.net/15/3701/2011/ Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011

Page 4: DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

3704 J. A. Vrugt and C. J. F. Ter Braak: DREAM(D)→ discrete MCMC simulation

E

A

C

FK

M

I

G

B

O

L

J

P

D

H N

J

O

M

NK

A

I

F

B

G

L

P

C

H

E D

J

O

K

PM

E

I

F

B

G

L

N

C

H

A D

(A) (B) (C) (D)

J

N

K

OM

F

I

E

B

G

L

P

C

H

A D

Fig. 1. A 4×4 square tile puzzle with 16 different letters. The goal is to list the letters in order of the alphabet. Each letter is assigned adifferent integer value, and the resulting inverse problem is solved using DREAM(D). The left panel(a) presents the initial state, whereasthe remaining three graphs(b–d) demonstrate how DREAM(D) iteratively solves this integer sampling problem.

and replace the corresponding proposal elementzij

byxij.

The proposalzi thus updatesd ′ elements ofxi only.

(d) Computep(zi) and accept the candidate points withMetropolis acceptance probability,α(xi ,zi),

α(zi ,xi)=1∧p(zi)

p(xi)(5)

(e) If accepted, move the chain to the candidate point,xi=

zi , otherwise remain at the old location,xi .

END FOR (CHAIN EVOLUTION)END REPEAT (POPULATION EVOLUTION)

2. AppendX to S.

3. Compute the Gelman-Rubin convergence diagnostic,Rj (Gel-man and Rubin, 1992) for each dimensionj = 1,...,d usingthe last 50 % of the samples ofS.

4. If Rj ≤ 1.2 for j = 1,...,d or t > Tmax, stop and go to step5, otherwise sett← t+K and go to POPULATION EVOLU-TION.

5. Summarize the posterior pdf usingS after discarding the ini-tial and burn-in samples.

In words, DREAM(D) runs multiple different chains inparallel, and generates jumps in each individual chain usinginformation from the location of the otherN−1 chains. Therounding operator,‖ · ‖d in Eq. (4) is used to enforce inte-ger values and accommodate discrete search problems. Tospeed up convergence to the target distribution, DREAM(D)

estimates a distribution ofCR values during burn-in that fa-vors large jumps over smaller ones in each of theN chains.Details can be found in previous studies (Vrugt et al., 2008,2009a,b, 2011b).

The transition kernel of DREAM(D) is especially designedfor integer variables. This appears to be a major limita-tion as many discrete search problems involve non-integervariables. For instance, consider the discrete variablex1 ∈

[0,0.2,0.4,...,5]. We cannot sample this parameter with theproposal distribution presented in Eq. (4). A simple lineartransformation of the search space,x1= 0.2j ;j ∈ {0,...,25}

suffices to facilitate inference with DREAM(D). If appro-priate, a different transformation can be used for each in-dividual parameter. An extreme case would simultaneouslyinvolve discrete and continuous variables. This would con-stitute some combination of jumps generated with DREAMand DREAM(D).

3.1 DREAM(D)⇒ detailed balance?

We are now left with a proof that DREAM(D) yields an in-variant distribution that is identical to the posterior target ofinterest. For this we need to demonstrate that the transi-tion kernel of DREAM maintains detailed balance, and thusresults in a reversible Markov chain. In other words, theforward p(xi

→ zi) and backwardp(zi→ xi) jump should

have equal probability at every single step in the chain. Thisis easy to proof for a standard RWM algorithm that uses afixed proposal distribution. Hence, the forward and back-ward jump will exhibit equal probability. Yet, the jump dis-tribution used in DREAM(D) continuously changes scale andorientation en route to the posterior target distribution. Thisadaptation significantly enhances search efficiency, but doesthis also ensure reversibility of the sampled Markov chains?

Previous manuscripts have provided formal proofs of con-vergence of DREAM (Vrugt et al., 2008), DREAM(ZS)

(Vrugt et al., 2011b), and MT-DREAM(ZS) (Laloy andVrugt, 2011) to the appropriate limiting distribution. Theseproofs appear rather simple, but might not be easy to under-stand for those that are not directly familiar with the underly-ing statistics and mathematics. We therefore resort to a sim-ple hypothetical example. Lets consider a two-dimensionaldiscrete sampling problem withN = 5 different chains, andthusγ = 2.4/

√2δd ′ = 2.4/

√2×1×2= 1.2. Without loss

of generality, lets further assume thatδ= 1, andej = εj = 0;j =1,...,2. We randomly sample each individual chain fromthe prior parameter distribution, and color code their ini-tial position in Fig. 2. We now use the transition kernel ofDREAM(ZS) to generate candidate points. Lets assume thatwe select the blue and green chain asr1 andr2 respectively.

Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011 www.hydrol-earth-syst-sci.net/15/3701/2011/

Page 5: DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

J. A. Vrugt and C. J. F. Ter Braak: DREAM(D)→ discrete MCMC simulation 3705

0 1 2 3 4 5

0

1

2

3

4

5

6

6 7

7

r1

r2

x2

x1

N = 5 different chains

Fig. 2. Illustration of detailed balance: two-dimensional parameterestimation problem withx1 ∈ [0,7] andx2 ∈ [0,7]. The color dotsdenote the starting points of theN = 5 different chains. The redsquare signifies the candidate point of the first chain and is createdby selecting the blue chain asr1 and green chain asr2. One steplater, after moving to the red square, the backward jump has equalprobability, as there is an equal chance of drawingr1 (green) andr2(blue) in reversed order.

The candidate point of the first (purple) chain thus becomes:

zi=[2 2]+

∥∥∥1.2([

4 4]−[5 1])∥∥∥= [2 2

]+

∥∥∥[−1.2 3.6]∥∥∥= [1 6

](6)

This proposal point is indicated in Fig. 2 with the red square.For the sake of this illustration, lets now assume that we ac-cept this candidate point, and transition the first chain to thisnew state.

We now need to demonstrate that the reverse jump hasequal probability. This backward jump can be obtained byselecting the blue and green chain in the opposite order fromthe forward jump:

zi=[1 6]+

∥∥∥1.2([

5 1]−[4 4])∥∥∥= [1 6

]+

∥∥∥[1.2−3.6]∥∥∥= [2 2

](7)

This point is exactly similar to the initial state of chain oneprior to the forward jump. Apparently, the rounding operatorused to sample integer values does not destroy detailed bal-ance. By samplingr1 andr2 from the remainingN−1 chainswith a uniform random number generator enforces symmetryof the proposal distribution. Indeed, the chance to selectr1as chain two, andr2 as chain four is equal to drawingr2=4,andr2=2 in the reverse order. This also holds whened , andεd are drawn from their respective symmetric probability dis-tributions, and whenδ > 1 andd > 2. We leave this up to thereader. This concludes our proof of detailed balance.

A final remark is appropriate. Theoretically, it is possiblethat at least one of thed arguments of the rounding func-tion, ‖·‖d has a fractional part of.5. In this case, conventiondictates to round down to the nearest integer. This directedrounding introduces a possible bias in jumping direction andthus strictly speaking violates detailed balance. The chancethat this happens in practice is virtually zero. If nothing else,the stochastic nature ofed ∼Ud(−b,b), andεd ∼Nd(0,b∗)

will eliminate this possibility. But, in the rare event that anyargument of‖ · ‖d has a factional part of.5 we implementstochastic rounding to the nearest integer. This guarantees re-versibility of the Markov chains generated with DREAM(D).

3.2 Combinatorial search problems: proposaldistribution using position swapping

The parallel direction update of Eq. (4) works well for arange of discrete problems but is not necessary optimalfor combinatorial problems in which the values of thefinal solution are known a-priori but not their order in theparameter vector. The topology of such search problemsdiffers substantially from the Euclidean search problemsconsidered hitherto. We therefore introduce an alternativejump distribution that creates candidate points by randomlyswapping two coordinates in each individual chain.

PROPOSAL DISTRIBUTION: RANDOM SWAPFORi←1,...,N DO (CHAIN EVOLUTION)

1. Setzi= xi .

2. Randomly select two numbersj andk without replacementfrom {1,...,d}.

3. Swap thej th andkth coordinates ofzi ; yij= xi

kandyi

k= xi

j.

4. Computeπ(zi) and accept the candidate points with Metropo-lis acceptance probability,α(xi ,zi).

5. If accepted, move the chain to the candidate point,xi= zi ,

otherwise remain at the old location,xi .

END FOR (CHAIN EVOLUTION)END RANDOM SWAP

In words, if the current state of theith chain is givenby xi

=[xi

1,...,xij ,...,x

ik,...,x

id

]then the candidate point

becomes zi=[xi

1,...,xik,...,x

ij ,...,x

id

]where j,k are

discrete uniformly sampled from{1,...,d} without re-placement. It is straightforward to see that this proposaldistribution satisfies detailed balance as the forward andbackward jump have equal probability.

A swap update is admirably suited to solve the tileproblem considered previously in Fig. 1. Yet, the coordinateswaps are fully random and do not exploit any informationabout the topology of the solution encapsulated in theposition of the otherN −1 chains. Essentially, each chainevolves independently to the posterior target distribution.This appears rather inefficient, particularly for complicatedsearch problems. We therefore introduce a second andalternative swap rule that takes explicit information from

www.hydrol-earth-syst-sci.net/15/3701/2011/ Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011

Page 6: DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

3706 J. A. Vrugt and C. J. F. Ter Braak: DREAM(D)→ discrete MCMC simulation

the dissimilarities in coordinates of theN chains running inparallel. We implement this idea as follows:

PROPOSAL DISTRIBUTION: DIRECTED SWAPDO (CHAIN EVOLUTION)

1. Randomly select two chains,Xr1 andXr2 without replacementfrom the populationX−r1,r2

t .

2. Find those coordinates ofxr1 andxr2 that are dissimilar andstore these locations inζ .

3. Randomly permuteζ ; setj = ζ1 andk= ζ2.

4. Copyzr1 = xr1 andzr2 = xr2.

5. Swap the elements ofzr1; zr1j= x

r1k

andzr1k= x

r1j

.

6. Swap the elements ofzr2; zr2j= x

r2k

andzr2k= x

r2j

.

7. Computeπ(zr1) andπ(zr2) .

8. Accept both candidate points with probability,p(zr1,zr2)

equal to the product of their respective Metropolis ratios,p(zr1,zr2)=α(xr1,zr1)α(xr2,zr2).

9. If accepted, move both chains to their respective candidatepoints,xr1 = zr1 andxr2 = zr2; otherwise remain at their oldlocation,xr1 andxr2.

END FOR (CHAIN EVOLUTION)END DIRECTED SWAP

This proposal distribution swaps locationj and k withintwo different chains,r1,r2 ∈ {1,...,N} from the differencesin their coordinates. This update needs to be carefullyimplemented so that all different chains are updated onlyonce. This requires that the number of chains,N > 1 andhence is easiest to implement ifN is an even number. Thisapproach considerably improves sampling efficiency, as willbe illustrated later with a Sudoku puzzle.

The swap move is fully Markovian, that is, it uses onlyinformation from the current time for proposal generation,and retains detailed balanced with respect toπ(·) becausethe reverse move is equally probable. If the swap is not fea-sible (less than two dissimilar coordinates), the current chainis simply sampled again. This is necessary to avoid com-plications with unequal probabilities of move types (Deni-son et al., 2002); the same trick is applied in reversible jumpMCMC (Green, 1995). The restriction of the update to thedissimilar coordinates does not destroy detailed balance inany way; it just selects a subspace to sample on the basis ofthe current state. Coordinate swapping is especially power-ful for combinatorial problems. Based on some preliminarystudies, we use a 90/10 % mix of directed and random swaps,respectively.

4 Case studies

We now present three different case studies with increasingcomplexity. The first study consists of a typical integer esti-mation problem, and involves a Sudoku puzzle. This puzzle

has become quite popular in the past 10 yr, and many news-papers, journals, and magazines around the world publishSudokus for entertainment. This synthetic study illustratesthe ability of DREAM(D) to help solve a relatively diffi-cult and high-dimensional combinatorial optimization prob-lem. The second study revisits a classical site characteriza-tion problem in vadose zone hydrology and involves infer-ence of the hydraulic properties of unsaturated soils fromlaboratory or in situ measured water retention data. Thisexample demonstrates how DREAM(D) can be utilized tosolve experimental design problems and guide measurementcollection. The third and final study resolves discrete pa-rameters in a parsimonious lumped watershed model usingobserved daily discharge data from the Guadalupe River inTexas. The posterior distribution derived with DREAM(D) iscompared against its counterpart derived with DREAM as-suming a continuous formulation of the model calibrationproblem.

4.1 The daily Sudoku

The first case study considers a Sudoku puzzle, a popular andwidely practised integer estimation problem. We use this ex-ample to illustrate the ability of DREAM(D) to successfullyfind the optimum of a discrete search problem. The objectiveis to fill a 9×9 grid with numbers so that each column, eachrow, and each of the nine 3×3 sub-grids that make up the to-tal square contains the values of 1 to 9. The same integer mayonly appear once in each column and row of the 9×9 play-ing board including in any of the nine 3×3 subregions. Eachpuzzle starts with a partially completed grid, and typicallyhas a single (unique) solution. The puzzle was popularizedin 1986 by the Japanese puzzle company Nikoli, under thename Sudoku, meaning single number (Hayes, 2006). Nowa-days, Sudoku puzzles are very popular, and widely practisedby many millions of people throughout the world.

We consider the Sudoku puzzle in Fig. 3, taken fromWikipedia (http://en.wikipedia.org/wiki/Sudoku). The initialgrid is depicted at the left-hand side, whereas the final solu-tion is presented at the right-hand side. In this particular puz-zle, the solution at 29 different cells is known, leaving us withd = 81−29= 52 parameter values to be estimated. Theseparameters can take on values between 1 to 9. Figure 4 il-lustrates the sampled values of DREAM(D) at various stagesduring the search. A total ofN = 20 different chains wereused to search the parameter space, and the initial sample wascreated using random permutation of the available numbers.This ensures that each value of 1 to 9 appears only nine timesin the grid. The log-likelihood function measures the sum ofthe horizontal, vertical and subgrid constraint violation.

The first grid, on the left-hand side of Fig. 4 depicts thestarting solution of one of the different chains. This ini-tial state is randomly drawn from the prior distribution, andappears rather poor. For example, the subregion at the bot-tom right hand corner contains three numbers eight, a severe

Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011 www.hydrol-earth-syst-sci.net/15/3701/2011/

Page 7: DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

J. A. Vrugt and C. J. F. Ter Braak: DREAM(D)→ discrete MCMC simulation 3707

(A) (B)

From: Wikipedia

Fig. 3. The daily sudoku:(A) initial solution, and(B) final solution. The black numbers were given, whereas the solution numbers aremarked in red.

(A) (B)

(D)(C)

5 36

9 8

71 9 5

6316

36

28

847

64 1 9

8 975

2 8

124

1956753

7

6

2

4 32

95

7

1

74

7 574

21

935 3

2 4

6

91

81 4

3

2

25948

886

3

5 36

9 8

71 9 5

6316

36

28

847

64 1 9

8 975

2 8

427

1916235

6

3

2

4 87

95

4

1

91

7 585

34

921 2

3 7

6

45

51 7

8

7

22948

436

3

5 36

9 8

71 9 5

6316

36

28

847

64 1 9

8 975

2 8

247

1916235

6

3

8

4 27

95

4

1

93

1 575

84

923 2

3 7

6

52

21 4

8

7

75948

436

1

5 36

9 8

71 9 5

6316

36

28

847

64 1 9

8 975

2 8

427

1956231

6

3

8

4 27

95

1

4

96

1 575

84

923 2

3 7

6

54

41 2

8

7

72958

436

1

Fig. 4. Sudoku puzzle: evolution of the DREAM(D) sampled parameter space to the posterior distribution:(A) typical starting solution,(B) intermediate solution,(C) nearly final solution, and(D) final solution.

violation of the constraints. Figure 4b displays the state ofthe same chain after about 25 000 function evaluations. Thesolution has improved considerably, but still contains sev-eral noticeable deviations from the actual solution. The thirdpanel in Fig. 4c, derived after about 50 000 Sudoku evalu-ations is a further refinement of the solution presented inFig. 4b. A few important swaps have been introduced thatfurther reduce the constraint violation. Finally, after about100 000 samples, Fig. 4d illustrates that the puzzle has beensuccessfully solved.

The results of this study inspire confidence in the ability ofDREAM(D) to successfully locate the optimum solution of adiscrete search problem. This constitutes a necessary bench-mark to inspire confidence in the search capabilities and con-vergence properties of the algorithm. A branch and boundoptimization approach is about 2 orders of magnitude moreefficient than DREAM(D) in solving the Sudoku puzzle, butthis method violates detailed balance, and hence cannot beused to derive the posterior target distribution. This is themain purpose of DREAM(D) and this ability will be high-lighted in the remaining two case studies of this paper.

www.hydrol-earth-syst-sci.net/15/3701/2011/ Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011

Page 8: DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

3708 J. A. Vrugt and C. J. F. Ter Braak: DREAM(D)→ discrete MCMC simulation

If our sole interest is in finding the optimum solution ofa discrete search problem, then significant efficiency gainscan be achieved with DREAM(D) by relaxing the assump-tion of detailed balance of the sampled Markov chains. Forinstance, if we modify the directed swap in DREAM(D) sothat each candidate point is accepted/rejected based on itsown Metropolis ratio independently of the associated moveof the corresponding chain, then far fewer function evalua-tions would be needed to solve the Sudoku puzzle. The con-sequence of such non-Markovian adaptation, however is thatthe algorithm no longer adequately samples the underlyingtarget distribution.

4.2 Optimal experimental design for soil hydrauliccharacterization

The second case study considers a common problem in va-dose zone hydrology and involves characterization of the hy-draulic properties of variably saturated soils. This serves todemonstrate the ability of DREAM(D) to guide experimentaldata collection and help determine which soil water retentionmeasurements to collect. We assume the capillary pressure-saturation relationship of (van Genuchten, 1980):

θ(h)= θr+(θs−θr)[1+(αh)n

]−m (8)

whereθ (cm3cm−3) is the volumetric water content,h (cm)denotes the soil water pressure head,θs (θr ) (cm3cm−3) sig-nifies the saturated (residual) water content,α (cm−1) andn

(-) are coefficients that determine the shape of the water re-tention function, andm= 1−1/n. A synthetic data set ofwater retention observations was created for a sandy soil byevaluating Eq. (8) for a given set of pressure head values.The soil water pressure head was discretized intoM = 501equidistant points betweenh=0 (saturation) andh=−1000(cm), and the correspondingθ(h) curve is plotted in Fig. 5with a solid black line. Then, the soil water pressure headobservations are corrupted with a normally distributed er-ror, h←N(h,2) to represent the combined effect of mea-surement error and soil inhomogeneity. The final data set isdepicted with the circles in Fig. 5.

From this data set ofM = 501 water retention measure-ments we now use DREAM(D) to find those four(h,θ(h))

observations that are most informative and best constrain thehydraulic properties,x= [θs,θr ,α,n] of the sandy soil underconsideration. For each selected combination of four differ-ent (h,θ(h)) measurements in DREAM(D) we estimate thecorresponding values ofx by nonlinear minimization usingthe Nelder-Mead Simplex algorithm. The hydraulic param-eters obtained this way are then used to predict the waterretention curve for the entire range of pressure head valuesconsidered in Fig. 5. A standard squared-deviation likelihoodfunction is subsequently used to calculateπ(x) in step (d) ofthe pseudo code of DREAM(D) and summarize the informa-tion content for each selected set of four(h,θ(h)) observa-tions. We use the standard settings of DREAM(D) and run

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350

0.5

1

1.5

2

2.5

3

3.5

Volumetric water content, [cm3 cm-3 ]

Soil w

ater p

ressur

e head

, log(-

h) [cm

]

θ

Fig. 5. Synthetic soil water retention curve of the sandy soil, andthe effect of data error.

N = 10 different Markov chains in parallel to search the dis-crete measurement space. About 10 000 function evaluationswere required to reach convergence to a limiting distribution.

Figure 6 presents the results of our analysis and plotshistograms of the DREAM(D) derived posterior measure-ment samples after sorting each combination in increasingorder. The marginal posterior distributions appear clusteredaround distinctly different soil water pressure head values.This includes soil water pressure head measurements around(Fig. 6a)h= 0, (Fig. 6b)h=−40, (Fig. 6c)h=−220, and(Fig. 6d)h=−1000 cm respectively. Indeed, the most infor-mative (h,θ(h)) observations are found close to saturation,around the air-entry value of the sandy soil, at the inflec-tion point of the water retention function, and in the verydry moisture range. The location of these most informa-tive θ(h) measurements matches very well with our previ-ous findings (Vrugt et al., 2002) and are in agreement withthe dynamic behavior of the marginal sensitivity coefficients,∂θ/∂θs , ∂θ/∂α, ∂θ/∂n, and∂θ/∂θr derived from the par-tial derivatives of Eq. (8). A similar pattern is observed forloamy and clayey soils (not shown herein), but the second,third and last observation depicted in Fig. 4b–d move to-wards lower soil water pressure head values, consistent withthe increased water holding capacity of these more fine tex-tured soils. An accurate soil hydraulic characterization thusrequires water retention measurements that span the entirerange of soil moisture values.

It is interesting to observe the differences in the posteriorspread of the four different histograms. Whereas the firsttwo pressure head observations in Fig. 4b–d are very welldefined with relatively little dispersion, the other two mea-surements are considerably less well defined symptomatic

Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011 www.hydrol-earth-syst-sci.net/15/3701/2011/

Page 9: DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

J. A. Vrugt and C. J. F. Ter Braak: DREAM(D)→ discrete MCMC simulation 3709

0 200 4000

0.1

0.2

0.3

0.4(A)

(h) observation

Mar

gina

l pos

terio

r den

sity

0 200 4000

0.1

0.2

0.3

0.4(B)

0 200 4000

0.1

0.2

0.3

0.4(C)

0 200 4000

0.1

0.2

0.3

0.4(D)

θ

Fig. 6. Histograms of the posterior samples generated with DREAM(D). The most informative soil water pressure head measurements aresomewhat clustered at different moisture values that range from saturation (left plot) to the dry end (right plot). Thex-axis indexes theobservations[1,...,501].

for a redundancy in information content. This is an in-teresting finding, and explains why the parametersn andθr are often found to be highly correlated in the fitting ofwater retention curves. The posterior spread derived withDREAM(D) is hence a useful byproduct to analyze measure-ment uniqueness, and determine the information content ofexperimental data.

4.3 Watershed model calibration using discreteparameter estimation

The third and final case study involves flood forecasting, andconsists of the calibration of a mildly complex lumped water-shed model using historical data from the Guadalupe River atSpring Branch, Texas. This is the driest of the 12 MOPEXbasins described in the study ofDuan et al.(2006). Themodel structure and hydrologic process representations arefound in Schoups and Vrugt(2010). The model transformsrainfall into runoff at the watershed outlet using explicit pro-cess descriptions of interception, throughfall, evaporation,runoff generation, percolation, and surface and subsurfacerouting. Table 1 summarizes the seven different model pa-rameters, and their prior uncertainty ranges. Each parame-ter is discretized equidistantly in 250 intervals with respec-tive step size listed in the last column at the right hand side.This gridding is necessary to create a non-continuous, dis-crete, parameter estimation problem. Unlike the previouscase study in which integer values are sampled only, this par-ticular study (mostly) involves non-integer values.

Daily discharge, mean areal precipitation, and mean arealpotential evapotranspiration were derived fromDuan et al.(2006) and used for model calibration. Details about thebasin, experimental data, and likelihood function can befound there, and will not be discussed herein. The samemodel and data was used in a previous study (Schoups andVrugt, 2010), and used to introduce a generalized likelihoodfunction for heteroscedastic, non-Gaussian, and correlated(streamflow) prediction errors.

Figure 7 presents histograms of the marginal distribu-tion of a few selected hydrologic model parameters usingfive years of observed daily discharge data. The top panelpresents the results of DREAM(D), whereas the bottom panelpresents the results for a continuous parameter space. Thesehistograms were derived by separately running DREAM forthe same data set and model. Notice the close agreementbetween the histograms derived with both MCMC methods.This is a testament to the ability of DREAM(D) to success-fully solve discrete posterior parameter estimation problems.The influence of gridding is hardly noticeable, but becomesapparent if we use at least 25 bins to represent the marginaldensity (not shown herein).

To better illustrate the effect of discretization, please con-sider Fig. 8 that presents two-dimensional scatter plots of theDREAM(D) derived posterior samples for a few selected pa-rameter pairs. The bottom panel shows similar plots but thenassuming continuity of the parameter space. The effect ofgridding is immediately apparent. Whereas the original bi-variate scatter plots sample the parameter space in a (blood-stain) spatter pattern, two-dimensional plots of the posteriorsamples derived with DREAM(D) exhibit an obvious gridpattern with horizontally and vertically aligned points. Theposterior samples take on discrete values with a distance be-tween subsequent points that is similar to the intervals listedin Table 1. Despite this difference in sampling pattern, theshape of the DREAM and DREAM(D) derived bivariate scat-ter plots are very similar, commensurate with the covariancestructure of the posterior distribution. The results presentedin Fig. 7 inspire confidence in the ability of DREAM(D) tosolve noncontinuous posterior sampling problems.

The excellent correspondence of the posterior parameterdistributions derived with DREAM and DREAM(D) resultsin marginal differences of the resulting streamflow predic-tions. We therefore do not show any times series plotsof model predictions, and corresponding 95 % uncertaintyranges. Such plots can be found inSchoups and Vrugt(2010)

www.hydrol-earth-syst-sci.net/15/3701/2011/ Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011

Page 10: DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

3710 J. A. Vrugt and C. J. F. Ter Braak: DREAM(D)→ discrete MCMC simulation

Table 1. Prior Uncertainty Ranges of Hydrologic and Error Model Parameters.

Parameter Symbol Minimum Maximum Units Step size

Maximum interception Imax 0 10 mm 0.02Soil water storage capacity Smax 10 1000 mm 1.98Maximum percolation rate Qmax 0 100 mm d−1 0.20Evaporation parameter αE 0 100 – 0.20Runoff parameter αF −10 10 – 0.04Time constant, fast reservoir KF 0 10 days 0.02Time constant, slow reservoir KS 0 150 days 0.30

Heteroscedasticity intercept σ0 0 1 mm d−1 0.002Heteroscedasticity slope σ1 0 1 – 0.002Autocorrelation coefficient φ1 0 1 – 0.002Kurtosis parameter β −1 1 – 0.004

0 2 4 6

0.1

0.2

0.3

Post

erio

r den

sity

(A)

40 60 80

0.1

0.2

0.3(B)

0.84 0.86 0.88

0.1

0.2

0.3(C)

DREA

M(D

)

0 2 4 6

0.1

0.2

0.3

Post

erio

r den

sity

Imax

(D)

40 60 80

0.1

0.2

0.3

KS

(E)

0.84 0.86 0.88

0.1

0.2

0.3

1

(F)

DREA

M

φ

Fig. 7. Histogram of the DREAM(D) derived marginal posterior distributions of the(A) Imax, (B) KS , and(C) φ1 rainfall – runoff and errormodel parameters (in red). For convenience, only a few parameters are plotted. To benchmark the results of DREAM(D) the bottom panelillustrates the results for DREAM (in blue), with the common assumption of a continuous parameter space.

and details can be found in that publication. It is interest-ing to observe that the maximum log-likelihood value of 543found with DREAM(D) is somewhat larger than its coun-terpart estimated with DREAM (540). This difference wasconsistently observed for multiple different trials with bothMCMC algorithms.

To provide better insights into the efficiency ofDREAM(D), Fig. 9a plots the evolution of theR-statisticof Gelman and Rubin(1992) for each of the different pa-rameters. To benchmark these results, the bottom panel(Fig. 9b) illustrates the convergence results of DREAM as-suming a continuous parameter space. The presented traceplots represent an average of 25 different MCMC trials. The

Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011 www.hydrol-earth-syst-sci.net/15/3701/2011/

Page 11: DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

J. A. Vrugt and C. J. F. Ter Braak: DREAM(D)→ discrete MCMC simulation 3711

Fig. 8. Two-dimensional scatter plots of the DREAM(D) derived posterior samples (top panel: red), and corresponding bivariate samplesestimated with DREAM (bottom panel: blue). Only a few parameter pairs are shown as this is sufficient to illustrate the similarity of thesampled distributions.

1

2

3

4

5

R-st

atis

tic

(A) DREAM (D)

10,000 20,000 30,000 40,000 50,0001

2

3

4

5

R-st

atis

tic

Number of MCMC function evaluations

(B) DREAM

Fig. 9. Simulated traces of theR-statistic of Gelman and Rubin (Gelman and Rubin, 1992) using the(A) DREAM(D) (top panel)(discretesampling), and(B) DREAM (bottom panel)(continuous sampling) MCMC algorithms. Each parameter is coded with a different color. Asimilar convergence speed is observed for both algorithms.

www.hydrol-earth-syst-sci.net/15/3701/2011/ Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011

Page 12: DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

3712 J. A. Vrugt and C. J. F. Ter Braak: DREAM(D)→ discrete MCMC simulation

convergence speed for both algorithms is strikingly similar.Both MCMC methods require about 30 000 model evalua-tions to converge to a limiting distribution. Although grid-ding significantly reduces the size of the feasible parameterspace, an approximately similar number of function evalua-tions remains necessary to explore the posterior target dis-tribution. Our experience with models involving a muchhigher parameter dimensionality demonstrate considerableenhancements in efficiency (sometimes dramatically) whensampling in the discretized rather than continuous space.Discretization might therefore provide a practical solution tospeed up search efficiency for insensitive parameters. Furtherresearch on this topic is warranted.

5 Conclusions

In the past decade much progress has been made in the devel-opment of sampling algorithms for statistical inference of theposterior parameter distribution. The typical assumption inthis work is that the parameters are continuous and can takeon any value within their upper and lower bounds. Unfor-tunately, such algorithms typically do not work for discreteparameter estimation problems. Such problems are abundantin many fields of study, and therefore of considerable the-oretical and practical interest. Examples include selectingamong different measurement locations in the design of op-timal experimental strategies, finding the best members of anensemble of predictors, and more generally discrete modelcalibration problems. Here, we have introduced a discreteMCMC simulation algorithm that is especially designed tosolve non-continuous and combinatorial posterior samplingproblems. This method, entitled DREAM(D) uses DREAMas its main building block, yet uses a modified proposal dis-tribution to facilitate solve discrete sampling problems. TheDREAM(D) algorithm maintains detailed balance and ergod-icity, and receives good performance across a range of prob-lems involving a Sudoku puzzle, soil water retention functionand discrete rainfall - runoff model calibration problem.

The theory developed herein is easily implementable inDREAM(ZS) (Vrugt et al., 2011b) and MT-DREAM(ZS)

(Laloy and Vrugt, 2011), which provides a venue to fur-ther increase the efficiency of MCMC simulation. TheDREAM(D) code is written in MATLAB and is availableupon request ([email protected]).

Acknowledgements.We gratefully acknowledge the helpful com-ments of Christian Robert, Bettina Schaefli and one anonymousreviewer. Computer support, provided by the SARA center for par-allel computing at the University of Amsterdam, The Netherlands,is highly appreciated.

Edited by: E. Morin

References

Box, G. E. P. and Tiao, G. C.: Bayesian Inference in StatisticalAnalyses Addison-Wesley-Longman, Reading, Massachusetts,1973.

Denison, D. G. T., Holmes, C. C., Mallick, B. K., and Smith,A. F. M.: Bayesian methods for nonlinear classification andregression, John Wiley & Sons, Chicester, 2002.

Duan, Q., Gupta, V. K., and Sorooshian, S.: Effective and efficientglobal optimization for conceptual rainfall-runoff models, WaterResour. Res., 28, 1015–1031, 1992.

Duan, Q., Schaake, J., Andreassian, V., Franks, S., Goteti, G.,Gupta, H. V., Gusev, Y. M., Habets, F., Hall, A., Hay, L., Hogue,T., Huang, M., Leavesley, G., Liang, X., Nasonova, O. N., Noil-han, J., Oudin, L., Sorooshian, S., Wagener, T., and Wood, E.F.: Model Parameter Estimation Experiment (MOPEX): Anoverview of science strategy and major results from the secondand third workshops, J. Hydrol., 320, 3–17, 2006.

Furman, A., Ferre, T. P. A., and Warrick, A. W.: Optimization ofERT surveys for monitoring transient hydrological events usingperturbation sensitivity and genetic algorithms, Vadose Zone J.,3, 1230–1239, 2004.

Gelfand, A. E. and Smith, A. F.: Sampling based approaches tocalculating marginal densities, J. Am. Stat. Assoc., 85, 398–409,1990.

Gelman, A. G. and Rubin, D. B.: Inference from iterative simula-tion using multiple sequences, Stat. Sci., 7, 457–472, 1992.

Gilks, W. R. and Roberts, G. O.: Strategies for improving MCMC,in: Markov Chain Monte Carlo in Practice, edited by: Gilks,W. R., Richardson, S., and Spiegelhalter, D. J., Chapman & Hall,London, 89–114, 1996.

Gilks, W. R., Roberts, G. O., and George, E. I.: Adaptive directionsampling, Statistician, 43, 179–189, 1994.

Green, P. J.: Reversible Jump Markov Chain Monte Carlo Compu-tation and Bayesian Model Determination, Biometrika, 82, 711–732, 1995.

Hastings, W. K.: Monte Carlo sampling methods using Markovchains and their applications, Biometrika, 57, 97–109, 1970.

Haario, H., Saksman, E., and Tamminen, J.: Adaptive proposal dis-tribution for random walk Metropolis algorithm, Computation.Stat., 14, 375–395, 1999.

Haario, H., Saksman, E., and Tamminen, J.: An adaptive Metropo-lis algorithm, Bernoulli, 7, 223–242, 2001.

Haario, H., Saksman, E., and Tamminen, J.: Componentwiseadaptation for high dimensional MCMC, Computation. Stat., 20,265–274, 2005.

Haario, H., Laine, M., Mira, A., and Saksman, E.: DRAM: Effi-cient adaptive MCMC, Stat. Comput., 16, 339–354, 2006.

Harmancioglu, N. B., Icaga, Y., and Gul, A.: The use of an opti-mization method in assessment of water quality sampling sites,European Water, 5/6, 25–35, 2004.

Hayes, B.: Unwed numbers, American Scientist, 94, 12–15,doi:10.1511/2006.1.12, 2006.

Kavetski, D., Kuczera, G., and Franks, S. W.: Bayesian analysis ofinput uncertainty in hydrologic modeling: 2. Application, WaterResour. Res., 42, W03408,doi:10.1029/2005WR004376, 2006.

Kleidorfer, M., Moderl, M., Fach, S., and Rauch, W.: Optimiza-tion of measurement campaigns for calibration of a hydrologicalmodel, Proceedings of the 11th International Conference on Ur-ban Drainage, Edinburgh, Scotland, UK, 2008.

Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011 www.hydrol-earth-syst-sci.net/15/3701/2011/

Page 13: DREAM D : an adaptive Markov Chain Monte Carlo …(D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter

J. A. Vrugt and C. J. F. Ter Braak: DREAM(D)→ discrete MCMC simulation 3713

Krink, T., Mittnik, S., and Paterlini, S.: Differential Evolution andCombinatorial Search for Constrained Index Tracking, Center forStudies in Banking and Finance, 09032, Universita di Modena eReggio Emilia, Facolt`di Economia “Marco Biagi”, 2009.

Kuczera, G., Kavetski, D., Franks, F. W., and Thyer, M. T.: To-wards a Bayesian total error analysis of conceptual rainfall-runoff models: Characterising model error using storm-dependent parameters, J. Hydrol., 331, 161–177, 2006.

Laloy, E. and Vrugt, J. A.: High-dimensional posterior explo-ration of hydrologic models using multi-try DREAM(ZS) andhigh performance computing, Water Resour. Res., in review,doi:2011WR010608RR, 2011.

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H.,and Teller, E.: Equation of state calculations by fast computingmachines, J. Chem. Phys., 21, 1087–1092, 1953.

Neuman, S. P., Xue, L., Ye, M., and Lu, D.: Bayesian analy-sis of data-worth considering model and parameter uncertain-ties, Adv. Water Resour.,doi:10.1016/j.advwatres.2011.02.007,in press, 2011.

Perrin, C., Andreassian, V., Rojas Serna, C., Mathevet, T.,and Le Moine, N.: Discrete parameterization of hydro-logical models: Evaluating the use of parameter sets li-braries over 900 catchments, Water Resour. Res., 44, W08447,doi:10.1029/2007WR006579, 2008.

Price, K. V., Storn, R. M., and Lampinen, J. A.: Differential Evo-lution, A Practical Approach to Global Optimization, Springer,Berlin, 2005.

Roberts, G. O. and Gilks, W. R.: Convergence of adaptive directionsampling, J. Multivariate Anal., 49, 287–298, 1994.

Roberts, G. O. and Rosenthal, J. S.: Coupling and ergodicity ofadaptive Markov chain Monte Carlo algorithms, J. Appl. Probab.,44, 458–475, 2007.

Roberts, G. O. and Rosenthal, J. S.: Examplesof adaptive MCMC, online manucript, availableat: http://probability.ca/jeff/ftpdir/adaptex.pdf, 2008.

Roberts, G. O. and Rosenthal, J. S.: Optimal scaling for variousMetropolis-Hastings algorithms, Statist. Sci., 16, 351–367, 2001.

Schaefli, B., Talamba, D. B., and Musy, A.: Quantifying hydrolog-ical modeling errors through a mixture of normal distributions, J.Hydrol., 332, 303–315, 2007.

Schoups, G. and Vrugt, J. A.: A formal likelihood function for pa-rameter and predictive inference of hydrologic models with cor-related, heteroscedastic and non-gaussian errors, Water Resour.Res., 46, W10531,doi:10.1029/2009WR008933, 2010.

Smith, T. J. and Marshall, L. A.: Bayesian methods in hydro-logic modeling: A study of recent advancements in Markovchain Monte Carlo techniques, Water Resour. Res., 44, W00B05,doi:10.1029/2007WR006705, 2008.

Sorooshian, S. and Dracup, J. A.: Stochastic parameter estima-tion procedures for hydrologic rainfall-runoff models: correlatedand heteroscedastic error cases, Water Resour. Res., 16, 430–442,doi:10.1029/WR016i002p00430, 1980.

Storn, R. and Price, K.: Differential evolution a simple and efficientheuristic for global optimization over continuous spaces, GlobalOptimization, 11, pp. 341–359, 1997.

Ter Braak, C. J. F.: A Markov chain Monte Carlo version of the ge-netic algorithm differential evolution: easy Bayesian computingfor real parameter spaces, Stat. Comput., 16, 239–249, 2006.

Ter Braak, C. J. F. and Vrugt, J. A.: Differential evolution Markovchain with snooker updater and fewer chains, Stat. Comput., 16,239–249,doi:10.1007/s11222-008-9104-9, 2008.

van Genuchten, M. Th.: A closed-form equation for predicting thehydraulic conductivity of unsaturated soils, Soil Sci. Soc. Am. J.,44, 892–898, 1980.

Vrugt, J. A., Bouten, W., Gupta, H. V., and Sorooshian, S.: Towardimproved identifiability of hydrologic model parameters: the in-formation content of experimental data, Water Resour. Res., 38,1312,doi:10.1029/2001WR001118, 2002.

Vrugt, J. A., Gupta, H. V., Bouten, W., and Sorooshian, S.: Ashuffled complex evolution metropolis algorithm for optimiza-tion and uncertainty assessment of hydrologic model parame-ters, Water Resour. Res., 39, 1–14,doi:10.1029/2002WR001642,2003.

Vrugt, J. A., ter Braak, C. J. F., Clark, M. P., Hyman, J. M.,and Robinson, B. A.: Treatment of input uncertainty in hy-drologic modeling: Doing hydrology backward with Markovchain Monte Carlo simulation, Water Resour. Res., 44, W00B09,doi:10.1029/2007WR006720, 2008.

Vrugt, J. A., ter Braak, C. J. F., Diks, C. G. H., Higdon, D.,Robinson, B. A., and Hyman, J. M.: Accelerating Markovchain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling, I. J. Nonlin. Sci. Num.,10, 271–288, 2009a.

Vrugt, J. A., ter Braak, C. J. F., Gupta, H. V., and Robinson, B. A.:Equifinality of formal (DREAM) and informal (GLUE) Bayesianapproaches in hydrologic modeling?, Stoch. Env. Res. Risk A.,23, 1011–1026,doi:10.1007/s00477-008-0274-y, 2009b.

Vrugt, J. A., Laloy, E., and ter Braak, C. J. F.: Differential evolu-tion adaptive Metropolis with sampling from past states, SIAMJ. Optimiz., in preparation, 2011.

www.hydrol-earth-syst-sci.net/15/3701/2011/ Hydrol. Earth Syst. Sci., 15, 3701–3713, 2011