Price Probabilities: A class of Bayesian and non-Bayesian prediction rules · Price Probabilities: A class of Bayesian and non-Bayesian prediction rules Filippo Massari School of

Price Probabilities:

A class of Bayesian and non-Bayesian

prediction rules

Filippo Massari

School of Banking and Finance, Australian School of Business

February 19, 2016

Abstract

We use the standard machinery of dynamic general equilibrium models to generate

a rich class of probabilities and to discuss their properties. This class includes proba-

bilities consistent with Bayes’ rule and known non-Bayesian rules. If the prior support

is correctly specified, we prove that all members of this class are as good as Bayes’

rule in terms of likelihood. If it is misspecified, we demonstrate that those rules that

underreact to new information can significantly outperform Bayes’. Because under-

reaction is never worse and sometimes better than Bayes’, we question the common

opinion that Bayes’ rule is the only rational way to learn and propose a valid alternative.

Keywords: Prediction markets, Non-Bayesian Learning, Market Selection, MDL

1 Introduction

It has long been argued that financial markets aggregate the different opinions of their par-

ticipants efficiently. According to the market selection hypothesis, prices becomes accurate

1

because traders with incorrect beliefs eventually lose all their wealth to accurate traders

(Friedman (1953)). In models in which the selection hypothesis holds, standard economic

arguments implies that the risk neutral probabilities converge to the most accurate trader’s

beliefs (Sandroni (2000), Blume and Easley (2006)). In this paper, we characterize this

dynamic.

Given traders’ beliefs and time zero consumption-shares distribution, different preferences

and normalization horizons determine different risk neutral probabilities. For example, if all

traders have log utility, risk neutral probabilities’ coincides with the probabilities obtained

via Bayes’ rule from a consumption-weighted prior on the set of traders beliefs (Rubinstein

(1974), Blume and Easley (1993), Blume and Easley (2009), Kets et al. (2014)).1

We define the Price Probability class to be the class of all probabilities that can be

represented as risk neutral probabilities of an economy with complete markets, no aggregate

risk and in which the market selection hypothesis holds. This class is rich; it includes

probabilities consistent with Bayes’ rule (BPD2) as well as other non-Bayesian rules which

were originally derived independently and with different objectives in mind such as NML3

and SNML4. Our unifying framework provides a common ground to compare these rules in

a novel and transparent way.

We compare price probabilities with Bayes’ rule on the ground of two of its characterizing

properties: exchangeability and prequentiality. Exchangeability is a fundamental property of

the Bayesian approach - De-Finetti’s theorem implies that a measure on infinite sequences

is exchangeable if and only if it is a Bayesian mixture of iid probabilities -. It requires the

probability of every partial history to depend exclusively on the frequency of the outcomes,

not on their order. Prequentiality is a form of intertemporal consistency which is necessary

for Bayes’ rule and to avoid arbitrages (Grunwald (2007), Lehrer and Teper (2016)). It

1More precisely, the risk neutral probabilities of the Arrows securities in a dynamic general equilib-rium model with no aggregate risk in which all traders have log utility, identical discount factor, time zeroconsumption-shares distribution C0 and set of traders beliefs P coincides with the probabilities obtained viaBayes’ rule from the prior C0 on P (Corollary 1).

2Bayesian Predictive Distribution, aka Bayesian Model Average or Bayesian Mixture.3Normalized Maximum Likelihood: Rissanen (1986), Shtar’kov (1987),Grunwald (2007).4Sequential Normalized Maximum Likelihood: Roos and Rissanen (2008).

2

requires the tree of unconditional probabilities to satisfy Kolmogorov’s third axiom.

Given the overwhelming experimental evidence showing that most agents are not Bayesian

(Rabin et al. (2000), Kahneman (2011)), it is natural to ask if these properties are funda-

mental for rational predictions. To answer this question, we take a pragmatic approach: we

fix an objective criterion of accuracy (based on asymptotic likelihood) and use it to evaluate

members of price probabilities against the Bayesian benchmark.

Because non-prequential members of price probabilities are, qualitatively, as accurate as

Bayes’ rule, we conclude that violations of prequentiality are rational (when used in settings

that do not allow arbitrages).

Rules that violate exchangeability are particularly interesting. If the prior support con-

tains the real model, we demonstrate that a mild form of (over)underreaction to empirical

evidence is not detrimental. Otherwise, if the real parameter does not belong to the prior

support (the model is misspecified), we prove that (over)underreacting rules can significantly

(under)outperform Bayes’ rule. Because underreacting rules never underperforms and can

significantly improve upon Bayes’, a Pascal (1668) wager’s argument suggests that under-

reacting rules should be pragmatically preferred to Bayes’ unless we are absolutely certain

that our model is correctly specified.5

Given the growing awareness that, in most real life situations, an agent is likely to use

misspecified models - “All models are wrong, but some are useful” Box (1976) -, our work

wants to stimulate the interest in tractable non-Bayesian alternatives to the prediction prob-

lem. Our results point to underreacting, non-exchangeable rules. Exchangeability greatly

simplifies the asymptotic analysis of statistical forecasting systems because it allows to focus

exclusively on past average realizations (conditional averages in Markov models). However,

it comes at a cost: exchangeable models cannot do better than the best model in the prior

support. Because of the reasonable risk of using a misspecified model, we believe this limita-

tion to be excessively tight. In this paper, we propose a whole class of analytically tractable

5If the model is correctly specified, the loss in the log-likelihood ratio we incur by under-reacting is finite.If the model is misspecified, the gain in the log-likelihood ratio we might get is infinite. Thus, an arbitrarilysmall risk of misspecification is enough to pragmatically recommend an underreacting rule over Bayes’.

3

rules that are not exchangeable and improve upon Bayes’ in some cases of misspecification.

A crucial assumption we use to prove that an underreacting rule can improve on Bayes’ is

finite prior support.6 If we use Bayes’ rule to learn the best parameter of a given parametric

distribution, this requirement is typically not met (e.g. Bernoulli with uniform prior on

(0,1)). However, there are many practically relevant cases in which this requirement is

satisfied at a meta-model level (e.g., a Bayesian mixture model with prior, g, on a finite

set of linear regression models satisfies this assumption). Our result applies verbatim to

these cases. We chose to work on supports with finitely many parameters only for ease of

exposition.

1.1 Related literature

Our model of (over)underreaction can be seen as a special case of Epstein (2006) in which

(over)underreaction is path dependent. If there is a unique best model in the support, the

posterior concentrates on it and (over)underreaction vanishes, otherwise it persists. Our

asymptotic results extends Epstein et al. (2008)’s by including the analysis of the effect of

(over)underreaction on misspecified models. We prove that a mild form of underreaction can

deliver forecasts that are even more accurate than Bayesian when the model is misspecified.

Contrary to the axiomatic approach to learning in the economic literature (e.g., Epstein

and Le Breton (1993), Ghirardato (2002), Gilboa and Marinacci (2011)), our approach is

purely pragmatic (closer to Machine Learning’s point of view). We fix a criterion and are

interested in updating rules that perform well according to it. A rule is desirable not for

the axioms it satisfies but for its practical performance. We consider these point of views

as complementary. The former is well suited to describe personal decision. The latter is

more relevant for those decisions that must satisfy an external criterion of performance (e.g.

portfolio managers performance and weather-forecaster predictions are evaluated according

to fix criteria (Sharp-ratio, calibration)).

6For intuition of how underreacting rules can outperform Bayes’, suppose we observe repeated tosses of afair coin and we erroneously believe the probability of Head to be either 1

3 or 23 with equal prior probability.

Bayes’ rule gives predictions that are most of the time arbitrarily close to either 13 or 2

3 ; an underreactingrule gives predictions that are closer to 1

2 than Bayes’, thus more accurate (Section 6.3).

4

Our contribution to the behavioral literature is to question the conventional belief accord-

ing to which Bayes’ rule is the only rational way to learn. We propose simple non-Bayesian

rules that, according to prediction accuracy, cannot be judged irrational. We leave it to

future research to experimentally verify if agents that do not follow Bayes’ rule use other

members of price probabilities and are, after all, pragmatically rational.

Our results relates to the finding in the Computer Science literature that a slower learn-

ing rate (underreaction) can improve on Bayes’ rule if the loss function is discontinuous, or

if the loss function differs from log and the model is misspecified (e.g. Vovk (1990), HEDGE

algorithm by Freund and Schapire (1997) and Safe Bayesian by Grunwald (2012)). We con-

tribute to this line of research proposing a unifying setting for rules previously believed to

be independent, introducing new prediction rules and showing that a slower learning rate

improves upon Bayes’ also with respect to log-loss.

Section 2 introduces notation and known statistical forecasting systems; we refer to these

rules in our discussion of price probabilities. Section 3 is about the economic derivation of

price probabilities; a reader who is not interested in the background of our non-Bayesian

rules can skip it and consider Propositions 4 and 5 as definitions. The rest of the paper is

about the properties (Section 5) and the asymptotic performance (Section 6) of members of

price probabilities. Several examples are provided; proofs are in Appendix.

2 Environment

Time is discrete and begins at date 0. At each date, a random variable (the economy) can

be in S mutually exclusive states, S := {1, ..., S}, with cartesian product St = ×tS. The

set of all infinite sequences of states is S∞ := ×∞S, with representative path, σ = (σ1, ...).

σt = (σ1, ..., σt) denotes the partial history till period t and (σt−1, σt) is the concatenation of

σt−1 and σt, i.e. the sequence whose first t-1 realizations coincide with σt−1 and last element

is σt. C(σt) is the cylinder set with base σt, C(σt) = {σ ∈ S∞|σ = (σt, . . .)}, Ft the σ-algebra

generated by the cylinders, Ft = σ (C(σt), ∀σt ∈ St}), and F is the σ-algebra generated by

5

their union, F = σ (∪tFt). By construction {Ft} is a filtration. For the sake of notation,

we assume that past realizations constitute all of the relevant information, i.e. Ft := σt.

In what follows we introduce a number of variables of the form xt(σ). These variables are

assumed to be measurable according to the natural filtration Ft.

2.1 Probabilities and statistical forecasting systems

This Section gives a brief overview of the standard definition of statistical forecasting sys-

tem, log-regret (which is needed to discuss MDL) and of the statistical forecasting systems

we refer to (for a more detailed discussion we recommend Foster and Vohra (1999) and

Grunwald (2007)). These notions are useful to understand the rules we propose. Although

these forecasting schemes were derived independently and with different objectives in mind,

in this paper we show that they share a common structure. Recognizing this common struc-

ture provides a unified framework that allows comparing these model in a novel and more

transparent way.

• Statistical forecasting system (Dawid (1984)). Given a reference set of probability

distributions P = {p1, ..., pI} on F , and a partial history σt−1, a statistical forecasting

system is any function p(.|σt−1) that uses P and past outcomes σt−1 to deliver a

probability distribution on St. WLOG (Dawid (1985)), we use the terms “statistical

forecasting system” and “probability” interchangeably.7

• Log-regret. Given a partial history σt and a reference set of probabilities P , the log-

regret is the log-likelihood ratio between the model in P with the highest likelihood on

σt (i.e. the best model with hindsight) and the statistical forecasting systems adopted:

given σt, R(p;σt) = supi∈P{lnpi(σt)p(σt)}.

Given σt, the log-regret is a measure of how well a statistical forecasting system per-

7Kolmogorov’s extension theorem implies that every statistical forecasting system, {p(.|σt−1)}, can be

considered to be the sequence of conditional probabilities, p(στ |στ−1) = p(στ )p(στ−1) , obtained from the prob-

ability p(σt) :=∏tτ=1 p(στ |στ−1). Conversely, any probability distribution, p(σt) on the set St defines a

statistical forecasting system induced by its conditional distributions p(στ |στ−1) = p(στ )p(στ−1) for 0 ≤ τ < t.

6

forms against the best model in P with hindsight. To different sequences correspond

different log-regrets. To avoid this dependence, it is customary focusing on worst-case

log-regret. That is on the log-regret calculated on the least favorable sequence of real-

izations: R(p; t) = supσt R(p;σt). A forecasting system with small worst-case log-regret

is desirable because, in every sequence, it produces forecasts that are almost as good

as those of the best model in P with hindsight.

• NML: Normalized Maximum Likelihood is the minmax probability with respect to

worst-case log-regret (Rissanen (1986) proved: pNML(.) = arg infR(p;σt)):

∀σt, pNML(σt) =maxi∈P p

i(σt)∑σt maxi∈P pi(σt)

(1)

NML has bounded worst-case log-regret (if the cardinality of P is finite), which makes

it desirable on data compression tasks; however NML is hardly used in prediction task

because it cannot be calculated recursively and it does not uniquely define a set of

conditional probabilities.

• SNML: Sequential Normalized Maximum Likelihood is a strategy derived to obtain

a recursive version of NML (Roos and Rissanen (2008)). SNML’s period t predictions

coincide with the conditional probabilities that NML gives to σt, assuming that t is

the final horizon:

∀σt, pSNML(σt|σt−1) =pNML(σt)∑

σtpNML(σt−1, σt)

. (2)

SNML is also known as “Follow the Leader Strategy” (Massari (2015a)). The strategy

that, in every period, prescribes to use the prediction of the model in P that had the

highest likelihood in the past.8 SNML is consistent - if the real probability belongs to

P , SNML’s predictions converge to it - and it can be calculated recursively. However,

unlike NML, SNML regret can be unbounded even when the cardinality of P is finite.

8This strategy differs from the FLS as defined by De Rooij et al. (2014) in that it prescribes mixinginstead of randomizing when facing a non-unique leader.

7

• BPD: Bayesian Predictive Distribution is considered the “gold standard” among all

statistical forecasting systems. Given a Bayesian prior distribution C0 on a set of

probabilities P and a sequence of realizations σt−1, BPD produces a prediction for σt

on the basis of the conditional distribution of σt given σt−1:

∀σt, pBPD(σt|σt−1) =pBDP (σt)

pBDP (σt−1)=

∑i∈P p

i(σt)ci0∑i∈P p

i(σt−1)ci0=∑i∈P

pi(σt|σt−1)cit−1(σ) (3)

Where pBDP (σt) is the unconditional probability attached by the mixture model∑

i∈P pi(σt)ci0

to σt and, for every i, cit−1(σ) =pi(σt−1)ci0∑i∈P p

i(σt−1)ci0are the weights of the prior distribution

obtained via Bayes’ rule from C0.9 The prominence of BPD in most discipline is due to

its sound axiomatic foundation, its good predictive performance, and its tractability.

BPD directly follows from Kolmogorov (1933)’s axioms (adopting the standard defini-

tion of conditional probability), and it is compatible with Savage (1954)’s axioms (e.g.

Ghirardato (2002)). Moreover, BPD is consistent, it has bounded worst-case log-regret

(if the cardinality of P is finite) and it can be calculated recursively.

3 Price Probabilities

In this Section, we introduce the economic setting used to derive the equilibrium prices of

the Arrow securities used to construct price probabilities. A reader who is mostly interested

in the properties and performance of non-Bayesian updating rules can skip this Section and

consider Propositions 1 and 2 to be the definitions of the non-Bayesian rules we propose.

We model the market as an Arrow-Debreu exchange economy with complete markets.

The economy contains a finite set of traders I. Each trader, i, has consumption set R+. A

consumption plan c : S∞ →∏∞

t=0 R+ is a sequence of R+- valued functions {ct(σ)}∞t=0. Each

trader i is characterized by a payoff function ui : R+ → R over consumption, a discount

factor βi ∈ (0, 1), an endowment stream {eit(σ)}∞t=0 and a subjective probability pi on S∞,

9The unusual notation “cit−1(σ)” for the weights of the prior distribution is to ease the comparison betweenconsumption-shares and probabilistic mass. In log-economies they coincide (Section 4.2).

8

his beliefs. We denote the set of traders’ beliefs by P := {pi : i ∈ I}. To ease the comparison

between prices and the Bayesian framework, we assume that traders beliefs are iid.10 Each

trader, i, aims to solve:

maxEpi∑t=0

βtui(cit(σ)) s.t.∑t=0

∑σt∈St

q(σt)(cit(σ)− eit(σ)

)≤ 0.

Where q(σt) is the price of a claim that pays a unit of consumption on the last realization

of σt in terms of consumption at time 0. Let q(σt|σt−1) be the price of a claim that pays a

unit of consumption at period/event σt in terms of consumption at period/event σt−1.

It worth noting the similarity between the equilibrium relation between time 0 Vs recur-

sive prices q(σt|σt−1) = q(σt)

q(σt−1) (e.g. Ljungqvist and Sargent (2004)) and the relation between

unconditional Vs conditional probabilities (p(σt|σt−1) = p(σt)p(σt−1)

). If the sum of next period

prices were one, equilibrium prices would define a probability distribution.

3.1 Assumptions

A competitive equilibrium is a sequence of prices and, for each trader, a consumption plan

that is affordable, preference maximal on the budget set and mutually feasible.

The following assumptions, together with no-short-sale (Araujo and Sandroni (1999)),

are sufficient for the existence of the competitive equilibrium (Peleg and Yaari (1970)) and

for the market selection hypothesis to hold (Sandroni (2000); Blume and Easley (2006)):

• A1: The payoff functions ui : R+ → [−∞,+∞] are C1, concave and strictly increasing

and satisfy the Inada condition at 0; that is, ui(c)′ →∞ as c↘ 0.

• A2: For all traders i, and for all finite sequences σt, pi(σt) > 0⇔ P (σt) > 0.

• A3: The aggregate endowment equals 1 in every period: ∀σ, ∀t,∑

i∈I eit(σ) = 1.

• A4: All traders have identical discount factor: ∀i, βi = β.

10This assumption is made for the sake of clarity. It ensures that the driving force that makes priceprobabilities accurate is exclusively market selection, not the learning that traders might make over time orfrom prices. Our results about efficiency apply verbatim to cases in which traders’ beliefs evolve over time.

9

WLOG, because the second welfare theorem applies, we assume that the initial optimal

consumption choices are known and given by C0 = [c10 ... c

I0]′ >> 0. By A3,

∑i∈I c

i0 = 1 and

we can interpret time zero of consumption-shares as the weights that a hypothetical Bayesian

prior gives to probabilities in P . The absence of aggregate risk is needed to eliminates biases

on risk neutral probabilities due to aggregate consumption fluctuations.

3.2 The price probability class

In this Section, we introduce the price probability class. Members of price probabilities

are obtained by first, interpreting equilibrium prices of the Arrow securities as representing

relative likelihoods, and then using these relative likelihoods to construct probabilities via

normalization. Given the set of traders’ beliefs (P), different initial consumption-share dis-

tribution (C0), preferences (&) and normalization method (n) determine different probability

measures. We call the class of all such probability measures:

Definition 1. Price probabilities, Mq(P , C0,&, n), is the class of all the probabilities that

can be represented as normalized equilibrium prices of an economy that satisfies A1-A4.

In the rest of the paper, we focus on two normalization methods: NNL, in which time

zero prices are normalized at every horizon; and SNNL, in which next period prices are

normalized sequentially.

Definition 2. Normalized Normed Likelihood (NNL):

∀σt, pNNL(σt) =q(σt)∑σt q(σ

t); pNNL(σt|σt−1) : not defined.

NNL is the only probability measure that preserves the relative likelihoods of time zero

prices at every horizon (a new normalization is done at every horizon). In economic terms,

NNL is the cost of moving a unit of consumption in period/event σt in terms of time zero

consumption, divided by the cost of moving a unit of consumption from time zero to time

t for sure. NNL (as NML) cannot be calculated recursively and it does not define a unique

set of marginal probabilities (unless all traders have log utility).

10

Definition 3. Sequential Normalized Normed Likelihood (SNNL):

∀σt, pSNNL(σt) =t∏

τ=1

pSNNL(σt|σt−1) ; pSNNL(σt|σt−1) =q(σt|σt−1)∑σtq(σt|σt−1)

SNNL is the only probability measure that preserves the relative likelihoods of next

period prices. It is the cost of moving a unit of consumption from period/event σt−1 one

period ahead in state σt, divided by the cost of moving a unit of consumption for sure (short

run risk neutral probabilities). SNNL (as SNML) can be calculated recursively.

The following Lemma highlights that the relation between NNL and SNNL is the same as

the one between NML and SNML: SNNL’s period t predictions coincide with the conditional

probabilities that NNL gives to σt, assuming that t is the final horizon:

Lemma 1.

∀σt, pSNNL(σt|σt−1) =pNNL(σt)∑

σtpNNL(σt−1, σt)

.

4 Price probabilities in identical CRRA economies

If all traders have identical CRRA utility function, members of price probabilities can be

analytically characterized. This setting is flexible enough to show that PBD, NML, and

SNML belongs to price probabilities and to discuss relevant deviation from Bayes’ rule. In

what follows we use the following notation:

Definition 4. pNNLγ and pSNNLγ denote the NNL and the SNNL probabilities obtained from an

economy that satisfies A2-A4 and in which all traders have identical CRRA utility function

with parameter γ, ∀i ∈ I, ui(c) = c1−γ−11−γ .11

11As customary, we define ln 0 = −∞. Moreover, we use γ = 0 as a short notation for the limit equilibriumquantities of an iCRRA economy in which γ → 0 after the equilibrium quantities are calculated.

11

4.1 NNL in iCRRA-economies: pNNLγ

Proposition 1. Given beliefs set P, prior C0 and parameter γ, pNNLγ is given by:

∀σt, pNNLγ (σt) =

(∑i∈P p

i(σt)1γ ci0

)γ∑

σt∈St

(∑i∈I p

i(σt)1γ ci0

)γ . (4)

Equation 4 shows that pNNLγ coincides with the normalized 1γ

norm of the likelihoods of

members of P according to the measure C0. Because PBD and NML are the normalized L1

and the L∞ norm, respectively, they both belong to price probabilities.

Corollary 1. Given beliefs set P and prior C0,

i) γ = 1 (log) ⇒ ∀σt, pNNL1 (σt) = pBPD(σt);

ii) γ = 0 (linear) ⇒ ∀σt, pNNL0 (σt) = pNML(σt).

Proof. i) Notice that, if γ = 1, the denominator of Eq.4 equals one, and compare Eq. 4 with Eq.3.

ii) Notice that limγ→0

(∑i∈P p

i(σt)1γ ci0

)γ= ||pi(σt)||∞: the sup norm; and compare Eq.4 with Eq.1

Taking Bayes’ rule as a reference point, the effect of gamma on NNL is qualitatively as

follows. In a log-economy (γ = 1) NNL coincides with BPD and the interaction between prior

information (C0) and empirical evidence (σt) is regulated by Bayes’ rule. For γ = 0, NNL

coincides with NML, i.e. it is the optimal probability with respect to worst-case log-regret.

Given the explosive nature of the log-likelihood on sequences whose frequencies are close to

the boundary of the simplex, NML ignores the information of the prior (C0 plays no role),

and it assigns a relatively higher probability to those sequences whose frequency lies close

to the boundary of the simplex. For values of γ 6= 1, NNL represents a compromise between

the minimum log-regret approach behind the NML distribution and the Bayesian attempt

to make the most out of the information in the prior. Compared to a BPD with the same

Uniform prior on P , NNL with γ < (>)1 assigns more probability to those sequences whose

frequency lies close to the boundary (center) of the simplex and penalizes those sequences

whose frequency lies close to the center (boundary) of the simplex.

12

4.2 SNNL in iCRRA-economies: pNNLγ

Proposition 2. Given beliefs set P, prior C0 and parameter γ, pSNNLγ is given by:

∀σt, pSNNLγ (σt|σt−1) =

(∑i∈P p

i(σt|σt−1)1γ ciγ,t−1(σ)

)γ∑

σt∈S

(∑i∈P p

i(σt)1γ ciγ,t−1(σ)

)γ . (5)

With ciγ,t−1(σ) =by Eq.11 pi(σt−1)1γ ci0∑

i∈P pi(σt−1)

1γ ci0

.

By construction,∑

i∈I ciγ,t−1(σ) = 1, thus each ciγ,t−1(σ) can be interpreted as being

the weight attached to model pi by a prior distribution Cγ,t−1(σ). The gamma parameter

affects the evolution of this distribution. If gamma equals one (log), Ciγ,t−1(σ) coincides

with the Bayesian prior distribution obtained from C0 on σt−1 (compare with Equation 3).

Taking Bayes’ rule as our reference point, the effect of gamma on Cγ,t−1(σ) is qualitatively

as follows. If gamma is greater than one, Ciγ,t−1(σ) is less concentrated around the model

with the highest likelihood, it gives less weight to empirical evidence than Bayes’ (Ci1,t−1(σ)).

Conversely, if gamma is lower than one, Ciγ,t−1(σ) gives more weight to empirical evidence

than Bayes’. The normalizing component and the use of the 1γ

norm only mitigate this effect.

Proposition 3. Given beliefs set P and prior C0

i) γ > 1 ⇔ pSNNLγ underreact to empirical evidence;

ii) γ = 1 (log) ⇔ pSNNLγ coincides with Bayesian updating (pBPD);

iii) γ < 1⇔ pSNNLγ over-react to empirical evidence.

For intuition, suppose every probability in P is iid Bernoulli (∀i,∀t, pi(at) = i) and let ta

and tb be the number of a, b observations until period t-1, respectively. Substituting:

ciγ,t−1(σ) =pi(σt)

1γ ci0∑

j∈P pj(σt)

1γ cj0

=itaγ (1− i)

tbγ∑

j∈P jtaγ (1− j)

tbγ

. (6)

Equation 6 highlights the effect of gamma on the evolution of Ciγ,t−1(σ). If gamma is smaller

than one, the model overreacts to empirical evidence: e.g. γ = 12

is equivalent to update

13

using Bayes’ rule “counting every past realization twice”. If gamma is greater than one, the

model under-reacts to empirical evidence: e.g. γ = 2 is equivalent to update using Bayes’

rule “counting every past realization as half”.12

The economic intuition is the following (see Massari (2015b)): ciγ,t−1(σ) is trader i’s

consumption-share after trading for t-1 periods on path σ. Higher values of γ imply lower

risk tolerance, thus more conservative investment strategies. Therefore, if γ > (<)1, the

traders with incorrect beliefs lose consumption-shares to the traders with correct beliefs at

a slower (faster) rate than they would if γ = 1. Thus, the effect of incorrect beliefs on next

period equilibrium prices takes more (less) periods to disappear.

It easy to verify that SNML belongs to price probabilities.

Corollary 2. Given beliefs set P and prior, C0, γ=0 (linear)⇒ ∀σt, pSNNL0 (σt) = pSNML(σt).

Proof. ∀σt−1, pSNNL0 (σt|σt−1) =Lem.1 pNNL0 (σt)∑σtpNNL0 (σt−1,σt)

=Cor.1 pNML(σt)∑σtpNML(σt−1,σt)

:=Eq.2 pSNML(σt|σt−1).

5 Properties of price probabilities

In this Section, we introduce two characterizing properties of Bayesian updating (on iid se-

quences) and discuss whether these properties are satisfied by members of price probabilities.

We demonstrate that price probabilities in iCRRA economies satisfy both properties if and

only if all traders have log utility (that is if they coincide with BPD).

Definition 5. A statistical forecasting systems, p, is prequential if:

∀σt−1,∑σt∈St

p(σt ∩ σt−1) = p((∪σt ∈ St

)∩ σt−1

)= p(σt−1)

12Ciγ,t−1(σ) is a special case of the “Generalized Bayes’ rule” introduced by Vovk (1990). The gammaparameter is often called the learning rate as it determines the convergence rate of the posterior. The choiceof this parameter plays a fundamental role in both the HEDGE algorithm (Freund and Schapire (1997))and the Safe Bayesian approach (Grunwald (2012)). SNNL differs from these algorithms because instead ofrelying on the generalized prior it directly depends on the sequential normalization of the 1

γ norm.

14

Prequentiality (see Grunwald (2007) for details) coincides with Kolmogorov’s 3rd axiom

(additivity). An agent with non-prequential beliefs believes that the sum of the probabil-

ities of disjoint events differs from the probability of their union. In the economic theory

literature, non prequential beliefs are called time inconsistent because a trader with non-

prequential beliefs can be put into arbitrage (see example and discussion in Section 6.2).

In the Behavioral literature, this type of violation is well documented and known as the

conjunction fallacy (Kahneman (2011)).

Definition 6. A statistical forecasting systems, p, is exchangeable if whenever two partial

histories σt, σt share the same frequency, p(σt) = p(σt).

Exchangeability captures the idea that the probability of a sequence of events does not

depend on the order of the realizations. This assumption is deeply connected to Bayes’

rule: De Finetti (1931)’s theorem implies that a forecasting system is Bayesian if and only

if is exchangeable (or conditionally exchangeable). Exchangeability is an appealing criterion

whenever all the models in P are iid. For example, we expect a rational agent facing repeated

iid tosses from a coin with an unknown bias to attach the same probability to the sequences

of realizations {H,H, T} and {T,H,H}. In terms of conditional forecasts, an agent who

attaches less probability to {H,H, T} than {T,H,H} will appear as either over-weighting

or under-weighting (relatively to a Bayesian) the first two realizations.

Proposition 4.

• (a) If γ = 1 (log) and all the probabilities in P are iid, then pNNLγ , pSNNLγ and pPBD

coincide and are prequential and exchangeable.

• (b) If γ 6= 1 and all the probabilities in P are iid:

i) pNNLγ is exchangeable but not prequential;

ii) pSNNLγ is prequential but not exchangeable.

The following example illustrates Proposition 4. It shows how different values of gamma

affect NNL and SNNL on sequences of length 3 (unconditional probabilities on the left-hand

tree, conditional probabilities on the right-hand tree).

15

5.1 Example

Consider the NNLs and SNNLs obtained from two iCRRA-economies E0, E1 with two states

(Left,Right), S = {L,R}; two traders with iid beliefs p1(a) = 13

= p2(b), (P = {p1, p2});

initial consumption: C0 = [12

12]′ and CRRA parameters γ0 = 0 and γ1 = 1, respectively.

Log-economy: γ = 1, q1(σt) = βt(∑

i∈I pi(σt)ci0

).

Equilibrium prices coincide with the discounted probabilities of a Bayesian with prior C0.

Because the discount factor is independent of time, different normalization’s choices do not

affect risk neutral probabilities. Thus, all members of Mq(P , C0, γ = 1, n) coincide with

BPD which is prequential (e.g. p({R,R,R}) + p({R,R,L}) = p({R,R})) and exchangeable

(e.g. p({L,L,R}) = p({L,R, L}) = p({R,L, L})).LOG

pNNL1 (σt),pSNNL1 (σt)

12

518

16

19

29

19

19

12

29

19

19

518

19

16

LOGpNNL(σt|σt−1),pSNNL(σt|σt−1)

12

59

35

25

49

12

12

12

49

12

12

59

25

35

Limit linear economy: γ → 0, q0(σt) = βt maxi∈P {pi(σt)}.

• Normalized Normed Likelihood(pNNL0 (σt) = maxi p

i(σt)∑σt maxi pi(σt)

), is exchangeable (e.g. pNNL0 ({L,L,R}) =

pNNL0 ({L,R, L}) = pNNL0 ({R,L, L})) but not prequential (e.g. pNNL0 (R,R,R)+pNNL0 (R,R,L) =

3106= 1

3= pNNL0 (R,R). Comparing pNNL0 whit pBPD we see that pNNL0 attaches more

probability to extreme sequences ({L,L, L} and {R,R,R}) than pBPD does.

LINpNNL0 (σt)

12

13

15

110

16

110

110

12

16

110

110

13

110

15

LINpNNL0 (σt|σt−1)

Not defined because not prequential

• Sequential Normalized Normed Likelihood(pSNNL0 (σt|σt−1) =

pNNL0 (σt)∑σtpNNL0 (σt−1,σt)

), is pre-

quential (e.g. pSNNL0 ({R,R,R}) + pSNNL0 ({R,R,L}) = pSNNL0 ({R,R})) but not ex-

changeable (e.g. pSNNL0 ({L,L,R}) = 196= 1

12= pSNNL0 ({L,R, L}) = pSNNL0 ({R,L, L})).

16

The tree on the right shows that pSNNL0 “overreacts” to empirical evidence. Unlike

pBPD, a single observation in favor of p1 (p2) suffices to make the conditional probabil-

ity of pSNNL0 coincide with p1(p2) (e.g. pSNNL0 (L|L) = 23

= p2(L) and pSNNL0 (R|R) =

23

= p1(R)).

LINpSNNL(σt)

12

13

29

19

16

112

112

12

16

112

112

13

19

29

LINpSNNL(σt|σt−1)

12

23

23

13

13

12

12

12

13

12

12

23

13

23

6 Asymptotic performance of price probabilities

6.1 The criterion

In this Section, we introduce the efficiency criterion we use to characterize price probabil-

ities’ performance. Following an established tradition across fields, our criterion is based

on asymptotic likelihood comparisons. In every sequence, we compare the likelihood of a

statistical prediction strategy with beliefs set P , against the likelihood of BPD with regular

prior13 on the same support. The comparison is done in every sequence because the real

probability is unknown in most practical cases in which we want to use a prediction strategy

(if we knew the real model we would have no need to find the best predictor). The bench-

mark is chosen because Bayesian updating is wildly known, applied and appreciated. 14 The

criterion is asymptotic to eliminate the small sample effect of the prior and because small

sample criteria can be misleading.15

Definition 7. Let pBPD(σt) be the likelihood of a BPD with regular prior on P,

13A prior is regular if it attaches positive mass on every element of the prior support.14Because (if |P| <∞) BDP has finite worst-case log-regret, a likelihood comparison against BPD is also

a way to verify if a statistical forecasting system possess this fundamental property.15E.g., Massari (2013) shows that, given two statistical forecasting systems {pa}, {pb}, it is not true that

if pa’s next period predictions are infinitely often more accurate than pb and never less accurate, then pa’spredictions are more accurate on long sequences, in terms of likelihood.

17

• a statistical forecasting systems p with beliefs set P is universal-efficient if

∀σ ∈ S∞, lnpBPD(σt)

p(σt)� 1; 16

• a statistical forecasting systems p with beliefs set P is super-efficient if

∀P ∈ P , limt→∞

lnpBPD(σt)

p(σt)�P -a.s. 1, and

6 ∃P : limt→∞ ln pBPD(σt)p(σt)

→P -a.s. +∞;

∃P : limt→∞ ln pBPD(σt)p(σt)

→P -a.s. −∞;

• a statistical forecasting systems p with beliefs set P is sub-efficient if

∀P ∈ P , limt→∞

lnpBPD(σt)

p(σt)�P -a.s. 1, and

∃P : limt→∞ ln pBPD(σt)p(σt)

→P -a.s. +∞;

6 ∃P : limt→∞ ln pBPD(σt)p(σt)

→P -a.s. −∞;

Intuitively, p is universal-efficient if, in every sequence, it is as accurate as the prediction

obtained using Bayes’ rule with the same prior support. p is super-efficient if it does as well

as Bayes’ in every sequence and there are probabilities P for which it outperforms it P -a.s.;

that is, if it guarantees to do as well as using Bayes’ rule and there are cases - when the

model is misspecified - in which it does better. p is sub-efficient if there are no sequences in

which it is better Bayes’ rule, and there are cases of misspecification in which it is worse.

6.2 Asymptotic performance of NNL

Theorem 1. If the cardinality of P is finite, ∀γ ∈ [0,∞), pNNLγ is universal-efficient.17

Theorem 1 tells us that, although not prequential, NNL is as good as a PBD in terms

of likelihood. If we are only concerned about accuracy, there is no reason to consider pre-

quentiality to be a fundamental property of rational forecasts. However, non-prequential

models are likely to be undesirable in certain economic settings because an agent with non-

prequential beliefs can be put in situations of dynamic arbitrage.

16The notation f(x) � g(x) abbreviates lim sup f(x)g(x) < +∞ and lim inf f(x)g(x) > 0.

17More generally, NNL is universal-efficient in any economy that satisfies A1-A4 with |P| <∞.

18

For example, a risk neutral agent who does not discount the future and whose believes

are pML0 (R,L) = 1

3, pNML

0 (R,L, L) = 210

and pNML0 (R,L,R) = 1

10(as in Example 5.1), is at

time zero indifferent between:

• 13$ and a lottery, L1, that pays 1$ if {R,L} realizes, 0$ otherwise;

• 210

$ and a lottery, L2, that pays 1$ if {R,L, L} realizes 0$ otherwise;

• 110

$ and a lottery, L3, that pays 1$ if {R,L,R} realizes, 0$ otherwise.

Selling to him L1 for 13$ and buying from him L2 and L3 for a total of 3

10$ constitute an

arbitrage: if {R,L} does not realize, we make a profit 13- 3

10> 0. If {R,L} does realize, we

make the same profit because we can use the market to pay the dollar we lose in t = 2 with

the dollar we win for sure in t=3 (because either {R,L,R} or {R,L, L} will happen for sure).

Importantly, this arbitrage opportunity can be generated only if NNL are used in markets

which allow for both time-zero and sequential trading. An arbitrage can be constructed

against a trader with NNL beliefs only because his beliefs correspond to a state of mind

in which trade can only occur at time 0. If he knew his final horizon t and he was given

the possibility to trade sequentially, then he could use his NNL at t to construct a set of

prequential conditional probabilities via backward induction to avoid arbitrages.

This procedure is equivalent to the “massaging” process described in Savage (1954) and

Binmore (2008) as sufficient to deduce a Bayesian prior from the set of subjective, relative

likelihoods of an agent. Incidentally, Example 5.1 can be used to demonstrate that “mas-

saging” is not sufficient to imply Bayes’ rule. “Massaging” pNNL0 forward, we obtain SNNL

which is prequential, but not Bayesian (because not-exchangeable). Fixing the final horizon

and “massaging” pNNL0 backward, the resulting measure is prequential, exchangeable but

still not Bayesian.18

18 Lemma: A prior that makes pNNL0 (σ3) in example 5.1 consistent with Bayes’ rule does not exist.Proof: By symmetry of pNNL0 (σ3), if this prior existed it had to give the same weight to model p1 and p2.Unconditional probabilities obtained via Bayes’ rule from this prior coincide with pNNL1 (σ3) 6= pNNL0 (σ3), acontradiction.

19

The possibility of making this type of intertemporal arbitrages should not be too surpris-

ing. The price that both parties are willing to pay for an interest rate swap that exchanges

a long-term interest rate with a sequence of short-term interest rates can be interpreted as

representing this type of arbitrage.

6.3 Asymptotic performance of SNNL

Theorem 2. If the cardinality of P is finite,

i) γ > 1⇒ pSNNLγ is super-efficient;

ii) γ = 1⇒ pSNNLγ is universal-efficient;

iii) γ < 1⇒ pSNNLγ is sub-efficient.

Theorem 1 tells us that a statistical forecasting system that is not exchangeable is as good

as Bayes’ rule whenever the model is correctly specified (SNNL is, at least, sub-efficient).

Furthermore, it shows that there are cases in which a non-exchangeable model that underre-

acts to empirical evidence can significantly outperform Bayes’ rule. If we are only concerned

about accuracy, there is no reason to consider exchangeability to be a fundamental property

of rational forecasts. Moreover, because underreacting rules never underperform and can

significantly improve upon Bayes’, a Pascal (1668) wager’s argument suggests that under-

reacting rules should be pragmatically preferred to Bayes’ unless we are absolutely certain

that our model is correctly specified.19

The relation between NNL and SNNL is qualitatively the same as the one between NML

and SNML. Each next period forecast of the SNNL corresponds to the last period marginal

distribution of the corresponding NNL probability. Thus, SNNL can be thought of as a

compromise to make NNL recursive. This interpretation makes the super-efficiency part of

Theorem 2 even more surprising. It shows that a forecaster can perform significantly better

by precomitting to use a recursive method even when he knows the final horizon of his

prediction task. Because a recursive method does not use the length of the sequence he is

19If the model is correctly specified, the loss in the log-likelihood ratio we incur by under-reacting is finite.If the model is misspecified, the gain in the log-likelihood ratio we might get is infinite. Thus, an arbitrarilysmall risk of misspecification is enough to recommend an underreacting rule over Bayes’.

20

forecasting as an input, this result illustrates a cases in which ignoring a relevant information

increases prediction accuracy.

The following example highlights the effect of gamma on SNNL. The two cases illustrate

that underreaction (γ > 1) can significantly outperform while never underperforms PBD

whereas overreaction (γ < 1) can significantly underperform while never outperforms PBD.

Example 1: consider the SNNLs obtained from three iCRRA-economies E0, E1, E2 with

two states (S = {a, b}), two traders with iid beliefs p1(a) = 13

= p2(b), (P = {p1, p2}); ini-

tial consumption: C0 = [12

12]′ and CRRA parameters γ0 = 0, γ1 = 1 and γ2 = 2, respectively.

Case a: The real probability, P , is degenerate. It gives probability 1 to the alternating

sequence {a, b, a, ...}. Because both models are equally (in)accurate, the best prediction is

to give equal weight to both models in every period (as C0 does). Thus, this is the most

favorable case for forecasting systems that underreact to empirical evidence. By Eq.5:

pSNNL0 (at|σt−1) =pNML(σt)∑

σtpNML(σt−1, σt)

=

12 if t even

23 if t odd

pSNNL1 (at|σt−1) =∑i∈I

pi(a)pi(σt−1)ci0∑i∈I p

i (σt−1) ci0=

12 if t even

59 if t odd

pSNNL2 (at|σt−1) =

(∑i∈I p

i(a)12

pi(σt−1)12 ci0∑

i∈I pi(σt−1)

12 ci0

)2

∑σt

(∑i∈I p

i(σt)12

pi(σt−1)12 ci0∑

i∈I pi(σt−1)

12 ci0

)2 =

12 if t even

917 if t odd

Thus, on {a, b, a, ...},∀α ∈ (0, 1),

pSNNL0 (σt)pBPDα (σt)

=( 1

2 )t2 ( 1

3 )t2

α( 13 )

t2 ( 2

3 )t2 ( 2

3 )It=odd+(1−α)( 1

3 )t2 ( 1

3 )t2 ( 1

3 )It=odd

→ 0


=( 1

2 )t2 ( 4

9 )t2

α( 13 )

t2 ( 2

3 )t2 ( 2

3 )It=odd+(1−α)( 1

3 )t2 ( 1

3 )t2 ( 1

3 )It=odd

� 1


=( 1

2 )t2 ( 9

17 )t2

α( 13 )

t2 ( 2

3 )t2 ( 2

3 )It=odd+(1−α)( 1

3 )t2 ( 1

3 )t2 ( 1

3 )It=odd

→ +∞

Case a shows that, by underreacting, pSNNL2 produce predictions that are closer to the

empirical frequency than pSNNL1 and pSNNL0 thus more accurate.

Case b: The real probability, P , is degenerate. It gives probability 1 to the sequence

21

{a, a, a, ...}. Because p2 is clearly the best model, case b is the most favorable for forecasting

systems that overreact to empirical evidence.

On {a, a, a, ...},∀α ∈ (0, 1)


=12 ( 2

3 )t−1

α( 23 )t+(1−α)( 1

3 )t � 1


=12 ( 2

3 )t+ 1

2 ( 13 )t

α( 23 )t+(1−α)( 1

3 )t � 1


=

(12 ( 2

3 )τγ + 1

2 ( 13 )

τγ)γ

α( 23 )t+(1−α)( 1

3 )t ∗ e−

∑tτ=1 ln(q(a|στ−1)+q(b|στ−1)) � 1

.20

Case b shows that, although pSNNL0 immediately individuate the best model, the other

models do converge to the best model fast enough to not compromise their asymptotic like-

lihood performance: pSNNL2 does not perform significantly worse that pSNNL0 .

These examples suggest that non-concentration of the Bayesian posterior plays a special

role in determining the (sub)superefficient condition. This is indeed the case:

Proposition 5. For every regular prior C0 on a finite support P and γ ∈ (0,∞) \ 1,

i) lim ln pSNNL(σt)pBPD(σt)

= ±∞ in every path in which pBPD’s posterior does not concentrate;

ii) lim ln pSNNL(σt)pBPD(σt)

� 1 in every path in which pBPD’s posterior concentrates exponentially

fast.

Proposition 5 generalizes the finding of Massari (2015a) about the inefficiency of the

Follow the Leader Strategy (i.e. pSNNL with γ = 0). It shows that, in those cases in which

the posterior does not concentrate (a condition weaker than the “weak leader” condition

that determine the inefficiency of the FLS), any amount of overreaction is detrimental while

any amount of underreaction is desirable. Known asymptotic results in Bayesian statis-

tics21 allow to use Proposition 5 to easily recognize the probabilities that determine the

(sub)superefficiency condition.22 For example, Proposition 5 can be used to analyze:

20By Proposition 5pSNNL2 (σt)pBPDα (σt)

=

(12 ( 2

3 )τγ + 1

2 ( 13 )

τγ)γ

α( 23 )t+(1−α)( 1

3 )t ∗ e−

∑tτ=1 ln(q(a|στ−1)+q(b|στ−1)) � 1

21If |P| <∞, the Bayesian posterior does not concentrate if and only if there is more than one model withthe same expected log-likelihood; otherwise, it concentrates exponentially fast in most standard setting (e.g.if members of P are either iid or conditionally iid).

22Proposition 5 enormously simplifies this task. Even when traders’ beliefs and the real measure are iid,non-exchangeability of SNNL translates into path dependent dynamic of the posterior distribution of SNNL.

22

Case c: The real probability is iid: ∀t, P (at) = 12. Because p1 and p2 are equally (in)

accurate, it is easy to show that the Bayesian posterior does not concentrates (e.g. Massari

(2013)). Proposition 5 implies

pSNNL0 (σt)

PBPD(σt)→P -a.s. 0

pSNNL1 (σt)

PBPD(σt)�P -a.s. 1

pSNNL2 (σt)

PBPD(σt)→P -a.s. +∞

; the intuition goes as follows.

• pSNNL1 (σt|σt−1) coincides, in every period, with PBPDα=.5 (σt|σt−1); hence, it does as well

as a Bayesian with regular prior on P ;

• pSNNL2 (σt|σt−1) smoothly oscillates between p1 and p2, but it spends more time “close

to the middle” than PBPDα=.5 (σt|σt−1) does because it underreacts to empirical evidence.

Because the real distribution lies between p1 and p2, spending “more time close to the

middle” makes pSNNL2 (σt|σt−1)’s forecasts more accurate than Bayes’;

• pSNNL0 (σt|σt−1) changes his forecasts discontinuously every time the model that per-

formed best in the past changes. Thus, it spends less time “close to the middle” than

PBPD does. Because the real distribution is between p1 and p2 spending “less time

close to the middle” makes pSNNL0 (σt|σt−1)’s forecasts less accurate than Bayes’.

7 Conclusion

This paper uses the standard machinery of a dynamic general equilibrium model to generate

a rich class of probabilities and to discuss their properties. These rules are consistent with

known behavioral biases such as (over)underreaction and time inconsistency.

If the prior support contains a unique most accurate model, all members of price prob-

abilities are mutually absolutely continuous with the probabilities obtained via Bayes rule

from a regular prior on the same support. However, if the prior support contains more than

one most accurate model, a mild form of underreaction significantly improves the accuracy

of the forecasts and outperform the Bayesian standard. Our result suggests that underreac-

23

tion not only is rational (because underreacting models are as good as Bayesian in terms of

likelihood) but constitute a pragmatic valid alternative to Bayes’ rule.

7.1 Future work

Our work is still preliminary. To constitute a real alternative to Bayes’ rule we should

extend our analysis beyond the finite probability support case and study their performance

in less dramatic cases of misspecification. Regarding this second point, Grunwald (2012)

findings on the hyper-compression that Safe Bayesian can deliver when the misspecified

models class is not convex (e.g. the family of linear regression with k regressors, when the

real data generating process is heteroskedastic) suggests that the super-efficiency of SNNL

might also hold in this relatively common case of misspecification. Finally, we should modify

SNNL to searching for its optimal learning rate (gamma) on-line. It is easy to show that if

the posterior does not concentrate, larger values of gamma deliver more accurate forecasts.

However, this improvement in accuracy comes with a cost. Although, for every 1 < γ <∞

the asymptotic log-likelihood ratio between pBPD and pSNNL is bounded above, this ratio

increases monotonically in gamma when the model is correctly specified (slower learning

rate implies slower convergence rate). If the model is misspecified and the posterior does

converge, the relative position of the projection of P on P determines whether larger values

of gamma improve or deteriorate SNNL’s accuracy. With hindsight, we would like to use

gamma small when past data clearly suggests the best model and large otherwise. We would

like to do this on-line. This line of reasoning is consistent with the Safe Bayesian approach

(Grunwald (2012)) and the Flip-Flop algorithm (De Rooij et al. (2014)). The main difficulty

is implementing this intuition on the equations governing SNNL.

A Appendix

The notation f(x) � g(x), stays for lim sup f(x)g(x) < +∞ and lim inf f(x)

g(x) > 0.

24

Lemma 2. In an economy that satisfies A1-A4, equilibrium prices are given by:

q(σt) =βt∑

i∈P pi(σt) 1

ui(ci0)′∑j∈I

1

uj(cjt (σ))′

(7)

Proof. The Lagrangian problem associated to each trader’s maximization problem is

Li = Epi

∞∑t=0

βtui(cit(σ)) + λi

(∑t=0

∑σt∈St

q(σt)(cit(σ)− eit(σ))

)).

By equating the derivatives of this Lagrangian to zero we get, ∀σ, ∀t,

∂Li∂cit(σ)

= 0⇒ βtpi(σt)ui(cit(σ))′ = λiq(σt) (8)

Letting q0 = 1 (the price of one unit of consumption at t=0 equals one) we find that λi = ui(ci0)′,the result follows rearranging summing over traders and rearranging.

Proof of Proposition 1 and 2:

Proof. Substituting cit(σ)−γ for ui(σt)′ in Equation 8, remembering that λi = ui(ci0):

βtpj(σt)cj(σt)−γ = (cj0)−γq(σt) (9)

taking ratio of traders i, j’ FOCs:βtpi(σt)cit(σ)−γ

βtpj(σt)cj(σt)−γ=

(ci0)−γq(σt)

(cj0)−γq(σt); solving for ci(σt):

cit(σ) =

(pi(σt)

pj(σt)

) 1γ ci0

cj0cj(σt). (10)

Substituting Equation 10 in the market-clearing condition (which holds with equality because of

monotonicity of ui): 1 =∑

i∈I cit(σ) = cj(σt)

∑i∈I p

i(σt)1γ ci0

pj(σt)1γ cj0

; solving for cj(σt):

cj(σt) =pj(σt)

1γ cj0∑

i∈I pi(σt)

1γ ci0

. (11)

Substituting cj(σt) in Equation 9 and rearranging, we obtain

q(σt) = βt

(∑i∈P

pi(σt)1γ ci0

)γ(12)

The result follows substituting Equations 12 in the definition of NNL and SNNL respectively.

Lemma 3. In an iCRRA-economy that satisfies A1-A4:

∀σ ∈ S∞,∀t ∈ {1, ...,+∞}, γ′′ ≥ 1 ≥ γ′ ⇒∑

σt∈S∞ qγ′(σt|σt−1)

β≥ 1 ≥

∑σt∈S qγ′′(σt|σt−1)

β

25

With equality if and only if γ = 1 or the consumption-share distribution is degenerate.

Proof. Using Equation 12 and the definition of q(σt|σt−1),

∑σt∈S qγ(σt|σt−1)

β=∑σt∈S

(∑i∈I p

i(σt−1, σt)1γ ci0

)γ(∑i∈I p

i (σt−1)1γ ci0

)γ=∑σt∈S

(∑i∈I

pi(σt|σt−1)1γ

pi(σt−1)1γ ci0∑

i∈I pi (σt−1)

1γ ci0

)γ

=∑σt∈S

(∑i∈I

pi(σt|σt−1)1γ ciγ,t−1(σ)

)γBy Eq. 11

=∑σt∈S

f

(∑i∈I


)

Note that∑

i∈I ciγ,t−1(σ) = 1, and f(.) is strictly concave ⇔ γ < 1, linear ⇔ γ = 1 and strictly

convex ⇔ γ > 1. Let γ > 1, by Jensen inequality:

∑σt∈S

f

(∑i∈I


)≤∑σt∈S

∑i∈I

ciγ,t−1(σ)f(pi(σt|σt−1)1γ ) Equality iff ∃i : ciγ,t−1(σ) = 1

=∑i∈I

ciγ,t−1(σ)∑σt∈S

pi(σt|σt−1)

= 1 Because ∀i,∑σt∈S

pi(σt|σt−1) = 1

The cases γ = 1, γ < 1 can be proven using the same logic.

Proof of Lemma 1

Proof.

pSNNL(σt|σt−1) =q(σt|σt−1)∑σtq(σt|σt−1)

= q(σt|σt−1) ∗ q(σt−1)∑σt q(σ

t)∗∑

σt q(σt)

q(σt−1)∗ 1∑

σtq(σt|σt−1)

=q(σt)∑σt q(σ

t)∗ 1∑

σtq(σt−1,σt)∑σt q(σ

t)

=pNNL(σt)∑

σtpNNL(σt−1, σt)

Proof of Proposition 3

26

Proof. By Eq.11, the consumption of the trader with maximum likelihood on σt, i(σt), is given by

cit(σ) =pi(σ

t)(σt)1γ ci0∑

j∈I pj(σt)

1γ cj0

. Claims i, ii, iii follow by noticing that pSNNLγ=1 = pBPD and that to higher

gamma corresponds lower weight to cit:∂ct (i(σ))

∂γ = −ci0pi(σt)1γ

∑j 6=i(σt) c

j0pj(σt)

1γ ln

pi(σt)

pj(σt)

γ2

(∑j∈I p

j(σt)1γ cj0

)2 < 0.

Proof of Proposition 4:

Proof. (a): In a log economy pNNL = pSNNL = pBPD which is prequential and exchangeable.(b) : i) Exchangeable (by construction, because the denominator in Eq. 7 is constant on everyhorizon); not prequential: by contradiction, assume H0 : pNNL is prequential forall γ ∈ [0,∞].

∀σt, q(σt)

q(σt−1)=Eq. condition q(σt|σt−1)

⇔ ∀σt, pNNL(σt)

pNNL(σt−1)=

q(σt)∑σt q(σ

t)

q(σt−1)∑σt−1 q(σt−1)

= q(σt|σt−1)

∑σt−1 q(σt−1)∑σt q(σ

t)

⇔ ∀σt,∑σt

pNNL(σt−1, σt)

pNNL(σt−1)=∑σt

q(σt|σt−1)


t)

⇔if H0 is true ∀σt, 1 =∑σt

q(σt|σt−1)


t)

⇔ ∀σt,∑

σt q(σt)∑

σt−1 q(σt−1)=∑σt

q(σt|σt−1)

⇔ ∀σt−1, σt−1,∑σt

q(σt|σt−1) =

∑σt q(σt)∑

σt−1 q(σt−1)=∑σt

q(σt|σt−1)

⇔ ∀σt−1, σt−1,∑σt

∑i∈P

pi(σt|σt−1)1γ

pi(σt−1)1γ∑

j pj(σt−1)

1γ

γ

=∑σt

(∑i∈P

pi(σt|σt−1)1γ

pi(σt−1)1γ∑

i∈P pi(σt−1)

1γ

)γ.

Which is true iff gamma=1 (i.e. all traders have log utility). A contradiction of H0.

ii) Prequential, by construction (because the measure is constructed recursively in a forwardfashion); not exchangeable: by De Finetti (1931)’s theorem, a measure on infinite sequences isexchangeable if and only if it can be represented as a mixture of iid measures if and only if it existsa prior such that it coincide with BPD. The (sub)super-efficiency of SNNL proved in Theorem 2implies that there exists no such prior if γ 6= 1, thus SNNL is not exchangeable.

Lemma 4. In an economy that satisfies A1-A4, ∀σ ∈ S∞ :∑

i∈I1

u′i(cit(σ))

� 1.

Proof.

27

• ∀σ ∈ S∞, lim sup∑

i∈I1

u′i(ci(σ))

≤ max[c1,...,cI ]

∑i∈I

1u′i(c

i)< |I|maxi

1u′i(1)

< ∞ because mar-

ket clearing implies maxi ci = 1; and A1 implies ∀i,maxc≤1

1ui(c)′

= 1ui(1)′ <∞.

• ∀σ ∈ S∞, lim inf∑

i∈I1

u′i(cit(σ))

≥ min[c1,...,cI ]

∑i∈I

1u′i(c

i)> 0 because

∑i∈I

1u′i(c

i)= 0 if and

only if ∀i, u′i(ci) =∞⇔A1 ∀i, ci = 0, which violates market-clearing(∀t,∑

i∈I ci =

∑i∈I e

i =A3 1).

Proof of Theorem 1:

Proof. Substituting Equation 7 in the definition of NNL.

pNNL(σt) =q(σt)∑σt q(σ

t)=

βt∑i∈I p

i(σt)ci0∑i∈I

1u′i(c(σt))

βt∑

σt∈St

(∑i∈I p

i(σt)ci0∑i∈I

1u′i(c(σt))

)

Let lim inf∑

i∈I1

u′i(c(σt))

= a and lim sup∑

i∈I1

u′i(c(σt))

= b; by Lem.4; 0 < a ≤ b <∞, thus

pNNL(σt) ∈ [

βt∑i∈I p

i(σt)ci0b

βt∑

σ∈St(∑

i∈I pi(σt)ci0a

) , βt∑i∈I p

i(σt)ci0a

βt∑

σ∈St(∑

i∈I pi(σt)ci0b

) ]

⇒ pNNL(σt) ∈ [a

b

∑i∈I

pi(σt)ci0,b

a

∑i∈I

pi(σt)ci0]

⇔ lnpBPD(σt)

pNNL(σt)� 1.

Proof of Theorem 2:

Proof. Let rewrite ln pBPD(σt)pSNNL(σt)

as follows:

lnpBPD(σt)

pSNNL(σt)=

t∑τ=1

∑στ

Iστ lnpBPD(στ |στ−1)

q(στ |στ−1)∑σtq(στ |στ−1)

= lnβtpBPD(σt)

q(σt)+

t∑τ=1

ln

(∑στq(στ |στ−1)

β

)

The result follows from these two claims:

• Claim 1: ln βtpBPD(σt)q(σt) � 1:

By Eq.7, q(σt) = βt∑

i∈I pi(σt)

1

ui(ci0)′∑j∈I

1

u′j(cj(σt))

�by Lem.4 βt∑

i∈I pi(σt) 1

|I| � βtpBPD(σt).

• Claim 2: ∃P : γ < (>)1⇒∑t

τ=1 ln(∑

στq(στ |στ−1)

β

)→ +(−)∞

28

Note that:i)∑t

τ=1 ln(∑


β

)→ +(−)∞ iff consumption shares Ct,γ do not concentrate on a

unique trader fast enough.

Proof : By Lem.3 (Jensen inequality) γ < (>)1⇔ ln(∑


β

)≥ (≤)0, with equality iff the

consumption share is degenerate. Thus, given γ, all terms of the sum have the same sign.

ii) pBPD(σt) posterior Ct,γ=1 does not concentrate on a unique model iff, ∀γ ∈ (0,+∞), Ct,γdoes not concentrate on a unique trader. Proof :23

Ct,γ=1 does not concentrate

⇔ ∃η > 0,∃i, j ∈ P : lim suppi(σt)∑i∈P p

i(σt)> η and lim sup

pj(σt)∑i∈P p

i(σt)> η

⇔ ∃ηγ > 0 : lim suppi(σt)

1γ∑

i∈P piσt

1γ

= ciγ,t(σ) > ηγ and lim suppi(σt)

1γ∑

i∈P piσt

1γ

= cjγ,t(σ) > ηγ

⇔ Ct,γ does not concentrate.

iii) Because P has finitely many models, it exists a sequence such that the two most accuratemodels in P have comparable likelihood (it can be constructed recursively by choosing the nextrealization to favor the model with the lower likelihood). Alternatively, a non degenerate measurethat satisfies this condition can be constructed using Chernoff’s bond (Cover and Thomas (2012)).

Proof of Proposition 5:

Proof. As in the proof of Th.2: ln pBPD(σt)pSNNL(σt)

= ln βtpBPD(σt)q(σt) +

∑tτ=1 ln


β

).

By Claim 1 in the proof of Th.2, ln βtpBPD(σt)q(σt) � 1. For the second term we have:

• Part i) mimic the step of Th. 2, except that non concentration is now assumed.

• Part ii) follows because if the concentration rate is exponential a Taylor expansion ensures

that ∃0 < a < b < 1 : ln(∑


β

)∈ [bτ ; aτ ], so that

∑tτ=1 ln


β

)� 1.

For a practical example of the details, consider Example 1:

e−∑tτ=1 ln

(q(a|στ−1)+q(b|στ−1)

)= EXP

−t∑

τ=1

ln

(12

(23

) τγ + 1

2

(13

) τγ

)γ+

(12

(13

) 1γ(

23

) τ−1γ + 1

2

(23

) 1γ(

13

) τ−1γ

)γ(

12

(23

) τ−1γ + 1

2

(13

) τ−1γ

)γ

= EXP

−t∑

τ=1

ln

((23

) 1γ +

(13

) 1γ(

12

) τ−1γ

)γ+

((13

) 1γ +

(23

) 1γ(

12

) τ−1γ

)γ(

1 +(

12

) τ−1γ

)γ

23The proof slightly differs for γ = 0 because we need the stronger condition that the model with the

highest likelihood changes infinitely to ensure∑tτ=1 ln


β

)→ ±∞ (See Massari (2015a)).

29

Taylor expanding the two terms on the numerator around 23

1γ and 1

3

1γ and the term in the

denumerator around 1, respectively, it follows that ∃η ∈ (0, 12) :

EXP

−t∑

τ=1

ln

((23

) 1γ +

(13

) 1γ(

12

) τ−1γ

)γ+

((13

) 1γ +

(23

) 1γ(

12

) τ−1γ

)γ(

1 +(

12

) τ−1γ

)γ ∈

[e−∑tτ=1

(12+η)τ

; e−∑tτ=1

(12−η

)τ ]� 1.

References

Araujo, A. and Sandroni, A. (1999). On the convergence to homogeneous expectations when markets arecomplete. Econometrica, 67(3):663–672.

Binmore, K. (2008). Rational decisions. Princeton University Press.

Blume, L. and Easley, D. (1993). Economic natural selection. Economics Letters, 42(2):281–289.

Blume, L. and Easley, D. (2006). If you’re so smart, why aren’t you rich? belief selection in complete andincomplete markets. Econometrica, 74(4):929–966.

Blume, L. and Easley, D. (2009). The market organism: long-run survival in markets with heterogeneoustraders. Journal of Economic Dynamics and Control, 33(5):1023–1035.

Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association, 71(356):791–799.

Cover, T. M. and Thomas, J. A. (2012). Elements of information theory. John Wiley & Sons.

Dawid, A. P. (1984). Present position and potential developments: Some personal views: Statistical theory:The prequential approach. Journal of the Royal Statistical Society. Series A (General), pages 278–292.

Dawid, A. P. (1985). Calibration-based empirical probability. The Annals of Statistics, pages 1251–1274.

De Finetti, B. (1931). Funzione caratteristica di un fenomeno aleatorio. Atti del Congresso Internazionaledei matematici, Bologna.

De Rooij, S., Van Erven, T., Grunwald, P. D., and Koolen, W. M. (2014). Follow the leader if you can,hedge if you must. The Journal of Machine Learning Research, 15(1):1281–1316.

Epstein, L. G. (2006). An axiomatic model of non-bayesian updating. Review of Economic Studies, pages413–436.

Epstein, L. G. and Le Breton, M. (1993). Dynamically consistent beliefs must be bayesian. Journal ofEconomic theory, 61(1):1–22.

Epstein, L. G., Noor, J., and Sandroni, A. (2008). Non-bayesian updating: a theoretical framework. Theo-retical Economics, 3(2):193–229.

Foster, D. P. and Vohra, R. (1999). Regret in the on-line decision problem. Games and Economic Behavior,29(1):7–35.

30

Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an appli-cation to boosting. Journal of computer and system sciences, 55(1):119–139.

Friedman, M. (1953). Essays in positive economics, volume 231. University of Chicago Press.

Ghirardato, P. (2002). Revisiting savage in a conditional world. Economic Theory, 20(1):83–92.

Gilboa, I. and Marinacci, M. (2011). Ambiguity and the bayesian paradigm. Chapter, 7:179–242.

Grunwald, P. (2007). The minimum description length principle. MIT press.

Grunwald, P. (2012). The safe bayesian. In Algorithmic Learning Theory, pages 169–183. Springer.

Kahneman, D. (2011). Thinking, fast and slow. Macmillan.

Kets, W., Pennock, D. M., Sethi, R., and Shah, N. (2014). Betting strategies, market selection, and thewisdom of crowds. Market Selection, and the Wisdom of Crowds (February 20, 2014).

Kolmogorov, A. (1933). Grundbegriffe der wahrscheinlichkeitstheorie. Ergebnisse Mathematische, 2.

Lehrer, E. and Teper, R. (2016). Who is a bayesian? Working paper.

Ljungqvist, L. and Sargent, T. J. (2004). Recursive macroeconomic theory. MIT press.

Massari, F. (2013). Comment on if you’re so smart, why aren’t you rich? belief selection in complete andincomplete markets. Econometrica, 81(2):849–851.

Massari, F. (2015a). Do not follow a weak leader. Available at SSRN 2663223.

Massari, F. (2015b). Market selection in large economies: A matter of luck. Available at SSRN 2559468.

Pascal, B. (1668). Pascal’s Pensees (English translation by John Walker). Available online athttp://www.gutenberg.org/files/18269/18269-h/18269-h.htm.

Peleg, B. and Yaari, M. E. (1970). Markets with countably many commodities. International EconomicReview, 11(3):369–377.

Rabin, M. et al. (2000). Inference by believers in the law of small numbers. Institute of Business andEconomic Research.

Rissanen, J. (1986). Stochastic complexity and modeling. The Annals of Statistics, pages 1080–1100.

Roos, T. and Rissanen, J. (2008). On sequentially normalized maximum likelihood models. Workshop onInformation Theoretic Methods in Science and Engineering.

Rubinstein, M. (1974). An aggregation theorem for securities markets. Journal of Financial Economics,1(3):225–244.

Sandroni, A. (2000). Do markets favor agents able to make accurate predictions? Econometrica, 68(6):1303–1341.

Savage, L. J. (1954). The foundations of statistics. John Wiley & Sons.

Shtar’kov, Y. M. (1987). Universal sequential coding of single messages. Problemy Peredachi Informatsii,23(3):3–17.

Vovk, V. G. (1990). Aggregating strategies. In Proc. Third Workshop on Computational Learning Theory,pages 371–383. Morgan Kaufmann.

31

Documents

Price Probabilities: A class of Bayesian and non-Bayesian prediction rules · Price Probabilities: A class of Bayesian and non-Bayesian prediction rules Filippo Massari School of