Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Price Probabilities:
A class of Bayesian and non-Bayesian
prediction rules
Filippo Massari
School of Banking and Finance, Australian School of Business
February 19, 2016
Abstract
We use the standard machinery of dynamic general equilibrium models to generate
a rich class of probabilities and to discuss their properties. This class includes proba-
bilities consistent with Bayes’ rule and known non-Bayesian rules. If the prior support
is correctly specified, we prove that all members of this class are as good as Bayes’
rule in terms of likelihood. If it is misspecified, we demonstrate that those rules that
underreact to new information can significantly outperform Bayes’. Because under-
reaction is never worse and sometimes better than Bayes’, we question the common
opinion that Bayes’ rule is the only rational way to learn and propose a valid alternative.
Keywords: Prediction markets, Non-Bayesian Learning, Market Selection, MDL
1 Introduction
It has long been argued that financial markets aggregate the different opinions of their par-
ticipants efficiently. According to the market selection hypothesis, prices becomes accurate
1
because traders with incorrect beliefs eventually lose all their wealth to accurate traders
(Friedman (1953)). In models in which the selection hypothesis holds, standard economic
arguments implies that the risk neutral probabilities converge to the most accurate trader’s
beliefs (Sandroni (2000), Blume and Easley (2006)). In this paper, we characterize this
dynamic.
Given traders’ beliefs and time zero consumption-shares distribution, different preferences
and normalization horizons determine different risk neutral probabilities. For example, if all
traders have log utility, risk neutral probabilities’ coincides with the probabilities obtained
via Bayes’ rule from a consumption-weighted prior on the set of traders beliefs (Rubinstein
(1974), Blume and Easley (1993), Blume and Easley (2009), Kets et al. (2014)).1
We define the Price Probability class to be the class of all probabilities that can be
represented as risk neutral probabilities of an economy with complete markets, no aggregate
risk and in which the market selection hypothesis holds. This class is rich; it includes
probabilities consistent with Bayes’ rule (BPD2) as well as other non-Bayesian rules which
were originally derived independently and with different objectives in mind such as NML3
and SNML4. Our unifying framework provides a common ground to compare these rules in
a novel and transparent way.
We compare price probabilities with Bayes’ rule on the ground of two of its characterizing
properties: exchangeability and prequentiality. Exchangeability is a fundamental property of
the Bayesian approach - De-Finetti’s theorem implies that a measure on infinite sequences
is exchangeable if and only if it is a Bayesian mixture of iid probabilities -. It requires the
probability of every partial history to depend exclusively on the frequency of the outcomes,
not on their order. Prequentiality is a form of intertemporal consistency which is necessary
for Bayes’ rule and to avoid arbitrages (Grunwald (2007), Lehrer and Teper (2016)). It
1More precisely, the risk neutral probabilities of the Arrows securities in a dynamic general equilib-rium model with no aggregate risk in which all traders have log utility, identical discount factor, time zeroconsumption-shares distribution C0 and set of traders beliefs P coincides with the probabilities obtained viaBayes’ rule from the prior C0 on P (Corollary 1).
2Bayesian Predictive Distribution, aka Bayesian Model Average or Bayesian Mixture.3Normalized Maximum Likelihood: Rissanen (1986), Shtar’kov (1987),Grunwald (2007).4Sequential Normalized Maximum Likelihood: Roos and Rissanen (2008).
2
requires the tree of unconditional probabilities to satisfy Kolmogorov’s third axiom.
Given the overwhelming experimental evidence showing that most agents are not Bayesian
(Rabin et al. (2000), Kahneman (2011)), it is natural to ask if these properties are funda-
mental for rational predictions. To answer this question, we take a pragmatic approach: we
fix an objective criterion of accuracy (based on asymptotic likelihood) and use it to evaluate
members of price probabilities against the Bayesian benchmark.
Because non-prequential members of price probabilities are, qualitatively, as accurate as
Bayes’ rule, we conclude that violations of prequentiality are rational (when used in settings
that do not allow arbitrages).
Rules that violate exchangeability are particularly interesting. If the prior support con-
tains the real model, we demonstrate that a mild form of (over)underreaction to empirical
evidence is not detrimental. Otherwise, if the real parameter does not belong to the prior
support (the model is misspecified), we prove that (over)underreacting rules can significantly
(under)outperform Bayes’ rule. Because underreacting rules never underperforms and can
significantly improve upon Bayes’, a Pascal (1668) wager’s argument suggests that under-
reacting rules should be pragmatically preferred to Bayes’ unless we are absolutely certain
that our model is correctly specified.5
Given the growing awareness that, in most real life situations, an agent is likely to use
misspecified models - “All models are wrong, but some are useful” Box (1976) -, our work
wants to stimulate the interest in tractable non-Bayesian alternatives to the prediction prob-
lem. Our results point to underreacting, non-exchangeable rules. Exchangeability greatly
simplifies the asymptotic analysis of statistical forecasting systems because it allows to focus
exclusively on past average realizations (conditional averages in Markov models). However,
it comes at a cost: exchangeable models cannot do better than the best model in the prior
support. Because of the reasonable risk of using a misspecified model, we believe this limita-
tion to be excessively tight. In this paper, we propose a whole class of analytically tractable
5If the model is correctly specified, the loss in the log-likelihood ratio we incur by under-reacting is finite.If the model is misspecified, the gain in the log-likelihood ratio we might get is infinite. Thus, an arbitrarilysmall risk of misspecification is enough to pragmatically recommend an underreacting rule over Bayes’.
3
rules that are not exchangeable and improve upon Bayes’ in some cases of misspecification.
A crucial assumption we use to prove that an underreacting rule can improve on Bayes’ is
finite prior support.6 If we use Bayes’ rule to learn the best parameter of a given parametric
distribution, this requirement is typically not met (e.g. Bernoulli with uniform prior on
(0,1)). However, there are many practically relevant cases in which this requirement is
satisfied at a meta-model level (e.g., a Bayesian mixture model with prior, g, on a finite
set of linear regression models satisfies this assumption). Our result applies verbatim to
these cases. We chose to work on supports with finitely many parameters only for ease of
exposition.
1.1 Related literature
Our model of (over)underreaction can be seen as a special case of Epstein (2006) in which
(over)underreaction is path dependent. If there is a unique best model in the support, the
posterior concentrates on it and (over)underreaction vanishes, otherwise it persists. Our
asymptotic results extends Epstein et al. (2008)’s by including the analysis of the effect of
(over)underreaction on misspecified models. We prove that a mild form of underreaction can
deliver forecasts that are even more accurate than Bayesian when the model is misspecified.
Contrary to the axiomatic approach to learning in the economic literature (e.g., Epstein
and Le Breton (1993), Ghirardato (2002), Gilboa and Marinacci (2011)), our approach is
purely pragmatic (closer to Machine Learning’s point of view). We fix a criterion and are
interested in updating rules that perform well according to it. A rule is desirable not for
the axioms it satisfies but for its practical performance. We consider these point of views
as complementary. The former is well suited to describe personal decision. The latter is
more relevant for those decisions that must satisfy an external criterion of performance (e.g.
portfolio managers performance and weather-forecaster predictions are evaluated according
to fix criteria (Sharp-ratio, calibration)).
6For intuition of how underreacting rules can outperform Bayes’, suppose we observe repeated tosses of afair coin and we erroneously believe the probability of Head to be either 1
3 or 23 with equal prior probability.
Bayes’ rule gives predictions that are most of the time arbitrarily close to either 13 or 2
3 ; an underreactingrule gives predictions that are closer to 1
2 than Bayes’, thus more accurate (Section 6.3).
4
Our contribution to the behavioral literature is to question the conventional belief accord-
ing to which Bayes’ rule is the only rational way to learn. We propose simple non-Bayesian
rules that, according to prediction accuracy, cannot be judged irrational. We leave it to
future research to experimentally verify if agents that do not follow Bayes’ rule use other
members of price probabilities and are, after all, pragmatically rational.
Our results relates to the finding in the Computer Science literature that a slower learn-
ing rate (underreaction) can improve on Bayes’ rule if the loss function is discontinuous, or
if the loss function differs from log and the model is misspecified (e.g. Vovk (1990), HEDGE
algorithm by Freund and Schapire (1997) and Safe Bayesian by Grunwald (2012)). We con-
tribute to this line of research proposing a unifying setting for rules previously believed to
be independent, introducing new prediction rules and showing that a slower learning rate
improves upon Bayes’ also with respect to log-loss.
Section 2 introduces notation and known statistical forecasting systems; we refer to these
rules in our discussion of price probabilities. Section 3 is about the economic derivation of
price probabilities; a reader who is not interested in the background of our non-Bayesian
rules can skip it and consider Propositions 4 and 5 as definitions. The rest of the paper is
about the properties (Section 5) and the asymptotic performance (Section 6) of members of
price probabilities. Several examples are provided; proofs are in Appendix.
2 Environment
Time is discrete and begins at date 0. At each date, a random variable (the economy) can
be in S mutually exclusive states, S := {1, ..., S}, with cartesian product St = ×tS. The
set of all infinite sequences of states is S∞ := ×∞S, with representative path, σ = (σ1, ...).
σt = (σ1, ..., σt) denotes the partial history till period t and (σt−1, σt) is the concatenation of
σt−1 and σt, i.e. the sequence whose first t-1 realizations coincide with σt−1 and last element
is σt. C(σt) is the cylinder set with base σt, C(σt) = {σ ∈ S∞|σ = (σt, . . .)}, Ft the σ-algebra
generated by the cylinders, Ft = σ (C(σt), ∀σt ∈ St}), and F is the σ-algebra generated by
5
their union, F = σ (∪tFt). By construction {Ft} is a filtration. For the sake of notation,
we assume that past realizations constitute all of the relevant information, i.e. Ft := σt.
In what follows we introduce a number of variables of the form xt(σ). These variables are
assumed to be measurable according to the natural filtration Ft.
2.1 Probabilities and statistical forecasting systems
This Section gives a brief overview of the standard definition of statistical forecasting sys-
tem, log-regret (which is needed to discuss MDL) and of the statistical forecasting systems
we refer to (for a more detailed discussion we recommend Foster and Vohra (1999) and
Grunwald (2007)). These notions are useful to understand the rules we propose. Although
these forecasting schemes were derived independently and with different objectives in mind,
in this paper we show that they share a common structure. Recognizing this common struc-
ture provides a unified framework that allows comparing these model in a novel and more
transparent way.
• Statistical forecasting system (Dawid (1984)). Given a reference set of probability
distributions P = {p1, ..., pI} on F , and a partial history σt−1, a statistical forecasting
system is any function p(.|σt−1) that uses P and past outcomes σt−1 to deliver a
probability distribution on St. WLOG (Dawid (1985)), we use the terms “statistical
forecasting system” and “probability” interchangeably.7
• Log-regret. Given a partial history σt and a reference set of probabilities P , the log-
regret is the log-likelihood ratio between the model in P with the highest likelihood on
σt (i.e. the best model with hindsight) and the statistical forecasting systems adopted:
given σt, R(p;σt) = supi∈P{lnpi(σt)p(σt)}.
Given σt, the log-regret is a measure of how well a statistical forecasting system per-
7Kolmogorov’s extension theorem implies that every statistical forecasting system, {p(.|σt−1)}, can be
considered to be the sequence of conditional probabilities, p(στ |στ−1) = p(στ )p(στ−1) , obtained from the prob-
ability p(σt) :=∏tτ=1 p(στ |στ−1). Conversely, any probability distribution, p(σt) on the set St defines a
statistical forecasting system induced by its conditional distributions p(στ |στ−1) = p(στ )p(στ−1) for 0 ≤ τ < t.
6
forms against the best model in P with hindsight. To different sequences correspond
different log-regrets. To avoid this dependence, it is customary focusing on worst-case
log-regret. That is on the log-regret calculated on the least favorable sequence of real-
izations: R(p; t) = supσt R(p;σt). A forecasting system with small worst-case log-regret
is desirable because, in every sequence, it produces forecasts that are almost as good
as those of the best model in P with hindsight.
• NML: Normalized Maximum Likelihood is the minmax probability with respect to
worst-case log-regret (Rissanen (1986) proved: pNML(.) = arg infR(p;σt)):
∀σt, pNML(σt) =maxi∈P p
i(σt)∑σt maxi∈P pi(σt)
(1)
NML has bounded worst-case log-regret (if the cardinality of P is finite), which makes
it desirable on data compression tasks; however NML is hardly used in prediction task
because it cannot be calculated recursively and it does not uniquely define a set of
conditional probabilities.
• SNML: Sequential Normalized Maximum Likelihood is a strategy derived to obtain
a recursive version of NML (Roos and Rissanen (2008)). SNML’s period t predictions
coincide with the conditional probabilities that NML gives to σt, assuming that t is
the final horizon:
∀σt, pSNML(σt|σt−1) =pNML(σt)∑
σtpNML(σt−1, σt)
. (2)
SNML is also known as “Follow the Leader Strategy” (Massari (2015a)). The strategy
that, in every period, prescribes to use the prediction of the model in P that had the
highest likelihood in the past.8 SNML is consistent - if the real probability belongs to
P , SNML’s predictions converge to it - and it can be calculated recursively. However,
unlike NML, SNML regret can be unbounded even when the cardinality of P is finite.
8This strategy differs from the FLS as defined by De Rooij et al. (2014) in that it prescribes mixinginstead of randomizing when facing a non-unique leader.
7
• BPD: Bayesian Predictive Distribution is considered the “gold standard” among all
statistical forecasting systems. Given a Bayesian prior distribution C0 on a set of
probabilities P and a sequence of realizations σt−1, BPD produces a prediction for σt
on the basis of the conditional distribution of σt given σt−1:
∀σt, pBPD(σt|σt−1) =pBDP (σt)
pBDP (σt−1)=
∑i∈P p
i(σt)ci0∑i∈P p
i(σt−1)ci0=∑i∈P
pi(σt|σt−1)cit−1(σ) (3)
Where pBDP (σt) is the unconditional probability attached by the mixture model∑
i∈P pi(σt)ci0
to σt and, for every i, cit−1(σ) =pi(σt−1)ci0∑i∈P p
i(σt−1)ci0are the weights of the prior distribution
obtained via Bayes’ rule from C0.9 The prominence of BPD in most discipline is due to
its sound axiomatic foundation, its good predictive performance, and its tractability.
BPD directly follows from Kolmogorov (1933)’s axioms (adopting the standard defini-
tion of conditional probability), and it is compatible with Savage (1954)’s axioms (e.g.
Ghirardato (2002)). Moreover, BPD is consistent, it has bounded worst-case log-regret
(if the cardinality of P is finite) and it can be calculated recursively.
3 Price Probabilities
In this Section, we introduce the economic setting used to derive the equilibrium prices of
the Arrow securities used to construct price probabilities. A reader who is mostly interested
in the properties and performance of non-Bayesian updating rules can skip this Section and
consider Propositions 1 and 2 to be the definitions of the non-Bayesian rules we propose.
We model the market as an Arrow-Debreu exchange economy with complete markets.
The economy contains a finite set of traders I. Each trader, i, has consumption set R+. A
consumption plan c : S∞ →∏∞
t=0 R+ is a sequence of R+- valued functions {ct(σ)}∞t=0. Each
trader i is characterized by a payoff function ui : R+ → R over consumption, a discount
factor βi ∈ (0, 1), an endowment stream {eit(σ)}∞t=0 and a subjective probability pi on S∞,
9The unusual notation “cit−1(σ)” for the weights of the prior distribution is to ease the comparison betweenconsumption-shares and probabilistic mass. In log-economies they coincide (Section 4.2).
8
his beliefs. We denote the set of traders’ beliefs by P := {pi : i ∈ I}. To ease the comparison
between prices and the Bayesian framework, we assume that traders beliefs are iid.10 Each
trader, i, aims to solve:
maxEpi∑t=0
βtui(cit(σ)) s.t.∑t=0
∑σt∈St
q(σt)(cit(σ)− eit(σ)
)≤ 0.
Where q(σt) is the price of a claim that pays a unit of consumption on the last realization
of σt in terms of consumption at time 0. Let q(σt|σt−1) be the price of a claim that pays a
unit of consumption at period/event σt in terms of consumption at period/event σt−1.
It worth noting the similarity between the equilibrium relation between time 0 Vs recur-
sive prices q(σt|σt−1) = q(σt)
q(σt−1) (e.g. Ljungqvist and Sargent (2004)) and the relation between
unconditional Vs conditional probabilities (p(σt|σt−1) = p(σt)p(σt−1)
). If the sum of next period
prices were one, equilibrium prices would define a probability distribution.
3.1 Assumptions
A competitive equilibrium is a sequence of prices and, for each trader, a consumption plan
that is affordable, preference maximal on the budget set and mutually feasible.
The following assumptions, together with no-short-sale (Araujo and Sandroni (1999)),
are sufficient for the existence of the competitive equilibrium (Peleg and Yaari (1970)) and
for the market selection hypothesis to hold (Sandroni (2000); Blume and Easley (2006)):
• A1: The payoff functions ui : R+ → [−∞,+∞] are C1, concave and strictly increasing
and satisfy the Inada condition at 0; that is, ui(c)′ →∞ as c↘ 0.
• A2: For all traders i, and for all finite sequences σt, pi(σt) > 0⇔ P (σt) > 0.
• A3: The aggregate endowment equals 1 in every period: ∀σ, ∀t,∑
i∈I eit(σ) = 1.
• A4: All traders have identical discount factor: ∀i, βi = β.
10This assumption is made for the sake of clarity. It ensures that the driving force that makes priceprobabilities accurate is exclusively market selection, not the learning that traders might make over time orfrom prices. Our results about efficiency apply verbatim to cases in which traders’ beliefs evolve over time.
9
WLOG, because the second welfare theorem applies, we assume that the initial optimal
consumption choices are known and given by C0 = [c10 ... c
I0]′ >> 0. By A3,
∑i∈I c
i0 = 1 and
we can interpret time zero of consumption-shares as the weights that a hypothetical Bayesian
prior gives to probabilities in P . The absence of aggregate risk is needed to eliminates biases
on risk neutral probabilities due to aggregate consumption fluctuations.
3.2 The price probability class
In this Section, we introduce the price probability class. Members of price probabilities
are obtained by first, interpreting equilibrium prices of the Arrow securities as representing
relative likelihoods, and then using these relative likelihoods to construct probabilities via
normalization. Given the set of traders’ beliefs (P), different initial consumption-share dis-
tribution (C0), preferences (&) and normalization method (n) determine different probability
measures. We call the class of all such probability measures:
Definition 1. Price probabilities, Mq(P , C0,&, n), is the class of all the probabilities that
can be represented as normalized equilibrium prices of an economy that satisfies A1-A4.
In the rest of the paper, we focus on two normalization methods: NNL, in which time
zero prices are normalized at every horizon; and SNNL, in which next period prices are
normalized sequentially.
Definition 2. Normalized Normed Likelihood (NNL):
∀σt, pNNL(σt) =q(σt)∑σt q(σ
t); pNNL(σt|σt−1) : not defined.
NNL is the only probability measure that preserves the relative likelihoods of time zero
prices at every horizon (a new normalization is done at every horizon). In economic terms,
NNL is the cost of moving a unit of consumption in period/event σt in terms of time zero
consumption, divided by the cost of moving a unit of consumption from time zero to time
t for sure. NNL (as NML) cannot be calculated recursively and it does not define a unique
set of marginal probabilities (unless all traders have log utility).
10
Definition 3. Sequential Normalized Normed Likelihood (SNNL):
∀σt, pSNNL(σt) =t∏
τ=1
pSNNL(σt|σt−1) ; pSNNL(σt|σt−1) =q(σt|σt−1)∑σtq(σt|σt−1)
SNNL is the only probability measure that preserves the relative likelihoods of next
period prices. It is the cost of moving a unit of consumption from period/event σt−1 one
period ahead in state σt, divided by the cost of moving a unit of consumption for sure (short
run risk neutral probabilities). SNNL (as SNML) can be calculated recursively.
The following Lemma highlights that the relation between NNL and SNNL is the same as
the one between NML and SNML: SNNL’s period t predictions coincide with the conditional
probabilities that NNL gives to σt, assuming that t is the final horizon:
Lemma 1.
∀σt, pSNNL(σt|σt−1) =pNNL(σt)∑
σtpNNL(σt−1, σt)
.
4 Price probabilities in identical CRRA economies
If all traders have identical CRRA utility function, members of price probabilities can be
analytically characterized. This setting is flexible enough to show that PBD, NML, and
SNML belongs to price probabilities and to discuss relevant deviation from Bayes’ rule. In
what follows we use the following notation:
Definition 4. pNNLγ and pSNNLγ denote the NNL and the SNNL probabilities obtained from an
economy that satisfies A2-A4 and in which all traders have identical CRRA utility function
with parameter γ, ∀i ∈ I, ui(c) = c1−γ−11−γ .11
11As customary, we define ln 0 = −∞. Moreover, we use γ = 0 as a short notation for the limit equilibriumquantities of an iCRRA economy in which γ → 0 after the equilibrium quantities are calculated.
11
4.1 NNL in iCRRA-economies: pNNLγ
Proposition 1. Given beliefs set P, prior C0 and parameter γ, pNNLγ is given by:
∀σt, pNNLγ (σt) =
(∑i∈P p
i(σt)1γ ci0
)γ∑
σt∈St
(∑i∈I p
i(σt)1γ ci0
)γ . (4)
Equation 4 shows that pNNLγ coincides with the normalized 1γ
norm of the likelihoods of
members of P according to the measure C0. Because PBD and NML are the normalized L1
and the L∞ norm, respectively, they both belong to price probabilities.
Corollary 1. Given beliefs set P and prior C0,
i) γ = 1 (log) ⇒ ∀σt, pNNL1 (σt) = pBPD(σt);
ii) γ = 0 (linear) ⇒ ∀σt, pNNL0 (σt) = pNML(σt).
Proof. i) Notice that, if γ = 1, the denominator of Eq.4 equals one, and compare Eq. 4 with Eq.3.
ii) Notice that limγ→0
(∑i∈P p
i(σt)1γ ci0
)γ= ||pi(σt)||∞: the sup norm; and compare Eq.4 with Eq.1
Taking Bayes’ rule as a reference point, the effect of gamma on NNL is qualitatively as
follows. In a log-economy (γ = 1) NNL coincides with BPD and the interaction between prior
information (C0) and empirical evidence (σt) is regulated by Bayes’ rule. For γ = 0, NNL
coincides with NML, i.e. it is the optimal probability with respect to worst-case log-regret.
Given the explosive nature of the log-likelihood on sequences whose frequencies are close to
the boundary of the simplex, NML ignores the information of the prior (C0 plays no role),
and it assigns a relatively higher probability to those sequences whose frequency lies close
to the boundary of the simplex. For values of γ 6= 1, NNL represents a compromise between
the minimum log-regret approach behind the NML distribution and the Bayesian attempt
to make the most out of the information in the prior. Compared to a BPD with the same
Uniform prior on P , NNL with γ < (>)1 assigns more probability to those sequences whose
frequency lies close to the boundary (center) of the simplex and penalizes those sequences
whose frequency lies close to the center (boundary) of the simplex.
12
4.2 SNNL in iCRRA-economies: pNNLγ
Proposition 2. Given beliefs set P, prior C0 and parameter γ, pSNNLγ is given by:
∀σt, pSNNLγ (σt|σt−1) =
(∑i∈P p
i(σt|σt−1)1γ ciγ,t−1(σ)
)γ∑
σt∈S
(∑i∈P p
i(σt)1γ ciγ,t−1(σ)
)γ . (5)
With ciγ,t−1(σ) =by Eq.11 pi(σt−1)1γ ci0∑
i∈P pi(σt−1)
1γ ci0
.
By construction,∑
i∈I ciγ,t−1(σ) = 1, thus each ciγ,t−1(σ) can be interpreted as being
the weight attached to model pi by a prior distribution Cγ,t−1(σ). The gamma parameter
affects the evolution of this distribution. If gamma equals one (log), Ciγ,t−1(σ) coincides
with the Bayesian prior distribution obtained from C0 on σt−1 (compare with Equation 3).
Taking Bayes’ rule as our reference point, the effect of gamma on Cγ,t−1(σ) is qualitatively
as follows. If gamma is greater than one, Ciγ,t−1(σ) is less concentrated around the model
with the highest likelihood, it gives less weight to empirical evidence than Bayes’ (Ci1,t−1(σ)).
Conversely, if gamma is lower than one, Ciγ,t−1(σ) gives more weight to empirical evidence
than Bayes’. The normalizing component and the use of the 1γ
norm only mitigate this effect.
Proposition 3. Given beliefs set P and prior C0
i) γ > 1 ⇔ pSNNLγ underreact to empirical evidence;
ii) γ = 1 (log) ⇔ pSNNLγ coincides with Bayesian updating (pBPD);
iii) γ < 1⇔ pSNNLγ over-react to empirical evidence.
For intuition, suppose every probability in P is iid Bernoulli (∀i,∀t, pi(at) = i) and let ta
and tb be the number of a, b observations until period t-1, respectively. Substituting:
ciγ,t−1(σ) =pi(σt)
1γ ci0∑
j∈P pj(σt)
1γ cj0
=itaγ (1− i)
tbγ∑
j∈P jtaγ (1− j)
tbγ
. (6)
Equation 6 highlights the effect of gamma on the evolution of Ciγ,t−1(σ). If gamma is smaller
than one, the model overreacts to empirical evidence: e.g. γ = 12
is equivalent to update
13
using Bayes’ rule “counting every past realization twice”. If gamma is greater than one, the
model under-reacts to empirical evidence: e.g. γ = 2 is equivalent to update using Bayes’
rule “counting every past realization as half”.12
The economic intuition is the following (see Massari (2015b)): ciγ,t−1(σ) is trader i’s
consumption-share after trading for t-1 periods on path σ. Higher values of γ imply lower
risk tolerance, thus more conservative investment strategies. Therefore, if γ > (<)1, the
traders with incorrect beliefs lose consumption-shares to the traders with correct beliefs at
a slower (faster) rate than they would if γ = 1. Thus, the effect of incorrect beliefs on next
period equilibrium prices takes more (less) periods to disappear.
It easy to verify that SNML belongs to price probabilities.
Corollary 2. Given beliefs set P and prior, C0, γ=0 (linear)⇒ ∀σt, pSNNL0 (σt) = pSNML(σt).
Proof. ∀σt−1, pSNNL0 (σt|σt−1) =Lem.1 pNNL0 (σt)∑σtpNNL0 (σt−1,σt)
=Cor.1 pNML(σt)∑σtpNML(σt−1,σt)
:=Eq.2 pSNML(σt|σt−1).
5 Properties of price probabilities
In this Section, we introduce two characterizing properties of Bayesian updating (on iid se-
quences) and discuss whether these properties are satisfied by members of price probabilities.
We demonstrate that price probabilities in iCRRA economies satisfy both properties if and
only if all traders have log utility (that is if they coincide with BPD).
Definition 5. A statistical forecasting systems, p, is prequential if:
∀σt−1,∑σt∈St
p(σt ∩ σt−1) = p((∪σt ∈ St
)∩ σt−1
)= p(σt−1)
12Ciγ,t−1(σ) is a special case of the “Generalized Bayes’ rule” introduced by Vovk (1990). The gammaparameter is often called the learning rate as it determines the convergence rate of the posterior. The choiceof this parameter plays a fundamental role in both the HEDGE algorithm (Freund and Schapire (1997))and the Safe Bayesian approach (Grunwald (2012)). SNNL differs from these algorithms because instead ofrelying on the generalized prior it directly depends on the sequential normalization of the 1
γ norm.
14
Prequentiality (see Grunwald (2007) for details) coincides with Kolmogorov’s 3rd axiom
(additivity). An agent with non-prequential beliefs believes that the sum of the probabil-
ities of disjoint events differs from the probability of their union. In the economic theory
literature, non prequential beliefs are called time inconsistent because a trader with non-
prequential beliefs can be put into arbitrage (see example and discussion in Section 6.2).
In the Behavioral literature, this type of violation is well documented and known as the
conjunction fallacy (Kahneman (2011)).
Definition 6. A statistical forecasting systems, p, is exchangeable if whenever two partial
histories σt, σt share the same frequency, p(σt) = p(σt).
Exchangeability captures the idea that the probability of a sequence of events does not
depend on the order of the realizations. This assumption is deeply connected to Bayes’
rule: De Finetti (1931)’s theorem implies that a forecasting system is Bayesian if and only
if is exchangeable (or conditionally exchangeable). Exchangeability is an appealing criterion
whenever all the models in P are iid. For example, we expect a rational agent facing repeated
iid tosses from a coin with an unknown bias to attach the same probability to the sequences
of realizations {H,H, T} and {T,H,H}. In terms of conditional forecasts, an agent who
attaches less probability to {H,H, T} than {T,H,H} will appear as either over-weighting
or under-weighting (relatively to a Bayesian) the first two realizations.
Proposition 4.
• (a) If γ = 1 (log) and all the probabilities in P are iid, then pNNLγ , pSNNLγ and pPBD
coincide and are prequential and exchangeable.
• (b) If γ 6= 1 and all the probabilities in P are iid:
i) pNNLγ is exchangeable but not prequential;
ii) pSNNLγ is prequential but not exchangeable.
The following example illustrates Proposition 4. It shows how different values of gamma
affect NNL and SNNL on sequences of length 3 (unconditional probabilities on the left-hand
tree, conditional probabilities on the right-hand tree).
15
5.1 Example
Consider the NNLs and SNNLs obtained from two iCRRA-economies E0, E1 with two states
(Left,Right), S = {L,R}; two traders with iid beliefs p1(a) = 13
= p2(b), (P = {p1, p2});
initial consumption: C0 = [12
12]′ and CRRA parameters γ0 = 0 and γ1 = 1, respectively.
Log-economy: γ = 1, q1(σt) = βt(∑
i∈I pi(σt)ci0
).
Equilibrium prices coincide with the discounted probabilities of a Bayesian with prior C0.
Because the discount factor is independent of time, different normalization’s choices do not
affect risk neutral probabilities. Thus, all members of Mq(P , C0, γ = 1, n) coincide with
BPD which is prequential (e.g. p({R,R,R}) + p({R,R,L}) = p({R,R})) and exchangeable
(e.g. p({L,L,R}) = p({L,R, L}) = p({R,L, L})).LOG
pNNL1 (σt),pSNNL1 (σt)
12
518
16
19
29
19
19
12
29
19
19
518
19
16
LOGpNNL(σt|σt−1),pSNNL(σt|σt−1)
12
59
35
25
49
12
12
12
49
12
12
59
25
35
Limit linear economy: γ → 0, q0(σt) = βt maxi∈P {pi(σt)}.
• Normalized Normed Likelihood(pNNL0 (σt) = maxi p
i(σt)∑σt maxi pi(σt)
), is exchangeable (e.g. pNNL0 ({L,L,R}) =
pNNL0 ({L,R, L}) = pNNL0 ({R,L, L})) but not prequential (e.g. pNNL0 (R,R,R)+pNNL0 (R,R,L) =
3106= 1
3= pNNL0 (R,R). Comparing pNNL0 whit pBPD we see that pNNL0 attaches more
probability to extreme sequences ({L,L, L} and {R,R,R}) than pBPD does.
LINpNNL0 (σt)
12
13
15
110
16
110
110
12
16
110
110
13
110
15
LINpNNL0 (σt|σt−1)
Not defined because not prequential
• Sequential Normalized Normed Likelihood(pSNNL0 (σt|σt−1) =
pNNL0 (σt)∑σtpNNL0 (σt−1,σt)
), is pre-
quential (e.g. pSNNL0 ({R,R,R}) + pSNNL0 ({R,R,L}) = pSNNL0 ({R,R})) but not ex-
changeable (e.g. pSNNL0 ({L,L,R}) = 196= 1
12= pSNNL0 ({L,R, L}) = pSNNL0 ({R,L, L})).
16
The tree on the right shows that pSNNL0 “overreacts” to empirical evidence. Unlike
pBPD, a single observation in favor of p1 (p2) suffices to make the conditional probabil-
ity of pSNNL0 coincide with p1(p2) (e.g. pSNNL0 (L|L) = 23
= p2(L) and pSNNL0 (R|R) =
23
= p1(R)).
LINpSNNL(σt)
12
13
29
19
16
112
112
12
16
112
112
13
19
29
LINpSNNL(σt|σt−1)
12
23
23
13
13
12
12
12
13
12
12
23
13
23
6 Asymptotic performance of price probabilities
6.1 The criterion
In this Section, we introduce the efficiency criterion we use to characterize price probabil-
ities’ performance. Following an established tradition across fields, our criterion is based
on asymptotic likelihood comparisons. In every sequence, we compare the likelihood of a
statistical prediction strategy with beliefs set P , against the likelihood of BPD with regular
prior13 on the same support. The comparison is done in every sequence because the real
probability is unknown in most practical cases in which we want to use a prediction strategy
(if we knew the real model we would have no need to find the best predictor). The bench-
mark is chosen because Bayesian updating is wildly known, applied and appreciated. 14 The
criterion is asymptotic to eliminate the small sample effect of the prior and because small
sample criteria can be misleading.15
Definition 7. Let pBPD(σt) be the likelihood of a BPD with regular prior on P,
13A prior is regular if it attaches positive mass on every element of the prior support.14Because (if |P| <∞) BDP has finite worst-case log-regret, a likelihood comparison against BPD is also
a way to verify if a statistical forecasting system possess this fundamental property.15E.g., Massari (2013) shows that, given two statistical forecasting systems {pa}, {pb}, it is not true that
if pa’s next period predictions are infinitely often more accurate than pb and never less accurate, then pa’spredictions are more accurate on long sequences, in terms of likelihood.
17
• a statistical forecasting systems p with beliefs set P is universal-efficient if
∀σ ∈ S∞, lnpBPD(σt)
p(σt)� 1; 16
• a statistical forecasting systems p with beliefs set P is super-efficient if
∀P ∈ P , limt→∞
lnpBPD(σt)
p(σt)�P -a.s. 1, and
6 ∃P : limt→∞ ln pBPD(σt)p(σt)
→P -a.s. +∞;
∃P : limt→∞ ln pBPD(σt)p(σt)
→P -a.s. −∞;
• a statistical forecasting systems p with beliefs set P is sub-efficient if
∀P ∈ P , limt→∞
lnpBPD(σt)
p(σt)�P -a.s. 1, and
∃P : limt→∞ ln pBPD(σt)p(σt)
→P -a.s. +∞;
6 ∃P : limt→∞ ln pBPD(σt)p(σt)
→P -a.s. −∞;
Intuitively, p is universal-efficient if, in every sequence, it is as accurate as the prediction
obtained using Bayes’ rule with the same prior support. p is super-efficient if it does as well
as Bayes’ in every sequence and there are probabilities P for which it outperforms it P -a.s.;
that is, if it guarantees to do as well as using Bayes’ rule and there are cases - when the
model is misspecified - in which it does better. p is sub-efficient if there are no sequences in
which it is better Bayes’ rule, and there are cases of misspecification in which it is worse.
6.2 Asymptotic performance of NNL
Theorem 1. If the cardinality of P is finite, ∀γ ∈ [0,∞), pNNLγ is universal-efficient.17
Theorem 1 tells us that, although not prequential, NNL is as good as a PBD in terms
of likelihood. If we are only concerned about accuracy, there is no reason to consider pre-
quentiality to be a fundamental property of rational forecasts. However, non-prequential
models are likely to be undesirable in certain economic settings because an agent with non-
prequential beliefs can be put in situations of dynamic arbitrage.
16The notation f(x) � g(x) abbreviates lim sup f(x)g(x) < +∞ and lim inf f(x)g(x) > 0.
17More generally, NNL is universal-efficient in any economy that satisfies A1-A4 with |P| <∞.
18
For example, a risk neutral agent who does not discount the future and whose believes
are pML0 (R,L) = 1
3, pNML
0 (R,L, L) = 210
and pNML0 (R,L,R) = 1
10(as in Example 5.1), is at
time zero indifferent between:
• 13$ and a lottery, L1, that pays 1$ if {R,L} realizes, 0$ otherwise;
• 210
$ and a lottery, L2, that pays 1$ if {R,L, L} realizes 0$ otherwise;
• 110
$ and a lottery, L3, that pays 1$ if {R,L,R} realizes, 0$ otherwise.
Selling to him L1 for 13$ and buying from him L2 and L3 for a total of 3
10$ constitute an
arbitrage: if {R,L} does not realize, we make a profit 13- 3
10> 0. If {R,L} does realize, we
make the same profit because we can use the market to pay the dollar we lose in t = 2 with
the dollar we win for sure in t=3 (because either {R,L,R} or {R,L, L} will happen for sure).
Importantly, this arbitrage opportunity can be generated only if NNL are used in markets
which allow for both time-zero and sequential trading. An arbitrage can be constructed
against a trader with NNL beliefs only because his beliefs correspond to a state of mind
in which trade can only occur at time 0. If he knew his final horizon t and he was given
the possibility to trade sequentially, then he could use his NNL at t to construct a set of
prequential conditional probabilities via backward induction to avoid arbitrages.
This procedure is equivalent to the “massaging” process described in Savage (1954) and
Binmore (2008) as sufficient to deduce a Bayesian prior from the set of subjective, relative
likelihoods of an agent. Incidentally, Example 5.1 can be used to demonstrate that “mas-
saging” is not sufficient to imply Bayes’ rule. “Massaging” pNNL0 forward, we obtain SNNL
which is prequential, but not Bayesian (because not-exchangeable). Fixing the final horizon
and “massaging” pNNL0 backward, the resulting measure is prequential, exchangeable but
still not Bayesian.18
18 Lemma: A prior that makes pNNL0 (σ3) in example 5.1 consistent with Bayes’ rule does not exist.Proof: By symmetry of pNNL0 (σ3), if this prior existed it had to give the same weight to model p1 and p2.Unconditional probabilities obtained via Bayes’ rule from this prior coincide with pNNL1 (σ3) 6= pNNL0 (σ3), acontradiction.
19
The possibility of making this type of intertemporal arbitrages should not be too surpris-
ing. The price that both parties are willing to pay for an interest rate swap that exchanges
a long-term interest rate with a sequence of short-term interest rates can be interpreted as
representing this type of arbitrage.
6.3 Asymptotic performance of SNNL
Theorem 2. If the cardinality of P is finite,
i) γ > 1⇒ pSNNLγ is super-efficient;
ii) γ = 1⇒ pSNNLγ is universal-efficient;
iii) γ < 1⇒ pSNNLγ is sub-efficient.
Theorem 1 tells us that a statistical forecasting system that is not exchangeable is as good
as Bayes’ rule whenever the model is correctly specified (SNNL is, at least, sub-efficient).
Furthermore, it shows that there are cases in which a non-exchangeable model that underre-
acts to empirical evidence can significantly outperform Bayes’ rule. If we are only concerned
about accuracy, there is no reason to consider exchangeability to be a fundamental property
of rational forecasts. Moreover, because underreacting rules never underperform and can
significantly improve upon Bayes’, a Pascal (1668) wager’s argument suggests that under-
reacting rules should be pragmatically preferred to Bayes’ unless we are absolutely certain
that our model is correctly specified.19
The relation between NNL and SNNL is qualitatively the same as the one between NML
and SNML. Each next period forecast of the SNNL corresponds to the last period marginal
distribution of the corresponding NNL probability. Thus, SNNL can be thought of as a
compromise to make NNL recursive. This interpretation makes the super-efficiency part of
Theorem 2 even more surprising. It shows that a forecaster can perform significantly better
by precomitting to use a recursive method even when he knows the final horizon of his
prediction task. Because a recursive method does not use the length of the sequence he is
19If the model is correctly specified, the loss in the log-likelihood ratio we incur by under-reacting is finite.If the model is misspecified, the gain in the log-likelihood ratio we might get is infinite. Thus, an arbitrarilysmall risk of misspecification is enough to recommend an underreacting rule over Bayes’.
20
forecasting as an input, this result illustrates a cases in which ignoring a relevant information
increases prediction accuracy.
The following example highlights the effect of gamma on SNNL. The two cases illustrate
that underreaction (γ > 1) can significantly outperform while never underperforms PBD
whereas overreaction (γ < 1) can significantly underperform while never outperforms PBD.
Example 1: consider the SNNLs obtained from three iCRRA-economies E0, E1, E2 with
two states (S = {a, b}), two traders with iid beliefs p1(a) = 13
= p2(b), (P = {p1, p2}); ini-
tial consumption: C0 = [12
12]′ and CRRA parameters γ0 = 0, γ1 = 1 and γ2 = 2, respectively.
Case a: The real probability, P , is degenerate. It gives probability 1 to the alternating
sequence {a, b, a, ...}. Because both models are equally (in)accurate, the best prediction is
to give equal weight to both models in every period (as C0 does). Thus, this is the most
favorable case for forecasting systems that underreact to empirical evidence. By Eq.5:
pSNNL0 (at|σt−1) =pNML(σt)∑
σtpNML(σt−1, σt)
=
12 if t even
23 if t odd
pSNNL1 (at|σt−1) =∑i∈I
pi(a)pi(σt−1)ci0∑i∈I p
i (σt−1) ci0=
12 if t even
59 if t odd
pSNNL2 (at|σt−1) =
(∑i∈I p
i(a)12
pi(σt−1)12 ci0∑
i∈I pi(σt−1)
12 ci0
)2
∑σt
(∑i∈I p
i(σt)12
pi(σt−1)12 ci0∑
i∈I pi(σt−1)
12 ci0
)2 =
12 if t even
917 if t odd
Thus, on {a, b, a, ...},∀α ∈ (0, 1),
pSNNL0 (σt)pBPDα (σt)
=( 1
2 )t2 ( 1
3 )t2
α( 13 )
t2 ( 2
3 )t2 ( 2
3 )It=odd+(1−α)( 1
3 )t2 ( 1
3 )t2 ( 1
3 )It=odd
→ 0
pSNNL1 (σt)pBPDα (σt)
=( 1
2 )t2 ( 4
9 )t2
α( 13 )
t2 ( 2
3 )t2 ( 2
3 )It=odd+(1−α)( 1
3 )t2 ( 1
3 )t2 ( 1
3 )It=odd
� 1
pSNNL2 (σt)pBPDα (σt)
=( 1
2 )t2 ( 9
17 )t2
α( 13 )
t2 ( 2
3 )t2 ( 2
3 )It=odd+(1−α)( 1
3 )t2 ( 1
3 )t2 ( 1
3 )It=odd
→ +∞
Case a shows that, by underreacting, pSNNL2 produce predictions that are closer to the
empirical frequency than pSNNL1 and pSNNL0 thus more accurate.
Case b: The real probability, P , is degenerate. It gives probability 1 to the sequence
21
{a, a, a, ...}. Because p2 is clearly the best model, case b is the most favorable for forecasting
systems that overreact to empirical evidence.
On {a, a, a, ...},∀α ∈ (0, 1)
pSNNL0 (σt)pBPDα (σt)
=12 ( 2
3 )t−1
α( 23 )t+(1−α)( 1
3 )t � 1
pSNNL1 (σt)pBPDα (σt)
=12 ( 2
3 )t+ 1
2 ( 13 )t
α( 23 )t+(1−α)( 1
3 )t � 1
pSNNL2 (σt)pBPDα (σt)
=
(12 ( 2
3 )τγ + 1
2 ( 13 )
τγ)γ
α( 23 )t+(1−α)( 1
3 )t ∗ e−
∑tτ=1 ln(q(a|στ−1)+q(b|στ−1)) � 1
.20
Case b shows that, although pSNNL0 immediately individuate the best model, the other
models do converge to the best model fast enough to not compromise their asymptotic like-
lihood performance: pSNNL2 does not perform significantly worse that pSNNL0 .
These examples suggest that non-concentration of the Bayesian posterior plays a special
role in determining the (sub)superefficient condition. This is indeed the case:
Proposition 5. For every regular prior C0 on a finite support P and γ ∈ (0,∞) \ 1,
i) lim ln pSNNL(σt)pBPD(σt)
= ±∞ in every path in which pBPD’s posterior does not concentrate;
ii) lim ln pSNNL(σt)pBPD(σt)
� 1 in every path in which pBPD’s posterior concentrates exponentially
fast.
Proposition 5 generalizes the finding of Massari (2015a) about the inefficiency of the
Follow the Leader Strategy (i.e. pSNNL with γ = 0). It shows that, in those cases in which
the posterior does not concentrate (a condition weaker than the “weak leader” condition
that determine the inefficiency of the FLS), any amount of overreaction is detrimental while
any amount of underreaction is desirable. Known asymptotic results in Bayesian statis-
tics21 allow to use Proposition 5 to easily recognize the probabilities that determine the
(sub)superefficiency condition.22 For example, Proposition 5 can be used to analyze:
20By Proposition 5pSNNL2 (σt)pBPDα (σt)
=
(12 ( 2
3 )τγ + 1
2 ( 13 )
τγ)γ
α( 23 )t+(1−α)( 1
3 )t ∗ e−
∑tτ=1 ln(q(a|στ−1)+q(b|στ−1)) � 1
21If |P| <∞, the Bayesian posterior does not concentrate if and only if there is more than one model withthe same expected log-likelihood; otherwise, it concentrates exponentially fast in most standard setting (e.g.if members of P are either iid or conditionally iid).
22Proposition 5 enormously simplifies this task. Even when traders’ beliefs and the real measure are iid,non-exchangeability of SNNL translates into path dependent dynamic of the posterior distribution of SNNL.
22
Case c: The real probability is iid: ∀t, P (at) = 12. Because p1 and p2 are equally (in)
accurate, it is easy to show that the Bayesian posterior does not concentrates (e.g. Massari
(2013)). Proposition 5 implies
pSNNL0 (σt)
PBPD(σt)→P -a.s. 0
pSNNL1 (σt)
PBPD(σt)�P -a.s. 1
pSNNL2 (σt)
PBPD(σt)→P -a.s. +∞
; the intuition goes as follows.
• pSNNL1 (σt|σt−1) coincides, in every period, with PBPDα=.5 (σt|σt−1); hence, it does as well
as a Bayesian with regular prior on P ;
• pSNNL2 (σt|σt−1) smoothly oscillates between p1 and p2, but it spends more time “close
to the middle” than PBPDα=.5 (σt|σt−1) does because it underreacts to empirical evidence.
Because the real distribution lies between p1 and p2, spending “more time close to the
middle” makes pSNNL2 (σt|σt−1)’s forecasts more accurate than Bayes’;
• pSNNL0 (σt|σt−1) changes his forecasts discontinuously every time the model that per-
formed best in the past changes. Thus, it spends less time “close to the middle” than
PBPD does. Because the real distribution is between p1 and p2 spending “less time
close to the middle” makes pSNNL0 (σt|σt−1)’s forecasts less accurate than Bayes’.
7 Conclusion
This paper uses the standard machinery of a dynamic general equilibrium model to generate
a rich class of probabilities and to discuss their properties. These rules are consistent with
known behavioral biases such as (over)underreaction and time inconsistency.
If the prior support contains a unique most accurate model, all members of price prob-
abilities are mutually absolutely continuous with the probabilities obtained via Bayes rule
from a regular prior on the same support. However, if the prior support contains more than
one most accurate model, a mild form of underreaction significantly improves the accuracy
of the forecasts and outperform the Bayesian standard. Our result suggests that underreac-
23
tion not only is rational (because underreacting models are as good as Bayesian in terms of
likelihood) but constitute a pragmatic valid alternative to Bayes’ rule.
7.1 Future work
Our work is still preliminary. To constitute a real alternative to Bayes’ rule we should
extend our analysis beyond the finite probability support case and study their performance
in less dramatic cases of misspecification. Regarding this second point, Grunwald (2012)
findings on the hyper-compression that Safe Bayesian can deliver when the misspecified
models class is not convex (e.g. the family of linear regression with k regressors, when the
real data generating process is heteroskedastic) suggests that the super-efficiency of SNNL
might also hold in this relatively common case of misspecification. Finally, we should modify
SNNL to searching for its optimal learning rate (gamma) on-line. It is easy to show that if
the posterior does not concentrate, larger values of gamma deliver more accurate forecasts.
However, this improvement in accuracy comes with a cost. Although, for every 1 < γ <∞
the asymptotic log-likelihood ratio between pBPD and pSNNL is bounded above, this ratio
increases monotonically in gamma when the model is correctly specified (slower learning
rate implies slower convergence rate). If the model is misspecified and the posterior does
converge, the relative position of the projection of P on P determines whether larger values
of gamma improve or deteriorate SNNL’s accuracy. With hindsight, we would like to use
gamma small when past data clearly suggests the best model and large otherwise. We would
like to do this on-line. This line of reasoning is consistent with the Safe Bayesian approach
(Grunwald (2012)) and the Flip-Flop algorithm (De Rooij et al. (2014)). The main difficulty
is implementing this intuition on the equations governing SNNL.
A Appendix
The notation f(x) � g(x), stays for lim sup f(x)g(x) < +∞ and lim inf f(x)
g(x) > 0.
24
Lemma 2. In an economy that satisfies A1-A4, equilibrium prices are given by:
q(σt) =βt∑
i∈P pi(σt) 1
ui(ci0)′∑j∈I
1
uj(cjt (σ))′
(7)
Proof. The Lagrangian problem associated to each trader’s maximization problem is
Li = Epi
∞∑t=0
βtui(cit(σ)) + λi
(∑t=0
∑σt∈St
q(σt)(cit(σ)− eit(σ))
)).
By equating the derivatives of this Lagrangian to zero we get, ∀σ, ∀t,
∂Li∂cit(σ)
= 0⇒ βtpi(σt)ui(cit(σ))′ = λiq(σt) (8)
Letting q0 = 1 (the price of one unit of consumption at t=0 equals one) we find that λi = ui(ci0)′,the result follows rearranging summing over traders and rearranging.
Proof of Proposition 1 and 2:
Proof. Substituting cit(σ)−γ for ui(σt)′ in Equation 8, remembering that λi = ui(ci0):
βtpj(σt)cj(σt)−γ = (cj0)−γq(σt) (9)
taking ratio of traders i, j’ FOCs:βtpi(σt)cit(σ)−γ
βtpj(σt)cj(σt)−γ=
(ci0)−γq(σt)
(cj0)−γq(σt); solving for ci(σt):
cit(σ) =
(pi(σt)
pj(σt)
) 1γ ci0
cj0cj(σt). (10)
Substituting Equation 10 in the market-clearing condition (which holds with equality because of
monotonicity of ui): 1 =∑
i∈I cit(σ) = cj(σt)
∑i∈I p
i(σt)1γ ci0
pj(σt)1γ cj0
; solving for cj(σt):
cj(σt) =pj(σt)
1γ cj0∑
i∈I pi(σt)
1γ ci0
. (11)
Substituting cj(σt) in Equation 9 and rearranging, we obtain
q(σt) = βt
(∑i∈P
pi(σt)1γ ci0
)γ(12)
The result follows substituting Equations 12 in the definition of NNL and SNNL respectively.
Lemma 3. In an iCRRA-economy that satisfies A1-A4:
∀σ ∈ S∞,∀t ∈ {1, ...,+∞}, γ′′ ≥ 1 ≥ γ′ ⇒∑
σt∈S∞ qγ′(σt|σt−1)
β≥ 1 ≥
∑σt∈S qγ′′(σt|σt−1)
β
25
With equality if and only if γ = 1 or the consumption-share distribution is degenerate.
Proof. Using Equation 12 and the definition of q(σt|σt−1),
∑σt∈S qγ(σt|σt−1)
β=∑σt∈S
(∑i∈I p
i(σt−1, σt)1γ ci0
)γ(∑i∈I p
i (σt−1)1γ ci0
)γ=∑σt∈S
(∑i∈I
pi(σt|σt−1)1γ
pi(σt−1)1γ ci0∑
i∈I pi (σt−1)
1γ ci0
)γ
=∑σt∈S
(∑i∈I
pi(σt|σt−1)1γ ciγ,t−1(σ)
)γBy Eq. 11
=∑σt∈S
f
(∑i∈I
pi(σt|σt−1)1γ ciγ,t−1(σ)
)
Note that∑
i∈I ciγ,t−1(σ) = 1, and f(.) is strictly concave ⇔ γ < 1, linear ⇔ γ = 1 and strictly
convex ⇔ γ > 1. Let γ > 1, by Jensen inequality:
∑σt∈S
f
(∑i∈I
pi(σt|σt−1)1γ ciγ,t−1(σ)
)≤∑σt∈S
∑i∈I
ciγ,t−1(σ)f(pi(σt|σt−1)1γ ) Equality iff ∃i : ciγ,t−1(σ) = 1
=∑i∈I
ciγ,t−1(σ)∑σt∈S
pi(σt|σt−1)
= 1 Because ∀i,∑σt∈S
pi(σt|σt−1) = 1
The cases γ = 1, γ < 1 can be proven using the same logic.
Proof of Lemma 1
Proof.
pSNNL(σt|σt−1) =q(σt|σt−1)∑σtq(σt|σt−1)
= q(σt|σt−1) ∗ q(σt−1)∑σt q(σ
t)∗∑
σt q(σt)
q(σt−1)∗ 1∑
σtq(σt|σt−1)
=q(σt)∑σt q(σ
t)∗ 1∑
σtq(σt−1,σt)∑σt q(σ
t)
=pNNL(σt)∑
σtpNNL(σt−1, σt)
Proof of Proposition 3
26
Proof. By Eq.11, the consumption of the trader with maximum likelihood on σt, i(σt), is given by
cit(σ) =pi(σ
t)(σt)1γ ci0∑
j∈I pj(σt)
1γ cj0
. Claims i, ii, iii follow by noticing that pSNNLγ=1 = pBPD and that to higher
gamma corresponds lower weight to cit:∂ct (i(σ))
∂γ = −ci0pi(σt)1γ
∑j 6=i(σt) c
j0pj(σt)
1γ ln
pi(σt)
pj(σt)
γ2
(∑j∈I p
j(σt)1γ cj0
)2 < 0.
Proof of Proposition 4:
Proof. (a): In a log economy pNNL = pSNNL = pBPD which is prequential and exchangeable.(b) : i) Exchangeable (by construction, because the denominator in Eq. 7 is constant on everyhorizon); not prequential: by contradiction, assume H0 : pNNL is prequential forall γ ∈ [0,∞].
∀σt, q(σt)
q(σt−1)=Eq. condition q(σt|σt−1)
⇔ ∀σt, pNNL(σt)
pNNL(σt−1)=
q(σt)∑σt q(σ
t)
q(σt−1)∑σt−1 q(σt−1)
= q(σt|σt−1)
∑σt−1 q(σt−1)∑σt q(σ
t)
⇔ ∀σt,∑σt
pNNL(σt−1, σt)
pNNL(σt−1)=∑σt
q(σt|σt−1)
∑σt−1 q(σt−1)∑σt q(σ
t)
⇔if H0 is true ∀σt, 1 =∑σt
q(σt|σt−1)
∑σt−1 q(σt−1)∑σt q(σ
t)
⇔ ∀σt,∑
σt q(σt)∑
σt−1 q(σt−1)=∑σt
q(σt|σt−1)
⇔ ∀σt−1, σt−1,∑σt
q(σt|σt−1) =
∑σt q(σt)∑
σt−1 q(σt−1)=∑σt
q(σt|σt−1)
⇔ ∀σt−1, σt−1,∑σt
∑i∈P
pi(σt|σt−1)1γ
pi(σt−1)1γ∑
j pj(σt−1)
1γ
γ
=∑σt
(∑i∈P
pi(σt|σt−1)1γ
pi(σt−1)1γ∑
i∈P pi(σt−1)
1γ
)γ.
Which is true iff gamma=1 (i.e. all traders have log utility). A contradiction of H0.
ii) Prequential, by construction (because the measure is constructed recursively in a forwardfashion); not exchangeable: by De Finetti (1931)’s theorem, a measure on infinite sequences isexchangeable if and only if it can be represented as a mixture of iid measures if and only if it existsa prior such that it coincide with BPD. The (sub)super-efficiency of SNNL proved in Theorem 2implies that there exists no such prior if γ 6= 1, thus SNNL is not exchangeable.
Lemma 4. In an economy that satisfies A1-A4, ∀σ ∈ S∞ :∑
i∈I1
u′i(cit(σ))
� 1.
Proof.
27
• ∀σ ∈ S∞, lim sup∑
i∈I1
u′i(ci(σ))
≤ max[c1,...,cI ]
∑i∈I
1u′i(c
i)< |I|maxi
1u′i(1)
< ∞ because mar-
ket clearing implies maxi ci = 1; and A1 implies ∀i,maxc≤1
1ui(c)′
= 1ui(1)′ <∞.
• ∀σ ∈ S∞, lim inf∑
i∈I1
u′i(cit(σ))
≥ min[c1,...,cI ]
∑i∈I
1u′i(c
i)> 0 because
∑i∈I
1u′i(c
i)= 0 if and
only if ∀i, u′i(ci) =∞⇔A1 ∀i, ci = 0, which violates market-clearing(∀t,∑
i∈I ci =
∑i∈I e
i =A3 1).
Proof of Theorem 1:
Proof. Substituting Equation 7 in the definition of NNL.
pNNL(σt) =q(σt)∑σt q(σ
t)=
βt∑i∈I p
i(σt)ci0∑i∈I
1u′i(c(σt))
βt∑
σt∈St
(∑i∈I p
i(σt)ci0∑i∈I
1u′i(c(σt))
)
Let lim inf∑
i∈I1
u′i(c(σt))
= a and lim sup∑
i∈I1
u′i(c(σt))
= b; by Lem.4; 0 < a ≤ b <∞, thus
pNNL(σt) ∈ [
βt∑i∈I p
i(σt)ci0b
βt∑
σ∈St(∑
i∈I pi(σt)ci0a
) , βt∑i∈I p
i(σt)ci0a
βt∑
σ∈St(∑
i∈I pi(σt)ci0b
) ]
⇒ pNNL(σt) ∈ [a
b
∑i∈I
pi(σt)ci0,b
a
∑i∈I
pi(σt)ci0]
⇔ lnpBPD(σt)
pNNL(σt)� 1.
Proof of Theorem 2:
Proof. Let rewrite ln pBPD(σt)pSNNL(σt)
as follows:
lnpBPD(σt)
pSNNL(σt)=
t∑τ=1
∑στ
Iστ lnpBPD(στ |στ−1)
q(στ |στ−1)∑σtq(στ |στ−1)
= lnβtpBPD(σt)
q(σt)+
t∑τ=1
ln
(∑στq(στ |στ−1)
β
)
The result follows from these two claims:
• Claim 1: ln βtpBPD(σt)q(σt) � 1:
By Eq.7, q(σt) = βt∑
i∈I pi(σt)
1
ui(ci0)′∑j∈I
1
u′j(cj(σt))
�by Lem.4 βt∑
i∈I pi(σt) 1
|I| � βtpBPD(σt).
• Claim 2: ∃P : γ < (>)1⇒∑t
τ=1 ln(∑
στq(στ |στ−1)
β
)→ +(−)∞
28
Note that:i)∑t
τ=1 ln(∑
στq(στ |στ−1)
β
)→ +(−)∞ iff consumption shares Ct,γ do not concentrate on a
unique trader fast enough.
Proof : By Lem.3 (Jensen inequality) γ < (>)1⇔ ln(∑
στq(στ |στ−1)
β
)≥ (≤)0, with equality iff the
consumption share is degenerate. Thus, given γ, all terms of the sum have the same sign.
ii) pBPD(σt) posterior Ct,γ=1 does not concentrate on a unique model iff, ∀γ ∈ (0,+∞), Ct,γdoes not concentrate on a unique trader. Proof :23
Ct,γ=1 does not concentrate
⇔ ∃η > 0,∃i, j ∈ P : lim suppi(σt)∑i∈P p
i(σt)> η and lim sup
pj(σt)∑i∈P p
i(σt)> η
⇔ ∃ηγ > 0 : lim suppi(σt)
1γ∑
i∈P piσt
1γ
= ciγ,t(σ) > ηγ and lim suppi(σt)
1γ∑
i∈P piσt
1γ
= cjγ,t(σ) > ηγ
⇔ Ct,γ does not concentrate.
iii) Because P has finitely many models, it exists a sequence such that the two most accuratemodels in P have comparable likelihood (it can be constructed recursively by choosing the nextrealization to favor the model with the lower likelihood). Alternatively, a non degenerate measurethat satisfies this condition can be constructed using Chernoff’s bond (Cover and Thomas (2012)).
Proof of Proposition 5:
Proof. As in the proof of Th.2: ln pBPD(σt)pSNNL(σt)
= ln βtpBPD(σt)q(σt) +
∑tτ=1 ln
(∑στq(στ |στ−1)
β
).
By Claim 1 in the proof of Th.2, ln βtpBPD(σt)q(σt) � 1. For the second term we have:
• Part i) mimic the step of Th. 2, except that non concentration is now assumed.
• Part ii) follows because if the concentration rate is exponential a Taylor expansion ensures
that ∃0 < a < b < 1 : ln(∑
στq(στ |στ−1)
β
)∈ [bτ ; aτ ], so that
∑tτ=1 ln
(∑στq(στ |στ−1)
β
)� 1.
For a practical example of the details, consider Example 1:
e−∑tτ=1 ln
(q(a|στ−1)+q(b|στ−1)
)= EXP
−t∑
τ=1
ln
(12
(23
) τγ + 1
2
(13
) τγ
)γ+
(12
(13
) 1γ(
23
) τ−1γ + 1
2
(23
) 1γ(
13
) τ−1γ
)γ(
12
(23
) τ−1γ + 1
2
(13
) τ−1γ
)γ
= EXP
−t∑
τ=1
ln
((23
) 1γ +
(13
) 1γ(
12
) τ−1γ
)γ+
((13
) 1γ +
(23
) 1γ(
12
) τ−1γ
)γ(
1 +(
12
) τ−1γ
)γ
23The proof slightly differs for γ = 0 because we need the stronger condition that the model with the
highest likelihood changes infinitely to ensure∑tτ=1 ln
(∑στq(στ |στ−1)
β
)→ ±∞ (See Massari (2015a)).
29
Taylor expanding the two terms on the numerator around 23
1γ and 1
3
1γ and the term in the
denumerator around 1, respectively, it follows that ∃η ∈ (0, 12) :
EXP
−t∑
τ=1
ln
((23
) 1γ +
(13
) 1γ(
12
) τ−1γ
)γ+
((13
) 1γ +
(23
) 1γ(
12
) τ−1γ
)γ(
1 +(
12
) τ−1γ
)γ ∈
[e−∑tτ=1
(12+η)τ
; e−∑tτ=1
(12−η
)τ ]� 1.
References
Araujo, A. and Sandroni, A. (1999). On the convergence to homogeneous expectations when markets arecomplete. Econometrica, 67(3):663–672.
Binmore, K. (2008). Rational decisions. Princeton University Press.
Blume, L. and Easley, D. (1993). Economic natural selection. Economics Letters, 42(2):281–289.
Blume, L. and Easley, D. (2006). If you’re so smart, why aren’t you rich? belief selection in complete andincomplete markets. Econometrica, 74(4):929–966.
Blume, L. and Easley, D. (2009). The market organism: long-run survival in markets with heterogeneoustraders. Journal of Economic Dynamics and Control, 33(5):1023–1035.
Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association, 71(356):791–799.
Cover, T. M. and Thomas, J. A. (2012). Elements of information theory. John Wiley & Sons.
Dawid, A. P. (1984). Present position and potential developments: Some personal views: Statistical theory:The prequential approach. Journal of the Royal Statistical Society. Series A (General), pages 278–292.
Dawid, A. P. (1985). Calibration-based empirical probability. The Annals of Statistics, pages 1251–1274.
De Finetti, B. (1931). Funzione caratteristica di un fenomeno aleatorio. Atti del Congresso Internazionaledei matematici, Bologna.
De Rooij, S., Van Erven, T., Grunwald, P. D., and Koolen, W. M. (2014). Follow the leader if you can,hedge if you must. The Journal of Machine Learning Research, 15(1):1281–1316.
Epstein, L. G. (2006). An axiomatic model of non-bayesian updating. Review of Economic Studies, pages413–436.
Epstein, L. G. and Le Breton, M. (1993). Dynamically consistent beliefs must be bayesian. Journal ofEconomic theory, 61(1):1–22.
Epstein, L. G., Noor, J., and Sandroni, A. (2008). Non-bayesian updating: a theoretical framework. Theo-retical Economics, 3(2):193–229.
Foster, D. P. and Vohra, R. (1999). Regret in the on-line decision problem. Games and Economic Behavior,29(1):7–35.
30
Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an appli-cation to boosting. Journal of computer and system sciences, 55(1):119–139.
Friedman, M. (1953). Essays in positive economics, volume 231. University of Chicago Press.
Ghirardato, P. (2002). Revisiting savage in a conditional world. Economic Theory, 20(1):83–92.
Gilboa, I. and Marinacci, M. (2011). Ambiguity and the bayesian paradigm. Chapter, 7:179–242.
Grunwald, P. (2007). The minimum description length principle. MIT press.
Grunwald, P. (2012). The safe bayesian. In Algorithmic Learning Theory, pages 169–183. Springer.
Kahneman, D. (2011). Thinking, fast and slow. Macmillan.
Kets, W., Pennock, D. M., Sethi, R., and Shah, N. (2014). Betting strategies, market selection, and thewisdom of crowds. Market Selection, and the Wisdom of Crowds (February 20, 2014).
Kolmogorov, A. (1933). Grundbegriffe der wahrscheinlichkeitstheorie. Ergebnisse Mathematische, 2.
Lehrer, E. and Teper, R. (2016). Who is a bayesian? Working paper.
Ljungqvist, L. and Sargent, T. J. (2004). Recursive macroeconomic theory. MIT press.
Massari, F. (2013). Comment on if you’re so smart, why aren’t you rich? belief selection in complete andincomplete markets. Econometrica, 81(2):849–851.
Massari, F. (2015a). Do not follow a weak leader. Available at SSRN 2663223.
Massari, F. (2015b). Market selection in large economies: A matter of luck. Available at SSRN 2559468.
Pascal, B. (1668). Pascal’s Pensees (English translation by John Walker). Available online athttp://www.gutenberg.org/files/18269/18269-h/18269-h.htm.
Peleg, B. and Yaari, M. E. (1970). Markets with countably many commodities. International EconomicReview, 11(3):369–377.
Rabin, M. et al. (2000). Inference by believers in the law of small numbers. Institute of Business andEconomic Research.
Rissanen, J. (1986). Stochastic complexity and modeling. The Annals of Statistics, pages 1080–1100.
Roos, T. and Rissanen, J. (2008). On sequentially normalized maximum likelihood models. Workshop onInformation Theoretic Methods in Science and Engineering.
Rubinstein, M. (1974). An aggregation theorem for securities markets. Journal of Financial Economics,1(3):225–244.
Sandroni, A. (2000). Do markets favor agents able to make accurate predictions? Econometrica, 68(6):1303–1341.
Savage, L. J. (1954). The foundations of statistics. John Wiley & Sons.
Shtar’kov, Y. M. (1987). Universal sequential coding of single messages. Problemy Peredachi Informatsii,23(3):3–17.
Vovk, V. G. (1990). Aggregating strategies. In Proc. Third Workshop on Computational Learning Theory,pages 371–383. Morgan Kaufmann.
31