Upload
horace-dickerson
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Abstract
We offer a formal treatment of choice behaviour based on the premise that agents minimise the expected free energy of future outcomes. Crucially, the negative free energy or quality of a policy can be decomposed into extrinsic and epistemic (intrinsic) value. Minimising expected free energy is therefore equivalent to maximising extrinsic value or expected utility (defined in terms of prior preferences or goals), while maximising information gain or intrinsic value; i.e., reducing uncertainty about the causes of valuable outcomes. The resulting scheme resolves the exploration-exploitation dilemma: epistemic value is maximised until there is no further information gain, after which exploitation is assured through maximisation of extrinsic value. This is formally consistent with the Infomax principle, generalising formulations of active vision based upon salience (Bayesian surprise) and optimal decisions based on expected utility and risk sensitive (KL) control. Furthermore, as with previous active inference formulations of discrete (Markovian) problems; ad hoc softmax parameters become the expected (Bayes-optimal) precision of beliefs about – or confidence in – policies. We focus on the basic theory – illustrating the minimisation of expected free energy using simulations. A key aspect of this minimisation is the similarity of precision updates and dopaminergic discharges observed in conditioning paradigms.
Active inference and epistemic value
Karl Friston, Francesco Rigoli, Dimitri Ognibene, Christoph Mathys, Thomas FitzGerald and Giovanni Pezzulo
Premise
All agents minimize free energy (under a generative model) All agents possess prior beliefs (preferences) Free energy is minimized when priors are (actively) realized All agents believe they will minimize (expected) free energy
Set-up and definitions: active inference
Definition: Active inference rests on the tuple ( , , , , , , )P Q R S A U :
A finite set of observations
A finite set of actions A
A finite set of hidden states S
A finite set of control states U
A generative process 0 0 0 1( , , ) Pr({ , , } ,{ , , } ,{ , , } )t t tR o s a o o o s s s a a a
over observations o , hidden states s S and action a A
A generative model 0 0( , , | ) Pr({ , , } ,{ , , } ,{ , , } )T T t TP o s u m o o o s s s u u u
over observations o , hidden s S and control u U states, with parameters .
An approximate posterior 0( , ) Pr({ , , } ,{ , , } )T t TQ s u s s s u u u over hidden and
control states with sufficient statistics ( , )s , where {1, , }K is a policy that indexes
a sequence of control states ( | ) ( , , | )t Tu u u
( , , ) [ ln ( , , | )] [ ( , )]
ln ( | ) [ ( , ) || ( , | )]
QF o s E P o s u m H Q s u
P o m D Q s u P s u o
Free energy
if
to
ta A
ts S
ts
Pr ( | )t t ta u Q u
( , ) arg min ( , , )ts F o s
action
perception
world agent
Generative process
Generative model
Approximate posterior
Perception-action cycle
1u 2u
1s
2s
3sHidden states
Control states
1( | , )t t tP s s u
1( | , )t t tP u u s
??
Reject or stay
Low offer High offer
??Accept or shift
Low offer High offer
1
0 0 0 0
0 0 0 0
( | , 0) 1 1 0 0
0 0 0 1 0
0 0 0 0 1
t t t
p
q
P s s u r
1
0 0 0 0 0
0 0 0 0 0
( | , 1) 0 0 1 1 1
1 0 0 0 0
0 1 0 0 0
t t tP s s u
p q
r
An example:
0 0 1 1
1 1 0 1 0
1
1
( , | )
, , , | , | | | |
| ( | ) ( | ) ( | )
|
| ( | , ) ( | , ) ( | )
| , ( )
| ( ( ))
( ) [ln ( , | ) ln ( | )]
t t
t t
t t t
t t t t
t T
Q o s
P o s u a m P o s P s a P u P m
P o s P o s P o s P o s
P o s
P s a P s s a P s s a P s m
P s s u u
P u
E P o s Q s
P o
A
B
Q Q
Q
0
|
|
| ( , )
m
P s m
P m
C
D ifFull priors
– control states
Empirical priors – hidden states
Likelihood
The (normal form) generative model
Hidden states ts
to
0 1, , ta a Action
, ,t Tu uControl states
C
,
B
A
1ts 1ts
1to
KL or risk-sensitive control
In the absence of ambiguity:
Expected utility theory
In the absence of posterior uncertainty or risk:
Priors over policies
Bayesian surprise and Infomax
In the absence of prior beliefs about outcomes:
( | )
Bayesian Surprise
Predictive mutual information
( ) [ [ ( | , ) || ( | )]]
[ ( , | ) || ( | ) ( | )]
Q oE D Q s o Q s
D Q s o Q s Q o
Q
1ln | ( ( ) ( ))t TP u Q Q
( | )
KL divergence
( ) [ln ( | ) ln ( | )]
[ ( | ) || ( | )]
Q sE P s Q s
D Q s P s
Q
( | )
Extrinsic value
( ) [ln ( | )]Q oE P o m Q
Prior beliefs about policies
Expected free energy
( , | )
( , | )
( | ) ( | )
Extrinsic value Epistemic value
(
( ) [ln ( , | )] [ ( | )]
[ln ( | , ) ln ( | ) ln ( | )]
[ln ( | )] [ [ ( | , ) || ( | )]]
Q o s
Q o s
Q o Q o
Q
E P o s H Q s
E Q s o P o m Q s
E P o m E D Q s o Q s
E
Q
| )
Predictive divergencePredictive ambiguity
[ [ ( | )]] [ ( | ) || ( | )]s H P o s D Q o P o m
Extrinsic value Epistemic value
Predicted divergencePredicted ambiguity
Predicted divergence
Extrinsic valueBayesian surprise
Predicted mutual information
The quality of a policy corresponds to (negative) expected free energy
( , | )( ) [ln ( , | )] [ ( | )]Q o sE P o s H Q s Q
( , | ) ( | , ) ( | )P o s Q s o P o m ( , | ) ( | , ) ( | )P o s P o s P s m
( | ) ln ( | )C o m P o m
Generative model of future states Future generative model of states
Prior preferences (goals) over future outcomes
0 0, , | ( | ) ( | ) ( , , | ) ( | )
| ( , )
T T t TQ s u Q s s Q s s Q u u Q
Q
/
/
/
( ) exp( [ln ( , , , | )])
( ) exp( [ln ( , , , | )])
( ) exp( [ln ( , , , | )])
tt Q s
Q
Q
Q s E P o s u m
Q E P o s u m
Q E P o s u m
The mean field partition
And variational updates
Minimising free energy
ta
midbrain
motor Cortex
occipital Cortex
striatum
to
1 1(ln ln( ( ) ))
( )
t t t ts o a s
A B
Q
Q
Variational updates
Perception
Action selection
Precision
Functional anatomy
ts
Pr ( | )t t ta u Q u
1
Predictive uncertainty Predictive divergence
( ) ( ) ( )
( ) ( ln ) ( ) (ln ( ) ln ) ( )
( ) ( | ) ( | )
t T
t t
s s s
s u u s
Q Q Q
Q 1 A A A C A
B B
Forward sweeps over future states
if
s
prefrontal Cortex
Q
hippocampus
Predicted divergencePredicted ambiguity
The (T-maze) problem
Y Yor
YY
Generative model
Hidden states
Control states
1 2
0 0 0 0
1 1 0 1( | , ) ( ) : ( 2)
0 0 1 0
0 0 0 0
t t t t tP s s u u u I
B B
, ,t Tu u u
Observations
( | ) ( | )
| , ( ( ))
( ) [ln ( | )] [ [ ( | , ) || ( | )]]Q o Q o
P u o
E P o m E D Q s o Q s
Q
Q
Extrinsic value Epistemic value
1 12 2
1 1 12 2
1 2 3 4
4
0 0 0 0 1 0
0 0 0 0 0 1( | ) : , , ,
1 1 0 00 0
1 1 0 00 0
t tP o sa a a a
a a a a
A
A A A A A
A
4
1 10 2 2
( | ) 0 0
( | ) 1 0 0 0
TT
T
P o m c c
P s m
C 1
D
Prior beliefs about control
Posterior beliefs about states ( , | , )Q s u s
YYYYl so o o
location stimulusCS CS US NS
YYYYl cs s s
location context
YYYYlocation
Comparing different schemes
( , | )
Expected utility
KL control
Expected Free energy
( ) [ln ( | ) ln ( | ) ln ( | )]Q o sE P o s Q o u P o m Q
0 0.2 0.4 0.6 0.8 1
0
20
40
60
80
Performance
Prior preference
succ
ess
rate
(%)
FE
KL
EU
DA
Sensitive to risk or ambiguity - - + - + +
0 0.2 0.4 0.6 0.8 1
0
20
40
60
80
Performance
Prior preference
succ
ess
rate
(%)
FEKLRLDA
1 2 3 44
5
6
7
8
9
10
11Precision updates
Peristimulus time (sec)
Prec
isio
n
1 2 3 40
0.5
1
1.5
2
2.5
3
3.5
4Dopamine responses
Peristimulus time (sec)
Res
pons
e
1 2 3 40
50
100
150
200
250
300
350
400Simulated (CS & US) responses
Peristimulus time (sec)
Rat
e
1 2 3 40
50
100
150
200
250
300
350
400Simulated (US) responses
Peristimulus time (sec)
Rat
e
Simulating conditioned responses (in terms of precision updates)
-8 -7 -6 -5 -4 -3 -2 -1 00
1
2
3
4
5
6
7
8
Expected value
Expe
cted
pre
cisio
n
8
1
Q
Q
Expected precision and value
Changes in expected precision reflect changes in expected value:c.f., dopamine and reward prediction error
1 1.5 2 2.5 3 3.5 4 4.50
0.5
1
1.5
2
2.5
3
3.5
4Preference (utility)
Peristimulus time (sec)
Res
pons
e
1 2 3 40
100
200
300
400
Rat
e
1 2 3 40
100
200
300
400
Rat
e
1 2 3 40
200
400
Rat
e
1 1.5 2 2.5 3 3.5 4 4.50
0.5
1
1.5
2
2.5
3
3.5
4Uncertainty
Peristimulus time (sec)
Res
pons
e
1 2 3 40
100
200
300
Rat
e
1 2 3 40
200
400
Rat
e
1 2 3 40
200
400
Rat
e
Peristimulus time
0 : 0.5c a
0.5 : 2a c
0.7 : 2a c
0.9 : 2a c
1: 0.5c a
2 : 0.5c a
Simulating conditioned responses
(1) (1)
(1)1
(1)4
(1)
[ , , ]
( )
( )
( )
i
i
j
i
j
A A A
B
B
B
C C
0
0
(1)4 8
( | , )
((1 ) ) ( )
T
T
T
s s
P s s m
e I e
E
E
D 1
Learning as inference
YYHidden states YYYYm l cs s s s
location context
Control states
, ,t Tu u u YYYY
YYYYmaze
1 2 3 4
1 3 2 4
1 2 4 3
1 4 3 2
j
Hierarchical augmentation of state space
Bayesian belief updating between trials:of conserved (maze) states
1 2 3 4 5 6 7 80
10
20
30
40
50
60
70
80
90
100Exploration and exploitation
Number of trials
Perfo
rman
ce a
nd u
ncer
tain
ty
20 40 60 80 100 120 140 160 1800.5
1
1.5
2
2.5
Average dopaminergic response
Variational updates
Prec
isio
n
20 40 60 80 100 120 140 160 180
0
100
200
300
Simulated dopaminergic response
Variational updates
Spik
es p
er th
en
Learning as inference
performance
uncertainty
Summary
·Optimal behaviour can be cast as a pure inference problem, in which valuable outcomes are defined in terms of prior beliefs about future states.
·Exact Bayesian inference (perfect rationality) cannot be realised physically, which means that optimal behaviour rests on approximate Bayesian inference (bounded rationality).
·Variational free energy provides a bound on Bayesian model evidence that is optimised by bounded rational behaviour.
·Bounded rational behaviour requires (approximate Bayesian) inference on both hidden states of the world and (future) control states. This mandates beliefs about action (control) that are distinct from action per se – beliefs that entail a precision.
·These beliefs can be cast in terms of minimising the expected free energy, given current beliefs about the state of the world and future choices.
·The ensuing quality of a policy entails epistemic value and expected utility that account for exploratory and exploitative behaviour respectively.
·Variational Bayes provides a formal account of how posterior expectations about hidden states of the world, policies and precision depend upon each other; and may provide a metaphor for message passing in the brain.
·Beliefs about choices depend upon expected precision while beliefs about precision depend upon the expected quality of choices.
·Variational Bayes induces distinct probabilistic representations (functional segregation) of hidden states, control states and precision – and highlights the role of reciprocal message passing. This may be particularly important for expected precision that is required for optimal inference about hidden states (perception) and control states (action selection).
·The dynamics of precision updates and their computational architecture are consistent with the physiology and anatomy of the dopaminergic system.