Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Bayesian Profiling with Learning
October 30, 2007
BESTNet/DIVERSITAS ecoSERVICES Workshop Economic and Ecological
Science and Management of Invasive Species
Research support: USDA Market and Trade Economics Division grants program
Michael SpringbornMichael SpringbornDonald Bren School of Environmental Science & ManagementDonald Bren School of Environmental Science & Management
UCSBUCSB
Learning versus exploitation-- Australian and Guatemalan oranges --
CountryNo. Inspections
Infested shipments
Proportion infested
Australia 191 50 0.26
Guatemala 4 1 0.25
Many significant environmental problems require taking action under uncertainty and involve common features:• Management actions
– In the presence of uncertain (and stochastic) outcomes/rewards
– Provide opportunities for information – reduce uncertainty
• Make better future decisions– # actions/experiments is constrained (“effort
budget”)
• GMOs, optimal pollution levels (e.g. CO2), optimal harvest
• Cargo inspector’s mitigation-learning decision problem– Over 1B tons of U.S. imports to inspect yearly– Can inspect ~ 2% of shipments– Import sources with hetergenous, unknown
risk
• Data: APHIS database: outcomes of inspections of shipments of imported goods (1996-2006)– Over 7M inspections, 144 U.S. ports of entry,
190 exporting countries• 0.8% were found to have an “actionable pest”
(62K shipments)Inspecting papaya for pests like Mediterranean fruit fly
Application setting
Adaptive Management (AM)
• active AM: intentional use of management actions as experiments to learn (Walters and Hilborn, 1978)
• “learning by doing”• Constrained applications in practice [Sainsbury et al.
(1997), Kaplan et al. (2003), Costello and Karp (2004)]
– Small experiments (e.g. 2 options)– Discrete/limited experimental phase– Experiments (learning) and management actions are
separate
• Reduced uncertainty means better decisionsthe particular emphasis placed on learning is an important management control decision in its own right.
• Tradeoff: 1.Exploitation--maximize immediate interceptions2.Exploration--learn to reduce uncertainty
• Need a methodology for capturing important features of resource management under uncertainty with learning
Bayesian Learning Model
• “Simple” Bayesian model of trade-related invasive species risk – Production techniques from a particular
source result in a fixed fraction of shipments being infested.
• pj: probability that shipment from source j is infested
• can learn about the risk through inspection• beliefs over true value of pj modeled with a beta
distribution.
Bayesian learning model• s0, f0 : initial parameters of beta distribution on p
– Reflects all initially available risk information
• st, ft : the updated parameters incorporating all observations up to time t.– pt ~ beta(st, ft)– E[pt] = st /(st+ ft)
beta distribution updating: Australian oranges
beta distribution updating: Australian oranges
• pj: likelihood of infestation of shipment, j = A, G• f(pj): beta distribution of beliefs over true infection rate• Assume:
– Source A is riskier: E(pA) > E(pG)– Variance for B is greater: Var(pA) < Var(pG)
pj
f(pi)
G
A
f(pG|yG1 = 1)
•• Optimal to inspect the less Optimal to inspect the less risky shipment (G) in period risky shipment (G) in period 1 when: 1 when:
the the expected benefitexpected benefit from from learning (in period 2) learning (in period 2) >= >= opportunity costopportunity cost of of learning experienced (in learning experienced (in period 1).period 1).
Simple example: 2 sources, 2 periods
Single time scale learning model:
• fixed infestation rate, po
• beliefs: p ~ beta(s, f)
Short AND long-term learning model:
• There is no true po
• random infestation rate -- each period (e.g. month) there is a random draw of a new pt.
• Learning about both:(1) pt for the current month(2) Distribution of pt
log(st/ft) log(st/ft) log(st/ft)
log(
s t+f
t)
log(
s t+f
t)
log(
s t+f
t)
Increasing risk
Inc.
con
f.
Decision model• st
j, ftj : the updated parameters of beta distribution over pj
• xtj : the decision variable (number of times a shipment from source j is
inspected in period t)
• J : the number of import sources.
• K : inspection budget--the total number of inspectors (i.e. servers or processors) employed simultaneously for inspection.
• Objective: allocate inspections across J sources to maximize expected interceptions:
ytj
Expected immediate rewards:Number of inspections * E[pt
j]Discounted future rewards
Multi-armed bandit (MAB) model
• Introduced in WWII
• Regarded as formidable to the “point of hopelessness" through the late 1970s (Whittle, 1998)
• A deep theory for this problem due to Gittins and Jones (1972)
http://pintofstout.files.wordpress.com/2006/11/one-armed-bandit.jpg
MAB problem – Gittins Index approach• There exists an index function γ(statej) –
called the Gittins Index (GI)--for each arm j, which depends only on the state of j
–Index policy: in each period, pull the arm for which γ(statej) is the highest (Gittins and Jones, 1972)
–J-dimensional optimization problem J independent one-dimensional problems
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
Demand for inspections
Mexican tomatoes
Hyp
othe
tical
MC
of
insp
ectio
n
E(p)=0.20
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
x
Mexican tomatoes
Hyp
othe
tical
MC
of
insp
ectio
n
E(p)=0.20E(p)=0.20
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
x
Mexican tomatoes
Hyp
othe
tical
MC
of
insp
ectio
n
E(p)=0.20E(p)=0.20
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
x
Dutch tomatoes
Hyp
othe
tical
MC
of
insp
ectio
n
E(p)=0.15E(p)=0.15E(p)=0.15
Conclusions• What have we got?
– Inputs• Past history of management outcomes
– Outputs• Bayesian estimate of outcome likelihood and
explicit characterization of uncertainty– Extended to account for correlated observations
• Decision rule for allocating resource budget– Internalizes the value of short- and long-term learning
Lagrangian Relaxation
• Estimate demand for inspections (in the current period) for each source individually and independently under a range of (shadow) prices imposed on each inspection conducted.
Conclusions
ContributionsMethodological• Applied hierarchical Bayesian learning model
– Handle correlated (non-independent) observations– Learning at two scales, allows managers to respond to
aberrations (e.g. “bad” months)
• Multi-armed bandit (MAB) allocation strategy– Lagrangian relaxation approach to approximate a solution to
intractable problem.
Empirical• How much better can a manager do? Invasives port
inspection example
Overview1. Common features of big environmental problems
2. Example: Invasive species trade-inspection targeting problem
3. Bayesian learning model (hyper-
4. Empirical application – Australian oranges
5. Full decision model &multi-armed bandit (MAB) allocation strategy
Dynamic decision-making under uncertainty
• Adaptive management, resource allocation– Adaptive sampling, exploration or exploitation
(Walters, 1986)• Learning: stems from action which informs subsequent
action (Stankey et al. 2005)
• Learning often passive (e.g. Kelly et al. (2005), Ulph and Ulph(1997) or the value of learning is only subjectively assessed (e.g. McDaniels (1995)).
• Endogenous learning: Costello and Karp (2004), pollution regulator modifies instrument choice over time to learn about unknown cost parameter.
Invasive species inspection targeting problemInvasive species inspection targeting problem
• Invasive species represent a signficant bioeconomic threat– ~50,000 invasive species
introduced to the U.S. (Pimentel, 2005)
– Estimated yearly U.S. losses: $4B – 120B (OTA, 1993; Pimentel, 2005)
• International trade is a signficant vector for unintended introductions
Asian gypsy moth
Objective• Optimal endogenous learning: Maximize long-
run interceptions given the opportunity for learning (via inspections) which balance exploitation and exploration.
• Conceptual model of trade-related invasive species risk – Production techniques from a particular source result
in a fixed fraction of shipments being infested.• pj: probability of infestation • can learn about the risk through analysis and inspection• beliefs over true value of pj modeled with a beta distribution.
MAB – Approximate solutions• GI very difficult to calculate – so we turn to
approximate solutions– Asymptotic approximations: Lai (1987), Bhulai and Koole
(2000) and– Brezzi and Lai (2002): Use a diffusion
approximation (roughly a random walk) to model state transitions
– Resulting index policy is “asymptotically optimal”
– Approximate dynamic programming • Lagrange multiplier relaxation• Linear programming relaxation (Bertsimas and Nino-Mora, 2000)
Asymptotic approximation of GI –Brezzi and Lai (2002)
E[pj], Expected probability of infection
V[pj], Variance of probability of inf.
Variance of Bernoulli trial
Discount factor-weighted ratio of variance terms
Ψ ( ): a given increasing concave function
Exploitation Exploration/Information: decreasing convex function of (s + f), roughly the number of observations on which beliefs are based.
0 5 10 15 20 25 30 35 40 45 500
0.005
0.01
0.015V
aria
nce beta hyperparameters: (s A, f A) = (5, 95) ==> E(p A)=0.05
(s B, f B) s.t. E(p B)=0.04 and s B+f B=n+2
V(pA)
V(pB)
0 5 10 15 20 25 30 35 40 45 500
0.02
0.04
0.06
0.08
Mea
n an
d in
fo c
ompo
nent
of
GI
E(pA)
infoAE(pB)
infoB
0 5 10 15 20 25 30 35 40 45 500.04
0.06
0.08
0.1
0.12
nB, observations of B
GI =
mea
n +
info GIA
GIB
Gittins Indices for two arms: A (high risk, low var.) & B (low risk). (Means constant, accumulating obs. of B)
Exploitation
Australian oranges beta distribution updating with GI estimate
“Greedy” versus GI strategy
• E[p] is the current Bayesian estimate of infestation risk • I is the informational component of the GI
Incorporating the value of information for future decisions (GI strategy) can reverse the inspection decision reached based on expected likelihood of infection alone (greedy strategy) .
Australian versus Guatemalan oranges.
Extensions to the MAB framework
• Multiple “pulls” in a time period (K>1).
• Different shipping shipping rates for different arms (i.e. NA type-A arms and NB type-B arms available to pull)
• Restless bandits: underlying pj is not fixed but rather continues to evolve over time
--------------------------------------------------------------------------• Approximate dynamic programming
– Lagrange multiplier relaxation– Linear programming relaxation (Bertsimas and Nino-Mora, 2000)