International trade and the risk of biological invasionsecoservices.asu.edu/pdf/Springborn-Tempe.pdf · • Reduced uncertainty means better decisions ¾the particular emphasis placed

Bayesian Profiling with Learning

October 30, 2007

BESTNet/DIVERSITAS ecoSERVICES Workshop Economic and Ecological

Science and Management of Invasive Species

Research support: USDA Market and Trade Economics Division grants program

Michael SpringbornMichael SpringbornDonald Bren School of Environmental Science & ManagementDonald Bren School of Environmental Science & Management

UCSBUCSB

Learning versus exploitation-- Australian and Guatemalan oranges --

CountryNo. Inspections

Infested shipments

Proportion infested

Australia 191 50 0.26

Guatemala 4 1 0.25

Many significant environmental problems require taking action under uncertainty and involve common features:• Management actions

– In the presence of uncertain (and stochastic) outcomes/rewards

– Provide opportunities for information – reduce uncertainty

• Make better future decisions– # actions/experiments is constrained (“effort

budget”)

• GMOs, optimal pollution levels (e.g. CO2), optimal harvest

• Cargo inspector’s mitigation-learning decision problem– Over 1B tons of U.S. imports to inspect yearly– Can inspect ~ 2% of shipments– Import sources with hetergenous, unknown

risk

• Data: APHIS database: outcomes of inspections of shipments of imported goods (1996-2006)– Over 7M inspections, 144 U.S. ports of entry,

190 exporting countries• 0.8% were found to have an “actionable pest”

(62K shipments)Inspecting papaya for pests like Mediterranean fruit fly

Application setting

Adaptive Management (AM)

• active AM: intentional use of management actions as experiments to learn (Walters and Hilborn, 1978)

• “learning by doing”• Constrained applications in practice [Sainsbury et al.

(1997), Kaplan et al. (2003), Costello and Karp (2004)]

– Small experiments (e.g. 2 options)– Discrete/limited experimental phase– Experiments (learning) and management actions are

separate

• Reduced uncertainty means better decisionsthe particular emphasis placed on learning is an important management control decision in its own right.

• Tradeoff: 1.Exploitation--maximize immediate interceptions2.Exploration--learn to reduce uncertainty

• Need a methodology for capturing important features of resource management under uncertainty with learning

Bayesian Learning Model

• “Simple” Bayesian model of trade-related invasive species risk – Production techniques from a particular

source result in a fixed fraction of shipments being infested.

• pj: probability that shipment from source j is infested

• can learn about the risk through inspection• beliefs over true value of pj modeled with a beta

distribution.

Bayesian learning model• s0, f0 : initial parameters of beta distribution on p

– Reflects all initially available risk information

• st, ft : the updated parameters incorporating all observations up to time t.– pt ~ beta(st, ft)– E[pt] = st /(st+ ft)

beta distribution updating: Australian oranges

beta distribution updating: Australian oranges

• pj: likelihood of infestation of shipment, j = A, G• f(pj): beta distribution of beliefs over true infection rate• Assume:

– Source A is riskier: E(pA) > E(pG)– Variance for B is greater: Var(pA) < Var(pG)

pj

f(pi)

G

A

f(pG|yG1 = 1)

•• Optimal to inspect the less Optimal to inspect the less risky shipment (G) in period risky shipment (G) in period 1 when: 1 when:

the the expected benefitexpected benefit from from learning (in period 2) learning (in period 2) >= >= opportunity costopportunity cost of of learning experienced (in learning experienced (in period 1).period 1).

Simple example: 2 sources, 2 periods

Single time scale learning model:

• fixed infestation rate, po

• beliefs: p ~ beta(s, f)

Short AND long-term learning model:

• There is no true po

• random infestation rate -- each period (e.g. month) there is a random draw of a new pt.

• Learning about both:(1) pt for the current month(2) Distribution of pt

log(st/ft) log(st/ft) log(st/ft)

log(

s t+f

t)

log(

s t+f

t)

log(

s t+f

t)

Increasing risk

Inc.

con

f.

Decision model• st

j, ftj : the updated parameters of beta distribution over pj

• xtj : the decision variable (number of times a shipment from source j is

inspected in period t)

• J : the number of import sources.

• K : inspection budget--the total number of inspectors (i.e. servers or processors) employed simultaneously for inspection.

• Objective: allocate inspections across J sources to maximize expected interceptions:

ytj

Expected immediate rewards:Number of inspections * E[pt

j]Discounted future rewards

Multi-armed bandit (MAB) model

• Introduced in WWII

• Regarded as formidable to the “point of hopelessness" through the late 1970s (Whittle, 1998)

• A deep theory for this problem due to Gittins and Jones (1972)

http://pintofstout.files.wordpress.com/2006/11/one-armed-bandit.jpg

MAB problem – Gittins Index approach• There exists an index function γ(statej) –

called the Gittins Index (GI)--for each arm j, which depends only on the state of j

–Index policy: in each period, pull the arm for which γ(statej) is the highest (Gittins and Jones, 1972)

–J-dimensional optimization problem J independent one-dimensional problems

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

Demand for inspections

Mexican tomatoes

Hyp

othe

tical

MC

of

insp

ectio

n

E(p)=0.20

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

x

Mexican tomatoes

Hyp

othe

tical

MC

of

insp

ectio

n

E(p)=0.20E(p)=0.20

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

x

Mexican tomatoes

Hyp

othe

tical

MC

of

insp

ectio

n

E(p)=0.20E(p)=0.20

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

x

Dutch tomatoes

Hyp

othe

tical

MC

of

insp

ectio

n

E(p)=0.15E(p)=0.15E(p)=0.15

Conclusions• What have we got?

– Inputs• Past history of management outcomes

– Outputs• Bayesian estimate of outcome likelihood and

explicit characterization of uncertainty– Extended to account for correlated observations

• Decision rule for allocating resource budget– Internalizes the value of short- and long-term learning

Lagrangian Relaxation

• Estimate demand for inspections (in the current period) for each source individually and independently under a range of (shadow) prices imposed on each inspection conducted.

Conclusions

ContributionsMethodological• Applied hierarchical Bayesian learning model

– Handle correlated (non-independent) observations– Learning at two scales, allows managers to respond to

aberrations (e.g. “bad” months)

• Multi-armed bandit (MAB) allocation strategy– Lagrangian relaxation approach to approximate a solution to

intractable problem.

Empirical• How much better can a manager do? Invasives port

inspection example

Overview1. Common features of big environmental problems

2. Example: Invasive species trade-inspection targeting problem

3. Bayesian learning model (hyper-

4. Empirical application – Australian oranges

5. Full decision model &multi-armed bandit (MAB) allocation strategy

Dynamic decision-making under uncertainty

• Adaptive management, resource allocation– Adaptive sampling, exploration or exploitation

(Walters, 1986)• Learning: stems from action which informs subsequent

action (Stankey et al. 2005)

• Learning often passive (e.g. Kelly et al. (2005), Ulph and Ulph(1997) or the value of learning is only subjectively assessed (e.g. McDaniels (1995)).

• Endogenous learning: Costello and Karp (2004), pollution regulator modifies instrument choice over time to learn about unknown cost parameter.

Invasive species inspection targeting problemInvasive species inspection targeting problem

• Invasive species represent a signficant bioeconomic threat– ~50,000 invasive species

introduced to the U.S. (Pimentel, 2005)

– Estimated yearly U.S. losses: $4B – 120B (OTA, 1993; Pimentel, 2005)

• International trade is a signficant vector for unintended introductions

Asian gypsy moth

Objective• Optimal endogenous learning: Maximize long-

run interceptions given the opportunity for learning (via inspections) which balance exploitation and exploration.

• Conceptual model of trade-related invasive species risk – Production techniques from a particular source result

in a fixed fraction of shipments being infested.• pj: probability of infestation • can learn about the risk through analysis and inspection• beliefs over true value of pj modeled with a beta distribution.

MAB – Approximate solutions• GI very difficult to calculate – so we turn to

approximate solutions– Asymptotic approximations: Lai (1987), Bhulai and Koole

(2000) and– Brezzi and Lai (2002): Use a diffusion

approximation (roughly a random walk) to model state transitions

– Resulting index policy is “asymptotically optimal”

– Approximate dynamic programming • Lagrange multiplier relaxation• Linear programming relaxation (Bertsimas and Nino-Mora, 2000)

Asymptotic approximation of GI –Brezzi and Lai (2002)

E[pj], Expected probability of infection

V[pj], Variance of probability of inf.

Variance of Bernoulli trial

Discount factor-weighted ratio of variance terms

Ψ ( ): a given increasing concave function

Exploitation Exploration/Information: decreasing convex function of (s + f), roughly the number of observations on which beliefs are based.

0 5 10 15 20 25 30 35 40 45 500

0.005

0.01

0.015V

aria

nce beta hyperparameters: (s A, f A) = (5, 95) ==> E(p A)=0.05

(s B, f B) s.t. E(p B)=0.04 and s B+f B=n+2

V(pA)

V(pB)

0 5 10 15 20 25 30 35 40 45 500

0.02

0.04

0.06

0.08

Mea

n an

d in

fo c

ompo

nent

of

GI

E(pA)

infoAE(pB)

infoB

0 5 10 15 20 25 30 35 40 45 500.04

0.06

0.08

0.1

0.12

nB, observations of B

GI =

mea

n +

info GIA

GIB

Gittins Indices for two arms: A (high risk, low var.) & B (low risk). (Means constant, accumulating obs. of B)

Exploitation

Australian oranges beta distribution updating with GI estimate

“Greedy” versus GI strategy

• E[p] is the current Bayesian estimate of infestation risk • I is the informational component of the GI

Incorporating the value of information for future decisions (GI strategy) can reverse the inspection decision reached based on expected likelihood of infection alone (greedy strategy) .

Australian versus Guatemalan oranges.

Extensions to the MAB framework

• Multiple “pulls” in a time period (K>1).

• Different shipping shipping rates for different arms (i.e. NA type-A arms and NB type-B arms available to pull)

• Restless bandits: underlying pj is not fixed but rather continues to evolve over time

--------------------------------------------------------------------------• Approximate dynamic programming

– Lagrange multiplier relaxation– Linear programming relaxation (Bertsimas and Nino-Mora, 2000)

Documents

International trade and the risk of biological invasionsecoservices.asu.edu/pdf/Springborn-Tempe.pdf · • Reduced uncertainty means better decisions ¾the particular emphasis placed