57
Optimal Transport in Risk Analysis Jose Blanchet (based on work with Y. Kang and K. Murthy) Stanford University (Management Science and Engineering), and Columbia University (Department of Statistics and Department of IEOR). Blanchet (Columbia U. and Stanford U.) 1 / 25

Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Optimal Transport in Risk Analysis

Jose Blanchet (based on work with Y. Kang and K. Murthy)

Stanford University (Management Science and Engineering), and Columbia University(Department of Statistics and Department of IEOR).

Blanchet (Columbia U. and Stanford U.) 1 / 25

Page 2: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Goal:

Goal: Present a comprehensive frameworkfor decision making under model uncertainty...

This presentation is an invitation to read these two papers:https://arxiv.org/abs/1604.01446https://arxiv.org/abs/1610.05627

Blanchet (Columbia U. and Stanford U.) 2 / 25

Page 3: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Example: Model Uncertainty in Ruin Probabilities

R (t) = the reserve (perhaps multiple lines) at time t.

Ruin probability (in finite time horizon T )

uT = Ptrue (R (t) ∈ B for some t ∈ [0,T ]) .

B is a set which models bankruptcy.

Problem: Model (Ptrue) may be complex, intractable or simplyunknown...

Blanchet (Columbia U. and Stanford U.) 3 / 25

Page 4: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Example: Model Uncertainty in Ruin Probabilities

R (t) = the reserve (perhaps multiple lines) at time t.

Ruin probability (in finite time horizon T )

uT = Ptrue (R (t) ∈ B for some t ∈ [0,T ]) .

B is a set which models bankruptcy.

Problem: Model (Ptrue) may be complex, intractable or simplyunknown...

Blanchet (Columbia U. and Stanford U.) 3 / 25

Page 5: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Example: Model Uncertainty in Ruin Probabilities

R (t) = the reserve (perhaps multiple lines) at time t.

Ruin probability (in finite time horizon T )

uT = Ptrue (R (t) ∈ B for some t ∈ [0,T ]) .

B is a set which models bankruptcy.

Problem: Model (Ptrue) may be complex, intractable or simplyunknown...

Blanchet (Columbia U. and Stanford U.) 3 / 25

Page 6: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Example: Model Uncertainty in Ruin Probabilities

R (t) = the reserve (perhaps multiple lines) at time t.

Ruin probability (in finite time horizon T )

uT = Ptrue (R (t) ∈ B for some t ∈ [0,T ]) .

B is a set which models bankruptcy.

Problem: Model (Ptrue) may be complex, intractable or simplyunknown...

Blanchet (Columbia U. and Stanford U.) 3 / 25

Page 7: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

A Distributionally Robust Risk Analysis Formulation

Our solution: Estimate uT by solving

supDc (P0,P )≤δ

P (R (t) ∈ B for some t ∈ [0,T ]) ,

where P0 is a suitable model.

P0 = proxy for Ptrue .

P0 right trade-off between fidelity and tractability.

δ is the distributional uncertainty size.

Dc (·) is the distributional uncertainty region.

Blanchet (Columbia U. and Stanford U.) 4 / 25

Page 8: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

A Distributionally Robust Risk Analysis Formulation

Our solution: Estimate uT by solving

supDc (P0,P )≤δ

P (R (t) ∈ B for some t ∈ [0,T ]) ,

where P0 is a suitable model.

P0 = proxy for Ptrue .

P0 right trade-off between fidelity and tractability.

δ is the distributional uncertainty size.

Dc (·) is the distributional uncertainty region.

Blanchet (Columbia U. and Stanford U.) 4 / 25

Page 9: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

A Distributionally Robust Risk Analysis Formulation

Our solution: Estimate uT by solving

supDc (P0,P )≤δ

P (R (t) ∈ B for some t ∈ [0,T ]) ,

where P0 is a suitable model.

P0 = proxy for Ptrue .

P0 right trade-off between fidelity and tractability.

δ is the distributional uncertainty size.

Dc (·) is the distributional uncertainty region.

Blanchet (Columbia U. and Stanford U.) 4 / 25

Page 10: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

A Distributionally Robust Risk Analysis Formulation

Our solution: Estimate uT by solving

supDc (P0,P )≤δ

P (R (t) ∈ B for some t ∈ [0,T ]) ,

where P0 is a suitable model.

P0 = proxy for Ptrue .

P0 right trade-off between fidelity and tractability.

δ is the distributional uncertainty size.

Dc (·) is the distributional uncertainty region.

Blanchet (Columbia U. and Stanford U.) 4 / 25

Page 11: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

A Distributionally Robust Risk Analysis Formulation

Our solution: Estimate uT by solving

supDc (P0,P )≤δ

P (R (t) ∈ B for some t ∈ [0,T ]) ,

where P0 is a suitable model.

P0 = proxy for Ptrue .

P0 right trade-off between fidelity and tractability.

δ is the distributional uncertainty size.

Dc (·) is the distributional uncertainty region.

Blanchet (Columbia U. and Stanford U.) 4 / 25

Page 12: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Desirable Elements of Distributionally Robust Formulation

Would like Dc (·) to have wide flexibility (even non-parametric).

Want optimization to be tractable.

Want to preserve advantages of using P0.

Want a way to estimate δ.

Blanchet (Columbia U. and Stanford U.) 5 / 25

Page 13: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Desirable Elements of Distributionally Robust Formulation

Would like Dc (·) to have wide flexibility (even non-parametric).Want optimization to be tractable.

Want to preserve advantages of using P0.

Want a way to estimate δ.

Blanchet (Columbia U. and Stanford U.) 5 / 25

Page 14: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Desirable Elements of Distributionally Robust Formulation

Would like Dc (·) to have wide flexibility (even non-parametric).Want optimization to be tractable.

Want to preserve advantages of using P0.

Want a way to estimate δ.

Blanchet (Columbia U. and Stanford U.) 5 / 25

Page 15: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Desirable Elements of Distributionally Robust Formulation

Would like Dc (·) to have wide flexibility (even non-parametric).Want optimization to be tractable.

Want to preserve advantages of using P0.

Want a way to estimate δ.

Blanchet (Columbia U. and Stanford U.) 5 / 25

Page 16: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Connections to Distributionally Robust Optimization

Standard choices based on divergence (such as Kullback-Leibler) -Hansen & Sargent (2016)

D (v ||µ) = Ev(log(dvdµ

)).

Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009).

Big problem: Absolute continuity may typically be violated...Think of using Brownian motion as a proxy model for R (t)...

We advocate using optimal transport costs (e.g. Wassersteindistance).

Blanchet (Columbia U. and Stanford U.) 6 / 25

Page 17: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Connections to Distributionally Robust Optimization

Standard choices based on divergence (such as Kullback-Leibler) -Hansen & Sargent (2016)

D (v ||µ) = Ev(log(dvdµ

)).

Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009).

Big problem: Absolute continuity may typically be violated...Think of using Brownian motion as a proxy model for R (t)...

We advocate using optimal transport costs (e.g. Wassersteindistance).

Blanchet (Columbia U. and Stanford U.) 6 / 25

Page 18: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Connections to Distributionally Robust Optimization

Standard choices based on divergence (such as Kullback-Leibler) -Hansen & Sargent (2016)

D (v ||µ) = Ev(log(dvdµ

)).

Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009).

Big problem: Absolute continuity may typically be violated...

Think of using Brownian motion as a proxy model for R (t)...

We advocate using optimal transport costs (e.g. Wassersteindistance).

Blanchet (Columbia U. and Stanford U.) 6 / 25

Page 19: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Connections to Distributionally Robust Optimization

Standard choices based on divergence (such as Kullback-Leibler) -Hansen & Sargent (2016)

D (v ||µ) = Ev(log(dvdµ

)).

Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009).

Big problem: Absolute continuity may typically be violated...Think of using Brownian motion as a proxy model for R (t)...

We advocate using optimal transport costs (e.g. Wassersteindistance).

Blanchet (Columbia U. and Stanford U.) 6 / 25

Page 20: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Connections to Distributionally Robust Optimization

Standard choices based on divergence (such as Kullback-Leibler) -Hansen & Sargent (2016)

D (v ||µ) = Ev(log(dvdµ

)).

Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009).

Big problem: Absolute continuity may typically be violated...Think of using Brownian motion as a proxy model for R (t)...

We advocate using optimal transport costs (e.g. Wassersteindistance).

Blanchet (Columbia U. and Stanford U.) 6 / 25

Page 21: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Elements of Optimal Transport Costs

SX and SY be Polish spaces.

BSX and BSY be the associated Borel σ-fields.

c (·) : SX × SX → [0,∞) be lower semicontinuous.µ (·) and v (·) Borel probability measures defined on SX and SY .Given π a Borel prob. measure on SX × SY ,

πX (A) = π (A× SY ) and πY (C ) = π (SX × C ) .

Blanchet (Columbia U. and Stanford U.) 7 / 25

Page 22: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Elements of Optimal Transport Costs

SX and SY be Polish spaces.BSX and BSY be the associated Borel σ-fields.

c (·) : SX × SX → [0,∞) be lower semicontinuous.µ (·) and v (·) Borel probability measures defined on SX and SY .Given π a Borel prob. measure on SX × SY ,

πX (A) = π (A× SY ) and πY (C ) = π (SX × C ) .

Blanchet (Columbia U. and Stanford U.) 7 / 25

Page 23: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Elements of Optimal Transport Costs

SX and SY be Polish spaces.BSX and BSY be the associated Borel σ-fields.

c (·) : SX × SX → [0,∞) be lower semicontinuous.

µ (·) and v (·) Borel probability measures defined on SX and SY .Given π a Borel prob. measure on SX × SY ,

πX (A) = π (A× SY ) and πY (C ) = π (SX × C ) .

Blanchet (Columbia U. and Stanford U.) 7 / 25

Page 24: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Elements of Optimal Transport Costs

SX and SY be Polish spaces.BSX and BSY be the associated Borel σ-fields.

c (·) : SX × SX → [0,∞) be lower semicontinuous.µ (·) and v (·) Borel probability measures defined on SX and SY .

Given π a Borel prob. measure on SX × SY ,

πX (A) = π (A× SY ) and πY (C ) = π (SX × C ) .

Blanchet (Columbia U. and Stanford U.) 7 / 25

Page 25: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Elements of Optimal Transport Costs

SX and SY be Polish spaces.BSX and BSY be the associated Borel σ-fields.

c (·) : SX × SX → [0,∞) be lower semicontinuous.µ (·) and v (·) Borel probability measures defined on SX and SY .Given π a Borel prob. measure on SX × SY ,

πX (A) = π (A× SY ) and πY (C ) = π (SX × C ) .

Blanchet (Columbia U. and Stanford U.) 7 / 25

Page 26: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Definition of Optimal Transport Costs

Define

Dc (µ, v) = minπ{Eπ (c (X ,Y )) : πX = µ and πY = v}.

This is the so-called Kantorovich problem (see Villani (2008)).

If c (·) is a metric then Dc (µ, v) is a Wasserstein distance of order 1.If c (x , y) = 0 if and only if x = y then Dc (µ, v) = 0 if and only ifµ = v .

Kantorovich’s problem is a "nice" infinite dimensional linearprogramming problem.

Blanchet (Columbia U. and Stanford U.) 8 / 25

Page 27: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Definition of Optimal Transport Costs

Define

Dc (µ, v) = minπ{Eπ (c (X ,Y )) : πX = µ and πY = v}.

This is the so-called Kantorovich problem (see Villani (2008)).

If c (·) is a metric then Dc (µ, v) is a Wasserstein distance of order 1.If c (x , y) = 0 if and only if x = y then Dc (µ, v) = 0 if and only ifµ = v .

Kantorovich’s problem is a "nice" infinite dimensional linearprogramming problem.

Blanchet (Columbia U. and Stanford U.) 8 / 25

Page 28: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Definition of Optimal Transport Costs

Define

Dc (µ, v) = minπ{Eπ (c (X ,Y )) : πX = µ and πY = v}.

This is the so-called Kantorovich problem (see Villani (2008)).

If c (·) is a metric then Dc (µ, v) is a Wasserstein distance of order 1.

If c (x , y) = 0 if and only if x = y then Dc (µ, v) = 0 if and only ifµ = v .

Kantorovich’s problem is a "nice" infinite dimensional linearprogramming problem.

Blanchet (Columbia U. and Stanford U.) 8 / 25

Page 29: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Definition of Optimal Transport Costs

Define

Dc (µ, v) = minπ{Eπ (c (X ,Y )) : πX = µ and πY = v}.

This is the so-called Kantorovich problem (see Villani (2008)).

If c (·) is a metric then Dc (µ, v) is a Wasserstein distance of order 1.If c (x , y) = 0 if and only if x = y then Dc (µ, v) = 0 if and only ifµ = v .

Kantorovich’s problem is a "nice" infinite dimensional linearprogramming problem.

Blanchet (Columbia U. and Stanford U.) 8 / 25

Page 30: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Definition of Optimal Transport Costs

Define

Dc (µ, v) = minπ{Eπ (c (X ,Y )) : πX = µ and πY = v}.

This is the so-called Kantorovich problem (see Villani (2008)).

If c (·) is a metric then Dc (µ, v) is a Wasserstein distance of order 1.If c (x , y) = 0 if and only if x = y then Dc (µ, v) = 0 if and only ifµ = v .

Kantorovich’s problem is a "nice" infinite dimensional linearprogramming problem.

Blanchet (Columbia U. and Stanford U.) 8 / 25

Page 31: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Illustration of Optimal Transport Costs

Blanchet (Columbia U. and Stanford U.) 9 / 25

Page 32: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Fundamental Theorem of Robust Performance Analysis

Theorem (B. and Murthy (2016))

Suppose that c (·) is lower semicontinuous and that h (·) is uppersemicontinuous with EP0 |f (X )| < ∞. Then,

supDc (P0,P )≤δ

EP (f (Y )) = infλ≥0

EP0 [λδ+ supz{f (z)− λc (X , z)}].

Moreover, (π∗) and dual λ∗ are primal-dual solutions if and only if

f (y)− λ∗c (x , y) = supz{f (z)− λ∗c (x , z)} (x , y) -π∗ a.s.

λ∗ (Eπ∗ [c (X ,Y )− δ]) = 0.

Blanchet (Columbia U. and Stanford U.) 10 / 25

Page 33: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Implications for Risk Analysis

Theorem (B. and Murthy (2016))

Suppose that c (·) is lower semicontinuous and that B is a closed set. LetcB (x) = inf{c (x , y) : y ∈ B}, then

supDc (P0,P )≤δ

P (Y ∈ B) = P0 (cB (X ) ≤ 1/λ∗) ,

where λ∗ ≥ 0 satisfies (under mild assumptions on cB (X ))

δ = E0 [cB (X ) I (cB (X ) ≤ 1/λ∗)] .

Blanchet (Columbia U. and Stanford U.) 11 / 25

Page 34: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Application 1: Back to Classical Risk Problem

Suppose that

c (x , y) = dJ (x (·) , y (·)) = Skorokhod J1 metric.= inf

φ(·) bijection{ supt∈[0,1]

|x (t)− y (φ (t))| , supt∈[0,1]

|φ (t)− t|}.

If R (t) = b− Z (t), then ruin during time interval [0, 1] is

Bb = {z (·) : b ≤ supt∈[0,1]

z (t)}.

Let P0 (·) be the Wiener measure want to compute

supDc (P0,P )≤δ

P (Z ∈ Bb) .

Blanchet (Columbia U. and Stanford U.) 12 / 25

Page 35: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Application 1: Back to Classical Risk Problem

Suppose that

c (x , y) = dJ (x (·) , y (·)) = Skorokhod J1 metric.= inf

φ(·) bijection{ supt∈[0,1]

|x (t)− y (φ (t))| , supt∈[0,1]

|φ (t)− t|}.

If R (t) = b− Z (t), then ruin during time interval [0, 1] is

Bb = {z (·) : b ≤ supt∈[0,1]

z (t)}.

Let P0 (·) be the Wiener measure want to compute

supDc (P0,P )≤δ

P (Z ∈ Bb) .

Blanchet (Columbia U. and Stanford U.) 12 / 25

Page 36: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Application 1: Back to Classical Risk Problem

Suppose that

c (x , y) = dJ (x (·) , y (·)) = Skorokhod J1 metric.= inf

φ(·) bijection{ supt∈[0,1]

|x (t)− y (φ (t))| , supt∈[0,1]

|φ (t)− t|}.

If R (t) = b− Z (t), then ruin during time interval [0, 1] is

Bb = {z (·) : b ≤ supt∈[0,1]

z (t)}.

Let P0 (·) be the Wiener measure want to compute

supDc (P0,P )≤δ

P (Z ∈ Bb) .

Blanchet (Columbia U. and Stanford U.) 12 / 25

Page 37: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Application 1: Computing Distance to Bankruptcy

So: {cBb (z) ≤ 1/λ∗} = {supt∈[0,1] z (t) ≥ b− 1/λ∗}, and

supDc (P0,P )≤δ

P (Z ∈ Bb) = P0

(supt∈[0,1]

Z (t) ≥ b− 1/λ∗).

Blanchet (Columbia U. and Stanford U.) 13 / 25

Page 38: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Application 1: Computing Uncertainty Size

Note any coupling π so that πX = P0 and πY = P satisfies

Dc (P0,P) ≤ Eπ [c (X ,Y )] ≈ δ.

So use any coupling between evidence and P0 or expert knowledge.

We discuss choosing δ non-parametrically in a moment

Blanchet (Columbia U. and Stanford U.) 14 / 25

Page 39: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Application 1: Computing Uncertainty Size

Note any coupling π so that πX = P0 and πY = P satisfies

Dc (P0,P) ≤ Eπ [c (X ,Y )] ≈ δ.

So use any coupling between evidence and P0 or expert knowledge.

We discuss choosing δ non-parametrically in a moment

Blanchet (Columbia U. and Stanford U.) 14 / 25

Page 40: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Application 1: Computing Uncertainty Size

Note any coupling π so that πX = P0 and πY = P satisfies

Dc (P0,P) ≤ Eπ [c (X ,Y )] ≈ δ.

So use any coupling between evidence and P0 or expert knowledge.

We discuss choosing δ non-parametrically in a moment

Blanchet (Columbia U. and Stanford U.) 14 / 25

Page 41: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Application 1: Illustration of Coupling

Given arrivals and claim sizes let Z (t) = m−1/22 ∑

N (t)k=1 (Xk −m1)

Blanchet (Columbia U. and Stanford U.) 15 / 25

Page 42: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Application 1: Coupling in Action

Blanchet (Columbia U. and Stanford U.) 16 / 25

Page 43: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Application 1: Numerical Example

Assume Poisson arrivals.

Pareto claim sizes with index 2.2 (P (V > t) = 1/(1+ t)2.2).Cost c (x , y) = dJ (x , y)

2 <—note power of 2.

Used Algorithm 1 to calibrate (estimating means and variances fromdata).

b P0(Ruin)Ptrue (Ruin)

P ∗robust (Ruin)Ptrue (Ruin)

100 1.07× 10−1 12.28150 2.52× 10−4 10.65200 5.35× 10−8 10.80250 1.15× 10−12 10.98

Blanchet (Columbia U. and Stanford U.) 17 / 25

Page 44: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Additional Applications: Multidimensional Ruin Problems

Paper: Quantifying Distributional Model Risk via Optimal Transport(B. & Murthy ’16) https://arxiv.org/abs/1604.01446 containsmore applicationsMultidimensional risk processes (explicit evaluation of cB (x) for dJmetric).Control: minθ supP :D (P ,P0)≤δ E [L (θ,Z )] <—robust optimalreinsurance.

Blanchet (Columbia U. and Stanford U.) 18 / 25

Page 45: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Connections to Machine Learning

Connection to machine learning helps further understandwhy optimal transport costs are sensible choices...

Paper:Robust Wasserstein Profile Inference and

Applications to Machine Learning (B., Murthy & Kang ’16)https://arxiv.org/abs/1610.05627

Blanchet (Columbia U. and Stanford U.) 19 / 25

Page 46: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Robust Performance Analysis in Machine Learning

Consider estimating β∗ ∈ Rm in linear regression

Yi = βXi + ei ,

where {(Yi ,Xi )}ni=1 are data points.

Optimal Least Squares approach consists in estimating β∗ via

MSE (β) = minβ

1n

n

∑i=1

(Yi − βTXi

)2.

Apply the distributionally robust estimator based on optimaltransport.

Blanchet (Columbia U. and Stanford U.) 20 / 25

Page 47: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Robust Performance Analysis in Machine Learning

Consider estimating β∗ ∈ Rm in linear regression

Yi = βXi + ei ,

where {(Yi ,Xi )}ni=1 are data points.Optimal Least Squares approach consists in estimating β∗ via

MSE (β) = minβ

1n

n

∑i=1

(Yi − βTXi

)2.

Apply the distributionally robust estimator based on optimaltransport.

Blanchet (Columbia U. and Stanford U.) 20 / 25

Page 48: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Robust Performance Analysis in Machine Learning

Consider estimating β∗ ∈ Rm in linear regression

Yi = βXi + ei ,

where {(Yi ,Xi )}ni=1 are data points.Optimal Least Squares approach consists in estimating β∗ via

MSE (β) = minβ

1n

n

∑i=1

(Yi − βTXi

)2.

Apply the distributionally robust estimator based on optimaltransport.

Blanchet (Columbia U. and Stanford U.) 20 / 25

Page 49: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Connection to Sqrt-Lasso

Theorem (B., Kang, Murthy (2016)) Suppose that

c((x , y) ,

(x ′, y ′

))=

{‖x − x ′‖2q if y = y ′

∞ if y 6= y ′ .

Then, if 1/p + 1/q = 1

maxP :Dc (P ,Pn)≤δ

E 1/2P

((Y − βTX

)2)=√MSE (β) +

√δ ‖β‖2p .

Remark 1: This is sqrt-Lasso (Belloni et al. (2011)).Remark 2: Also representations for support vector machines, LAD lasso,group lasso, adaptive lasso, and more!

Blanchet (Columbia U. and Stanford U.) 21 / 25

Page 50: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

For Instance... Regularized Estimators

Theorem (B., Kang, Murthy (2016)) Suppose that

c((x , y) ,

(x ′, y ′

))=

{‖x − x ′‖q if y = y ′

∞ if y 6= y ′ .

Then,

supP : Dc (P ,Pn)≤δ

EP[log(1+ e−Y βTX )

]=1n

n

∑i=1log(1+ e−Yi β

TXi ) + δ ‖β‖p .

Remark 1: This is regularized logistic regression (see also Esfahani andKuhn 2015).

Blanchet (Columbia U. and Stanford U.) 22 / 25

Page 51: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Connections to Empirical Likelihood

Paper:https://arxiv.org/abs/1610.05627

Also chooses δ optimally introducingan extension of Empirical Likelihood

called "Robust Wasserstein Profile Inference".

Blanchet (Columbia U. and Stanford U.) 23 / 25

Page 52: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

The Robust Wasserstein Profile Function

Pick δ = 95% quantile of Rn(β∗) and we show that

nRn(β∗) ≈dE [e2]

E [e2]− (E |e|)2‖N(0,Cov (X ))‖2q .

Blanchet (Columbia U. and Stanford U.) 24 / 25

Page 53: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Conclusions

Presented systematic approach for quantifying model misspecification.

Approach based on optimal transport

supD (P ,P0)≤δ

P (X ∈ B) = P0 (cB (X ) ≤ 1/λ∗) .

Closed forms solutions, tractable in terms of P0, feasible calibration ofδ.

New statistical estimators, connections to machine learning &regularization.

Extensions of Empirical Likelihood & connections to optimalregularization.

Blanchet (Columbia U. and Stanford U.) 25 / 25

Page 54: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Conclusions

Presented systematic approach for quantifying model misspecification.

Approach based on optimal transport

supD (P ,P0)≤δ

P (X ∈ B) = P0 (cB (X ) ≤ 1/λ∗) .

Closed forms solutions, tractable in terms of P0, feasible calibration ofδ.

New statistical estimators, connections to machine learning &regularization.

Extensions of Empirical Likelihood & connections to optimalregularization.

Blanchet (Columbia U. and Stanford U.) 25 / 25

Page 55: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Conclusions

Presented systematic approach for quantifying model misspecification.

Approach based on optimal transport

supD (P ,P0)≤δ

P (X ∈ B) = P0 (cB (X ) ≤ 1/λ∗) .

Closed forms solutions, tractable in terms of P0, feasible calibration ofδ.

New statistical estimators, connections to machine learning &regularization.

Extensions of Empirical Likelihood & connections to optimalregularization.

Blanchet (Columbia U. and Stanford U.) 25 / 25

Page 56: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Conclusions

Presented systematic approach for quantifying model misspecification.

Approach based on optimal transport

supD (P ,P0)≤δ

P (X ∈ B) = P0 (cB (X ) ≤ 1/λ∗) .

Closed forms solutions, tractable in terms of P0, feasible calibration ofδ.

New statistical estimators, connections to machine learning &regularization.

Extensions of Empirical Likelihood & connections to optimalregularization.

Blanchet (Columbia U. and Stanford U.) 25 / 25

Page 57: Optimal Transport in Risk Analysisjblanche/presentations_slides/Blanchet_EV… · P (R (t) 2B for some t 2[0,T]), where P 0 is a suitable model. P 0 = proxy for P true. P 0 right

Conclusions

Presented systematic approach for quantifying model misspecification.

Approach based on optimal transport

supD (P ,P0)≤δ

P (X ∈ B) = P0 (cB (X ) ≤ 1/λ∗) .

Closed forms solutions, tractable in terms of P0, feasible calibration ofδ.

New statistical estimators, connections to machine learning &regularization.

Extensions of Empirical Likelihood & connections to optimalregularization.

Blanchet (Columbia U. and Stanford U.) 25 / 25