Universal Off-Policy Evaluation

Preview:

Citation preview

Universal Off-Policy Evaluation

Xuhui Liu

Nanjing University

June 18, 2021

Yash Chandak et al.โ€œUniversal Off-Policy Evaluation.โ€ CoRR, 2021.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.1/50

Table of Contents

Introduction

Universal Off-Policy Evaluation

High-Confidence Bounds

Experiment Results

Analysis

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.2/50

Table of Contents

Introduction

Universal Off-Policy Evaluation

High-Confidence Bounds

Experiment Results

Analysis

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.3/50

Off-Policy Evaluation

Definition: The problem of evaluating a new strategy for behavior, or policy, usingonly observations collected during the execution of another policy.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.4/50

Motivation

Want to evaluate new method without incurring the risk and cost of actuallyimplementing this new method/policy.Existing logs containing huge amounts of historical data based on existing policies.It makes economical sense to, if possible, use these logs.It makes economical sense to, if possible, not risk the loss of testing out a newpotentially bad policy.Online ad placement is a good example.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.5/50

Motivation

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.6/50

Importance SamplingNaive importance sampling

๐ฝ (๐œ‹๐œƒ) = ๐”ผ๐œโˆผ๐œ‹๐›ฝ(๐œ) [ ๐œ‹๐œƒ(๐œ)๐œ‹๐›ฝ(๐œ)

๐ปโˆ‘๐‘ก=0

๐›พ๐‘ก๐‘Ÿ(s, a)]

= ๐”ผ๐œโˆผ๐œ‹๐›ฝ(๐œ) [(๐ป

โˆ๐‘ก=0

๐œ‹๐œƒ (a๐‘ก โˆฃ s๐‘ก)๐œ‹๐›ฝ (a๐‘ก โˆฃ s๐‘ก)

)๐ป

โˆ‘๐‘ก=0

๐›พ๐‘ก๐‘Ÿ(s, a)] โ‰ˆ๐‘›

โˆ‘๐‘–=1

๐‘ค๐‘–๐ป

๐ปโˆ‘๐‘ก=0

๐›พ๐‘ก๐‘Ÿ๐‘–๐‘ก

where ๐‘ค๐‘–๐‘ก = 1

๐‘›ฮ ๐‘ก๐‘กโ€ฒ=0

๐œ‹๐œƒ (๐‘Ž๐‘–๐‘กโ€ฒ โˆฃ ๐‘ ๐‘–

๐‘กโ€ฒ)๐œ‹๐›ฝ (๐‘Ž๐‘–

๐‘กโ€ฒ โˆฃ ๐‘ ๐‘–๐‘กโ€ฒ)

Weighted importance sampling

๐‘ค๐‘–๐‘ก = 1

๐‘›ฮ ๐‘ก๐‘กโ€ฒ=0

๐œ‹๐œƒ (๐‘Ž๐‘–๐‘กโ€ฒ โˆฃ ๐‘ ๐‘–

๐‘กโ€ฒ)๐œ‹๐›ฝ (๐‘Ž๐‘–

๐‘กโ€ฒ โˆฃ ๐‘ ๐‘–๐‘กโ€ฒ)

โ‡’ ๐‘ค๐‘–๐‘ก = 1

โˆ‘๐‘›๐‘–=1 ๐‘ค๐‘–

๐‘กฮ ๐‘ก

๐‘กโ€ฒ=0๐œ‹๐œƒ (๐‘Ž๐‘–

๐‘กโ€ฒ โˆฃ ๐‘ ๐‘–๐‘กโ€ฒ)

๐œ‹๐›ฝ (๐‘Ž๐‘–๐‘กโ€ฒ โˆฃ ๐‘ ๐‘–

๐‘กโ€ฒ)

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.7/50

Statistical Quantities

Consider a random variable ๐‘‹, the cumulative distribution function and probabilitydistribution function of which are ๐น(๐‘ฅ) and ๐‘(๐‘ฅ), respectively.

Mean ๐”ผ[๐‘‹].Quantile quantile๐›ผ(๐‘‹) = ๐น โˆ’1(๐›ผ).Value at Risk (VaR) VaR๐›ผ(๐‘‹) = quantile๐›ผ.Conditional Value at Risk (CVaR) CVaR๐›ผ(๐‘‹) = ๐”ผ[๐‘‹|๐‘‹ โ‰ค quantile๐›ผ(๐‘‹)].Variance ๐”ผ[(๐‘‹ โˆ’ ๐”ผ๐‘‹)2].Entropy ๐ป(๐‘‹) = โˆซ ๐‘(๐‘ฅ) log ๐‘(๐‘ฅ)๐‘‘๐‘ฅ.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.8/50

Illustration

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.9/50

Limitations

Safety critical applications, e.g., automated healthcare.Risk-prone metrics: Value at risk (VaR) and conditional value at risk(CVaR).

Applications like online recommendations are subject to noisy dataRobust metrics: Median and other quantiles.

Applications involving direct human-machine interaction, such as robotics andautonomous driving.Uncertainty metrics: Variance and entropy.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.10/50

How do we develop a universal off-policy methodโ€”one that can estimate any desiredperformance metrics and can also provide high-confidence bounds that holdsimultaneously with high probability for those metrics?

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.11/50

Table of Contents

Introduction

Universal Off-Policy Evaluation

High-Confidence Bounds

Experiment Results

Analysis

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.12/50

Notations

Partial Observable Markov Decision Process (POMDP) (๐’ฎ, ๐’ช, ๐’œ, ๐’ซ, ฮฉ, โ„›, ๐›พ, ๐‘‘0).๐’Ÿ is the data set (๐ป๐‘–)๐‘›

๐‘–=1 collected using behavior policies (๐›ฝ๐‘–)๐‘›๐‘–=1, where ๐ป๐‘– is

the observed history (๐‘‚0, ๐ด0, ๐›ฝ(๐ด0|๐‘‚0), ๐‘…0, ๐‘‚1, โ€ฆ ).๐บ๐‘– โˆถ= โˆ‘๐‘‡

๐‘—=0 ๐›พ๐‘—๐‘…๐‘— is the return for ๐ป๐‘–.๐‘‡ is the horizon length.๐บ๐œ‹ and ๐ป๐œ‹ is the random variables for return s and comlete trajectories under anypolicy ๐œ‹, respectively.๐‘”(โ„Ž) is the return for trajectory โ„Ž.โ„‹๐œ‹ is the set of all possible trajectories for policy ๐œ‹.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.13/50

Assumptions

Assumption 1The set ๐’Ÿ contains independent (not necessarily identically distributed) historiesgenerated using (๐›ฝ๐‘–)๐‘›

๐‘–=1, along with the probability of the actions chosen, such that forsome ๐œ– > 0, (๐›ฝ๐‘–(๐‘Ž|๐‘œ) < ๐œ–) โŸน (๐œ‹(๐‘Ž|๐‘œ) = 0), for all ๐‘œ โˆˆ ๐’ช, ๐‘Ž โˆˆ ๐’œ, and ๐‘– โˆˆ (1, โ€ฆ , ๐‘›).

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.14/50

Method

First estimate its cumulative distribution ๐น๐œ‹.Then use it to estimate its parameter ๐œ“(๐น๐œ‹).

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.15/50

Method

Represent ๐น๐œ‹ with return ๐บ๐œ‹.

๐น๐œ‹(๐œˆ) = Pr (๐บ๐œ‹ โ‰ค ๐œˆ) = โˆ‘๐‘ฅโˆˆ๐’ณ,๐‘ฅโ‰ค๐œˆ

Pr (๐บ๐œ‹ = ๐‘ฅ) = โˆ‘๐‘ฅโˆˆ๐’ณ,๐‘ฅโ‰ค๐œˆ

( โˆ‘โ„Žโˆˆโ„‹๐œ‹

Pr (๐ป๐œ‹ = โ„Ž) ๐Ÿ™{๐‘”(โ„Ž)=๐‘ฅ})

Exchange the order of Sum.

๐น๐œ‹(๐œˆ) = โˆ‘โ„Žโˆˆโ„‹๐œ‹

Pr (๐ป๐œ‹ = โ„Ž) โˆ‘๐‘ฅโˆˆ๐’ณ,๐‘ฅโ‰ค๐œˆ

๐Ÿ™{๐‘”(โ„Ž)=๐‘ฅ} = โˆ‘โ„Žโˆˆโ„‹๐œ‹

Pr (๐ป๐œ‹ = โ„Ž) (๐Ÿ™{๐‘”(โ„Ž)โ‰ค๐œˆ})

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.16/50

Method

From Assumption 1 โˆ€๐›ฝ, โ„‹๐œ‹ โŠ‚ โ„‹๐›ฝ,

๐น๐œ‹(๐œˆ) = โˆ‘โ„Žโˆˆโ„‹๐›ฝ

Pr (๐ป๐œ‹ = โ„Ž) (๐Ÿ™{๐‘”(โ„Ž)โ‰ค๐œˆ}) = โˆ‘โ„Žโˆˆโ„‹๐›ฝ

Pr (๐ป๐›ฝ = โ„Ž) Pr (๐ป๐œ‹ = โ„Ž)Pr (๐ป๐›ฝ = โ„Ž) (๐Ÿ™{๐‘”(โ„Ž)โ‰ค๐œˆ})

Let ๐œŒ๐‘– โˆถ= ฮ ๐‘‡๐‘—=0

๐œ‹(๐ด๐‘—|๐‘‚๐‘—)๐›ฝ๐‘–(๐ด๐‘—|๐‘‚๐‘—) , which is equal to Pr(๐ป๐œ‹ = โ„Ž)/Pr(๐ป๐›ฝ = โ„Ž).

โˆ€๐œˆ โˆˆ โ„, ๐น๐‘›(๐œˆ) โˆถ= 1๐‘›

๐‘›โˆ‘๐‘–=1

๐œŒ๐‘–๐Ÿ™{๐บ๐‘–โ‰ค๐œˆ}

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.17/50

Illustration

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.18/50

Partial Observable Setting

๐’ช, ๐’ช: Observation set for the behavior policy and the evaluation policy.If ๐’ช = ๐’ช = ๐’ฎ, it becomes MDP setting.If ๐’ช = ๐’ช, as ๐›ฝ(๐‘Ž|๐‘œ) = ๐›ฝ(๐‘Ž| ๐‘œ), one can use density estimation on the availabledata, ๐’Ÿ, to construct an estimator ๐›ฝ(๐‘Ž|๐‘œ) of Pr(๐‘Ž| ๐‘œ) = ๐›ฝ(๐‘Ž| ๐‘œ).

๐’ช โ‰  ๐’ช, it is only possible to estimate Pr(๐‘Ž| ๐‘œ) = โˆ‘๐‘ฅโˆˆ๐‘œ ๐›ฝ(๐‘Ž|๐‘ฅ)Pr(๐‘ฅ| ๐‘œ) throughdensity estimation using data ๐’Ÿ.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.19/50

Probability Distribution and Inverse CDF

Let (๐บ(๐‘–))๐‘›๐‘–=1 be the order statistics for samples (๐บ๐‘–)๐‘›

๐‘–=1 and ๐บ0 โˆถ= ๐บmin.

Inverse CDF๐น โˆ’1๐‘› (๐›ผ) โˆถ= min {๐‘” โˆˆ (๐บ(๐‘–))

๐‘›๐‘–=1

โˆฃ ๐น๐‘›(๐‘”) โ‰ฅ ๐›ผ}

Probability distributiond ๐น๐‘› (๐บ(๐‘–)) โˆถ= ๐น๐‘› (๐บ(๐‘–)) โˆ’ ๐น๐‘› (๐บ(๐‘–โˆ’1))

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.20/50

Estimator

๐œ‡๐œ‹ ( ๐น๐‘›) โˆถ= โˆ‘๐‘›๐‘–=1 d ๐น๐‘› (๐บ(๐‘–)) ๐บ(๐‘–),

๐œŽ2๐œ‹ ( ๐น๐‘›) โˆถ= โˆ‘๐‘›

๐‘–=1 d ๐น๐‘› (๐บ(๐‘–)) (๐บ(๐‘–) โˆ’ ๐œ‡๐œ‹ ( ๐น๐‘›))2 ,๐‘„๐›ผ

๐œ‹ ( ๐น๐‘›) โˆถ= ๐น โˆ’1๐‘› (๐›ผ),

CVaR๐›ผ๐œ‹ ( ๐น๐‘›) โˆถ= 1

๐›ผ โˆ‘๐‘›๐‘–=1 d ๐น๐‘› (๐บ(๐‘–)) ๐บ(๐‘–)๐Ÿ™{๐บ(๐‘–)โ‰ค๐‘„๐›ผ๐œ‹ ( ๐น๐‘›) } .

โ€ข Definition of CVaR:CVaR๐›ผ

๐œ‹ (๐น๐œ‹) = ๐”ผ [๐บ๐œ‹ โˆฃ ๐บ๐œ‹ โ‰ค ๐น โˆ’1๐œ‹ (๐›ผ)]

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.21/50

Table of Contents

Introduction

Universal Off-Policy Evaluation

High-Confidence Bounds

Experiment Results

Analysis

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.22/50

High-Confidence Bounds

1. Itโ€™s easy to obtain bounds for a single point.2. Itโ€™s hard hard to obtain bounds for a interval.3. CDF is monotonically non-decreasing.

Let (๐œ…๐‘–)๐พ๐‘–=1 be any K โ€œkey points at which we obtain confidence interval for

(๐น๐œ‹(๐œ…๐‘–))๐พ๐‘–=1.

Generalize to whole interval based on these โ€œkey pointsโ€.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.23/50

Let CIโˆ’ (๐œ…๐‘–, ๐›ฟ๐‘–) and CI+ (๐œ…๐‘–, ๐›ฟ๐‘–) be the lower and the upper confidence bounds on๐น๐œ‹ (๐œ…๐‘–), respectively, such that

โˆ€๐‘– โˆˆ (1, โ€ฆ , ๐พ), Pr (CIโˆ’ (๐œ…๐‘–, ๐›ฟ๐‘–) โ‰ค ๐น๐œ‹ (๐œ…๐‘–) โ‰ค CI+ (๐œ…๐‘–, ๐›ฟ๐‘–)) โ‰ฅ 1 โˆ’ ๐›ฟ๐‘–

Based on this constuction, we formulate a lower function ๐นโˆ’ and an upper function ๐น+:

๐นโˆ’(๐œˆ) โˆถ= { 1 if ๐œˆ > ๐บmaxmax๐œ…๐‘–โ‰ค๐œˆ ๐ถ๐ผโˆ’ (๐œ…๐‘–, ๐›ฟ๐‘–) otherwise

๐น+(๐œˆ) โˆถ= { 0 if ๐œˆ < ๐บminmin๐œ…๐‘–โ‰ฅ๐œˆ ๐ถ๐ผ+ (๐œ…๐‘–, ๐›ฟ๐‘–) otherwise

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.24/50

Illustration

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.25/50

Illustration

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.26/50

Illustration

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.27/50

Bootstrap Bounds1

1B. Efron and R. J. Tibshirani. โ€œAn introduction to the Bootstrapโ€. CRC press, 1994..

.

...

..

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.28/50

Non-stationary

Assumption 3For any ๐œˆ, โˆƒ๐‘ค๐œˆ โˆˆ โ„๐‘‘, such that, โˆ€๐‘– โˆˆ [1, ๐ฟ + โ„“], ๐น (๐‘–)

๐œ‹ (๐œˆ) = ๐œ™(๐‘–)โŠค๐‘ค๐œˆ.

Estimating ๐น (๐‘–)๐œ‹ can now be seen as a time-series forecasting problem.

Wild bootstrap2 provides approximate CIs.

2E. Mammen. โ€œBootstrap and wild bootstrap for high dimensional linear models.โ€ The Annals ofStatistics, pages 255โ€“285, 1993.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.29/50

Table of Contents

Introduction

Universal Off-Policy Evaluation

High-Confidence Bounds

Experiment Results

Analysis

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.30/50

Setting

A simulated stationary and a non-stationary recommender system domain,where the userโ€™s interest for a finite set of items is represented using thecorresponding itemโ€™s reward.Type-1 Diabetes Mellitus Simulator (T1DMS) for the treatment of type-1diabetes.A continuous-state Gridworld with partial observability (which also makes thedomain non-Markovian in the observations), stochastic transitions, and eightdiscrete actions corresponding to up, down, left, right, and the four diagonalmovements.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.31/50

Experiments3

3P. Thomas, G. Theocharous, and M. Ghavamzadeh. โ€œHigh confidence policy improvement.โ€ICML, 2015.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.32/50

Experiments4

4Y. Chandak, S. Shankar, and P. S. Thomas. โ€œHigh confidence off-policy (or counterfactual)variance estimation.โ€ AAAI, 2021.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.33/50

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.34/50

Table of Contents

Introduction

Universal Off-Policy Evaluation

High-Confidence Bounds

Experiment Results

Analysis

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.35/50

Theorem

Estimatorโˆ€๐œˆ โˆˆ โ„, ๐น๐‘›(๐œˆ) โˆถ= 1

๐‘›๐‘›

โˆ‘๐‘–=1

๐œŒ๐‘–๐Ÿ™{๐บ๐‘–โ‰ค๐œˆ}

Theoretical guarantee

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.36/50

Part 1 UnbiasednessRecall that

๐น๐œ‹(๐œˆ) = โˆ‘โ„Žโˆˆโ„‹๐›ฝ

Pr (๐ป๐œ‹ = โ„Ž) (๐Ÿ™{๐‘”(โ„Ž)โ‰ค๐œˆ}) = โˆ‘โ„Žโˆˆโ„‹๐›ฝ

Pr (๐ป๐›ฝ = โ„Ž) Pr (๐ป๐œ‹ = โ„Ž)Pr (๐ป๐›ฝ = โ„Ž) (๐Ÿ™{๐‘”(โ„Ž)โ‰ค๐œˆ})

The probability of a trajectory under a policy ๐œ‹ with partial observations andnon-Markovian structure is

Pr (๐ป๐œ‹ = โ„Ž) = Pr (๐‘ 0) Pr (๐‘œ0 โˆฃ ๐‘ 0) Pr ( ๐‘œ0 โˆฃ ๐‘œ0, ๐‘ 0) Pr (๐‘Ž0 โˆฃ ๐‘ 0, ๐‘œ0, ๐‘œ0; ๐œ‹)

ร—๐‘‡ โˆ’1โˆ๐‘–=0

(Pr (๐‘Ÿ๐‘– โˆฃ โ„Ž๐‘–) Pr (๐‘ ๐‘–+1 โˆฃ โ„Ž๐‘–) Pr (๐‘œ๐‘–+1 โˆฃ ๐‘ ๐‘–+1, โ„Ž๐‘–) Pr ( ๐‘œ๐‘–+1 โˆฃ ๐‘ ๐‘–+1, ๐‘œ๐‘–+1, โ„Ž๐‘–)

ร— Pr (๐‘Ž๐‘–+1 โˆฃ ๐‘ ๐‘–+1, ๐‘œ๐‘–+1, ๐‘œ๐‘–+1, โ„Ž๐‘–; ๐œ‹)) Pr (๐‘Ÿ๐‘‡ โˆฃ โ„Ž๐‘‡ )

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.37/50

The ratio can be written as

Pr (๐ป๐œ‹ = โ„Ž)Pr (๐ป๐›ฝ = โ„Ž) = Pr (๐‘Ž0 โˆฃ ๐‘ 0, ๐‘œ0, ๐‘œ0; ๐œ‹)

Pr (๐‘Ž0 โˆฃ ๐‘ 0, ๐‘œ0, ๐‘œ0; ๐›ฝ)๐‘‡ โˆ’1โˆ๐‘–=0

Pr (๐‘Ž๐‘–+1 โˆฃ ๐‘ ๐‘–+1, ๐‘œ๐‘–+1, ๐‘œ๐‘–+1, โ„Ž๐‘–; ๐œ‹)Pr (๐‘Ž๐‘–+1 โˆฃ ๐‘ ๐‘–+1, ๐‘œ๐‘–+1, ๐‘œ๐‘–+1, โ„Ž๐‘–; ๐›ฝ)

=๐‘‡

โˆ๐‘–=0

๐œ‹ (๐‘Ž๐‘– โˆฃ ๐‘œ๐‘–)๐›ฝ (๐‘Ž๐‘– โˆฃ ๐‘œ๐‘–)

= ๐œŒ(โ„Ž)

Then we have๐น๐œ‹(๐œˆ) = โˆ‘

โ„Žโˆˆโ„‹๐›ฝ

Pr(๐ป๐›ฝ = โ„Ž)๐œŒ(โ„Ž)(๐Ÿ™{๐‘”(โ„Ž)โ‰ค๐œˆ}).

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.38/50

๐”ผ๐’Ÿ [ ๐น๐‘›(๐œˆ)] = ๐”ผ๐’Ÿ [ 1๐‘›

๐‘›โˆ‘๐‘–=1

๐œŒ๐‘– (๐Ÿ™{๐บ๐‘–โ‰ค๐œˆ})]

= 1๐‘›

๐‘›โˆ‘๐‘–=1

๐”ผ๐’Ÿ [๐œŒ๐‘– (๐Ÿ™{๐บ๐‘–โ‰ค๐œˆ})]

= 1๐‘›

๐‘›โˆ‘๐‘–=1

โˆ‘โ„Žโˆˆโ„‹๐›ฝ๐‘–

Pr (๐ป๐›ฝ๐‘–= โ„Ž) ๐œŒ(โ„Ž) (๐Ÿ™{๐‘”(โ„Ž)โ‰ค๐œˆ})

(๐‘Ž)= 1๐‘›

๐‘›โˆ‘๐‘–=1

๐น๐œ‹(๐œˆ)

= ๐น๐œ‹(๐œˆ)

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.39/50

Part 2 Uniform Consistency

First we show pointwise consistency, i.e., for all ๐œˆ, ๐น๐‘›(๐œˆ) a.s.โŸถ ๐น๐œ‹(๐œˆ).

Then we use this to establish uniform consistency.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.40/50

Kolmogorovโ€™ Strong Law of Large Numbers

The strong law of large numbers states that the sample average converges almostsurely to the expexted value:

๏ฟฝ๏ฟฝ๐‘›a.s.โŸถ ๐œ‡ when ๐‘› โ†’ โˆž

if one of the following conditions is satisfied:1. The random variables are identically distributed;2. For each ๐‘›, the variance of ๐‘‹๐‘› is finite, and

โˆžโˆ‘๐‘›=1

Var [๐‘‹๐‘›]๐‘›2 < โˆž

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.41/50

Let๐‘‹๐‘– โˆถ= ๐œŒ๐‘– (๐Ÿ™{๐บ๐‘–โ‰ค๐œˆ})

By assumption 1, ๐›ฝ(๐‘Ž|๐‘œ) โ‰ฅ ๐œ– when ๐œ‹(๐‘Ž| ๐‘œ) > 0. This implies the ratio is boundedabove, and hence ๐‘‹๐‘– are bounded above and have a finite variance.By Kolmogorovโ€™s strong law of large numbers:

๐น๐‘›(๐œˆ) = 1๐‘›

๐‘›โˆ‘๐‘–=1

๐‘‹๐‘–a.s.โ†’ ๐”ผ๐’Ÿ [ 1

๐‘›๐‘›

โˆ‘๐‘–=1

๐‘‹๐‘–] = ๐น๐œ‹(๐œˆ)

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.42/50

Some extra notation to tackle discontinuities in CDF ๐น๐œ‹

๐น๐œ‹ (๐œˆโˆ’) โˆถ= Pr (๐บ๐œ‹ < ๐œˆ) = ๐น๐œ‹(๐œˆ) โˆ’ Pr (๐น๐œ‹ = ๐œˆ) , ๐น๐‘› (๐œˆโˆ’) โˆถ= 1๐‘›

๐‘›โˆ‘๐‘–=1

๐œŒ๐‘– (๐Ÿ™{๐บ๐‘–<๐œˆ})

Similarly, we have๐น๐‘› (๐œˆโˆ’) a.sโŸถ ๐น๐œ‹ (๐œˆโˆ’)

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.43/50

Let ๐œ–1 > 0, and let ๐พ be any value more than 1/๐œ–1. Let (๐œ…๐‘–)๐พ๐‘–=0 be ๐พ key points,

๐บmin = ๐œ…0 < ๐œ…1 โ‰ค ๐œ…2 โ€ฆ โ‰ค ๐œ…๐พโˆ’1 < ๐œ…๐พ = ๐บmax

which create ๐พ intervals such that for all ๐‘– โˆˆ (1, โ€ฆ , ๐พ โˆ’ 1),

๐น๐œ‹ (๐œ…โˆ’๐‘– ) โ‰ค ๐‘–

๐พ โ‰ค ๐น๐œ‹ (๐œ…๐‘–)

Then by construction, if ๐œ…๐‘–โˆ’1 < ๐œ…๐‘–,

๐น๐œ‹ (๐œ…โˆ’๐‘– ) โˆ’ ๐น๐œ‹ (๐œ…๐‘–โˆ’1) โ‰ค ๐‘–

๐พ โˆ’ ๐‘– โˆ’ 1๐พ = 1

๐พ < ๐œ–1.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.44/50

For any ๐œˆ, let ๐œ…๐‘–โˆ’1 and ๐œ…๐‘– be such that ๐œ…๐‘–โˆ’1 โ‰ค ๐œˆ < ๐œ…๐‘–. Then,

๐น๐‘›(๐œˆ) โˆ’ ๐น๐œ‹(๐œˆ) โ‰ค ๐น๐‘› (๐œ…โˆ’๐‘– ) โˆ’ ๐น๐œ‹ (๐œ…๐‘–โˆ’1)

โ‰ค ๐น๐‘› (๐œ…โˆ’๐‘– ) โˆ’ ๐น๐œ‹ (๐œ…โˆ’

๐‘– ) + ๐œ–1.

Similarly,๐น๐‘›(๐œˆ) โˆ’ ๐น๐œ‹(๐œˆ) โ‰ฅ ๐น๐‘› (๐œ…๐‘–โˆ’1) โˆ’ ๐น๐œ‹ (๐œ…โˆ’

๐‘– )โ‰ฅ ๐น๐‘› (๐œ…๐‘–โˆ’1) โˆ’ ๐น๐œ‹ (๐œ…๐‘–โˆ’1) โˆ’ ๐œ–1

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.45/50

Then, โˆ€๐œˆ โˆˆ โ„,

๐น๐‘› (๐œ…๐‘–โˆ’1) โˆ’ ๐น๐œ‹ (๐œ…๐‘–โˆ’1) โˆ’ ๐œ–1 โ‰ค ๐น๐‘›(๐œˆ) โˆ’ ๐น๐œ‹(๐œˆ) โ‰ค ๐น๐‘› (๐œ…โˆ’๐‘– ) โˆ’ ๐น๐œ‹ (๐œ…โˆ’

๐‘– ) + ๐œ–1,

Letฮ”๐‘› โˆถ= max

๐‘–โˆˆ(1โ€ฆ๐พโˆ’1){โˆฃ ๐น๐‘› (๐œ…๐‘–) โˆ’ ๐น๐œ‹ (๐œ…๐‘–)โˆฃ , โˆฃ ๐น๐‘› (๐œ…โˆ’

๐‘– ) โˆ’ ๐น๐œ‹ (๐œ…โˆ’๐‘– )โˆฃ}

By the pointwise convergence, we have

ฮ”๐‘›a.s.โ†’ 0

and thus,โˆฃ ๐น๐‘›(๐œˆ) โˆ’ ๐น๐œ‹(๐œˆ)โˆฃ โ‰ค ฮ”๐‘› + ๐œ–1

Finally, since the inequality holds for โˆ€๐œˆ โˆˆ โ„ and is valid for any ๐œ–1 > 0, making๐œ–1 โ†’ 0 gives the desired result,

sup๐œˆโˆˆโ„

โˆฃ ๐น๐‘›(๐œˆ) โˆ’ ๐น๐œ‹(๐œˆ)โˆฃ a.s.โŸถ 0

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.46/50

Variance Reduction

Inspired by weighted importance sampling

โˆ€๐œˆ โˆˆ โ„, ๐น๐‘›(๐œˆ) โˆถ= 1โˆ‘๐‘›

๐‘—=1 ๐œŒ๐‘—(

๐‘›โˆ‘๐‘–=1

๐œŒ๐‘– (๐Ÿ™{๐บ๐‘–โ‰ค๐œˆ})) .

Under Assumption 1, ๐น๐‘› may be biased but is a uniformly consistent estimator of๐น๐œ‹,

โˆ€๐œˆ โˆˆ โ„, ๐”ผ๐’Ÿ [ ๐น๐‘›(๐œˆ)] โ‰  ๐น๐œ‹, sup๐œˆโˆˆโ„

โˆฃ ๐น๐‘›(๐œˆ) โˆ’ ๐น๐œ‹(๐œˆ)โˆฃ a.s.โŸถ 0

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.47/50

Part 1 Biased: We prove this using a counter-example. Let ๐‘› = 1, so

โˆ€๐œˆ โˆˆ โ„, ๐”ผ๐’Ÿ [ ๐น๐‘›(๐œˆ)] = ๐”ผ๐’Ÿ โŽกโŽขโŽฃ

1โˆ‘1

๐‘—=1 ๐œŒ๐‘—(

1โˆ‘๐‘–=1

๐œŒ๐‘–๐Ÿ™{๐บ๐‘–โ‰ค๐œˆ})โŽคโŽฅโŽฆ

= ๐”ผ๐’Ÿ [๐Ÿ™{๐บ1โ‰ค๐œˆ}](๐‘Ž)= โˆ‘

โ„Žโˆˆโ„‹๐›ฝ1

Pr (๐ป๐›ฝ1= โ„Ž) (๐Ÿ™{๐‘”(โ„Ž)โ‰ค๐œˆ})

= ๐น๐›ฝ1(๐œˆ)

โ‰  ๐น๐œ‹(๐œˆ)

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.48/50

Part 2 Uniform Consistency: First, we will establish pointwise consistency, i.e., forany ๐œˆ, ๐น๐‘›(๐œˆ) a.s.โ†’ ๐น๐œ‹(๐œˆ), and then we will use this to establish uniform consistency, asrequired.

โˆ€๐œˆ โˆˆ โ„, ๐น๐‘›(๐œˆ) = 1โˆ‘1

๐‘—=1 ๐œŒ๐‘—(

1โˆ‘๐‘–=1

๐œŒ๐‘–๐Ÿ™{๐บ๐‘–โ‰ค๐œˆ})

= ( 1๐‘›

๐‘›โˆ‘๐‘—=1

๐œŒ๐‘—)โˆ’1

( 1๐‘›

๐‘›โˆ‘๐‘–=1

๐œŒ๐‘–๐Ÿ™{๐บ๐‘–โ‰ค๐œˆ}) .

If both (lim๐‘›โ†’๐‘‡1๐‘› โˆ‘๐‘›

๐‘—=1 ๐œŒ๐‘—)โˆ’1

and (lim๐‘›โ†’โˆž1๐‘› โˆ‘๐‘›

๐‘–=1 ๐œŒ๐‘–๐Ÿ™{๐บ๐‘–โ‰ค๐œˆ}) exist, then usingSlutskyโ€™s theorem, โˆ€๐œˆ โˆˆ โ„,

lim๐‘›โ†’โˆž

๐น๐‘›(๐œˆ) = ( lim๐‘›โ†’โˆž

1๐‘›

๐‘›โˆ‘๐‘—=1

๐œŒ๐‘—)โˆ’1

( lim๐‘›โ†’โˆž

1๐‘›

๐‘›โˆ‘๐‘–=1

๐œŒ๐‘–๐Ÿ™{๐บ๐‘–โ‰ค๐œˆ})

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.49/50

Notice using Kolmogorovโ€™s strong law of large numbers that the term in the firstparentheses will converge to the expected value of importance ratios, which equals one.Similarly, we know that the term in the second parentheses will converge to ๐น๐œ‹(๐œˆ)almost surely. Therefore,

โˆ€๐œˆ โˆˆ โ„, ๐น๐‘›(๐œˆ) a.s.โŸถ (1)โˆ’1 (๐น๐œ‹(๐œˆ)) = ๐น๐œ‹(๐œˆ)

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.50/50

Recommended