Concentration inequalities and tail bounds€¦ · Concentration inequalities and tail bounds John...

Preview:

Citation preview

Concentration inequalities and tail bounds

John Duchi

Prof. John Duchi

Outline

I Basics and motivation1 Law of large numbers2 Markov inequality3 Cherno↵ bounds

II Sub-Gaussian random variables1 Definitions2 Examples3 Hoe↵ding inequalities

III Sub-exponential random variables1 Definitions2 Examples3 Cherno↵/Bernstein bounds

Prof. John Duchi

Motivation

I Often in this class, goal is to argue that sequence of random(vectors) X1, X2, . . . satisfies

1

n

nX

i=1

X

i

p! E[X].

I Law of large numbers: if E[kXk] < 1, then

P

lim

n!1

1

n

nX

i=1

X

i

6= E[X]

!= 0.

Prof. John Duchi

Markov inequalities

Theorem (Markov’s inequality)

Let X be a non-negative random variable. Then

P(X � t) E[X]

t

.

Prof. John Duchi

Chebyshev inequalitiesTheorem (Chebyshev’s inequality)

Let X be a real-valued random variable with E[X2] < 1. Then

P(X � E[X] � t) E[(X � E[X])

2]

t

2=

Var(X)

t

2.

Example: i.i.d. sampling

Prof. John Duchi

Cherno↵ bounds

Moment generating function: for random variable X, the MGF is

M

X

(�) := E[e�X ]

Example: Normally distributed random variables

Prof. John Duchi

Cherno↵ bounds

Theorem (Cherno↵ bound)

For any random variable and t � 0,

P(X �E[X] � t) inf

��0M

X�E[X](�)e��t

= inf

��0E[e�(X�E[X])

]e

��t

.

Prof. John Duchi

Sub-Gaussian random variables

Definition (Sub-Gaussianity)

A mean-zero random variable X is �2-sub-Gaussian if

Ehe

�X

i exp

✓�

2�

2

2

◆for all � 2 R

Example: X ⇠ N(0,�2)

Prof. John Duchi

Properties of sub-Gaussians

Proposition (sums of sub-Gaussians)

Let X

i

be independent, mean-zero �

2i

-sub-Gaussian. ThenPn

i=1Xi

is

Pn

i=1 �2i

-sub-Gaussian.

Prof. John Duchi

Concentration inequalities

TheoremLet X be �

2-sub-Gaussian. Then for t � 0,

P(X � E[X] � t) exp

✓� t

2

2�

2

P(X � E[X] �t) exp

✓� t

2

2�

2

Prof. John Duchi

Concentration: convergence of an independent sum

CorollaryLet X

i

be independent �

2i

-sub-Gaussian. Then for t � 0,

P 1

n

nX

i=1

X

i

� t

! exp

� nt

2

2

1n

Pn

i=1 �2i

!

Prof. John Duchi

Example: bounded random variables

PropositionLet X 2 [a, b], with E[X] = 0. Then

E[e�X ] e

�2(b�a)2

8.

Prof. John Duchi

Maxima of sub-Gaussian random variables (in probability)

Emax

jn

X

j

�p2�

2log n

Prof. John Duchi

Maxima of sub-Gaussian random variables (in expectation)

P✓max

jn

X

j

�p2�

2(log n+ t)

◆ e

�t

.

Prof. John Duchi

Hoe↵ding’s inequality

If Xi

are bounded in [a

i

, b

i

] then for t � 0,

P 1

n

nX

i=1

(X

i

� E[Xi

]) � t

! exp

� 2nt

2

1n

Pn

i=1(bi � a

i

)

2

!

P 1

n

nX

i=1

(X

i

� E[Xi

]) �t

! exp

� 2nt

2

1n

Pn

i=1(bi � a

i

)

2

!.

Prof. John Duchi

Equivalent definitions of sub-Gaussianity

TheoremThe following are equivalent (up to constants)

i E[exp(X2/�

2)] e

ii E[|X|k]1/k �

pk

iii P(|X| � t) exp(� t

2

2�2 )

If in addition X is mean-zero, then this is also equivalent to i–iii

above

iv X is �

2-sub-Gaussian

Prof. John Duchi

Sub-exponential random variables

Definition (Sub-exponential)

A mean-zero random variable X is (⌧2, b)-sub-Exponential if

E [exp (�X)] exp

✓�

2⌧

2

2

◆for |�| 1

b

.

Example: Exponential RV, density p(x) = �e

��x for x � 0

Prof. John Duchi

Sub-exponential random variables

Example: �2-random variable. Let Z ⇠ N(0,�2) and X = Z

2.Then

E[e�X ] =

1

[1� 2��

2]

12+

.

Prof. John Duchi

Concentration of sub-exponentials

TheoremLet X be (⌧

2, b)-sub-exponential. Then

P(X � E[X]+t) (e

� t2

2⌧2if 0 t ⌧

2

b

e

� t2b

if t � ⌧

2

b

= max

⇢e

� t2

2⌧2, e

� t2b

�.

Prof. John Duchi

Sums of sub-exponential random variables

Let Xi

be independent (⌧2i

, b

i

)-sub-exponential random variables.Then

Pn

i=1Xi

is (P

n

i=1 ⌧2i

, b⇤)-sub-exponential, whereb⇤ = max

i

b

i

Corollary: If Xi

satisfy above, then

P �����

1

n

nX

i=1

X

i

� E[Xi

]

����� � t

! 2 exp

�min

(nt

2

2

1n

Pn

i=1 ⌧2i

,

nt

2b⇤

)!.

Prof. John Duchi

Bernstein conditions and sub-exponentialsSuppose X is mean-zero with

|E[Xk

]| 1

2

k!�

2b

k�2

Then

E[e�X ] exp

✓�

2�

2

2(1� b|�|)

Prof. John Duchi

Johnson-Lindenstrauss and high-dimensional embedding

Question: Let u1, . . . , um 2 Rd be arbitrary. Can we find amapping F : Rd ! Rn, n ⌧ d, such that

(1� �)

��u

i � u

j

��22��F (u

i

)� F (u

j

)

��22 (1 + �)

��u

i � u

j

��22

Theorem (Johnson-Lindenstrauss embedding)

For n & 1✏

2 logm such a mapping exists.

Prof. John Duchi

Proof of Johnson-Lindenstrauss continued

P �����

kXuk22n kuk22

� 1

����� � t

! 2 exp

✓�nt

2

8

◆for t 2 [0, 1].

Prof. John Duchi

Reading and bibliography

1. S. Boucheron, O. Bousquet, and G. Lugosi. Concentrationinequalities.In O. Bousquet, U. Luxburg, and G. Ratsch, editors, AdvancedLectures in Machine Learning, pages 208–240. Springer, 2004

2. V. Buldygin and Y. Kozachenko. Metric Characterization of

Random Variables and Random Processes, volume 188 ofTranslations of Mathematical Monographs.American Mathematical Society, 2000

3. M. Ledoux. The Concentration of Measure Phenomenon.American Mathematical Society, 2001

4. S. Boucheron, G. Lugosi, and P. Massart. ConcentrationInequalities: a Nonasymptotic Theory of Independence.Oxford University Press, 2013

Prof. John Duchi

Recommended