51
Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta Department of Mathematical and Statistical Sciences Edmonton, Alberta, Canada Linz, October 2008 joint work with Roger Koenker (University of Illinois at Urbana-Champaign) Gratefully acknowledging the support of the Natural Sciences and Engineering Research Council of Canada

Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

  • Upload
    lekhanh

  • View
    259

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Regularization prescriptions andconvex duality: density estimation and

Renyi entropies

Ivan Mizera

University of AlbertaDepartment of Mathematical and Statistical Sciences

Edmonton, Alberta, Canada

Linz, October 2008

joint work with Roger Koenker(University of Illinois at Urbana-Champaign)

Gratefully acknowledging the support of the

Natural Sciences and Engineering Research Council of Canada

Page 2: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Density estimation (say)

A useful heuristics: maximum likelihood

Given the datapoints X1,X2, . . . ,Xn, solve

n∏i=1

f(Xi) # maxf

!

or equivalently

n∑i=1

log f(Xi) # minf

!

under the side conditions

f > 0,

∫f = 1

1

Page 3: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Note that useful...

0 5 10 15 20 250

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

2

Page 4: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Dirac catastrophe!

3

Page 5: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Preventing the disaster for general case

• Sieves (...)

4

Page 6: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Preventing the disaster for general case

• Sieves (...)

• Regularization

n∑i=1

log f(Xi) # minf

! f > 0,

∫f = 1

4

Page 7: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Preventing the disaster for general case

• Sieves (...)

• Regularization

n∑i=1

log f(Xi) # minf

! J(f) 6Λ, f > 0,

∫f = 1

4

Page 8: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Preventing the disaster for general case

• Sieves (...)

• Regularization

n∑i=1

log f(xi) + λJ(f) # minf

! f > 0,

∫f = 1

4

Page 9: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Preventing the disaster for general case

• Sieves (...)

• Regularization

n∑i=1

log f(xi) + λJ(f) # minf

! f > 0,

∫f = 1

J(·) - penalty (penalizing complexity, lack of smoothness etc.)

for instance, J(f) =

∫|(log f) ′′| = TV((log f) ′)

or also J(f) =

∫|(log f) ′′′| = TV((log f) ′′)

Good (1971), Good and Gaskins (1971), Silverman (1982),Leonard (1978), Gu (2002), Wahba, Lin, and Leng (2002)

See also:Eggermont and LaRiccia (2001)Ramsay and Silverman (2006)Hartigan (2000), Hartigan and Hartigan (1985)Davies and Kovac (2004)

4

Page 10: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

See also in particular

Roger Koenker and Ivan Mizera (2007)Density estimation by total variation regularization

Roger Koenker and Ivan Mizera (2006)The alter egos of the regularized maximum likelihood densityestimators: deregularized maximum-entropy, Shannon, Renyi,Simpson, Gini, and stretched strings

Roger Koenker, Ivan Mizera, and Jungmo Yoon (200?)What do kernel density estimators optimize?

Roger Koenker and Ivan Mizera (2008):Primal and dual formulations relevant for the numericalestimation of a probability density via regularization

Roger Koenker and Ivan Mizera (200?)Quasi-concave density estimation

http://www.stat.ualberta.ca/∼mizera/

http://www.econ.uiuc.edu/∼roger/

5

Page 11: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Preventing the disaster for special cases

• Shape constraint: monotonicity

n∑i=1

log f(Xi) # minf

! f > 0,

∫f = 1

6

Page 12: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Preventing the disaster for special cases

• Shape constraint: monotonicity

n∑i=1

log f(Xi) # minf

! f decreasing, f > 0,

∫f = 1

Grenander (1956), Jongbloed (1998),Groeneboom, Jongbloed, and Wellner (2001),...

6

Page 13: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Preventing the disaster for special cases

• Shape constraint: monotonicity

n∑i=1

log f(Xi) # minf

! f decreasing, f > 0,

∫f = 1

Grenander (1956), Jongbloed (1998),Groeneboom, Jongbloed, and Wellner (2001),...

• Shape constraint: (strong) unimodality

n∑i=1

log f(Xi) # minf

! f > 0,

∫f = 1

6

Page 14: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Preventing the disaster for special cases

• Shape constraint: monotonicity

n∑i=1

log f(Xi) # minf

! f decreasing, f > 0,

∫f = 1

Grenander (1956), Jongbloed (1998),Groeneboom, Jongbloed, and Wellner (2001),...

• Shape constraint: (strong) unimodality

n∑i=1

log f(Xi) # minf

! − log f convex, f > 0,

∫f = 1

Eggermont and LaRiccia (2000), Walther (2000)

Rufibach and Dumbgen (2006)

Pal, Woodroofe, and Meyer (2006)

6

Page 15: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Note

Shape constraint: no regularization parameter to be set...

... but of course, we need to believe that the shape is plausible

7

Page 16: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Note

Shape constraint: no regularization parameter to be set...

... but of course, we need to believe that the shape is plausible

Regularization via TV penalty...

... vs log-concavity shape constraint:

The differential operator is the same,only the constraint is somewhat different∫

|(log f) ′′| 6Λ, in the dual |(log f) ′′| 6Λ

Log-concavity: (log f) ′′ 6 0

7

Page 17: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Note

Shape constraint: no regularization parameter to be set...

... but of course, we need to believe that the shape is plausible

Regularization via TV penalty...

... vs log-concavity shape constraint:

The differential operator is the same,only the constraint is somewhat different∫

|(log f) ′′| 6Λ, in the dual |(log f) ′′| 6Λ

Log-concavity: (log f) ′′ 6 0

Only the functional analysis may be a bit more difficult...

... so let us do the shape-constrained case first

7

Page 18: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

The hidden charm of log-concave distributions

A density f is called log-concave if − log f is convex.

(Usual conventions: − log 0 = ∞, convex where finite, ...)

8

Page 19: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

The hidden charm of log-concave distributions

A density f is called log-concave if − log f is convex.

(Usual conventions: − log 0 = ∞, convex where finite, ...)

Schoenberg 1940’s, Karlin 1950’s (monotone likelihood ratio)Karlin (1968) - monograph about their mathematicsBarlow and Proschan (1975) - reliabilityFlinn and Heckman (1975) - social choiceCaplin and Nalebuff (1991a,b) - voting theoryDevroye (1984) - how to simulate from themMizera (1994) - M-estimators

8

Page 20: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

The hidden charm of log-concave distributions

A density f is called log-concave if − log f is convex.

(Usual conventions: − log 0 = ∞, convex where finite, ...)

Schoenberg 1940’s, Karlin 1950’s (monotone likelihood ratio)Karlin (1968) - monograph about their mathematicsBarlow and Proschan (1975) - reliabilityFlinn and Heckman (1975) - social choiceCaplin and Nalebuff (1991a,b) - voting theoryDevroye (1984) - how to simulate from themMizera (1994) - M-estimators

Uniform, Normal, Exponential, Logistic, Weibull, Gamma...- all log-concave

If f is log-concave, then- it is unimodal (“strongly”)- the convolution with any unimodal density is unimodal- the convolution with any log-concave density is log-concave- f = e−g, with g convex...

8

Page 21: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

The hidden charm of log-concave distributions

A density f is called log-concave if − log f is convex.

(Usual conventions: − log 0 = ∞, convex where finite, ...)

Schoenberg 1940’s, Karlin 1950’s (monotone likelihood ratio)Karlin (1968) - monograph about their mathematicsBarlow and Proschan (1975) - reliabilityFlinn and Heckman (1975) - social choiceCaplin and Nalebuff (1991a,b) - voting theoryDevroye (1984) - how to simulate from themMizera (1994) - M-estimators

Uniform, Normal, Exponential, Logistic, Weibull, Gamma...- all log-concave

If f is log-concave, then- it is unimodal (“strongly”)- the convolution with any unimodal density is unimodal- the convolution with any log-concave density is log-concave- f = e−g, with g convex...

No heavy tails! t-distributions (finance!): not log-concave (!!)

8

Page 22: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

A convex problem

Let g = − log f; let K be the cone of convex functions.

The original problem is transformed:

n∑i=1

g(Xi) # ming

! g ∈K,

∫e−g = 1

9

Page 23: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

A convex problem

Let g = − log f; let K be the cone of convex functions.

The original problem is transformed:

n∑i=1

g(Xi) +

∫e−g # min

g! g ∈K

9

Page 24: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

A convex problem

Let g = − log f; let K be the cone of convex functions.

The original problem is transformed:

n∑i=1

g(Xi) +

∫e−g # min

g! g ∈K

and generalized: let ψ be convex and nonincreasing (like e−x)

n∑i=1

g(Xi) +

∫e−g # min

g! g ∈K

9

Page 25: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

A convex problem

Let g = − log f; let K be the cone of convex functions.

The original problem is transformed:

n∑i=1

g(Xi) +

∫e−g # min

g! g ∈K

and generalized: let ψ be convex and nonincreasing (like e−x)

n∑i=1

g(Xi) +

∫ψ(g) # min

g! g ∈K

9

Page 26: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Primal and dual

Recall: K is the cone of convex functions;ψ is convex and nonincreasing

The strong Fenchel dual of

1

n

n∑i=1

g(Xi) +

∫ψ(g)dx# min

g! g ∈K (P)

is

∫ψ∗(−f)dx# max

f! f =

d(Pn −G)

dx, G ∈K∗ (D)

Extremal relation: f = −ψ′(g).

For penalized estimation, in discretized setting: Koenker andMizera (2007b)

10

Page 27: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Remarks

ψ∗(y) = supx∈domψ

(yx−ψ(x)) is the conjugate of ψ

if primal solutions g are sought in some space, then dualsolutions G are sought in a dual space

for instance, if g ∈ C(X), and X is compact, then G ∈ C(X)∗,the space of (signed) Radon measures on X.

The equality f =d(Pn −G)

dxis thus a feasibility constraint

(for other G, the dual objective is −∞)

K∗ is the dual cone to K - a collection of (signed) Radonmeasures such that

∫gdG > 0 for any convex g.

Dual: good for computation...

11

Page 28: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Dual: good not only for computation

Couldn’t we have here heavy-tailed distribution too?

...possibly going beyond log-concavity?

Recall: the strong Fenchel dual of

1

n

n∑i=1

g(Xi) +

∫ψ(g)dx# min

g! g ∈K (P)

is

∫ψ∗(−f)dx# max

f! f =

d(Pn −G)

dx, G ∈K∗ (D)

Extremal relation: f = −ψ′(g).

12

Page 29: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Instance: maximum likelihood, α = 1

For ψ(x) = e−x, we have

1

n

n∑i=1

g(Xi) +

∫e−g # min

g! g ∈K (P)

∫f log fdx# max

f! f =

d(Pn −G)

dx, G ∈K∗ (D)

... a maximum entropy formulation

Extremal relation: f = e−g

g required convex → f log-concave

How about entropies alternative to Shannon entropy?

13

Page 30: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Renyi system

Renyi (1961,1965): entropies defined with the help of

(1 −α)−1 log(

∫fα(x)dx),

with Shannon entropy being a limiting form for α = 1.

Various entropies correspond to various known divergences:

α = 1: Shannon entropy, Kullback-Leibler divergence

α = 2: Renyi-Simpson-Gini entropy, Pearson’s χ2

α = 1/2: Hellinger’s distance

α = 0: reversed Kullback-Leibler

New heuristics:MLE → Shannon dual → Renyi duals → ? primals

14

Page 31: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

ψ and ψ∗ for various α

! = 2

! = 1

! = 1/2

! = 0

! = 2

! = 1

! = 1/2

! = 0

15

Page 32: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Some properties for all α

The density estimators with Renyi entropies, as defined above,are:

• supported by the convex hull of the data

• the expected value of the estimated density is equal to thesample mean of the data

• the function g, appearing in the primal, is a polyhedralconvex function (that is, it is determined by its values at thedata points Xi, and is the maximal convex function minorizingthose)

• and the estimates are well-defined: the minimum of theprimal formulation is attained

16

Page 33: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Instance: α = 2

∫f2(y)dy = max

f! f =

d(Pn −G)

dy, G ∈K∗. (D)

1

n

n∑i=1

g(Xi) +1

2

∫g2dx# min

g! g ∈K (P)

Minimum Pearson χ2, maximum Renyi-Simpson-Gini entropy

Extremal relation: f = −g

g required convex → f concave

That yields a class more restrictive than log-concave

- and thus is not of interest for us!

17

Page 34: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

But perhaps for others...

Replacing g by −f gives

−1

n

n∑i=1

f(Xi) +1

2

∫f2dx# min

g! subject to g ∈K

the objective function of “least squares estimator”Groeneboom, Jongbloed, and Wellner (2001)

A folk tune (in the penalized context):Aidu and Vapnik (1989), Terrell (1990)

... and more generally, the primal form for α > 1 isequivalent to the objective function of “minimum densitypower divergence estimators”, introduced by Basu, Harris,Hjort, and Jones (1998) in the context of parametric M-estimation.

18

Page 35: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

De profundis: α = 0

Not explicitly a member of the Renyi family - nevertheless,a limit ∫

log fdy = maxf

! f =d(Pn −G)

dy, G ∈K∗, (D)

1

n

n∑i=1

g(Xi) −

∫loggdx = min

g∈C(X)! g ∈K. (P)

Empirical likelihood (Owen, 2001)

Extremal relation g = 1/f

the primal thus estimates the “sparsity function”

g required convex → 1/f convex

- that would yield a very nice family of functions...

... but numerically still fragile.

19

Page 36: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

The hierarchy of ρ-convex functions

Hardy, Littlewood, and Polya (1934): means of order ρ

Avriel (1972): ρ-convex functions

ρ < 0: fρ convex

ρ = 0: log-concave

ρ > 0: fρ concave

The class of ρ-convex densities grows with decreasing ρ:

if ρ1 < ρ2 then every ρ2-convex is ρ1-convex

Every ρ-convex density is quasi-convex : has convex level sets

Our α corresponds to ρ = α− 1 - that is:

if we do the estimating prescription whose dual involvesthe Renyi α-entropy, then the result is guaranteed to liein the domain of (α− 1)-convex functions

20

Page 37: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

So the winner is: α = 1/2

“Moderate progress within the limits of law”,“Hellinger selector”:∫√

fdx# maxf

! subject to f =d(Pn −G)

dx, G ∈K∗ (D)

1

n

n∑i=1

g(Xi) +

∫1

gdx# min

g∈C(X)! g ∈K (P)

Extremal relation: f = g−2

g required convex → f−1/2 convex (f is −1/2-convex)

- all log-concave

- all t family

the primal thus estimates f−1/2 (...rootosparsity)

21

Page 38: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Weibull, n = 200;left Shannon, right Hellinger

!4 !2 0 2 4 6 8 10 120

0.2

0.4

0.6

0.8

1

1.2

1.4

!4 !2 0 2 4 6 8 10 120

0.2

0.4

0.6

0.8

1

1.2

1.4

22

Page 39: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Another Weibull, n = 200;left Shannon, right Hellinger

!1 !0.5 0 0.5 1 1.5 2 2.5 3 3.50

0.5

1

1.5

2

2.5

3

3.5

!1 !0.5 0 0.5 1 1.5 2 2.5 3 3.50

0.5

1

1.5

2

2.5

3

3.5

23

Page 40: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Four points at the vertices of the square

24

Page 41: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Student data on criminal fingers

!6 !4 !2 0 2 4 6!6

!4

!2

0

2

4

6

25

Page 42: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Once again, but with logarithmic contours

!6 !4 !2 0 2 4 6!6

!4

!2

0

2

4

6

26

Page 43: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Simulated data: uniform distribution

!1.5 !1 !0.5 0 0.5 1 1.5!1.5

!1

!0.5

0

0.5

1

1.5

27

Page 44: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

A panoramic view

!2!1

01

2

!1.5!1!0.500.511.5

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

28

Page 45: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Computation

Main problem: enforcing convexity optimization

Easy in dimension 1; in dimension 2, the most promising wayseems to be to employ a finite-difference scheme: estimatethe Hessian, the matrix of second derivatives, by finitedifferences...

...and then enforce this matrix to be positive semidefinite

That means: semidefinite programming...

...but with (slightly) nonlinear objective function.

In dimension two, one can express the semidefiniteness of thematrix by a rotated quadratic cone...

...and also the reciprocal value can be tricked in that way.

Thus: Hellinger selector turns out to be computationally easierthan (Shannon) maximum likelihood...

We acknowledge using a Danish commercial implementationcalled Mosek by Erling Andersen, and an open source code byMichael Saunders

See also Cule, Samworth, and Stewart (2008)

29

Page 46: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Summary

• We can estimate a density restricted to a broader domainthan log-concave - to include also heavy-tailed distributions.

• Generalizing the formulation dual to the maximum likelihoodin the family of Renyi entropies indexed by α, we obtain aninteresting family of divergence-based primal/dual estimators.

• Each yields the estimates in its corresponding ρ-convex class,in a natural way.

• Our choice is α = 1/2, which in dual picks a feasible densityclosest to the uniform, on the convex hull of the data, inHellinger distance.

• And yields −1/2-convex densities, which include all log-concave densities, but also t-family, that is, algebraical tails;seems like all practically important quasi-concave densities.

• And is in dimension 2 computationally somewhat moreconvenient than other possibilities.

30

Page 47: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Duality heuristics

Recall: penalized estimation, discretized setting

Primal:

−1

n

n∑i=1

g(xi) + J(−Dg) +

∫ψ(g) = min

g!

where (typically) J(−Dg) = λ

∫|g(k)|pp

Dual:

∫ψ∗(f) − J∗(h) = max

f,h! f =

d (Pn +D∗h)

dx� 0

where ψ∗ is again the conjugate to ψ

J∗ is the conjugate to J

D∗ is the operator adjoint to D

and strong duality yields f = ψ ′(g)

31

Page 48: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Instances

Silverman (1982), Leonard (1978): p = 2, k = 3

Gu (2002), Wahba, Lin, and Leng (2002): p = 2, k = 2

Davies and Kovac (2004), Hartigan (2000), Hartiganand Hartigan (1985): p = 1, k = 1

Koenker and Mizera (2006a,b,c): p = 1, k = 1, 2, 3

Recall: the conjugate of a norm is the indicator of the unitball in the dual norm. If J(−Dg) = λ

∫|g ′|, then the dual is

equivalent to

∫ψ∗(f) = max

f,h! f =

d (Pn +D∗h)

dx� 0 ‖h‖∞ 6 λ

If ψ(u) = eu, (which means that ψ∗(u) = u logu)

then the primal is a maximum likelihood prescription

penalized by∫

|(log f) ′| = TV(log f)

And the dual means: stretch h, the antiderivative of f, in theL∞ neighborhood (“tube”) of Pn... (and for other α as well!)

32

Page 49: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

Stretching (“tauting”) strings

−5 −4 −3 −2 −1 0 1 2 3 4 5−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Cumulative distribution function: tube with δ = 0.1

33

Page 50: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

“tube” may be somewhat ambiguous...

!5 !4 !3 !2 !1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

34

Page 51: Regularization prescriptions and convex duality: density ... · Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta

...but nevertheless, there is one that matches

!5 !4 !3 !2 !1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

...and the density estimate is its derivative(Koenker and Mizera 2006b).

35