46
Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis MIT

Learning and testing k-modal distributions

  • Upload
    hiero

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Learning and testing k-modal distributions. Rocco A. Servedio Columbia University. Joint work (in progress) with. Costis Daskalakis MIT. Ilias Diakonikolas UC Berkeley. What this talk is about. Probability distributions over [N] = {1,2,…,N}. N. 1. 2. - PowerPoint PPT Presentation

Citation preview

Page 1: Learning and testing  k-modal distributions

Learning and testing k-modal distributions

Rocco A. ServedioColumbia University

Joint work (in progress) with

Ilias DiakonikolasUC Berkeley

Costis DaskalakisMIT

Page 2: Learning and testing  k-modal distributions

What this talk is about

Probability distributions over [N] = {1,2,…,N}

Monotone increasing distribution: for all

(Whole talk: “increasing” means “non-decreasing”)

`

1 2 N

Page 3: Learning and testing  k-modal distributions

k-modal distributions

k-modal: k peaks and valleys

A 3-modal distribution:

A unimodal distribution: Another one:

Monotone distribution: 0-modal

Page 4: Learning and testing  k-modal distributions

The learning problem

Target distribution p is an unknown k-modal distribution over [N]

Algorithm gets samples from p

Goal: output a hypothesis h that’s -close to p in total variation distance

Want algorithm that uses few samples &

is computationally efficient.

1 2 N

Page 5: Learning and testing  k-modal distributions

The testing problem

q is a known k-modal distribution over [N].

Goal: output “yes” w.h.p. if

“no” w.h.p. if

1 2 N

p is an unknown k-modal distribution over [N].

Algorithm gets samples from p.

1 2 N

Page 6: Learning and testing  k-modal distributions

Please note

Testing problem is not: given samples from an unknown distribution p, determine if p is k-modal versus -far from every k-modal distribution.

This problem requires samples, even for k=0.

1 N 1 N

hard to distinguish vs

uniform over random uniform over

Page 7: Learning and testing  k-modal distributions

Why study these questions?

• k-modal distributions seem natural

• would be nice if k-modal structure were exploitable by efficient learning / testing algorithms

• post hoc justification: solutions exhibit interesting connections between testing and learning

Page 8: Learning and testing  k-modal distributions

The general case: learning

If we drop k-modal assumption, learning problem becomes:

Learn an arbitrary distribution over [N] to total variation distance

1 N

samples are necessary and sufficient

Page 9: Learning and testing  k-modal distributions

The general case: testing

q is a known, arbitrary distribution over [N].

Goal: output “yes” if

“no” if

p is an unknown, arbitrary distribution over [N].

Algorithm gets samples from p.

samples are necessary and sufficient [GR00, BFFKRW02, P08]

If we drop k-modal assumption, testing problem becomes:

Page 10: Learning and testing  k-modal distributions

This work: main learning result

We give an algorithm that learns any k-modal distribution over [N] to accuracy . It uses

samples

and runs in

time.

Close to optimal: -sample lower bound for

any algorithm.

Page 11: Learning and testing  k-modal distributions

Main testing result

We give an algorithm that solves the k-modal testing problem over [N] to accuracy . It uses

samples

and runs in time.

Any testing algorithm must use samples.

Testing is easier than learning!

Page 12: Learning and testing  k-modal distributions

Prior work

k=0,1: [BKR04] gave -sample efficient algorithm for testing problem (p,q both available via sample access)

k=0,1: [Birge87, Birge87a] gave -sample efficient algorithm for learning, and matching lower bound

We’ll use this algorithm as a

black box in our results

Page 13: Learning and testing  k-modal distributions

Outline of rest of talk

• Background: some tools

• Learning k-modal distributions

• Testing k-modal distributions

Page 14: Learning and testing  k-modal distributions

First tool: Learning monotone distributions

Theorem [B87] There is an efficient algorithm that learns any monotone decreasing distribution over to accuracy . It uses samples and runs in time linear in its input size.

[B87b] also gave lower bound for learning a monotone distribution.

Page 15: Learning and testing  k-modal distributions

Second tool: Learning a CDF – the Dvoretsky-Kiefer-Wolfowitz inequality

Theorem: [DKW56] Let be any distribution over with CDF .

Let be empirical estimate of obtained from samples.

Then with probability .

Morally, means you can partition into intervals each of mass under , using samples.

Note:

samples suffice (by easyChernoff bound argument)

true CDF

empirical CDF

Page 16: Learning and testing  k-modal distributions

Learning k-modal

distributions

Page 17: Learning and testing  k-modal distributions

The problem

Learn an unknown k-modal distribution over [N].

1 2 N

Page 18: Learning and testing  k-modal distributions

What should we shoot for?

Easy lower bound: need samples.

(have to solve monotone-distribution-learning problems over to accuracy )

Want an algorithm that uses roughly this many samples and takes time

Page 19: Learning and testing  k-modal distributions

The problem, again

Goal: learn an unknown k-modal distribution over [N].

We know how to efficiently learn an unknown monotone distribution…

Would be easy if we knew the k peaks/valleys…

Guessing them exactly: infeasible

Guessing them approximately: not too great either

X X X

Page 20: Learning and testing  k-modal distributions

A first approach

Break up [N] into many intervals:

is not monotone for at most k of the intervals

So running monotone distribution learner on each interval will usually give a good answer.

Page 21: Learning and testing  k-modal distributions

First approach in more detail

1. Use [DKW] to divide [N] into intervals & obtain estimates such that

(Assumes each point has mass at most or so; heavier points are easy to detect and deal with.)

2. Run monotone distribution learner on each to get

(Actually run it twice: once for increasing, once for decreasing.

Do hypothesis testing to pick one as .)

3. Combine hypotheses in obvious way:

and

Page 22: Learning and testing  k-modal distributions

Sketch of analysis

and

1. Use [DKW] to divide [N] into intervals & obtain estimates such that

Takes samples

2. Run monotone distribution learner on each to get

Takes samples

3. Combine hypotheses in obvious way:

Total error from k non-monotone intervals

from scaling factors

from estimating ’s with ’s

Page 23: Learning and testing  k-modal distributions

Improving the approach

came from running monotone distribution learner times rather than just times

If we could somehow check – more cheaply than learning – whether an interval is monotone before running the learner, could run the learner fewer times and save…

…this is a property testing problem!

More sophisticated algorithm: two new ingredients.

Page 24: Learning and testing  k-modal distributions

First ingredient: testing k-modal distributions for monotonicity

Consider the following property testing problem:

Goal: output “yes” w.h.p. if p is monotone increasing

“no” w.h.p. if p is -far from monotone increasing

Algorithm gets samples from unknown k-modal distribution p over [N].

1 n 1 n

hard to distinguish

Note: k-modal promise for p might save us from lower bound…

Page 25: Learning and testing  k-modal distributions

Efficiently testing k-modal distributions for monotonicity

Goal: output “yes” w.h.p. if p is monotone increasing

“no” w.h.p. if p is -far from monotone increasing

Algorithm gets samples from unknown k-modal distribution p over [N].

Theorem: There is a -sample tester for this problem.

We’ll use this to identify sub-intervals of [N] where p is monotone

vclose to

…can we efficiently learn close-to-monotone distributions?

Page 26: Learning and testing  k-modal distributions

Second ingredient: agnostically learning monotone distributions

Consider the following “agnostic learning” problem:

Algorithm gets samples from unknown distribution p over [N] that is -close to monotone.

Goal: output hypothesis distribution h such that

If opt=0, this is the original “learn a monotone distribution” problem

Want to handle general case as efficiently as opt=0 case

Page 27: Learning and testing  k-modal distributions

agnostically learning monotone distributions

Algorithm gets samples from unknown distribution p over [N] that isopt-close to monotone.

Goal: output hypothesis distribution h such that

Theorem: There is a computationally efficient learning algorithm for this problem that uses samples.

Page 28: Learning and testing  k-modal distributions

agnostically learning monotone distributions

Algorithm gets samples from unknown distribution p over [N] that isopt-close to monotone.

Goal: output hypothesis distribution h such that

Theorem: There is a computationally efficient learning algorithm for this semi-agnostic problem that uses samples.

Semi-

The [Birge87] monotone distribution learner does the job.

We will take , , so versus doesn’t matter.

Page 29: Learning and testing  k-modal distributions

The learning algorithm: first phase

1. Use [DKW] to divide [N] into intervals & obtain estimates such that

2. Run testers on then etc., until first time both say “no” at Mark and continue.

invocations of tester in total

(Alternative: use binary search: invocations of tester in total.)

Page 30: Learning and testing  k-modal distributions

The algorithm

2. Run testers on then etc., until first time both say “no” at

Mark and continue.

Each time an interval is marked,

• the block of unmarked intervals right before it is close-to-monotone; call this a superinterval

• (at least) one of the k peaks/valleys of p is “used up”

Page 31: Learning and testing  k-modal distributions

The learning algorithm: second phase

After this step, [N] is partitioned into • superintervals each -close to monotone• “marked” intervals, each of weight

Rest of algorithm:

3. Run semi-agnostic monotone distribution learner on each superinterval to get -accurate hypothesis for

4. Output final hypothesis

Page 32: Learning and testing  k-modal distributions

Analysis of the algorithm

Sample complexity:

• runs of tester: each uses samples

• runs of semi-agnostic monotone learner: each uses

samples.

Error rate:

• error from marked intervals

• total error from estimating ’s with ’s

• total error from scaling factors

Page 33: Learning and testing  k-modal distributions

I owe you a tester

Theorem: There is a -sample tester for this problem.

Algorithm gets samples from unknown k-modal distribution p over [N].

Goal: output “yes” w.h.p. if p is monotone increasing

“no” w.h.p. if p is -far from monotone increasing

Page 34: Learning and testing  k-modal distributions

The testing algorithm

Algorithm: • Run [DKW] with accuracy Let be resulting empirical PDF. • If such that

then output “no”; otherwise output “yes”

Completeness: p monotone increasing test passes w.h.p.

average value of

over [a,b]

Page 35: Learning and testing  k-modal distributions

Soundness

Soundness lemma: If is k-modal and have

then is -close to monotone increasing.

Algorithm: • Run [DKW] with accuracy Let be resulting empirical PDF. • If such that

then output “no”; otherwise output “yes”

To prove soundness lemma: show that under lemma’s hypothesis, can “correct” each peak/valley of by “spending” at most in variation distance.

Page 36: Learning and testing  k-modal distributions

Correcting a peak of p

Lemma: If is k-modal and have

then is -close to monotone increasing.

Draw a line at height such that

(mass of “hill” above line) =

(missing mass of “valley” below line):

Consider a peak of p:

Correct the peak by bulldozing the hill into the valley:

Page 37: Learning and testing  k-modal distributions

Why it works

ncorrection

Lemma: If is k-modal and have

then is -close to monotone increasing.

So and

so so

Page 38: Learning and testing  k-modal distributions

Summary

Sample- and time- efficient algorithms for learning and testingk-modal distributions over [N].

Upper bounds pretty close to lower bounds for these problems.

• Testing is easier than learning

• Learning algorithms have a testing component

Page 39: Learning and testing  k-modal distributions

Future work

More efficient algorithms for restricted classes of -modal distributions?

• [DDS11]: any sum of Bernoulli random variables is learnable using samples independent of

special type of unimodal distribution: “Poisson Binomial Distribution”

Page 40: Learning and testing  k-modal distributions

Thank you

Page 41: Learning and testing  k-modal distributions

Key ingredient: oblivious decomposition

Decompose into intervals whose widths increase as powers of . Call these the oblivious buckets.

… …

Page 42: Learning and testing  k-modal distributions

Flattening a monotone distributionusing the oblivious decomposition

Given a monotone decreasing distribution , the flattened version of , denoted , spreads ’s weight uniformly within each bucket of the oblivious decomposition:

Lemma: [B87] For any monotone decreasing distribution , have

… …

true pdf

flattened version

Page 43: Learning and testing  k-modal distributions

Learning monotone distributionsusing oblivious decomposition [B87]

Reduce

• -• View as arbitrary distribution over -element set:

Algorithm:• Draw samples from• Output hypothesis is the flattened empirical distribution

learning monotone distributions over to accuracy

learning arbitrary distributions over to accuracy

Analysis:

Page 44: Learning and testing  k-modal distributions

Testing monotone distributionsusing oblivious decomposition

: known monotone distribution over

: unknown monotone distribution over

: known distribution over

: unknown distribution over

But, can do better by using oblivious decomposition directly:

testing equality of monotone distributions over to accuracy

testing equality of arbitrary distributions over to accuracy

Using [BFFKRW02], get -sample testing algorithm

Can use learning algorithm to get -sample algorithm for testing problem.

Can show lower bound for any tester.

Page 45: Learning and testing  k-modal distributions

[BKR04] implicitly gave log^2(n)loglog(n)/eps^5-sample algorithm for learning monotone distribution

Page 46: Learning and testing  k-modal distributions