Upload
hiero
View
34
Download
0
Embed Size (px)
DESCRIPTION
Learning and testing k-modal distributions. Rocco A. Servedio Columbia University. Joint work (in progress) with. Costis Daskalakis MIT. Ilias Diakonikolas UC Berkeley. What this talk is about. Probability distributions over [N] = {1,2,…,N}. N. 1. 2. - PowerPoint PPT Presentation
Citation preview
Learning and testing k-modal distributions
Rocco A. ServedioColumbia University
Joint work (in progress) with
Ilias DiakonikolasUC Berkeley
Costis DaskalakisMIT
What this talk is about
Probability distributions over [N] = {1,2,…,N}
Monotone increasing distribution: for all
(Whole talk: “increasing” means “non-decreasing”)
`
1 2 N
k-modal distributions
k-modal: k peaks and valleys
A 3-modal distribution:
A unimodal distribution: Another one:
Monotone distribution: 0-modal
The learning problem
Target distribution p is an unknown k-modal distribution over [N]
Algorithm gets samples from p
Goal: output a hypothesis h that’s -close to p in total variation distance
Want algorithm that uses few samples &
is computationally efficient.
1 2 N
The testing problem
q is a known k-modal distribution over [N].
Goal: output “yes” w.h.p. if
“no” w.h.p. if
1 2 N
p is an unknown k-modal distribution over [N].
Algorithm gets samples from p.
1 2 N
Please note
Testing problem is not: given samples from an unknown distribution p, determine if p is k-modal versus -far from every k-modal distribution.
This problem requires samples, even for k=0.
1 N 1 N
hard to distinguish vs
uniform over random uniform over
Why study these questions?
• k-modal distributions seem natural
• would be nice if k-modal structure were exploitable by efficient learning / testing algorithms
• post hoc justification: solutions exhibit interesting connections between testing and learning
The general case: learning
If we drop k-modal assumption, learning problem becomes:
Learn an arbitrary distribution over [N] to total variation distance
1 N
samples are necessary and sufficient
The general case: testing
q is a known, arbitrary distribution over [N].
Goal: output “yes” if
“no” if
p is an unknown, arbitrary distribution over [N].
Algorithm gets samples from p.
samples are necessary and sufficient [GR00, BFFKRW02, P08]
If we drop k-modal assumption, testing problem becomes:
This work: main learning result
We give an algorithm that learns any k-modal distribution over [N] to accuracy . It uses
samples
and runs in
time.
Close to optimal: -sample lower bound for
any algorithm.
Main testing result
We give an algorithm that solves the k-modal testing problem over [N] to accuracy . It uses
samples
and runs in time.
Any testing algorithm must use samples.
Testing is easier than learning!
Prior work
k=0,1: [BKR04] gave -sample efficient algorithm for testing problem (p,q both available via sample access)
k=0,1: [Birge87, Birge87a] gave -sample efficient algorithm for learning, and matching lower bound
We’ll use this algorithm as a
black box in our results
Outline of rest of talk
• Background: some tools
• Learning k-modal distributions
• Testing k-modal distributions
First tool: Learning monotone distributions
Theorem [B87] There is an efficient algorithm that learns any monotone decreasing distribution over to accuracy . It uses samples and runs in time linear in its input size.
[B87b] also gave lower bound for learning a monotone distribution.
Second tool: Learning a CDF – the Dvoretsky-Kiefer-Wolfowitz inequality
Theorem: [DKW56] Let be any distribution over with CDF .
Let be empirical estimate of obtained from samples.
Then with probability .
Morally, means you can partition into intervals each of mass under , using samples.
Note:
samples suffice (by easyChernoff bound argument)
true CDF
empirical CDF
Learning k-modal
distributions
The problem
Learn an unknown k-modal distribution over [N].
1 2 N
What should we shoot for?
Easy lower bound: need samples.
(have to solve monotone-distribution-learning problems over to accuracy )
Want an algorithm that uses roughly this many samples and takes time
The problem, again
Goal: learn an unknown k-modal distribution over [N].
We know how to efficiently learn an unknown monotone distribution…
Would be easy if we knew the k peaks/valleys…
Guessing them exactly: infeasible
Guessing them approximately: not too great either
X X X
A first approach
Break up [N] into many intervals:
is not monotone for at most k of the intervals
So running monotone distribution learner on each interval will usually give a good answer.
…
First approach in more detail
1. Use [DKW] to divide [N] into intervals & obtain estimates such that
(Assumes each point has mass at most or so; heavier points are easy to detect and deal with.)
2. Run monotone distribution learner on each to get
(Actually run it twice: once for increasing, once for decreasing.
Do hypothesis testing to pick one as .)
3. Combine hypotheses in obvious way:
and
Sketch of analysis
and
1. Use [DKW] to divide [N] into intervals & obtain estimates such that
Takes samples
2. Run monotone distribution learner on each to get
Takes samples
3. Combine hypotheses in obvious way:
Total error from k non-monotone intervals
from scaling factors
from estimating ’s with ’s
Improving the approach
came from running monotone distribution learner times rather than just times
If we could somehow check – more cheaply than learning – whether an interval is monotone before running the learner, could run the learner fewer times and save…
…this is a property testing problem!
More sophisticated algorithm: two new ingredients.
First ingredient: testing k-modal distributions for monotonicity
Consider the following property testing problem:
Goal: output “yes” w.h.p. if p is monotone increasing
“no” w.h.p. if p is -far from monotone increasing
Algorithm gets samples from unknown k-modal distribution p over [N].
1 n 1 n
hard to distinguish
Note: k-modal promise for p might save us from lower bound…
Efficiently testing k-modal distributions for monotonicity
Goal: output “yes” w.h.p. if p is monotone increasing
“no” w.h.p. if p is -far from monotone increasing
Algorithm gets samples from unknown k-modal distribution p over [N].
Theorem: There is a -sample tester for this problem.
We’ll use this to identify sub-intervals of [N] where p is monotone
vclose to
…can we efficiently learn close-to-monotone distributions?
Second ingredient: agnostically learning monotone distributions
Consider the following “agnostic learning” problem:
Algorithm gets samples from unknown distribution p over [N] that is -close to monotone.
Goal: output hypothesis distribution h such that
If opt=0, this is the original “learn a monotone distribution” problem
Want to handle general case as efficiently as opt=0 case
agnostically learning monotone distributions
Algorithm gets samples from unknown distribution p over [N] that isopt-close to monotone.
Goal: output hypothesis distribution h such that
Theorem: There is a computationally efficient learning algorithm for this problem that uses samples.
agnostically learning monotone distributions
Algorithm gets samples from unknown distribution p over [N] that isopt-close to monotone.
Goal: output hypothesis distribution h such that
Theorem: There is a computationally efficient learning algorithm for this semi-agnostic problem that uses samples.
Semi-
The [Birge87] monotone distribution learner does the job.
We will take , , so versus doesn’t matter.
The learning algorithm: first phase
1. Use [DKW] to divide [N] into intervals & obtain estimates such that
2. Run testers on then etc., until first time both say “no” at Mark and continue.
invocations of tester in total
(Alternative: use binary search: invocations of tester in total.)
…
The algorithm
2. Run testers on then etc., until first time both say “no” at
Mark and continue.
Each time an interval is marked,
• the block of unmarked intervals right before it is close-to-monotone; call this a superinterval
• (at least) one of the k peaks/valleys of p is “used up”
…
The learning algorithm: second phase
After this step, [N] is partitioned into • superintervals each -close to monotone• “marked” intervals, each of weight
Rest of algorithm:
3. Run semi-agnostic monotone distribution learner on each superinterval to get -accurate hypothesis for
4. Output final hypothesis
Analysis of the algorithm
Sample complexity:
• runs of tester: each uses samples
• runs of semi-agnostic monotone learner: each uses
samples.
Error rate:
• error from marked intervals
• total error from estimating ’s with ’s
• total error from scaling factors
I owe you a tester
Theorem: There is a -sample tester for this problem.
Algorithm gets samples from unknown k-modal distribution p over [N].
Goal: output “yes” w.h.p. if p is monotone increasing
“no” w.h.p. if p is -far from monotone increasing
The testing algorithm
Algorithm: • Run [DKW] with accuracy Let be resulting empirical PDF. • If such that
then output “no”; otherwise output “yes”
Completeness: p monotone increasing test passes w.h.p.
average value of
over [a,b]
Soundness
Soundness lemma: If is k-modal and have
then is -close to monotone increasing.
Algorithm: • Run [DKW] with accuracy Let be resulting empirical PDF. • If such that
then output “no”; otherwise output “yes”
To prove soundness lemma: show that under lemma’s hypothesis, can “correct” each peak/valley of by “spending” at most in variation distance.
Correcting a peak of p
Lemma: If is k-modal and have
then is -close to monotone increasing.
Draw a line at height such that
(mass of “hill” above line) =
(missing mass of “valley” below line):
Consider a peak of p:
Correct the peak by bulldozing the hill into the valley:
Why it works
ncorrection
Lemma: If is k-modal and have
then is -close to monotone increasing.
So and
so so
Summary
Sample- and time- efficient algorithms for learning and testingk-modal distributions over [N].
Upper bounds pretty close to lower bounds for these problems.
• Testing is easier than learning
• Learning algorithms have a testing component
Future work
More efficient algorithms for restricted classes of -modal distributions?
• [DDS11]: any sum of Bernoulli random variables is learnable using samples independent of
special type of unimodal distribution: “Poisson Binomial Distribution”
Thank you
Key ingredient: oblivious decomposition
Decompose into intervals whose widths increase as powers of . Call these the oblivious buckets.
… …
Flattening a monotone distributionusing the oblivious decomposition
Given a monotone decreasing distribution , the flattened version of , denoted , spreads ’s weight uniformly within each bucket of the oblivious decomposition:
Lemma: [B87] For any monotone decreasing distribution , have
… …
true pdf
flattened version
…
…
Learning monotone distributionsusing oblivious decomposition [B87]
Reduce
• -• View as arbitrary distribution over -element set:
Algorithm:• Draw samples from• Output hypothesis is the flattened empirical distribution
learning monotone distributions over to accuracy
learning arbitrary distributions over to accuracy
Analysis:
Testing monotone distributionsusing oblivious decomposition
: known monotone distribution over
: unknown monotone distribution over
: known distribution over
: unknown distribution over
But, can do better by using oblivious decomposition directly:
testing equality of monotone distributions over to accuracy
testing equality of arbitrary distributions over to accuracy
Using [BFFKRW02], get -sample testing algorithm
Can use learning algorithm to get -sample algorithm for testing problem.
Can show lower bound for any tester.
[BKR04] implicitly gave log^2(n)loglog(n)/eps^5-sample algorithm for learning monotone distribution