94
Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1 © Liu Yang 2013

Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Embed Size (px)

Citation preview

Page 1: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Mathematical Theories of Interaction with

Oracles

Liu YangCarnegie Mellon University

1© Liu Yang 2013

Page 2: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Thesis Committee

Avrim Blum (co-chair)Jaime Carbonell (co-chair)Manuel BlumSanjoy Dasgupta (UC, San Diego)Yishay Mansour (Tel Aviv University)Joel Spencer (Courant Institute, NYU)

Page 3: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Outline

• Active Property Testing

- Do we need to imitate human to advance AI?- I see air planes can fly without flapping their wings.

© Liu Yang 2013 3

Page 4: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Property TestingProperty Testing

• Given access to massive dataset: want to quickly determine if a given fn f has some given property P or is far from having it

• Goal: test from very small num of queries.

• One motivation: preprocessing step

before learning

© Liu Yang 2013 4

Page 5: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Property TestingProperty Testing

© Liu Yang 2013 5

• Instance space X = Rn (Distri D over X)

• Tested function f : X->{-1,1}

• A property P of Boolean fn is a subset of all Boolean fns h : X -> {-1,1} (e.g ltf)

• distD(f, P):=ming P Px~D[f(x) ≠g(x)]

• Standard Type of query: membership query (ask for f(x) at arbitrary point x)

Page 6: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Property Testing: An Property Testing: An ExampleExample

• E.g. Union of d Intervals 0----++++----+++++++++-----++---+++--------1

- UINT4 ? Accept! UINT3 ? Depend on ε

- Model selection: testing can tell us how big d need be to be close to target

(double and guess, d = 2, 4, 8, 16, ….)

If fP should accept w/ prob 2/3

If dist(f,P)>ε should reject w/ prob 2/3

6© Liu Yang 2013

Page 7: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Property Testing and Property Testing and Learning : MotivationLearning : Motivation

• What is Property Testing for ? - Quickly tell if the right fn class to use - Estimate complexity of fn without actually learning

• Want to do it with fewer queries than learning

7© Liu Yang 2013

Page 8: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Standard Model uses Standard Model uses Membership QueryMembership Query

• Results of Testing basic Boolean fns using MQ: • Constant QC for UINTd, dictator, ltf, …

However …

8© Liu Yang 2013

Page 9: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Membership Query is Membership Query is Unrealistic for ML Problems: Unrealistic for ML Problems:

An An Object Recognition Object Recognition exampleexample

Recognizing cat/dog ? MQ gives …

Is this a dog or a cat?

9© Liu Yang 2013

Page 10: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

An example: movie reviewsAn example: movie reviews Is this a positive or negative

review ? Typical representation in ML (bag-of-words):

• {fell, holding, interest, movie, my, of, short, this}

The original review (human labelers see):

• “This movie fell short of holding my interest.”

© Liu Yang 2013 10

- Object a human expert labels has more structure than internal representation used by alg. - MQs construct ex.s in internal representation.- Can be very difficult to order constructed example’s words so a human can label the example (esp for long reviews)

Page 11: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Passive : Waste Too Many Passive : Waste Too Many Queries Queries

• ML people move on

• Can we SAVE #queries ?

• Passive Model (sample from D) query samples exist in ; but quite wasteful (many examples uninformative)

NATURE

11© Liu Yang 2013

Page 12: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Active Testing Active Testing

12© Liu Yang 2013

Alg can ask for labels but only pts in the poolGoal: small #queries

Pool of unlabeled data (poly-size)

Page 13: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Property TesterProperty Tester

• Definition. Definition. An s-sample, q-query ε-tester for P over the distribution D is a randomized algorithm A that draws s samples from D, sequentially queries for the value of f on q of those samples, and then

1. Accepts w.p. at least 2/3 when f P 2. Rejects w.p. at least 2/3 when

distD(f,P)>ε

cheap

expensive

13© Liu Yang 2013

Page 14: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

• Definition. Definition. An s-sample, q-query ε-tester for P over the distribution D is a randomized algorithm A that draws s samples from D, sequentially queries for the value of f on q of those samples, and then

1. Accepts w.p. at least 2/3 when f P 2. Rejects w.p. at least 2/3 when

distD(f,P)>ε

cheap

expensive

14© Liu Yang 2013

Active tester: s = poly(n)Passive tester: s = qMQ tester: s = ∞ (D= Unif)

Page 15: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Active Property TestingActive Property Testing• Testing as preprocessing step of learning • Need an example? where Active testing - get same QC saving as MQ - better in QC than Passive - need fewer queries than Learning

• Union of d Intervals, active testing help! 0----++++----+++++++++-----++---+++--------1

- Testing tells how big d need to be close to target- #Label: Active Testing need O(1), Passive Testing need Θ(√d), Active Learning need Θ(d)

15© Liu Yang 2013

Page 16: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Our Results Our Results

MQ-like on testing UINTd Passive-like on testing Dictator

Active Testing Passive Testing Active Learning

Union of d Intervals O(1) Θ(d1/2) Θ(d)

Dictator Θ(log n) Θ(log n) Θ(log n)

Linear Threshold Fn O(n1/2) ~Θ(n1/2) Θ(n)

Cluster Assumption O(1) Ω(N1/2) Θ(N)

MQ-like Passive-like

NEW !!NEW !!

16© Liu Yang 2013

Page 17: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Testing Unions of IntervalsTesting Unions of Intervals00----++++----+++++++++-----++---+++-------- 1• Theorem. Testing UINTd in the active

testing model can be done using O(1) queries.

Recall: Learning requires Ω(d) examples.

17© Liu Yang 2013

Page 18: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Testing Unions of Intervals Testing Unions of Intervals (cont.)(cont.)

• Suppose uniform distribution• Definition: Fixδ>0. The localδ-noise

sensitivity of fn f: [0, 1]->{0, 1} at x [0; 1] is . The noise sensitivity of f is

• Proposition: Fixδ>0. Let f: [0, 1] -> {0,1} be a union of d intervals. NSδ(f) ≤ dδ.

easyeasy

hard

hard• Lemma: Fix δ= ε2/(32d). Let f : [0, 1] -> {0, 1} be a fn with noise sensitivity bounded by NSδ(f) ≤ dδ(1 + ε/4 ). Then f is ε-close to a union of d intervals.

18© Liu Yang 2013

Page 19: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Easy LemmaEasy Lemma• Lemma. If f is a union of ≤ d intervals,

NSδ(f) ≤ dδ.

Proof sketch:- The probability that x lands within distance

δ of any of the boundaries is at most 2d*2δ.

- The probability that y crosses a boundary given that x is within distance δ of it is 1/4.

- P(f(x)≠f(y)| |x-y|<δ) ≤ (2d*2δ)*(1/4) = dδ.

© Liu Yang 2013 19

Page 20: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Hard LemmaHard Lemma• Lemma. Fix δ = ε2/(32d). If f is ε-

far from a union of d intervals, then NSδ(f) > (1+ε/4)dδ.

Proof strategy:If NSδ(f) is small, do “self-correction”.

g(x) = E[f(y) | yÃ[x-δ,x+δ]], f’(x) = round g(x) to 0 if ≤¿ or to 1 if ≥

1-¿

© Liu Yang 2013 20

Page 21: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Hard LemmaHard Lemma• Lemma. Fix δ = ε2/(32d). If f is ε-

far from a union of d intervals, then NSδ(f) > (1+ε/4)dδ.

Proof strategy:- Argue dist(f,f’) ≤ε/2.- Show f’ is union of ≤ d(1 + ε/2)

intervals.- Implies dist(f’,P) ≤ ε/2.

© Liu Yang 2013 21

Page 22: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

zz

----++++----+++++++++-----++---+++-------------++---+++--------

at δ Nr

δδ δδ

!!!

δ

!!!

δ

!!!

22© Liu Yang 2013

Uniform

Page 23: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Testing Unions of IntervalsTesting Unions of Intervals

© Liu Yang 2013 23

• Theorem. Testing UINTd in the active testing model can be done using O(1) queries.

• If non-uniform distribution, use data to stretch/squash the axis, makes the distribution near-uniform

• Total num unlabeled samples: O(d1/2).

Page 24: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Testing Linear Threshold Testing Linear Threshold FnsFns

24© Liu Yang 2013

• Linear Threshold Functions (LTF):

f(x) = sign(<w,x>), for w,x 2 Rn

Page 25: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Testing Linear Threshold Testing Linear Threshold FnsFns

• Theorem. We can efficiently test LTFs under the Gaussian distribution with Õ(n1/2) labeled examples in both active and passive testing models.

• We have lower bounds of ~Ω (n1/3) for active testing and ~Ω (n1/2) for passive testing.

• Learning LTFs need Ω(n) under Gaussian. So testing is better than learning in this case.

25© Liu Yang 2013

Page 26: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Testing Linear Threshold Testing Linear Threshold FnsFns

• [MORS’10] => suffices to estimate E[f(x) f(y) <x,y>] up to ± poly(ε).

• Intuition: LTF is characterized by a nice linear relation between angle (<x,y>) and probability of having same label (f(x)f(y)=1).

26© Liu Yang 2013

Page 27: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Testing Linear Threshold Testing Linear Threshold FnsFns

• [MORS’10] => suffices to estimate E[f(x) f(y) <x,y>] up to ± poly(ε).• Could take m random pairs and use

empirical average. - But most pairs x,y would have <x,y> ≈ n 1/2

(CLT) So would need m = Ω(n) to get within ± poly(ε).• Solution: take O(n1/2) random points

and average f(x)f(y)<x,y> over all O(n) pairs x,y.

- Concentration inequalities for U-statistics [Arcones,95] imply this works.

27© Liu Yang 2013

Page 28: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

General Testing DimensionGeneral Testing Dimension

• Testing dim characterize (up to constant factors) the intrinsic #label requests needed to test the given property w.r.t. the given distribution

• All our lower bounds are proved via testing dim

28© Liu Yang 2013

Page 29: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Minimax ArgumentMinimax Argument

• minAlgmaxf P(Alg mistaken) = maxπ0

minAlg P(Alg mistaken)

• wolg, π0=α π + (1-α) π’, π Π0,π’ Πε

• Let πS, π’S be induced distributions on labels of S.

• For a given π0,

minAlgP(Alg makes mistake|S)≤ 1-dS(π, π’)

29© Liu Yang 2013

Page 30: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Passive Testing DimPassive Testing Dim

• Define dpassive largest q in N, s.t.

• Theorem: Sample Complexity of passive testing is Θ(dpassive).

30© Liu Yang 2013

Compare with VC-dimension:Want exists set S s.t. all labelings occur at least once.

Page 31: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Active Testing DimActive Testing Dim• Fair(π,π’,U): distri. of labeled (y; l): w.p.½ choose y~πU, l= 1; w.p.½ choose y~π’U, l= 0.

• err*(H; P): err of optimal fn in H w.r.t data drawn from distri. P over labeled egs.• Given u=poly(n) unlabeled egs, dactive(u):

largest q in N s.t. • Theorem: Active testing w/ failure prob 1/8 using u unlabeled egs needs Ω(dactive(u)) label queries; can be done w/ O(u) unlabeled egs and O(dactive(u)) label queries 31© Liu Yang 2013

Page 32: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Application: Dictator fnsApplication: Dictator fns

• Theorem: For dictator functions under the uniform distribution, dactive(u)=Θ(log n) (for any large-enough u=poly(n)).

• Corollary: Any class that contains dictator functions requires log(n) queries to test in the active model, including poly-size decision trees, functions of low Fourier degree, juntas, DNFs, etc.

32© Liu Yang 2013

Page 33: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Application: Dictator fnsApplication: Dictator fns

• Theorem: For dictator functions under the uniform distribution, dactive(u)=Θ(log n) (for any large-enough u=poly(n)).

• π = unif over dictator fns• π’ = unif over all Boolean fns

33© Liu Yang 2013

Page 34: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Application: LTFs Application: LTFs

• Theorem. For LTFs under the standard n-dim Gaussian distrib, dpassive = Ω((n/logn)1/2) and dactive(u) = Ω((n/logn)1/3) (for any u=poly(n)).

- π: distrib over LTFs obtained by choosing w~N(0, Inxn) and outputting f(x) = sgn(wx). - π’: uniform distrib over all functions.

34© Liu Yang 2013

- Obtain dpassive :bound tvd(distrib of Xw/√n, N(0, Iqxq)).- Obtain dactive: similar to dictator LB but rely on strong concentration bounds on spectrum of random matrices

Page 35: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Open ProblemOpen Problem

• Matching lb/ub for active testing LTF: √n?

• Tolerant Testing ε/2 vs. ε (UINTd, LTF)• Testing LTF under general distrib.

35© Liu Yang 2013

Page 36: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Outline

• Learnability of DNF with Representation Specific Queries

- Liu: We do statistical learning for …

- Marvin: but we haven't not done well at the fundamentals, e.g. knowledge

representation. © Liu Yang 2013 36

Page 37: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Learning DNF formulas

• Poly-sized DNF: # terms = nO(1)

e.g. f=(x1∧x2)∨(x1∧x4)

- Natural form of knowledge representation- PAC-learning DNF appears to be very hard.

37© Liu Yang 2013

Your ticket : n: number of var.s Concept space C: collection of fn h: {0, 1}^n -> {0,1} Unknown target fn f*: the true labeling fn Err(h) = Px~D[h(x) ~= f*(x)] (Distri. D over X)

Best known alg in standard model is exponential over arbitrary distri; Over Unif, no known poly time alg

Page 38: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

New Models: Interaction with Oracles

38© Liu Yang 2013

- Boolean queries: K(x, y) = 1 if share some term- Numerical queries: K(x, y) = #terms share

Hi, Tim, do x and y have some term in common ?

Yes!

Imagine …

Page 39: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Query: Similarity about TYPE

39© Liu Yang 2013 Fraud Detection

Type of Query: pair of POSITIVE ex.s from a random dataset, teacher says YES if they share some term; or report how many terms they share.Question: can we efficiently learn DNF with this type of query?

Identity theft

Stolen cards

Stolen cards BIN attack

x

y

What if have similarity info about TYPE ?Fraud detection: fraudulent of same type ? YES! x and y

share a termSkimming

Term 1 of x Term 2 of x Term 3 of x

Page 40: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Warm Up: Disjoint DNF w/Boolean Queries

• Use similarity queries to partition positive ex.s into t buckets, one per term.

• Separately learn a conjunction for each bucket (intersect the pos ex.s in it)

• OR the results

40© Liu Yang 2013

Page 41: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Pos Result 1: Weakly Disjoint DNF w/Boolean Queries

- Distinguishing ex for T1: ex. sat. T1 & no other term

- Weakly disjoint: for each term, poly(n, 1/ε) fraction rand. ex.s sat. it & no other term.

- Neighbor-method: get all its neighbors in the graph and learn a conjunc.

- Neighbor-method w.p. 1-δ, produce an ε-accu. DNF if weakly disjoint.

41© Liu Yang 2013

T1Graph:- Nodes: pos examples- Edge exists if K(.) = 1

Page 42: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Hardness ResultsBoolean Queries

Thm. Learning DNF from random data under arb. distri. w/ Boolean queries is as hard as learning DNF from random data under arb. distri. w/ only labels (no queries).

42© Liu Yang 2013

m

K (giant 1, giant 2) = 1

- Group-learn: tell data from D+ or D-- Reduction from group-learn DNF in std. model to our model - How to use our alg A to group-learn ? - Simulate the oracle by always saying yes whenever there is a query made to two pos ex.s; Given the output of A, we give a group-learn alg for the original problem

n var.sn var.sn var.sn var.sn var.sn var.sn var.sn var.sn var.sn var.s

Page 43: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Hardness ResultsApprox Numerical Queries

Thm. Learning DNF from random data under arbitrary distri. w/ approx-numerical-queries is as hard as learning DNF from random data under arb. distri. w/ only labels i.e. if C is #terms xi and xj sat in common, oracle returns a value in [(1 – τ)C, (1 + τ)C].

© Liu Yang 2013 43

Page 44: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Pos Result 3: learn DNF w/ Numerical Queries

- Sample m = O((t/ε) log(t/(εδ))) landmark points

- Landmark Fi(x) is sum-of-monotone-terms fn (rm terms not sat by pos xi). Fi(·) = K(xi, ·), K is numerical query

- Use subroutine to learn hypo. hi(x) ε/(2m)-accu w.r.t. Fi.

• Subroutine: learn a sum of monotone t terms over unif., using time & samples poly(t, n, 1/ε).

- Combine all hypo.s hi to h: h(x) = 0 if hi(x) = 0 for all i, else h(x) = 1.

© Liu Yang 2013 44

Thm. Under unif distri., w/ numerical queries, can learn any poly(n)-term DNF.

f(x) = T1(x)+T2(x)+ … +Tt(x)

Page 45: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Learn Sum of Monotone Terms

Estimate Fourier coeffi. of S Inclusion check: mag. ≥ ε/(16t)?

otw

x1 | x2 |x3 |x4 |x5 |x6 |x7 |x8 |x9

S= {x1}

Outputx1∧x3∧x4∧x9

S= {x1, x2} S= {x1, x3}S= {x1, x3 x4 }

YES

S= {x1, x3 x4 ,x5} S= {x1, x3 x4 ,x6} S= {x1, x3 x4 ,x7} S= {x1, x3 x4 ,x8} S= {x1, x3 x4 ,x9}

- Greedy:

- Inclusion Check:

- Fourier coeffi. of S:

© Liu Yang 2013 45

Page 46: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Learn Sum of Monotone Terms : Greedy Alg

• Examine each parity fn of size 1 & est its Fourier coeffi. (up to θ/4 accu.). Set θ =ε/(8t)

• Place all coeffi. of mag. ≥ θ/2 into a list L1.• For j = 2, 3, ... repeat: - For each parity fn ΦS in list Lj-1 and each xi not in S,

est Fourier coeffi. of - If est. is ≥ θ/2, add it to list Lj (if not already in) - maintain list Lj: size-j parity fns w/ coeffi. mag. ≥ θ.• Construct fn g: weight sum of parities for identified

coeff. • Output fn h(x) = [g(x)]

© Liu Yang 2013 46

Inclusion check

Page 47: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Other Positive Results Binary Numeric

O(log(n)) terms DNF (any distrib.)

2-term DNF (any distrib.)✔ ✔

DNF: each var in at most O(log(n)) terms (Unif)

✔ ✔

log(n)-Junta (Unif)✔ ✔

log(n)-Junta (any distrib)✔

DNF having ≤ 2O(√log(n)) terms (Unif.)

✔ ✔

Open problems:- learn arbitrary DNF (unif, Boolean queries)?- learn arbitrary DNF (any distri. numerical queries)?

Page 48: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Outline

• Active Learning with a Drifting Active Learning with a Drifting DistributionDistribution

- If not every poem has a proof, can we at least try to make every theorem proved beautiful like a poem?

© Liu Yang 2013 48

Page 49: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Active Learning with Active Learning with a Drifting Distrib: Modela Drifting Distrib: Model

• Scenario: - Unobservable seq. of distrib.s with each - Unobservable time-indep. regular cond. distrib. represent by fn

- : an infinite seq. of indep. r. v., s.t., and cond. distrib. Of Yt given Xt satisfies

• Active learning protocol At each time t, alg is presented with Xt, and is required to

predict a label , then it may optionally request to see true label value Yt

• Interested in cumulative #mistakes up to time T and total #labels requested up to time T

© Liu Yang 2013 49

Page 50: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

x1

x2

xtx3

Data Space

D2

D1D3

Dt

x4

D4

© Liu Yang 2013 50

Distrib.Space

Page 51: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Definition and NotationsDefinition and Notations

• Instance space X = Rn • Distribution space of distributions on X• Concept space C of classifiers h: X -> {-1,1} - Assume C has VC dimension vc < ∞

• Dt: Data distrib. on X at t

• Unknown target fn h*: true labeling fn

• Errt (h) = Px~Dt [h(x) ≠ h*(x)]

• In realizable case, h* in C and errt(h*) = 0.

• For ,

© Liu Yang 2013 51

Page 52: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Def: disagreement coefficient, tvd

• The disagreement coefficient of h* under a distri. P on X, is define as, (r > 0)

• Total variation distance of probability measures P and Q on a sigma-algebra of subsets of the sample space is defined via

© Liu Yang 2013 52

Page 53: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Assumptions

• Independence of the Xt variables

• Vc-dim < ∞• Assumption 1 (totally bounded) : is totally bounded (i.e. satisfies ) - For each ε > 0, denote a minimal subset of s.t.

s.t. (i.e. a minimal ε-cover of )

• Assumption 2 (poly-covers)

where c,m ≥ 0 are constants.

Page 54: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Realizable-case Active Learning CAL

© Liu Yang 2013 54

Page 55: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Sublinear Result: Realizable Sublinear Result: Realizable CaseCase

© Liu Yang 2013 55

Theorem. If is totally bounded, then CAL,achieves an expected mistake boundAnd if , then CAL makes an E[#queries]

[Proof Sketch]:Partition D into buckets of diam < eps. Pick a time T_eps past all indices from finite buckets and all the infinite bucket has at least

Page 56: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Number of MistakesNumber of Mistakes

• Alternative scenario: - Let Pi be in bucket i

- Swap the L(ε) samples for bucket i with L(ε) samples from Pi

- L(ε) large enough so E[diam(V)]alternative < sqrt{eps}.

- Note: E[diam(V)] ≤ E[diam(V)]alternative + sumL(ε) t values||P_i – D_t|| < √ε + L(ε)*ε.

So E[diam] -> 0 as T -> ∞ - E[#mistake] - Since

Page 57: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Number of QueriesNumber of Queries

•E[#queries]

•P(make query) = E[P(DIS(Vt-1))]

•Let then and E[#queries] • =>

Page 58: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Explicit Bound: Realizable CaseExplicit Bound: Realizable Case

© Liu Yang 2013 58

Theorem. If poly-covers assumption is satisfied ( )then CAL achieves an expected mistake bound and E[#queries] such that

where [Proof Sketch]Fix any ε >0, and enumerate For t in N, let K(t) be the index k of the closest to Dt. Alternative data sequence: Let be indep., with This way all samples corresp. to distrib.s in a given bucket all came from same distri. Let V’t be the corresponding version spaces.

Page 59: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

E[#mistakes]

Classic PAC bound =>

(#previous distrib.s in Dt's bucket)So

(each bucket has at most T samples)So E[#mistakes] Take to get the stated theorem.

To bound E[#queries], again it is

just showed this is So

Again, taking gives the stated result. © Liu Yang 2013 59

Page 60: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Learning with NoiseLearning with NoiseNoise conditionsNoise conditions

•Strictly benign noise condition:

•Special case: Tsybakov's noise conditions •η satisfies strictly benign noise condition and for some c > 0 and α≥0,

•Unif Tsybakov assumption: Tsybakov Assumption is satisfied for all with the same c and α values.

© Liu Yang 2013 60

and

Page 61: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Agnostic CAL [DHM]Agnostic CAL [DHM]

© Liu Yang 2013 61

Based on subroutine:

Page 62: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Tsybakov Tsybakov Noise: Sublinear Noise: Sublinear Results & Explicit BoundResults & Explicit Bound

Theorem. If is totally bounded and η satisfies strictly benign noise condition, then ACAL achieves an excess expected mistake bound and if additionally , then ACAL makes an expected number of queries

© Liu Yang 2013 62

Theorem. If poly-covers Assumption and Unif Tsybakov assumption are satisfied, then ACAL achieves an expected excess number of mistakesACAL achieves expected #mistakes and expected #queries such that, for

Page 63: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Outline

• Transfer LearningTransfer Learning

- Do not ask what Bayesians can do for - Do not ask what Bayesians can do for Machine Learning, ask what Machine Machine Learning, ask what Machine Learning can do for BayesiansLearning can do for Bayesians

Page 64: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Transfer Learning• Principle: solving a new learning problem is easier

given that we’ve solved several already ! • How does it help? - New task directly ``related’’ to previous task [e.g., Ben-David & Schuller 03; Evgeniou, Micchelli, & Pontil 2005] - Previous tasks give us useful sub-concepts [e.g., Thrun 96]

- Can gather statistical info on the variety of concepts [e.g., Baxter 97; Ando & Zhang 04]

• Example: Speech Recognition - After training a few times, figured out the dialects. - Next time, just identify the dialect. - Much easier than training a recognizer from scratch

Page 65: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

prior

h1*

x11,y1

1 … x1k,y1

k

Task 1

hT*

xT1,yT

1 … xTk,yT

k

Task T

Model of Transfer Learning Motivation: Learners often Not Too Altruistic

h2*

x21,y2

1 … x2k,y2

k

Task 2

Layer 1: draw task i.i.d. from unknown prior

Layer 2: per task, draw data i.i.d. from target

Better Estimate of Prior !!

Page 66: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

- Marvin: so you assume learning French is - Marvin: so you assume learning French is similar to learning English?similar to learning English?

- Liu: It indeed seems many English words - Liu: It indeed seems many English words have a French counterpart …have a French counterpart …

Page 67: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Identifiability of priors from joint distribs

• Let prior π be any distribution on C - example: (w, b) ~ multivariate normal

• Target h*π ~ π

• Data X = (X1, X2, …) i.i.d. D indep h*π

• Z(π) = ((X1, h*π (X1), (X2, h*π (X2), …).

• Let [m] = {1, …, m}.

• Denote XI = {Xi}i € I (I : subset of natural numbers)

• ZI (π) = {(Xi, h*π (Xi))}i € I Theorem: Z[VC] (π1) =d Z[VC] (π2) iff π1 = π2.

Page 68: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Identifiability of priors by VC-dim joint distri.

• Threshold:

- for two points x1, x2, if x1 < x2, then

Pr(+,+)=Pr(+.), Pr(-,-)=Pr(.-), Pr(+,-)=0, So Pr(-,+)=Pr(.+)-Pr(++) = Pr(.+)-Pr(+.) - for any k > 1 points, can directly to reduce number of labels in the joint prob from k to 1 P(-----------(-+)+++++++++++++++++)

= P( (-+) ) = P( (.+) ) - P( (++) ) = P( (.+) ) - P( (+.) ) + P( (+-) ) (unrealized labeling !!) = P( (.+) ) - P( (+.) )

---------------------0 1

++++++++++++++++

Page 69: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

• Theorem: Z[VC] (π1) =d Z[VC] (π2) iff π1 = π2.

Proof Sketch• Let ρm(h,g) = 1/m Σi=1

m II(h(Xm) ≠ g(Xm)) Then vc < ∞ implies w.p.1 forall h, g € C with h ≠ g limm -> ∞ ρm(h,g) = ρ(h,g) > 0• ρ is a metric on C by assumption, so w.p.1 each h in C labels ∞-seq (X1, X2 …) distinctly

(h(X1), h(X2), …)• => w.p.1 conditional distribution of the label seq Z(π)|

X identifies π => distrib of Z(π) identifies π i.e. Z∞ (π1) =d Z∞ (π2) implies π1 = π2

Page 70: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Identifiability of Priors from Joint Distributions

lower–dim cond distrib

y’ closer to ỹ

Page 71: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Identifiability of Priors from Joint Distributions

Page 72: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Identifiability of Priors from Joint Distributions

Page 73: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Transfer Learning Setting• Collection Π of distribs on C. (known)• Target distrib π* € Π. (unknown)

• Indep target fns h1*, …, hT* ~ π* (unknown)

• Indep i.i.d. D data sets X(t) = (X1(t), X2

(t), …), t €[T].

• Define Z(t) = ((X1(t), ht*(X1

(t))), (X2(t), ht*(X2

(t))), …).

• Learning alg. “gets” Z(1), then produces ĥ1, then “gets” Z(2), then produces ĥ2, etc. in sequence.

• Interested in: values of ρ(ĥt, h*(t)), and the

number of h*t (Xj(t)) value alg. needs to access.

Page 74: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Estimating the prior• Principle: learning would be easier if know π*• Fact: π* is identifiable by distrib of Z[VC]

(t)

• Strategy: Take samples Z[VC](i) from past tasks 1,

…, t-1, use them to estimate distrib of Z[VC](i),

convert that into an estimate π’t-1

of π*,

• Use π’t-1

in a prior-dependent learning alg for

new task ht*• Assume Π is totally bounded in total variation• Can estimate π* at a bounded rate:

|| π* - π’t||< δt converges to 0 (holds whp)

Page 75: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Transfer Learning• Given a prior-dependent learning A(ε, π), with E[# labels accessed] =Λ(ε, π) and producing ĥ with E[ρ(ĥ, h*)]≤εFor t = 1,…, T If δt-1 > ε/4,

run prior-indep learning on Z[VC/ε](t) to get ĥt

Else let π’’t = argminπ € B(π’t-1, δt-1) Λ(ε/2, π) and

run A(ε/2, π’’t) on Z(t) to get ĥt

Theorem: Forall t, E[ρ(ĥt, ht*)] ≤ ε, and

limsupT -> ∞E[#labels accessed]/T ≤Λ(ε/2, π*) + vc.

Page 76: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

- Yonatan: I’ll send you an email to - Yonatan: I’ll send you an email to summarize what we just discussed.summarize what we just discussed.

- Liu: Thank you but I now invented a model - Liu: Thank you but I now invented a model to transfer knowledge with provable to transfer knowledge with provable guarantees; guarantees;

so I use that all the time.so I use that all the time.

- Yonatan: But that’s asymptotic guarantee. - Yonatan: But that’s asymptotic guarantee. My life span is finite. So I’m still gonna to My life span is finite. So I’m still gonna to send you an email. send you an email.

Page 77: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Outline

• Online Allocation and Pricing with Online Allocation and Pricing with Economies of ScaleEconomies of Scale

© Liu Yang 2013 77

- Jamie Dimon: Economies of scale are a good thing. - Jamie Dimon: Economies of scale are a good thing. If we didn't have them, we'd still be living in tents If we didn't have them, we'd still be living in tents and eating buffalo.and eating buffalo.

Page 78: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

SettingSetting

• Christmas season

- Nov: customer survey

- Dec: purchasing and selling

• Buyers arrive online one at a time w/ val.s on items sampled iid from some unknown distri.

Page 79: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Thrifty Santa Claus Thrifty Santa Claus

• Each shopper wants only one item though it might prefer some items than others

• Minimize total cost to seller

• Buyers: binary valuation• Goal of seller: sat. everyone

Page 80: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Hardness: Set-CoverHardness: Set-Cover

• If costs much more rapidly, then even if all customers' val.s known up front, would be (roughly) a set-cover problem and could not hope to achieve cost o(log n) times optimal.

• Natural case: for each good, cost (to the seller) for ordering T copies is sublinear in T. Production

costMarginal cost

#copies#copies

α = 1

α = 0

α in (0, 1)

α = 1

α = 0 α in (0, 1)

Page 81: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Thrifty Santa Claus : ResultsThrifty Santa Claus : Results

• Mar-cost non-increa, exists optimal strategy? - order items by some perm.; give new buyer earliest item it desires in the perm.

• What if n (#buyers) >> k (#items) AND mar-cost not too rapidly? (rate 1/Tα for 0≤α<1)

- can efficiently perform allocation w/ cost ≤ a const. factor greater than OPT

Page 82: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Algorithm

• Alg: use initial buyers to learn about distri. determine how best to allocate to new buyers.

• If cost fn c(x) = Σi=1 x 1/iα, for α in [0,1)

- run greedy weighted set cover => total cost ≤ 1/(1-α) {± OPT}.

• Essentially smooth variant of set-cover• If ave-cost within some factor of mar-cost,

have a greedy alg w/ const. approx ratio

Page 83: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Sample Complexity Analysis

• How complicated the allocation rule needs to be to achieve good perf.?

Theorem

Page 84: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Outline

• Factor Models for Correlated Factor Models for Correlated Auctions Auctions

© Liu Yang 2013 84

Page 85: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

The ProblemThe Problem

• Auctioneer sells good to a group of n buyers.

• Seller wants to maximize his revenue. • Each buyer maximize his utility of getting

good: val. - price• Seller doesn’t know exact val.s of players • He knows distri D from which vec. of val.s

(v1, …, vn) is drawn.

Page 86: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Our ContributionOur Contribution

• When D is a product distri., - Myerson gives dominant strategy

truthful auction• General correlated distr.s, not known - how to create truthful auctions - how to use player j’s bid to capture

info about player i. • What if correlation between buyer val.s

driven by common factors?

Page 87: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

ExampleExample• Two firms produce same type of good• Each firm’s “value”: production cost• need to hire workers (W) & rent capital (Z)

• li: #workers firm i needs to produce one unit

• Ki: amount of capital firm needs

• εi:fixed costs unique to firm i.

• firm’s costs: Ci = liW + kiZ + εi

• firms’ costs correlated : hire workers & rent capital from the same pool.

Page 88: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

The Factor ModelThe Factor Model

• Factor model as V = F + U where

- V: vec. of observations - λ: matrix of coefficients - F : vec. of factors - U: vec. of idiosyncratic components ind. of each

other & ind. of the factors

Page 89: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

DiscussionsDiscussions

• Possible that: - Designer & bidders might not know common

factors - Bidders might only know their val. - seller only knows joint distri. of bidders’ val.s,

• Seller RECOVER factor model by making inferences over observed bids.

• Aggregate info.: common factors inferred from collective knowledge of all players.

Page 90: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

The AuctionThe Auction

Page 91: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

The Auction (cont.)The Auction (cont.)

• Thm: When correlation follows this factor model, this auction is dominant strategy truthful, ex-post individually rational, and asymptotically optimal.

Page 92: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Dominant Strategy Dominant Strategy TruthfulnessTruthfulness

• Toss a coin & choose between: - 2nd price auction: truthful - mechanism M estimates factors from a

random set of bidders S: bidders in S receive utility 0 regardless of

allocation & price output by M • Players S incentivized truthful for small

incentive they get from participating in 2nd price auction.

Page 93: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Dominant Strategy Dominant Strategy Truthfulness (Cont.)Truthfulness (Cont.)

• Remaining bidders set R = {1, …, n} - S receive incentives from both 2nd price auction and mechanism M.

• M offers them allocation and price vec.s x(bR), p(bR) by running Myerson (bR,VR |^f) on players' bids, and on cond. distri.s estimated for these players.

• No player in R can influence the estimated conditional distri. VR|^f, and Myerson's optimal auction is truthful.

Page 94: Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013

Thanks !

94© Liu Yang 2013

Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law.