24
Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison [email protected] 2013

Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Three Assertionsabout Interactive Machine Learning

Xiaojin Zhu

Department of Computer SciencesUniversity of Wisconsin-Madison

[email protected]

Page 2: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Assertion 1: Humans can be modeled with statisticallearning theory

I Unifying math behind cognitive science and machine learning

Page 3: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Example 1a: Human Rademacher Complexity(grenade, A), (meadow, A), (skull, B), (conflict, B), (queen, B)

I “learning random labels” (x1, σ1) . . . (xn, σn)I Rademacher complexity (similar to VC dimension)

Radn(F ) ≈

∣∣∣∣∣ 2nn∑i=1

σif(xi)

∣∣∣∣∣. . . of our mind!

I Larger Rademacher complexity → worse generalization errorbound (overfitting) [ZRG NIPS09]

rape killer funeral · · · fun laughter joy

0 10 20 30 400

0.5

1

1.5

2

n

Rad

emac

her

com

plex

ity

0 10 20 30 400

0.5

1

1.5

2

n

Rad

emac

her

com

plex

ity

Page 4: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Example 1a: Human Rademacher Complexity(grenade, A), (meadow, A), (skull, B), (conflict, B), (queen, B)

I “learning random labels” (x1, σ1) . . . (xn, σn)I Rademacher complexity (similar to VC dimension)

Radn(F ) ≈

∣∣∣∣∣ 2nn∑i=1

σif(xi)

∣∣∣∣∣. . . of our mind!

I Larger Rademacher complexity → worse generalization errorbound (overfitting) [ZRG NIPS09]

rape killer funeral · · · fun laughter joy

0 10 20 30 400

0.5

1

1.5

2

n

Rad

emac

her

com

plex

ity

0 10 20 30 400

0.5

1

1.5

2

n

Rad

emac

her

com

plex

ity

Page 5: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Example 1b: Human Semi-Supervised Learning

I Humans learn supervised first, then

I . . . decision boundary shifts to distribution trough in test data

I Can be explained by a variety of semi-supervised machinelearning models [GRZ ToCS13]

−3 −2 −1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

left−shifted

Gaussian mixture

range examples

test examples

x−1 −0.5 0 0.5 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

perc

ent c

lass

2 r

espo

nses

test−1, alltest−2, L−subjectstest−2, R−subjects

Page 6: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Example 1c: Human Active Learning

[CKNQRZ NIPS08]

Passive learning inf θn supθ∈[0,1] E[|θn − θ|] ≥ 14

(1+2ε1−2ε

)2ε1

n+1

Active learning supθ∈[0,1] E[|θn − θ|] ≤ 2

(√12 +

√ε(1− ε)

)nnoise ε = 0 ε = 0.05 ε = 0.1 ε = 0.2 ε = 0.4

HumanPassive

10 20 30 40−1

−0.5

0

0.5

1

10 20 30 40−1

−0.5

0

0.5

1

10 20 30 40−1

−0.5

0

0.5

1

10 20 30 40−1

−0.5

0

0.5

1

10 20 30 40−1

−0.5

0

0.5

1

HumanActive

10 20 30 40−1

−0.5

0

0.5

1

10 20 30 40−1

−0.5

0

0.5

1

10 20 30 40−1

−0.5

0

0.5

1

10 20 30 40−1

−0.5

0

0.5

1

10 20 30 40−1

−0.5

0

0.5

1

Page 7: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Assertion 2: There is a theoretically optimal way to teach

Human teaches machine (interactive ML)Machine teaches human (education)

Page 8: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Example 2: 1D threshold function

I Passive learning (xi, yi)iid∼ p, risk ≈ O( 1

n)

I Active learning risk ≈ 12n

I Minimum teaching: n = 2. (teaching dimension)

I Alternatively: easy to hard (curriculum learning, fading,parentese)

Page 9: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

A formula for optimal teaching

1. World: p(x, y | θ∗), loss function `(f(x), y)

2. Learner: makes prediction f(x | data)

3. Teacher:I clairvoyant, knows everything aboveI can only teach by examples (x, y)I goal: choose the least-effort teaching set D = (x, y)1:n to

minimize the learner’s future loss (risk):

minD

Eθ∗ [`(f(x | D), y)] + effort(D)

I if the future loss approaches Bayes risk, D is a teaching setand n is the (generalized) teaching dimension

[KZM NIPS11, Z arXiv13]

Page 10: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Assertion 3: Even when human teachers are not optimal,they are not iid

. . . and machine learners should take advantage of thatnon-iidness.

Page 11: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Example 3: Feature Volunteering (Interactive ML)

[JZSR ICML13]

Page 12: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Example 3: Sampling with Reduced Replacement

Page 13: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Example 3: Sampling with Reduced Replacement

Page 14: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Example 3: Sampling with Reduced Replacement

Page 15: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Example 3: Sampling with Reduced Replacement

Page 16: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Example 3: Sampling with Reduced Replacement

Page 17: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

References

R. Castro, C. Kalish, R. Nowak, R. Qian, T. Rogers, and X. Zhu.Human active learning.In Advances in Neural Information Processing Systems (NIPS) 22. 2008.

B. R. Gibson, T. T. Rogers, and X. Zhu.Human semi-supervised learning.Topics in Cognitive Science, 5(1):132–172, 2013.

K.-S. Jun, X. Zhu, B. Settles, and T. Rogers.Learning from human-generated lists.In The 30th International Conference on Machine Learning (ICML), 2013.

F. Khan, X. Zhu, and B. Mutlu.How do humans teach: On curriculum learning and teaching dimension.In Advances in Neural Information Processing Systems (NIPS) 25. 2011.

X. Zhu, T. T. Rogers, and B. Gibson.Human Rademacher complexity.In Advances in Neural Information Processing Systems (NIPS) 23. 2009.

Page 18: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Three Assertions

1. Humans can be modeled with statistical learning theory.

2. There is a theoretically optimal way to teach.

3. Even when human teachers are not optimal, they are not iid.

Page 19: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Capacity

VC-dimension

I F : a family of binary classifiers

I VC-dimension V C(F ): size of the largest set that F canshatter

I With probability at least 1− δ,

supf∈F

R(f)−Rn(f) ≤ 2

√2V C(F ) log n+ V C(F ) log 2e

V C(F ) + log 2δ

n.

I R(f): error of f in the future

I Rn(f): error of f on a training set of size n

Page 20: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Capacity

Rademacher complexity

I σ1, . . . , σn : P (σi = 1) = P (σi = −1) = 12

I Rademacher complexity

Radn(F ) = Eσ,x

(supf∈F

∣∣∣∣∣ 1nn∑i=1

σif(xi)

∣∣∣∣∣).

I With probability at least 1− δ,

supf∈F|Rn(f)−R(f)| ≤ 2Radn(F ) +

√log(2/δ)

2n.

Page 21: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Machine learning → human learning

I f : you categorize x by f(x)

I F : all the classifiers in your mind

I Rn(f): how did you do in class

I R(f): how well can you do outside classI Capacity: can we measure it in humans?

I V C(F ): too brittle (find one dataset of size n) andcombinatorial (verify shattering)

I Others may behave better, e.g., Radn(F )

Page 22: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Overfitting indicator

0 0.5 1 1.5 20

0.5

1

1.5

Rademacher complexity

e −

e^

Shape,40Word,40 Shape,5

Word,5

boundobserved

I e test set error, e training set errorI generalization error bound holdsI actual overfitting tracks bound (nice but not predicted by

theory)

The study of capacity may

I constrain cognitive modelsI understand groups differ in age, health, education, etc.

Page 23: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Human semi-supervised learning, the other way aroundHuman unsupervised learning first

−8 −4.8 1.6 8

p(x)

−8 −1.6 4.8 8

p(x)

−8 −1.6 8

p(x)

−8 −1.6 8

p(x)

−8 −4.8 1.6 8x

time

−8 −1.6 4.8 8x

time

−8 −1.6 8x

time

−8 −1.6 8x

time

trough peak uniform converge. . . influences subsequent (identical) supervised learning task

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

mea

n ac

cura

cy ±

std

err

troughconvergeuniformpeak

Page 24: Three Assertions about Interactive Machine Learning · Three Assertions about Interactive Machine Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison

Human teacher behaviors

strategy boundary curriculum linear positive

“graspability” (n = 31) 0% 48% 42% 10%“lines” (n = 32) 56% 19% 25% 0%