Incomemagdon/courses/LFD-Slides/SlidesLect04-C.pdf · 2019-09-30 · M Age Income E out (h M) # D Age Income E in (h 1) = 2 9 Age Income E in (h 2) = 0 Age Income E in (h 3) = 5 9

Learning From DataLecture 4

Real Learning is Feasible

Real Learning vs. VerificationThe Two Step Solution to Learning

Closer to Reality: Error and Noise

M. Magdon-IsmailCSCI 4100/6100

recap: Verification

h

Age

Incom

e

Eout(h)

↓ D

Age

Incom

e

Ein(h) =29

Hoeffding: Eout(g) ≈ Ein(g) (with high probability)

P[|Ein(h)− Eout(h)| > ǫ] ≤ 2e−2Nǫ2.

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 2 /16 Recap: selection bias and coins −→

recap: Selection Bias Illustrated with Coins

Coin tossing example:

• If we toss one coin and get no heads, its very surprising. P =1

2N

We expect it is biased: P[heads] ≈ 0.

• Tossing 70 coins, and find one with no heads. Is it surprising? P = 1−(1− 1

2N

)70

Do we expect P[heads] ≈ 0 for the selected coin?

Similar to the “birthday problem”: among 30 people, two will likely share the same birthday.

• This is called selection bias.Selection bias is a very serious trap. For example medical screening.

Search Causes Selection Bias

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 3 /16 Learning: finite model −→

Real Learning – Finite Learning Models

h1

Age

Incom

e

Eout(h1)

h2

Age

Incom

e

Eout(h2)

h3

Age

Incom

e

Eout(h3)

· · ·

hM

Age

Incom

e

Eout(hM)

↓ D

Age

Inco

me

Ein(h1) =29

Age

Inco

me

Ein(h2) = 0

Age

Inco

me

Ein(h3) =59

· · ·Age

Inco

me

Ein(hM) = 69

Pick the hypothesis with minimum Ein; will Eout be small?

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 4 /16 Hoeffding: finite model −→

Hoeffding says that Ein(g) ≈ Eout(g) for Finite H

P [|Ein(g)−Eout(g)| > ǫ] ≤ 2|H|e−2ǫ2N , for any ǫ > 0.

P [|Ein(g)−Eout(g)| ≤ ǫ] ≥ 1− 2|H|e−2ǫ2N , for any ǫ > 0.

We don’t care how g was obtained, as long as it is from H

Some Basic ProbabilityEvents A,B

ImplicationIf A =⇒ B (A ⊆ B) then P[A] ≤ P[B].

Union BoundP[A or B] = P[A ∪ B] ≤ P[A] + P[B].

Bayes’ Rule

P[A|B] =P[B|A] · P[A]

P[B]

Proof: Let M = |H|.

The event “|Ein(g)−Eout(g)| > ǫ” implies“|Ein(h1)− Eout(h1)| > ǫ” OR . . .OR “|Ein(hM)− Eout(hM)| > ǫ”

So, by the implication and union bounds:

P[|Ein(g)− Eout(g)| > ǫ] ≤ P[

M

ORm=1

|Ein(hM)− Eout(hM)| > ǫ

]

≤M∑

m=1

P[|Ein(hm)−Eout(hm)| > ǫ],

≤ 2Me−2ǫ2N .

(The last inequality is because we can apply the Hoeffding bound to each summand)

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 5 /16 Hoeffding as error bar −→

Interpreting the Hoeffding Bound for Finite |H|

P [|Ein(g)−Eout(g)| > ǫ] ≤ 2|H|e−2ǫ2N , for any ǫ > 0.

P [|Ein(g)−Eout(g)| ≤ ǫ] ≥ 1− 2|H|e−2ǫ2N , for any ǫ > 0.

Theorem. With probability at least 1− δ,

Eout(g) ≤ Ein(g) +

√1

2Nlog

2|H|δ

.

We don’t care how g was obtained, as long as g ∈ H

Proof: Let δ = 2|H|e−2ǫ2N . Then

P [|Ein(g)− Eout(g)| ≤ ǫ] ≥ 1− δ.

In words, with probability at least 1− δ,

|Ein(g)−Eout(g)| ≤ ǫ.

This implies

Eout(g) ≤ Ein(g) + ǫ.

From the definition of δ, solve for ǫ:

ǫ =

√1

2Nlog

2|H|δ

.

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 6 /16 Ein is close to Eout for small H −→

Ein Reaches Outside to Eout when |H| is Small

Eout(g) ≤ Ein(g) +

√1

2Nlog

2|H|δ

.

If N ≫ ln |H|, then Eout(g) ≈ Ein(g).

• Does not depend on X , P (x), f or how g is found.

• Only requires P (x) to generate the data points independently and also the test point.

What about Eout ≈ 0?

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 7 /16 2 step approach −→

The 2 Step Approach to Getting Eout ≈ 0:

(1) Eout(g) ≈ Ein(g).

(2) Ein(g) ≈ 0.

Together, these ensure Eout ≈ 0.

How to verify (1) since we do not know Eout

– must ensure it theoretically - Hoeffding.

We can ensure (2) (for example PLA)– modulo that we can guarantee (1)

There is a tradeoff:

• Small |H| =⇒ Ein ≈ Eout

• Large |H| =⇒ Ein ≈ 0 is more likely.in-sample error

model complexity√12N

log 2|H|δ

out-of-sample error

|H|

Error

|H|∗

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 8 /16 Summary: feasibility of learning −→

Feasibility of Learning (Finite Models)

• No Free Lunch: can’t know anything outside D, for sure.

• Can “learn” with high probability if D is i.i.d. from P (x).Eout ≈ Ein (Ein can reach outside the data set to Eout).

•We want Eout ≈ 0.

• The two step solution. We trade Eout ≈ 0 for 2 goals:

(i) Eout ≈ Ein;

(ii) Ein ≈ 0.

We know Ein, not Eout, but we can ensure (i) if |H| is small.

This is a big step!

•What about infinite H - the perceptron?

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 9 /16 Complex f are harder to learn −→

“Complex” Target Functions are Harder to Learn

What happened to the “difficulty” (complexity) of f?

• Simple f =⇒ can use small H to get Ein ≈ 0 (need smaller N).

• Complex f =⇒ need large H to get Ein ≈ 0 (need larger N).

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 10 /16 Learning setup with probability −→

Revising the Learning Problem – Adding in Probability

xx1,x2, . . . ,xN

g(x) ≈ f(x)

UNKNOWN TARGET FUNCTION

f : X 7→ Y

TRAINING EXAMPLES

(x1, y1), (x2, y2), . . . , (xN , yN )

UNKNOWNINPUT DISTRIBUTION

P (x)

FINALHYPOTHESIS

g

LEARNINGALGORITHM

A

HYPOTHESIS SET

H

yn = f(xn)

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 11 /16 Error and Noise −→

Error and Noise

Error Measure: How to quantify that h ≈ f .

Noise: yn 6= f(xn).

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 12 /16 Finger print example −→

Finger Print Recognition

f

+1 you

−1 intruder

Two types of error.

f

+1 −1

h+1 no error false accept

−1 false reject no error

In any application you need to think about how to penalize each type of error.

f+1 −1

h+1 0 1−1 10 0

f+1 −1

h+1 0 1000−1 1 0

Supermarket CIA

Take AwayError measure is specified by the user.

If not, choose one that is– plausible (conceptually appealing)

– friendly (practically appealing)

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 13 /16 Pointwise errors −→

Almost All Error Mearures are Pointwise

Compare h and f on individual points x using a pointwise error e(h(x), f(x)):

Binary error: e(h(x), f(x)) = Jh(x) 6= f(x)K (classification)

Squared error: e(h(x), f(x)) = (h(x)− f(x))2 (regression)

In-sample error:

Ein(h) =1

N

N∑

n=1

e(h(xn), f(xn)).

Out-of-sample error:Eout(h) = Ex[e(h(x), f(x))].

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 14 /16 Noisy targets −→

Noisy Targets

age 32 years

gender male

salary 40,000

debt 26,000

years in job 1 year

years at home 3 years

. . . . . .

Approve for credit?

Consider two customers with the same credit data.

They can have different behaviors.

The target ‘function’ is not a deterministic function but a stochastic function.

‘f(x)’ = P (y|x)

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 15 /16 Learning setup with error and noise −→

Learning Setup with Error Measure and Noisy Targets

x1,x2, . . . ,xN

TRAINING EXAMPLES

(x1, y1), (x2, y2), . . . , (xN , yN )

UNKNOWNINPUT DISTRIBUTION

P (x)

UNKNOWN TARGET DISTRIBUTION(target function f plus noise)

P (y | x)

x

g(x) ≈ f(x)

FINALHYPOTHESIS

g

HYPOTHESIS SET

H

LEARNINGALGORITHM

A

ERRORMEASURE

yn ∼ P (y | xn)

c© AML Creator: Malik Magdon-Ismail Real Learning is Feasible: 16 /16

Documents

Incomemagdon/courses/LFD-Slides/SlidesLect04-C.pdf · 2019-09-30 · M Age Income E out (h M) # D Age Income E in (h 1) = 2 9 Age Income E in (h 2) = 0 Age Income E in (h 3) = 5 9