34
Gaussian Naive Bayes and Linear Regression Instructor: Alan Ritter Many Slides from Tom Mitchell

Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Gaussian Naive Bayes and Linear Regression

Instructor: Alan Ritter

Many Slides from Tom Mitchell

Page 2: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

What if we have continuous Xi ? Eg., image classification: Xi is real-valued ith pixel

Page 3: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

What if we have continuous Xi ? Eg., image classification: Xi is real-valued ith pixel Naïve Bayes requires P(Xi | Y=yk), but Xi is real (continuous) Common approach: assume P(Xi | Y=yk) follows a Normal

(Gaussian) distribution

Page 4: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Gaussian Distribution (also called “Normal”) p(x) is a probability density function, whose integral (not sum) is 1

Page 5: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

•  Train Naïve Bayes (examples) for each value yk

estimate* for each attribute Xi estimate

•  class conditional mean , variance •  Classify (Xnew)

Gaussian Naïve Bayes Algorithm – continuous Xi (but still discrete Y)

* probabilities must sum to 1, so need estimate only n-1 parameters...

Page 6: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Estimating Parameters: Y discrete, Xi continuous

Maximum likelihood estimates: jth training example

δ()=1 if (Yj=yk) else 0

ith feature kth class

Page 7: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

GNB Example: Classify a person’s cognitive state, based on brain image

•  reading a sentence or viewing a picture? •  reading the word describing a “Tool” or “Building”? •  answering the question, or getting confused?

Page 8: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Y is the mental state (reading “house” or “bottle”) Xi are the voxel activities, this is a plot of the µ’s defining P(Xi | Y=“bottle”)

fMRI activation

high

below average

average

Mean activations over all training examples for Y=“bottle”

Page 9: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Classification task: is person viewing a “tool” or “building”?

p4 p8 p6 p11 p5 p7 p10 p9 p2 p12 p3 p10

0.10.20.30.40.50.60.70.80.9

1

Participants

Cla

ssifi

catio

n ac

cura

cy statistically significant

p<0.05

Cla

ssifi

catio

n ac

cura

cy

Page 10: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Where is information encoded in the brain?

Accuracies of cubical 27-voxel classifiers

centered at each significant

voxel [0.7-0.8]

Page 11: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Naïve Bayes: What you should know •  Designing classifiers based on Bayes rule

•  Conditional independence –  What it is –  Why it’s important

•  Naïve Bayes assumption and its consequences –  Which (and how many) parameters must be estimated under

different generative models (different forms for P(X|Y) ) •  and why this matters

•  How to train Naïve Bayes classifiers –  MLE and MAP estimates –  with discrete and/or continuous inputs Xi

Page 12: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Naïve Bayes with Log Probabilities

Page 13: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

What if we want to calculate posterior log-probabilities?

Page 14: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

What if we want to calculate posterior log-probabilities?

Page 15: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

What if we want to calculate posterior log-probabilities?

= logP (c) +nX

i=1

logP (xi|c)� log

"X

c0

P (c0)nY

i=1

P (xi|c0)#

Page 16: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

What if we want to calculate posterior log-probabilities?

= logP (c) +nX

i=1

logP (xi|c)� log

"X

c0

P (c0)nY

i=1

P (xi|c0)#

Page 17: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Log Exp Sum Trick

• We have: a bunch of log probabilities. – log(p1), log(p2), log(p3), … log(pn)

• We want: log(p1 + p2 + p3 + … pn)

Page 18: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Log Exp Sum Trick:

Page 19: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

What if we want to calculate posterior log-probabilities?

= logP (c) +nX

i=1

logP (xi|c)� log

"X

c0

P (c0)nY

i=1

P (xi|c0)#

Page 20: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Linear Regression

Page 21: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Regression So far, we’ve been interested in learning P(Y|X) where Y has

discrete values (called ‘classification’) What if Y is continuous? (called ‘regression’) •  predict weight from gender, height, age, …

•  predict Google stock price today from Google, Yahoo, MSFT prices yesterday

•  predict each pixel intensity in robot’s current camera image, from previous image and previous action

Page 22: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Regression Wish to learn f:X!Y, where Y is real, given {<x1,y1>…<xn,yn>} Approach: 1.  choose some parameterized form for P(Y|X; θ)

( θ is the vector of parameters)

2.  derive learning algorithm as MCLE or MAP estimate for θ

Page 23: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

1. Choose parameterized form for P(Y|X; θ)

Assume Y is some deterministic f(X), plus random noise Therefore Y is a random variable that follows the distribution and the expected value of y for any given x is f(x)

Y

X

where

Page 24: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Training Linear Regression

How can we learn W from the training data?

Page 25: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Maximum Likelihood Estimation Recipe

1. Use the log-likelihood 2. Differentiate with

respect to the parameters

3. *Equate to zero and solve

22

*Often requires numerical approximation (no closed form solution)

Page 26: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Training Linear Regression

How can we learn W from the training data? Learn Maximum Conditional Likelihood Estimate! where

Page 27: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Training Linear Regression Learn Maximum Conditional Likelihood Estimate

where

Page 28: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Training Linear Regression Learn Maximum Conditional Likelihood Estimate

where so:

Page 29: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers
Page 30: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Gradient Descent: Batch gradient: use error over entire training set D Do until satisfied:

1. Compute the gradient

2. Update the vector of parameters:

Stochastic gradient: use error over single examples Do until satisfied: 1. Choose (with replacement) a random training example

2. Compute the gradient just for :

3. Update the vector of parameters: Stochastic approximates Batch arbitrarily closely as Stochastic can be much faster when D is very large Intermediate approach: use error over subsets of D

Page 31: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Training Linear Regression Learn Maximum Conditional Likelihood Estimate

Can we derive gradient descent rule for training?

Page 32: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Normal Equation

w⇤ = (XTX)�1XT y

Page 33: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

How About MAP instead of MLE?

w⇤ = argmaxW

X

l

lnP (Y l|X l;W ) + lnN(W |0, I)

= argmaxW

X

l

lnP (Y l|X l;W )� cX

i

w2i

Page 34: Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

Regression - What you should know

• MLE -> Sum of Squared Errors • MAP -> Sum of Squared Errors minus sum

of squared weights • Learning is an optimization problem once

we choose objective function • Maximize Data Likelihood • Maximize Posterior Prob of Weights

• Can use Gradient Descent as General Learning Algorithm