21
Theory Naïve Bayes in SQL Naïve Bayes Classification Nickolai Riabov, Kenneth Tiong Brown University Fall 2013 Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Naïve Bayes Classification

Nickolai Riabov, Kenneth Tiong

Brown University

Fall 2013

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 2: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Structure of the Talk

Theory of Naïve Bayes classificationNaive Bayes in SQL

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 3: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Notation

X – Set of features of the dataY – Set of classes of the data

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 4: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Bayes’ Theorem

P(y |x) = P(x |y)P(y)P(x)

P(y) – Prior probability of being in class yP(x) – Probability of features xP(x |y) – Likelihood of features x given class yP(y |x) – Posterior probability of y

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 5: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Maximum a posteriori estimate

Based on Bayes’ theorem, we can compute which of theclasses y maximizes the posterior probability

y∗ = arg maxy∈Y

P(y |x)

= arg maxy∈Y

P(x |y)P(y)P(x)

= arg maxy∈Y

P(x |y)P(y)

(Note: we can drop P(x) since it is common to allposteriors)

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 6: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Commonality with maximum likelihood

Assume that all classes are equally likely a priori:

P(y) =1

# of elements in Y∀ y ∈ Y

Then,y∗ = arg max

y∈YP(x |y)

That is, y∗ is the y that maximizes the likelihood function

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 7: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Desirable Properties of the Bayes Classifier

Incrementality: Each new element of the training setleads to an update in the likelihood function. This makesthe estimator robustCombines Prior Knowledge and Observed DataOutputs a probability distribution in addition to aclassification

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 8: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Bayes Classifier

Assumption: Training set consists of instances ofdifferent classes y that are functions of features x(In this case, assume each point has k features, andthere are n points in the training set)Task: Classify a new point x:,n+1 as belonging to a classyn+1 ∈ Y on the basis of its features by using a MAPclassifier

y∗ ∈ arg maxyn+1∈Y

P(x1,n+1, x2,n+1, · · · , xk ,n+1|yn+1)P(yn+1)

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 9: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Bayes Classifier

P(y) can either be externally specified (i.e. it canactually be a prior), or can be estimated as thefrequency of classes in the training setP(x1, x2, · · · , xk |y) has O(|X |k |Y |) parameters – canonly be estimated with a very large number of datapoints

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 10: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Bayes Classifier

Can reduce the dimensionality of the problem byassuming that features are conditionally independentgiven the class (this is the Naïve Bayes Assumption)

P(x1, x2, · · · , xk |y) =k∏

i=1

P(xi |y)

Now, there’s only O(|X ||Y |) parameters to estimateIf the distribution of x1, · · · xn|y is continuous, this resultis even more important

P(x1, x2, · · · , xk |y) has to be estimatednonparametrically; this method is very sensitive tohigh-dimensional problems

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 11: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Bayes Classifier

Learning step consists of estimating P(xi |y)∀i ∈ {1, 2, · · · , k}Data with unknown class is classified by computing they∗ that maximizes the posterior

y∗ ∈ arg maxyn+1∈Y

P(yn+1)k∏

i=1

P(xn+1,i |yn+1)

Note: Due to underflow, the above is usually replacedwith the numerically tractable expression

y∗ ∈ arg maxyn+1∈Y

ln(P(yn+1)) +k∑

i=1

ln(P(xn+1,i |yn+1))

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 12: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Example

Classifying emails into spam or ham

Training set: n tuples that contain the text of the email and itsclass

xi,j =

{1 if word i in email j0 otherwise

; yj =

{1 if ham0 if spam

Calculate likelihood of each word by class:

P(xi |y = 1) =

∑nj=1 xi,j · yj∑n

j=1 yj

P(xi |y = 0) =

∑nj=1 xi,j · (1− yj)∑n

j=1(1− yj)

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 13: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Example

Define prior, calculate numerator of posterior probability:

P(yn+1 = 1|x1,n+1, x2,n+1, · · · , xk ,n+1)

∝ P(yn+1 = 1)k∏

i=1

P(xi,n+1|yn+1 = 1)

P(yn+1 = 0|x1,n+1, x2,n+1, · · · , xk ,n+1)

∝ P(yn+1 = 0)k∏

i=1

P(xi,n+1|yn+1 = 0)

If P(yn+1 = 1|~xn+1) > P(yn+1 = 0|~xn+1), classify as ham.

If P(yn+1 = 1|~xn+1) < P(yn+1 = 0|~xn+1), classify as spam.

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 14: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Naive Bayes in SQL

Why SQL?Standard language in a DBMSEliminates need to understand and modify internalsource

DrawbacksLimitations in manipulating vectors and matricesMore overhead than systems languages (e.g. C)

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 15: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Efficient SQL implementations of Naïve Bayes

Numeric attributesBinning is required (create k uniform intervals betweenmin and max, or take intervals around the mean basedon multiples of std dev)Two passes over the data set to transform numericalattributes to discrete onesFirst pass for minimum, maximum and meanSecond pass for variance (due to numerical issues)

Discrete attributesWe can compute histograms on each attribute with SQLaggregations

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 16: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Generalisations of Naive-Bayes

Bayesian K-means (BKM) is a generalisation of NaïveBayes (NB)NB has 1 cluster per class, BKM has k > 1 clusters perclassThe class decomposition is found by K-Means algorithm

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 17: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

K-Means algorithm

K-Means algorithm finds k clusters by choosing k datapoints at random as initial cluster centers. Each datapoint is then assigned to the cluster with center that isclosest to that point.Each cluster center is then replaced by the mean of alldata points that have been assigned to that clusterThis process is iterated until no data point is reassignedto a different cluster.

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 18: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Tables needed for Bayesian K-means

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 19: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Example SQL queries for K-Means algorithm

The following SQL statement computes k distances foreach point, corresponding to the gth class.

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 20: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Results

Experiment with 4 real data sets, comparing NB, BKM,and decision trees (DT)Numeric and discrete versions of Naïve Bayes hadsimilar accuracyBKM was more accurate than NB and similar to decisiontrees in global accuracy. However BKM is more accuratewhen computing a breakdown of accuracy per class

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification

Page 21: Naïve Bayes Classification - Brown Universitycs.brown.edu/courses/cs195w/slides/naivebayes.pdf · Theory Naïve Bayes in SQL Generalisations of Naive-Bayes Bayesian K-means (BKM)

Theory Naïve Bayes in SQL

Results

Low numbers of clusters produced good resultsEquivalent implementation of NB in SQL and C++: SQLis four times slowerSQL queries were faster than User-Defined functions(SQL optimisations are important!)NB and BKM exhibited linear scalability in data set sizeand dimensionality.

Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification