Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Theory Naïve Bayes in SQL
Naïve Bayes Classification
Nickolai Riabov, Kenneth Tiong
Brown University
Fall 2013
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Structure of the Talk
Theory of Naïve Bayes classificationNaive Bayes in SQL
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Notation
X – Set of features of the dataY – Set of classes of the data
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Bayes’ Theorem
P(y |x) = P(x |y)P(y)P(x)
P(y) – Prior probability of being in class yP(x) – Probability of features xP(x |y) – Likelihood of features x given class yP(y |x) – Posterior probability of y
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Maximum a posteriori estimate
Based on Bayes’ theorem, we can compute which of theclasses y maximizes the posterior probability
y∗ = arg maxy∈Y
P(y |x)
= arg maxy∈Y
P(x |y)P(y)P(x)
= arg maxy∈Y
P(x |y)P(y)
(Note: we can drop P(x) since it is common to allposteriors)
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Commonality with maximum likelihood
Assume that all classes are equally likely a priori:
P(y) =1
# of elements in Y∀ y ∈ Y
Then,y∗ = arg max
y∈YP(x |y)
That is, y∗ is the y that maximizes the likelihood function
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Desirable Properties of the Bayes Classifier
Incrementality: Each new element of the training setleads to an update in the likelihood function. This makesthe estimator robustCombines Prior Knowledge and Observed DataOutputs a probability distribution in addition to aclassification
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Bayes Classifier
Assumption: Training set consists of instances ofdifferent classes y that are functions of features x(In this case, assume each point has k features, andthere are n points in the training set)Task: Classify a new point x:,n+1 as belonging to a classyn+1 ∈ Y on the basis of its features by using a MAPclassifier
y∗ ∈ arg maxyn+1∈Y
P(x1,n+1, x2,n+1, · · · , xk ,n+1|yn+1)P(yn+1)
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Bayes Classifier
P(y) can either be externally specified (i.e. it canactually be a prior), or can be estimated as thefrequency of classes in the training setP(x1, x2, · · · , xk |y) has O(|X |k |Y |) parameters – canonly be estimated with a very large number of datapoints
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Bayes Classifier
Can reduce the dimensionality of the problem byassuming that features are conditionally independentgiven the class (this is the Naïve Bayes Assumption)
P(x1, x2, · · · , xk |y) =k∏
i=1
P(xi |y)
Now, there’s only O(|X ||Y |) parameters to estimateIf the distribution of x1, · · · xn|y is continuous, this resultis even more important
P(x1, x2, · · · , xk |y) has to be estimatednonparametrically; this method is very sensitive tohigh-dimensional problems
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Bayes Classifier
Learning step consists of estimating P(xi |y)∀i ∈ {1, 2, · · · , k}Data with unknown class is classified by computing they∗ that maximizes the posterior
y∗ ∈ arg maxyn+1∈Y
P(yn+1)k∏
i=1
P(xn+1,i |yn+1)
Note: Due to underflow, the above is usually replacedwith the numerically tractable expression
y∗ ∈ arg maxyn+1∈Y
ln(P(yn+1)) +k∑
i=1
ln(P(xn+1,i |yn+1))
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Example
Classifying emails into spam or ham
Training set: n tuples that contain the text of the email and itsclass
xi,j =
{1 if word i in email j0 otherwise
; yj =
{1 if ham0 if spam
Calculate likelihood of each word by class:
P(xi |y = 1) =
∑nj=1 xi,j · yj∑n
j=1 yj
P(xi |y = 0) =
∑nj=1 xi,j · (1− yj)∑n
j=1(1− yj)
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Example
Define prior, calculate numerator of posterior probability:
P(yn+1 = 1|x1,n+1, x2,n+1, · · · , xk ,n+1)
∝ P(yn+1 = 1)k∏
i=1
P(xi,n+1|yn+1 = 1)
P(yn+1 = 0|x1,n+1, x2,n+1, · · · , xk ,n+1)
∝ P(yn+1 = 0)k∏
i=1
P(xi,n+1|yn+1 = 0)
If P(yn+1 = 1|~xn+1) > P(yn+1 = 0|~xn+1), classify as ham.
If P(yn+1 = 1|~xn+1) < P(yn+1 = 0|~xn+1), classify as spam.
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Naive Bayes in SQL
Why SQL?Standard language in a DBMSEliminates need to understand and modify internalsource
DrawbacksLimitations in manipulating vectors and matricesMore overhead than systems languages (e.g. C)
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Efficient SQL implementations of Naïve Bayes
Numeric attributesBinning is required (create k uniform intervals betweenmin and max, or take intervals around the mean basedon multiples of std dev)Two passes over the data set to transform numericalattributes to discrete onesFirst pass for minimum, maximum and meanSecond pass for variance (due to numerical issues)
Discrete attributesWe can compute histograms on each attribute with SQLaggregations
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Generalisations of Naive-Bayes
Bayesian K-means (BKM) is a generalisation of NaïveBayes (NB)NB has 1 cluster per class, BKM has k > 1 clusters perclassThe class decomposition is found by K-Means algorithm
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
K-Means algorithm
K-Means algorithm finds k clusters by choosing k datapoints at random as initial cluster centers. Each datapoint is then assigned to the cluster with center that isclosest to that point.Each cluster center is then replaced by the mean of alldata points that have been assigned to that clusterThis process is iterated until no data point is reassignedto a different cluster.
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Tables needed for Bayesian K-means
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Example SQL queries for K-Means algorithm
The following SQL statement computes k distances foreach point, corresponding to the gth class.
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Results
Experiment with 4 real data sets, comparing NB, BKM,and decision trees (DT)Numeric and discrete versions of Naïve Bayes hadsimilar accuracyBKM was more accurate than NB and similar to decisiontrees in global accuracy. However BKM is more accuratewhen computing a breakdown of accuracy per class
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification
Theory Naïve Bayes in SQL
Results
Low numbers of clusters produced good resultsEquivalent implementation of NB in SQL and C++: SQLis four times slowerSQL queries were faster than User-Defined functions(SQL optimisations are important!)NB and BKM exhibited linear scalability in data set sizeand dimensionality.
Nickolai Riabov, Kenneth Tiong Naïve Bayes Classification