Classification2_DMM

Embed Size (px)

Citation preview

  • 7/30/2019 Classification2_DMM

    1/16

    Data Mining 1

    1

    [email protected]

    Jalali.mshdiau.ac.ir

  • 7/30/2019 Classification2_DMM

    2/16

    Data Mining 2

    Bayesian Classification

    2

  • 7/30/2019 Classification2_DMM

    3/16

    3

    Bayesian Learning (Nave Bayesian Classifier)

    A statistical classifier: performsprobabilistic prediction, i.e., predicts class membership

    probabilities

    Foundation: Based on Bayes Theorem.

    Performance: A simple Bayesian classifier, nave Bayesian classifier, has comparable

    performance with decision tree and selected neural network classifiers

    Incremental: Each training example can incrementally increase/decrease the probability that

    a hypothesis is correct

    When to use:

    Moderate or large training set available

    Attributes that describe instances are conditionally independent given classification

    Successful applications:

    Diagnosis

    Classifying text documents

  • 7/30/2019 Classification2_DMM

    4/16

    4

    Bayesian Theorem: Basics

    Let Xbe a data sample (evidence): class label is unknown

    Let H be a hypothesis that X belongs to class C

    Classification is to determine P(H|X), the probability that the hypothesis holds

    given the observed data sample X

    P(H) (prior probability), the initial probability

    E.g., Xwill buy computer, regardless of age, income,

    P(X): probability that sample data is observed

    P(X|H) (posteriori probability), the probability of observing the sample X,

    given that the hypothesis holds

    E.g.,Given that X will buy computer, the prob. that X is 31..40, medium income

  • 7/30/2019 Classification2_DMM

    5/16

    5

    Bayesian Theorem

    Given training dataX, posteriori probability of a hypothesis H,

    P(H|X), follows the Bayes theorem

    Predicts X belongs to C2 if the probability P(Ci|X) is the highest

    among all the P(Ck|X) for all the kclasses

    Practical difficulty: require initial knowledge of manyprobabilities, significant computational cost

    )(

    )()|()|(

    X

    XX

    P

    HPHPHP

  • 7/30/2019 Classification2_DMM

    6/16

    6

    Towards Nave Bayesian Classifier

    Let D be a training set of tuples and their associated class labels, and each

    tuple is represented by an n-D attribute vector X = (x1, x2, , xn)

    Suppose there are m classes C1, C2, , Cm.

    Classification is to derive the maximum posteriori, i.e., the maximal

    P(Ci|X)

    This can be derived from Bayes theorem

    Since P(X) is constant for all classes, only

    needs to be maximized

    )(

    )()|()|(

    X

    XX

    Pi

    CPi

    CP

    iCP

    )()|()|(i

    CPi

    CPi

    CP XX

  • 7/30/2019 Classification2_DMM

    7/16

    7

    Derivation of Nave Bayes Classifier

    A simplified assumption: attributes are conditionally independent (i.e., nodependence relation between attributes):

    This greatly reduces the computation cost: Only counts the class

    distribution

    If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk for Ak

    divided by |Ci, D| (# of tuples of Ci in D)

    If Ak is continous-valued, P(xk|Ci) is usually computed based on Gaussian

    distribution with a mean and standard deviation

    and P(xk|Ci) is

    )|(.. .)|()|(

    1

    )|()|(21

    CixPCixPCixP

    n

    k

    CixPCiPnk

    X

    2

    2

    2

    )(

    2

    1),,(

    x

    exg

    ),,()|(ii CCk

    xgCiP X

  • 7/30/2019 Classification2_DMM

    8/16

    8

    Nave Bayesian Classifier: Training Dataset

    Class:

    C1:buys_computer = yes

    C2:buys_computer = no

    Data sample

    X = (age 40 low yes excellent no

    3140 low yes excellent yes

  • 7/30/2019 Classification2_DMM

    9/16

    9

    Nave Bayesian Classifier: An Example

    P(Ci): P(buys_computer = yes) = 9/14 = 0.643P(buys_computer = no) = 5/14= 0.357

    Compute P(X|Ci) for each classP(age =

  • 7/30/2019 Classification2_DMM

    10/16

    10

    Avoiding the 0-Probability Problem

    Nave Bayesian prediction requires each conditional prob. be non-zero.

    Otherwise, the predicted prob. will be zero

    Ex. Suppose a dataset with 1000 tuples, income=low (0), income=

    medium (990), and income = high (10),

    Use Laplacian correction (or Laplacian estimator)

    Adding 1 to each case

    Prob(income = low) = 1/1003

    Prob(income = medium) = 991/1003

    Prob(income = high) = 11/1003

    n

    kCixkPCiXP

    1

    )|()|(

  • 7/30/2019 Classification2_DMM

    11/16

    11

    Nave Bayesian Classifier: Comments

    Advantages Easy to implement

    Good results obtained in most of the cases

    Disadvantages

    Assumption: class conditional independence, therefore loss of accuracy

    Practically, dependencies exist among variables

    E.g., hospitals: patients: Profile: age, family history, etc.

    Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.

    Dependencies among these cannot be modeled by Nave Bayesian

    Classifier

    How to deal with these dependencies?

    Bayesian Belief Networks

  • 7/30/2019 Classification2_DMM

    12/16

    12

    Bayesian Belief Networks

    Bayesian belief network allows a subsetof the variables conditionallyindependent. (Also called Bayes Nets, directed graphical models, BNs, ).

    A graphical model of causal relationships

    Represents dependency among the variables

    Gives a specification of joint probability distribution

    X Y

    ZP

    Nodes: variables

    Links: dependency

    X and Y are the parents of Z, and Y is

    the parent of P

    No dependency between Z and P

    Has no loops or cycles

  • 7/30/2019 Classification2_DMM

    13/16

    13

    Bayesian Belief Networks

    Simple Bayesian

    Bayesian Belief Networks

    X1 X2 xn

    Concept C

    P(x1,x2,xn,c) = P(c) P(x1|c) P(x2|c) P(xn|c)

    P(x1,x2,xn,c) = P(c) P(x1|c) P(x2|c) P(x3|x1,x2,c)P(x4,c)

    X1 X2 x4

    Concept C

    X3

  • 7/30/2019 Classification2_DMM

    14/16

    14

    Bayesian Belief Network: An Example

    Family

    History

    LungCancer

    PositiveXRay

    Smoker

    Emphysema

    Dyspnea

    LC

    ~LC

    (FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

    0.8

    0.2

    0.5

    0.5

    0.7

    0.3

    0.1

    0.9

    Bayesian Belief Networks

    The conditional probability table(CPT) for variable LungCancer:

    n

    i

    YParents ixiPxxP n1

    ))(|(),...,( 1

    CPT shows the conditional probability foreach possible combination of its parents

    Derivation of the probability of aparticular combination of values ofX,from CPT:

    Node = variables

    Arc = dependency

    Direction on arc representing causality

  • 7/30/2019 Classification2_DMM

    15/16

    15

    1 .1

  • 7/30/2019 Classification2_DMM

    16/16

    16

    .1

    x=(sqrt=25, bit=30, chip=25, Web= 15, limit= 20, CPU= 12, equal=30)

    Text Classification

    Word sqrt bit chip log web limit cpu equal class

    Doc1 42 25 7 56 2 66 1 89 Math

    Doc2 10 28 45 2 69 8 56 40 Comp

    Doc3 11 25 22 4 40 9 44 34 Comp

    Doc4 33 40 8 48 3 40 4 77 Math

    Doc5 28 32 9 60 6 29 7 60 Math

    Doc6 8 22 30 1 29 3 40 33 Comp

    Class Attribute Meaning

    Math Maths related

    Comp Computers related

    Each document is a vector of

    frequencies of all the words in

    the document