K2 Algorithm Presentation Learning Bayes Networks from Data Haipeng Guo Friday, April 21, 2000 KDD Lab, CIS Department, KSU

K2 Algorithm PresentationK2 Algorithm Presentation

Learning Bayes Networks from DataLearning Bayes Networks from Data

Haipeng GuoHaipeng Guo Friday, April 21, 2000Friday, April 21, 2000

KDD Lab, CIS Department, KSUKDD Lab, CIS Department, KSU

Presentation OutlinePresentation Outline

• Bayes Networks Introduction

• What’s K2?

• Basic Model and the Score Function

• K2 algorithm

• Demo

• A Bayes network B = (Bs, Bp)

• A Bayes Network structure Bs is a directed

acyclic graph in which nodes represent

random domain variables and arcs between

nodes represent probabilistic independence.

• Bs is augmented by conditional probabilities,

Bp, to form a Bayes Network B.

Bayes Networks IntroductionBayes Networks Introduction


• Example: Sprinkler - Bs of Bayes Network: the structure

x1

x2

x3

x4 x5

Season

Sprinkler

Rain

Ground_moist Ground_state


- Bp of Bayes Network: the conditional probability

P(Spring) P(Summer) P(Fall) P(Winter)0.25 0.25 0.25 0.25

Season P(on) P(off)Spring 0.75 0.25Summer 1 0Fall 0.75 0.25Winter 0.25 0.75

season

sprinkler

Rain , Ground-moist, and Ground-state

What’s K2?What’s K2?

• K2 is an algorithm for constructing a Bayes Network from a database of records

• “A Bayesian Method for the Induction of Probabilistic Networks from Data”, Gregory F. Cooper and Edward Herskovits, Machine Learning 9, 1992

Basic ModelBasic Model

• The problem: to find the most probable

Bayes-network structure given a database

• D – a database of cases

• Z – the set of variables represented by D

• Bsi , Bsj – two bayes network structures

containing exactly those variables that are in

Z

Basic ModelBasic Model

),(

),(

)(

),()(

),(

)|(

)|(

DBP

DBP

DP

DBPDP

DBP

DBP

DBP

Sj

Si

Sj

S

Sj

S

i

i

• By computing such ratios for pairs of bayes network structures, we can rank order a set of structures by their posterior probabilities.

• Based on four assumptions, the paper introduces an efficient formula for computing P(Bs,D), let B represent an arbitrary bayes network structure containing just the variables in D

Computing P(BComputing P(Bss,D),D)

• Assumption 4 The density function f(Bp|Bs) is uniform. Bp is a vector whose values denotes the conditional-probability assignment associated with structure Bs

• Assumption 2 Cases occur independently, given a bayes network model

• Assumption 3 There are no cases that have variables with missing values

• Assumption 1 The database variables, which we denote as Z, are discrete

Computing P(BComputing P(Bss,D),D)

D - dataset, it has m cases(records)Z - a set of n discrete variables: (x1, …, xn)

ri - a variable xi in Z has ri possible value assignment: ),...( 1 iiri vv

Bs - a bayes network structure containing just the variables in Zi - each variable xi in Bs has a set of parents which we represent with a list of variables i qi - there are has unique instantiations of i wij - denote jth unique instantiation of i relative to D.Nijk - the number of cases in D in which variable xi has the value of and i is instantiated as wij.

Nij -

ikv

ir

kijkij NN

1

!)!1(

)!1()(),(

11 1

ii r

kijkij

n

i

q

j iij

iss NN

rN

rBPDBP

Where

Decrease the computational complexityDecrease the computational complexity

Three more assumptions to decrease the computational

complexity to polynomial-time:

<1> There is an ordering on the nodes such that if xi precedes

xj, then we do not allow structures in which there is an arc from

xj to xi .

<2> There exists a sufficiently tight limit on the number of

parents of any nodes

<3> P(i xi) and P(j xj) are independent when i j.

!)!1(

)!1()([)],([max

11 1

ii

s

r

kijkij

n

i

q

j iij

iiis

BNN

rN

rxPDBP

K2 algorithm: a heuristic search methodK2 algorithm: a heuristic search method

Use the following functions:

!)!1(

)!1(),(

11

ii r

kijk

q

j iij

ii N

rN

rig

Where the Nijk are relative to i being the parents of xi and relative to a database D

Pred(xi) = {x1, ... xi-1}

It returns the set of nodes that precede xi in the node ordering

{Input: A set of nodes, an ordering on the nodes, an

upper bound u on the number of parents a node may

have, and a database D containing m cases}

{Output: For each nodes, a printout of the parents of the

node}


Procedure K2

For i:=1 to n do

i = ;

Pold = g(i, i );

OKToProceed := true

while OKToProceed and | i |<u do

let z be the node in Pred(xi)- i that maximizes g(i, i {z});

Pnew = g(i, i {z});

if Pnew > Pold then

Pold := Pnew ;

i :=i {z} ;

else OKToProceed := false;

end {while}

write(“Node:”, “parents of this nodes :”, i );

end {for}

end {K2}


Conditional probabilities

• Let ijk denote the conditional probabilities P(xi =vik | i = wij )-that is,

the probability that xi has value v for some k from 1 to ri , given that

the parents of x , represented by , are instantiated as wij. We call ijk a

network conditional probability.

• Let be the four assumptions.

• The expected value of ijk :

)(

)1(],,|[

iij

ijksijk rN

NBDE

Demo ExampleDemo Example

Input: Case x1 x2 x31 present absent absent2 present present present3 absent absent present4 present present present5 absent absent absent6 absent present present7 present present present8 absent absent absent9 present present present10 absent absent absent

The dataset is generated from the following structure:

x1 x2 x3

Demo ExampleDemo Example

Note:

-- use log[g(i, i )] instead of g(i, i ) to save running time

Documents

K2 Algorithm Presentation Learning Bayes Networks from Data Haipeng Guo Friday, April 21, 2000 KDD Lab, CIS Department, KSU