81
An Introduction to Bayesian Networks: Concepts and Learning from Data Jeong-Ho Chang Seoul National University [email protected]

An Introduction to Bayesian Networks: Concepts and Learning …isoft.postech.ac.kr/.../LectureNotes/BayesianNetworks.pdf · 2011. 8. 8. · Networks: Concepts and Learning from Data

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • An Introduction to Bayesian Networks: Concepts and

    Learning from Data

    Jeong-Ho ChangSeoul National University

    [email protected]

  • 2005-04-16 (Sat.) 2An Introduction to Bayesian Networks

    IntroductionBasic Concepts of Bayesian NetworksLearning Bayesian Networks

    Parameter LearningStructural Learning

    ApplicationsDNA microarrayClassificationDependency Analysis

    Summary

  • 2005-04-16 (Sat.) 3An Introduction to Bayesian Networks

    Bayesian NetworksA compact representation of a joint probability of variables on the basis of the concept of conditional independence.Qualitative part: graph theory

    Directed acyclic graph Nodes: variables Edges: dependency or influence of a variable on another.

    Quantitative part: probability theorySet of conditional probabilities for all variables

    Naturally handles the problem of complexity and uncertainty.

  • 2005-04-16 (Sat.) 4An Introduction to Bayesian Networks

    Bayesian Networks (Cont’d)BN encodes probabilistic relationships among a set of objects or variables.

    It is useful in thatdependency encoding among all variables: Modular representation of knowledge.can be used for the learning of causal relationships

    helpful in understanding a problem domain.has both a causal and probabilistic semantics

    can naturally combine prior knowledge and data.provide an efficient and principled approach for avoiding overfitting data in conjunction with Bayesian statistical methods.

  • 2005-04-16 (Sat.) 5An Introduction to Bayesian Networks

    BN as a Probabilistic Graphical ModelGraphical model

    directed graphundirected graph

    C

    E

    D

    BA

    C

    E

    D

    BA

    Bayesian NetworksMarkov Random Field

  • 2005-04-16 (Sat.) 6An Introduction to Bayesian Networks

    Representation of Joint Probability

    ),,(),,(),(1

    )(1),,,,(

    321 EDCDCBBA

    EDCBAPi

    ii

    ϕϕϕα

    ϕα

    =

    = ∏ ZC

    E

    D

    BA

    Z3

    Z2Z1

    normalization constant

    C

    E

    D

    BA

    ))(|())(|( ))(|())(|())(|(),,,,(

    EEPDDPCCPBBPAAPEDCBAP

    PaPaPaPaPa=

  • 2005-04-16 (Sat.) 7An Introduction to Bayesian Networks

    Joint probability as a product of conditional probabilities

    Can dramatically reduce the parameters for data modeling in Bayesian networks.

    PCWP CO

    HRBP

    HREKG HRSAT

    ERRCAUTERHRHISTORY

    CATECHOL

    SAO2 EXPCO2

    ARTCO2

    VENTALV

    VENTLUNG VENITUBE

    DISCONNECT

    MINVOLSET

    VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

    PAP SHUNT

    ANAPHYLAXIS

    MINOVL

    PVSAT

    FIO2PRESS

    INSUFFANESTHTPR

    LVFAILURE

    ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

    HYPOVOLEMIA

    CVP

    BP

    (From NIPS’01 tutorial by Friedman, N. and Daphne K.)

    37 variables in total

    509 parameters 254

  • 2005-04-16 (Sat.) 8An Introduction to Bayesian Networks

    Real World Applications of BNIntelligent agents

    Microsoft Office assistant: Bayesian user modelingMedical diagnosis

    PATHFINDER (Heckerman, 1992): diagnosis of lymph node disease commercialized as INTELLIPATH (http://www.intellipath.com/)

    Control decision support systemSpeech recognition (HMMs)Genome data analysis

    gene expression, DNA sequence, a combined analysis of heterogeneous data.

    Turbocodes (channel coding)

  • 2005-04-16 (Sat.) 9An Introduction to Bayesian Networks

    IntroductionBasic Concepts of Bayesian NetworksLearning Bayesian Networks

    Parameter LearningStructural Learning

    ApplicationsDNA microarrayClassificationDependency Analysis

    Summary

  • 2005-04-16 (Sat.) 10An Introduction to Bayesian Networks

    Causal NetworksNode: eventArc: causal relationship between two nodes

    A B: A causes B.Causal network for the car start problem [Jensen 01]

    Fuel

    Fuel MeterStanding Start

    Clean SparkPlugs

  • 2005-04-16 (Sat.) 11An Introduction to Bayesian Networks

    Reasoning with Causal Networks

    • My car does not start. increases the certainty of no fuel and dirty spark plugs. increases the certainty of fuel meter’s standing for the empty.

    • Fuel meter stands for the half. decreases the certainty of no fuel increases the certainty of dirty spark plugs.

    Fuel

    Fuel MeterStanding Start

    Clean SparkPlugs

  • 2005-04-16 (Sat.) 12An Introduction to Bayesian Networks

    d-separationConnections in causal networks

    CA BC

    A B C

    A B

    Serial diverging converging

    Definition [Jensen 01]:♦Two nodes in a causal network are d-separated if for all paths between them there is an intermediate node V such that

    the connection is serial or diverging and the state of V is known orthe connection is converging and neither V nor any of V’s

    descendants have received evidence.

  • 2005-04-16 (Sat.) 13An Introduction to Bayesian Networks

    d-separation (Cont’d)

    CA BC

    A B

    A and B is marginally dependent

    CA BC

    A B

    A and B is conditionally independent

    C

    A B

    A and B is marginally independent

    C

    A B

    A and B is conditionally dependent

  • 2005-04-16 (Sat.) 14An Introduction to Bayesian Networks

    d-separation: Car Start Problem1. ‘Start’ and ‘Fuel’ are dependent on each other.

    2. ‘Start’ and ‘Clean Spark Plugs’ are dependent on each other.

    3. ‘Fuel’ and ‘Fuel Meter Standing’ are dependent on each other.

    4. ‘Fuel’ and ‘Clean Spark Plugs’ are conditionally dependent on each other given the value of ‘Start’.

    5. ‘Fuel Meter Standing’ and ‘Start’ are conditionally independent given the value of ‘Fuel’.

    Fuel

    Fuel MeterStanding Start

    Clean SparkPlugs

  • 2005-04-16 (Sat.) 15An Introduction to Bayesian Networks

    DefinitionA graphical model for the probabilistic relationships among a set of variables.Compact representation of joint probability distributions on the basis of conditional probabilities.Consists of the following.

    A set of n variables X = {X1, X2, …, Xn} and a set of directed edges between variables.The variables (nodes) with the directed edges form a directed acyclic graph (DAG) structure.

    To each variable Xi and its parents Pa(Xi), a conditional probability table for P(Xi|Pa(Xi)).

    Modeling for continuous variables is also possible.

    Bayesian Networks: Revisited

    Quantitative part

    Qualitative part

  • 2005-04-16 (Sat.) 16An Introduction to Bayesian Networks

    Quantitative Specification by Probability Calculus

    FundamentalsConditional Probability

    Product Rule

    Chain Rule: a successive application of the product rule.

    )(

    ),()|(APBAPABP =

    )()|()()|(),( BPBAPAPABPBAP ==

    ∏=

    −−−−

    −−

    =

    ===

    n

    iii

    nnnnn

    nnn

    n

    XXXPXP

    XXXXPXXXXPXXXPXXXXPXXXP

    XXXP

    2111

    1212211221

    121121

    21

    ),...,|()(

    ),...,,|(),...,,|(),...,,(

    ),...,,|(),...,,(),...,,(

    Λ

  • 2005-04-16 (Sat.) 17An Introduction to Bayesian Networks

    Independence of two events

    )|(),|( ),|(),|(|

    CBPCABPCAPCBAPCBA

    ==⇔⊥

    Conditional independence

    )()|( )()|(

    BPABPAPBAPBA

    ==⇔⊥

    Marginal independence

    CA BC

    A B

    C

    A B

  • 2005-04-16 (Sat.) 18An Introduction to Bayesian Networks

    X4

    Ch(X5)

    Ch(X5): the children of X5

    Pa(X5)

    Pa(X5): the parents of X5X={X1, X2, … , X10}

    ∏== i ii XXPXPXP ))(Pa|()X ,()( 101 Κ

    De(X5)

    De(X5): the descendents of X5X1

    X3

    X5 X6

    X7 X8

    X2

    X9 X10

    X5

    Topological Sort of X∈iXX1, X2, X3, X4, X5, X6, X7, X8, X9, X10

    Chain rule in a reverse order

    )|(}){\( }){\|(}){\()},{\(

    81010

    1010101010

    XXPXXPXXXPXXPXXXP

    ==

    )|(})'\{( })'\{|(})'\{()},'\{(

    799

    99999

    XXPXXPXXXPXXPXXXP

    ==

    ),|(})'\{( })'\{|(})'\{()},'\{(

    6588

    88888

    XXXPXXPXXXPXXPXXXP

    ==

  • 2005-04-16 (Sat.) 19An Introduction to Bayesian Networks

    X4

    Ch(X5)

    Ch(X5): the children of X5

    Pa(X5)

    Pa(X5): the parents of X5X={X1, X2, … , X10}

    ∏== i ii XXPXPXP ))(Pa|()X ,()( 101 Κ

    De(X5)

    De(X5): the descendents of X5X1

    X3

    X5 X6

    X7 X8

    X2

    X9 X10

    X5

    Topological Sort of X∈iXX1, X2, X3, X4, X5, X6, X7, X8, X9, X10

    Chain rule in a reverse order

    )|(}){\( }){\|(}){\()},{\(

    81010

    1010101010

    XXPXXPXXXPXXPXXXP

    ==

    )|(})'\{( })'\{|(})'\{()},'\{(

    799

    99999

    XXPXXPXXXPXXPXXXP

    ==

    ),|(})'\{( })'\{|(})'\{()},'\{(

    6588

    88888

    XXXPXXPXXXPXXPXXXP

    ==

  • 2005-04-16 (Sat.) 20An Introduction to Bayesian Networks

    Bayesian Networkfor the Car Start Problem

    Fuel

    Fuel MeterStanding Start

    Clean SparkPlugs

    P(Fu = Yes) = 0.98 P(CSP = Yes) = 0.96

    P(St|Fu, CSP)P(FMS|Fu)

    0.0010.60

    FMS = Half

    0.9980.001Fu = No0.010.39Fu = Yes

    FMS = Empty

    FMS = Full

    10(No, Yes)0.990.01(Yes, No)

    10(No, No)

    0.010.99(Yes, Yes)Start=NoStart=YES(Fu, CSP)

  • 2005-04-16 (Sat.) 21An Introduction to Bayesian Networks

    An Illustration of Conditional Independence in BN

    Netica (http://www.norsys.com)

  • 2005-04-16 (Sat.) 22An Introduction to Bayesian Networks

    Main Issues in BNInference in Bayesian networks

    Given an assignment of a subset of variables (evidence) in a BN,estimate the posterior distribution over another subset of unobserved variables of interest.

    Learning Bayesian network from dataParameter Learning

    Given a data set, estimate local probability distributions P(Xi|Pa(Xi)).for all variables (nodes) comprising the BN .

    Structure learningFor a data set, search a network structure G (dependency structure) which is best or at least plausible.

    This tutorial introduces this issue.

    ( )( )obs

    obsunobsun P

    PPx

    xxxx ,)|( =

  • 2005-04-16 (Sat.) 23An Introduction to Bayesian Networks

    IntroductionBasic Concepts of Bayesian NetworksLearning Bayesian Networks

    Parameter LearningStructural Learning

    ApplicationsDNA microarrayClassificationDependency Analysis

    Summary

  • 2005-04-16 (Sat.) 24An Introduction to Bayesian Networks

    Learning Bayesian Networks

    Bayesian network learning

    * Structure Search* Score Metric* Parameter Learning

    Data Acquisition Preprocessing

    BN = Structure + Local probability distribution

    Prior knowledge

  • 2005-04-16 (Sat.) 25An Introduction to Bayesian Networks

    Learning Bayesian Networks (Cont’d)

    Bayesian network learning consists of Structure learning (DAG structure)Parameter learning (for local probability distribution)

    SituationsKnown structure and complete data.Unknown structure and complete data.Known structure and incomplete data.Unknown structure and incomplete data.

  • 2005-04-16 (Sat.) 26An Introduction to Bayesian Networks

    Parameter LearningTask: Given a network structure, estimate the parameters of the model from data.

    SM

    S2

    S1

    LHHL

    ……..…

    HHHH

    LLLH

    DCBA

    A B

    C D

    P(A)

    0.010.99

    LH

    P(B)

    0.070.93

    LH

    0.60.4(H, H)

    0.80.2(H, L)

    0.70.3(L, H)

    (L, L)

    (A, B)

    P(C)

    0.20.8

    LH

    0.10.9H

    0.90.1L

    B

    P(D)

    LH

  • 2005-04-16 (Sat.) 27An Introduction to Bayesian Networks

    Key point: independence of parameter estimation

    D={s1, s2, …, sM}, where si=(ai, bi, ci, di) is an instance of a random vector variable S=(A, B, C, D).Assumption: samples si are independent and identically distributed(i.i.d).

    [ ]

    ⎟⎟⎠

    ⎞⎜⎜⎝

    ⎛Θ⎟⎟

    ⎞⎜⎜⎝

    ⎛Θ⎟⎟

    ⎞⎜⎜⎝

    ⎛Θ⎟⎟

    ⎞⎜⎜⎝

    ⎛Θ=

    ΘΘΘΘ=

    Θ=Θ=Θ

    ∏∏∏∏

    ====

    =

    =

    M

    iiiG

    M

    iiiiG

    M

    iiG

    M

    iiG

    M

    iiiGiiiGiGiG

    M

    iiGGG

    ,|bdP,,b|acP|bP|aP

    ,|bdP,,b|acP|bP|aP

    PDPDL

    1111

    1

    1

    )( )( )( )(

    )()()()(

    )|()|();( s AB

    C D

    G

    Independent parameter estimation for each node (variable)

  • 2005-04-16 (Sat.) 28An Introduction to Bayesian Networks

    One can estimate the parameters for P(A), P(B), P(C|A, B), and P(D|B) in an independent manner.

    If A, B, C, and D are all binary-valued, the number of parameters are reduced from 15 (24-1) to 8 (1+1+4+2).

    A

    )(AP

    B

    )(BP

    A B C D

    s1s2

    sM

    ),|( BACP

    A B

    C

    )|( BDP

    B

    D

    P(A,B,C,D)=P(A)xP(B)xP(C|A,B)xP(D|B)

    ⎥⎥⎥⎥

    Md

    dd

    Μ2

    1

    ⎢⎢⎢⎢

    Mc

    cc

    Μ2

    1

    Mb

    bb

    Μ2

    1

    Ma

    aa

    Μ2

    1

  • 2005-04-16 (Sat.) 29An Introduction to Bayesian Networks

    One can estimate the parameters for P(A), P(B), P(C|A, B), and P(D|B) in an independent manner.

    If A, B, C, and D are all binary-valued, the number of parameters are reduced from 15 (24-1) to 8 (1+1+4+2).

    A

    )(AP

    B

    )(BP

    A B C D

    s1s2

    sM

    ),|( BACP

    A B

    C

    )|( BDP

    B

    D

    P(A,B,C,D)=P(A)xP(B)xP(C|A,B)xP(D|B)

    ⎥⎥⎥⎥

    Md

    dd

    Μ2

    1

    ⎢⎢⎢⎢

    Mc

    cc

    Μ2

    1

    Mb

    bb

    Μ2

    1

    Ma

    aa

    Μ2

    1

  • 2005-04-16 (Sat.) 30An Introduction to Bayesian Networks

    Methods for Parameter EstimationMaximum Likelihood Estimation

    Choose the value of Θ which maximizes the likelihood for the observed data D.

    Bayesian estimationRepresent uncertainty about parameters using a probability distribution over Θ.Θ is also a random variable rather than a parameter value.

    )|(maxarg);(maxargˆ Θ=Θ=Θ ΘΘ DPDLG

    )|()()(

    )|()()|( ΘΘ∝ΘΘ=Θ DPPDPDPPDP

    posterior prior likelihood

  • 2005-04-16 (Sat.) 31An Introduction to Bayesian Networks

    Bayes Rule, MAP and MLBayes’ rule

    )()()|()|(

    DPhPhDPDhP =

    )|(maxarg* hDPh h=

    )|(maxarg* DhPh h=

    )|( DhP

    ML

    h: hypothesis (models or parameters)D: data

    (maximum likelihood) estimation

    MAP (maximum a posteriori) estimation

    Bayesian LearningNot a point estimate, but the posterior distribution

    From NIPS’99 tutorial by Ghahramani, Z.

  • 2005-04-16 (Sat.) 32An Introduction to Bayesian Networks

    Bayesian estimation (for multinomial distribution)

    ∏ −∝k

    kkP 1)( αθθ

    ∏ −+∝∝

    kN

    kkk

    DPPDP1

    )|()()|(αθ

    θθθ

    [ ]∑

    ∫∫

    ++

    ==

    =

    ==+

    l ll

    kkkDP

    k

    M

    NNE

    dDP

    dDPkPDkSP

    )(

    )|(

    )|()|()|(

    )|(

    1

    ααθ

    θ

    θ

    θθ

    θθθ

    Smoothed version of MLE

    s1 s2 s3 sM

    θ

    Prior P(ө)

    Likelihood P(D| ө)

    θSufficient statistics

    Prior knowledge or pseudo counts

    ),,,(~ 21 KDir αααθ Κ

  • 2005-04-16 (Sat.) 33An Introduction to Bayesian Networks

    Bayesian estimation (for multinomial distribution)

    ∏ −∝k

    kkP 1)( αθθ

    ∏ −+∝∝

    kN

    kkk

    DPPDP1

    )|()()|(αθ

    θθθ

    [ ]∑

    ∫∫

    ++

    ==

    =

    ==+

    l ll

    kkkDP

    k

    M

    NNE

    dDP

    dDPkPDkSP

    )(

    )|(

    )|()|()|(

    )|(

    1

    ααθ

    θ

    θ

    θθ

    θθθ

    Smoothed version of MLE

    s1 s2 s3 sM

    θ

    Prior P(ө)

    Likelihood P(D| ө)

    θSufficient statistics

    Prior knowledge or pseudo counts

    ),,,(~ 21 KDir αααθ Κ

  • 2005-04-16 (Sat.) 34An Introduction to Bayesian Networks

    H H T H

    θ

    H H

    ( ) ( ), 5, 1H TN N =

    ( )1 ,1~ Dir15)|()()|( THDPPDP θθθθθ =×∝

    75.043

    )11()51(51)|( ==+++

    +=DP H

    )11()11()( −−∝ THP θθθ

    15

    )|(

    TH

    HHHHTHDP

    θθ

    θθθθθθ

    =

    H H T HH H

    15 )|( THHHHHTHDP θθθθθθθθ ==θ

    65

    155 )|(maxargˆ =+

    == θθ θ DP

    833.0ˆ)( ≈== HHSP θ

    Maximum likelihood estimation

    Bayesian inference

    An Example: Coin toss

  • 2005-04-16 (Sat.) 35An Introduction to Bayesian Networks

    H H T H

    θ

    H H

    ( ) ( ), 5, 1H TN N =

    ( )1 ,1~ Dir15)|()()|( THDPPDP θθθθθ =×∝

    75.043

    )11()51(51)|( ==+++

    +=DP H

    )11()11()( −−∝ THP θθθ

    15

    )|(

    TH

    HHHHTHDP

    θθ

    θθθθθθ

    =

    H H T HH H

    15 )|( THHHHHTHDP θθθθθθθθ ==θ

    65

    155 )|(maxargˆ =+

    == θθ θ DP

    833.0ˆ)( ≈== HHSP θ

    Maximum likelihood estimation

    Bayesian inference

    An Example: Coin toss

  • 2005-04-16 (Sat.) 36An Introduction to Bayesian Networks

    P(H)H HH HHT HHTH HHTHH HHTHHH

    HHTHHHTTHH

    (100, 50)

    MLE 1.00 1.00 0.67 0.75 0.80 0.83 0.70 0.67B(0.5, 0.5) 0.75 0.83 0.63 0.70 0.75 0.79 0.68 0.67B(1, 1) 0.67 0.75 0.60 0.67 0.71 0.75 0.67 0.66B(2, 2) 0.60 0.67 0.57 0.63 0.67 0.70 0.64 0.66B(5, 5) 0.55 0.58 0.54 0.57 0.60 0.63 0.60 0.65

    Dirichlet prior

  • 2005-04-16 (Sat.) 37An Introduction to Bayesian Networks

    B(1, 1)

    B(2, 2) B(5, 5)

    B(0.5, 0.5)

    Variation of posterior distribution for the parameter

  • 2005-04-16 (Sat.) 38An Introduction to Bayesian Networks

    IntroductionBasic Concepts of Bayesian NetworksLearning Bayesian Networks

    Parameter LearningStructural Learning

    Scoring MetricSearch Strategy

    Practical ApplicationsDNA microarrayClassificationDependency Analysis

    Summary

  • 2005-04-16 (Sat.) 39An Introduction to Bayesian Networks

    Structural LearningTask: Given a data set, search a most plausible network structure underlying the generation of the data set.Metric (score)-based approach

    Use a scoring metric to measure how well a particular structure fits the observed set of cases.

    A B C DS1S2

    SM

    H L L L

    H H H H

    … … … …

    L H H L

    DC

    A B

    DC

    A B

    DC

    A B

    Score

    Scoring metric

    Search strategy

  • 2005-04-16 (Sat.) 40An Introduction to Bayesian Networks

    Scoring Metric)|()(

    )()|()()|( GDPGP

    DPGDPGPDGP ∝=

    Prior for network structure

    Marginal likelihood

    Likelihood Score

    Nodes of high mutual information (dependency) with their parents get higher score.Since, I(X;Y) ≤ I(X; {Y, Z}), the fully connected network is obtained in an unrestricted case.Prone to overfitting.

    ( )MLEGDPDGScore Θ= ,|log);( ∑∑==

    −∝N

    ii

    N

    iii XHXI

    11)( );( Pa

  • 2005-04-16 (Sat.) 41An Introduction to Bayesian Networks

    ∑∑∑

    ∑∑∑

    = = =

    = = =

    =

    =

    N

    i j

    X

    k ij

    ijkijk

    N

    i j

    X

    k ij

    ijkijk

    i i

    i i

    NN

    MN

    M

    NN

    NL

    1 1 1

    1 1 1

    log

    logD);Θ̂(log

    Pa

    Pa

    ( )∑=

    −=N

    iiiXHM

    1| Pa

    ( ) ⎟⎠

    ⎞⎜⎝

    ⎛−−= ∑ ∑

    = =

    N

    i

    N

    iiiii XHXHXHM

    1 1)()|()( Pa

    ∑∑==

    −∝N

    ii

    N

    iii XHXI

    11)( );( Pa

    H(X) H(Y)

    H(X|Y) H(Y|X)I(X;Y)

    A B

    C D

    A B

    C D

    Empty network

    Likelihood Score in Relation with Information Theory

  • 2005-04-16 (Sat.) 42An Introduction to Bayesian Networks

    Bayesian ScoreConsider the uncertainty in parameter estimation in Bayesian network

    Assuming a complete data and parameter independence, the marginal likelihood can be rewritten as

    ( )( )( ) ( )[ ]∏ ∫=

    =N

    iiiiii dGPGXXDPGDP

    1

    |, ;)|( θθθPa

    ∫ ΘΘΘ= dGPGDPGPDGScore )|(),|()();(

    Marginal likelihood for each pair of (Xi;Pa(Xi))

  • 2005-04-16 (Sat.) 43An Introduction to Bayesian Networks

    Bayesian Dirichlet ScoreFor a multinomial case, if we assume a Dirichletprior for each parameter (Heckerman, 1995),

    ( )( )( ) ( ) ∏ ∏∫= = Γ

    Γ=

    i i

    j

    X

    k ijk

    ijkijk

    ijij

    ijiiiii

    NN

    dGPGXXDPPa

    θθθPa1 1 )(

    )()(

    )(|, ;

    αα

    αα

    ( )iXijijijij

    Dir αααθ ,,, ~ 21 Κ ∑ == iX

    k ijkij 1αα

    ∑ == iX

    k ijkijNN

    1( )jiikiiijk paXxXN === )(, of # Pa

    ∫∞ −−=Γ0

    1)( dtetx tx!)()1( nnnn ==Γ=+Γ Λ

    1)1( =Γ

  • 2005-04-16 (Sat.) 44An Introduction to Bayesian Networks

    Bayesian Dirichlet Score (Cont’d)

    H H T H

    TH

    HHPαα

    αφ+

    =)|(

    TH

    HHHPαα

    α+++)1(

    )1()|(

    TH

    THHTPαα

    α++

    =)2(

    )|(

    )1()2()2()|(+++

    +=

    TH

    HHHTHPαα

    α

    [ ])3()1()2()1(

    +×+××+×+×

    ααααααα

    ΛΛ THHH

    )()4(

    ααΓ

    )()3(

    H

    H

    ααΓ

    +Γ)(

    )1(

    T

    T

    ααΓ

    ⎥⎦

    ⎤⎢⎣

    ⎡Γ

    +Γ×

    Γ+Γ

    +ΓΓ

    )()1(

    )()3(

    )4()(

    T

    T

    H

    H

    αα

    αα

    αα

    ~ ( , ) H T H TDirθ α α α α α= +

  • 2005-04-16 (Sat.) 45An Introduction to Bayesian Networks

    Bayesian Dirichlet Score (Cont’d)

    H H T H

    TH

    HHPαα

    αφ+

    =)|(

    TH

    HHHPαα

    α+++)1(

    )1()|(

    TH

    THHTPαα

    α++

    =)2(

    )|(

    )1()2()2()|(+++

    +=

    TH

    HHHTHPαα

    α

    [ ])3()1()2()1(

    +×+××+×+×

    ααααααα

    ΛΛ THHH

    )()4(

    ααΓ

    )()3(

    H

    H

    ααΓ

    +Γ)(

    )1(

    T

    T

    ααΓ

    ⎥⎦

    ⎤⎢⎣

    ⎡Γ

    +Γ×

    Γ+Γ

    +ΓΓ

    )()1(

    )()3(

    )4()(

    T

    T

    H

    H

    αα

    αα

    αα

    ~ ( , ) H T H TDirθ α α α α α= +

  • 2005-04-16 (Sat.) 46An Introduction to Bayesian Networks

    Bayesian Dirichlet Score (Cont’d)

    ∏∏ ∏= = = Γ

    Γ=

    N

    i j

    X

    k ijk

    ijkijk

    ijij

    iji i N

    N1 1 1 )()(

    )()(Pa

    αα

    αα

    ( )( )( ) ( )[ ]∏ ∫=

    =N

    iiiiii dGPGXXDPGDP

    1

    |, ;)|( θθθPa

    log P(D|G) in Bayesian score is asymptotically equivalent to BIC (Schwarz, 1978) and minus the MDL criterion (Rissanen, 1987).

    MGGDPDGBICGDP log2

    )dim(),ˆ|(log);()|(log −Θ=≈

    dim( ) # of parameters in G G=

  • 2005-04-16 (Sat.) 47An Introduction to Bayesian Networks

    Structure SearchGiven a data set, a score metric, and a set of possible structures,

    Find the network structure with maximal score.Discrete optimization

    One can utilize the property of independent score for each pair of (Xi, Pa(Xi)).

    ( )∏=

    =N

    iii GXXDPGDP

    1

    |))(;()|( Pa

    ∑=

    ==N

    iii XPaXScoreGDPDGScore

    1))(;()|(log);(

  • 2005-04-16 (Sat.) 48An Introduction to Bayesian Networks

    Tree-Structured NetworkDefinition: Each node has at most one parent.

    An effective search algorithm exists.

    [ ] ∑∑∑ +−==i

    ii

    iiii

    ii XScoreXScorePaXScorePaXScoreDGScore )()()|( )|()|(Improvement over empty network

    Score for empty network

    Chow and Liu (1968)Construct the undirected complete graph with the weights of edge E(Xi, Xj) being I(Xi; Xj).

    Build a maximum weighted spanning tree.

    Transform to a directed tree with an arbitrary root node.

    A

    B C

    D

  • 2005-04-16 (Sat.) 49An Introduction to Bayesian Networks

    A

    B C

    D

    A

    B C

    I(A;B) = 0.0652 I(A;D)=

    0.0834

    I(A;C)= 0.0880

    I(B;C)= 0.1913

    I(C;D)= 0.1980

    I(B;D)= 0.1967 D

    A

    B C

    D

    SM

    S2

    S1

    LHHL

    ……..…

    HHHH

    LLLH

    DCBA

    Complete undirected graph

    Maximum spanning

    tree

    Directed tree

  • 2005-04-16 (Sat.) 50An Introduction to Bayesian Networks

    Search Strategies for General Bayesian Networks

    With more than one parents per node NP-hard(Chickering, 1996)

    Heuristic search methods are usually employed.Greedy hill-climbing (local search)Greedy hill-climbing with random restartSimulated annealingTabu search…

  • 2005-04-16 (Sat.) 51An Introduction to Bayesian Networks

    Greedy Local Search AlgorithmINPUT: Data set, Scoring Metric,

    Possible Structures

    Initialize the structure• empty network, random network,

    tree-structured network, etc

    Do a local edge operation resulting in the largest improvement in scoreamong all possible operations

    Score(Gt+1;D)>Score(Gt;D)YES

    NO

    Gfinal = Gt

    A B

    C D

    INSERT

    DELETE REVERSE

    A B

    C D

    A B

    C D

    A B

    C D

  • 2005-04-16 (Sat.) 52An Introduction to Bayesian Networks

    Enhanced SearchGreedy local search can get stuck in local maxima or plateaux.Standard heuristics to escape the two includes

    Search with random restarts, Simulated annealing, Tabu searchGenetic algorithm: a population-based search.

    scor

    e

    Greedy search

    Local maximum

    scor

    e

    Greedy search with random restarts

    Random restart

    scor

    e

    Simulated annealing

    P( ) scor

    e

    Population-based search (GA)

  • 2005-04-16 (Sat.) 53An Introduction to Bayesian Networks

    Learning Bayesian networksJoint (probability) distribution product of conditional probabilities for each variable (node) (sum in log-based representation.)

    Parameter LearningEstimation of local probability distributionMLE, Bayesian estimation.

    Structure LearningScore metrics

    Likelihood, Bayesian score, BIC, MDLSearch strategies

    Optimization in discrete space maximize the scoreKey concept: the decomposability of the score Tree-structured and general Bayesian networks.

    Learning Bayesian Networks: Summary

  • 2005-04-16 (Sat.) 54An Introduction to Bayesian Networks

    IntroductionBasic Concepts of Bayesian NetworksLearning Bayesian Networks

    Parameter LearningStructural Learning

    Applications DNA microarrayClassification (Naïve Bayes, TAN)Dependency Analysis

    Summary

  • 2005-04-16 (Sat.) 55An Introduction to Bayesian Networks

    DNA Microarray Data AnalysisClassification

    Gene expression data of 72 leukemia patients.Task: classification of samples into AML or ALL based on expression patterns using Bayesian networks.

    Combined analysis of gene expression data and drug activity data

    Gene expression data and drug activity data of 60 cancer samplesTask: construct a dependency network of genes and drugs.

    ToolsWEKA (http://www.cs.waikato.ac.nz/~ml/weka/)

    A collection of machine learning algorithms for data mining tasks (implemented in JAVA, open source software issued under the GNU General Public License)Bayesian network learning algorithms are also included.

    BNJ software (http://bnj.sourceforge.net/) are used for visualization when needed.

  • 2005-04-16 (Sat.) 56An Introduction to Bayesian Networks

    DNA MicroarraysRecent developments in the technology for biological experiments have made it possible to produce massive biological data sets.

    Monitor thousands of gene expression levels simultaneously. traditional one gene experiments.

    parallel view of the expression patterns of thousands of genes in a cell.Bayesian networks is a useful tool for the identification of a variety of meaningful relationships among genes from the data.

  • 2005-04-16 (Sat.) 57An Introduction to Bayesian Networks

    Bayesian Networksin Microarray Data Analysis

    Sample classification

    Gene-gene relation analysis

    Gene regulatory network analysisExpression profiles

    Bayesian network construction

    - Disease diagnosis

    - Activation or inhibition between genes

    - Global view on the relations among genes

    Combined analysis of other biological data- DNA sequence data, drug activity data, and so on.

    http://www.gene-chips.com/sample1.htmlhttp://www.gene-chips.com/sample1.htmlhttp://www.gene-chips.com/sample1.htmlhttp://www.gene-chips.com/sample1.htmlhttp://www.gene-chips.com/sample1.html

  • 2005-04-16 (Sat.) 58An Introduction to Bayesian Networks

    DNA Microarray

    Image analysis

  • 2005-04-16 (Sat.) 59An Introduction to Bayesian Networks

    Data Preparation for Data Mining

    Sample 1 Sample 2 Sample i

    Sample k Sample M

    Microarray image samples

    Sample 1

    Gene 2

    Numerical data for data mining

    Image analysis

  • 2005-04-16 (Sat.) 60An Introduction to Bayesian Networks

    Example 1: Tumor Type ClassificationTask: classification a leukemia samples into two classes based on gene expression patterns using Bayesian networks

    Gene B

    TargetGene D

    Gene CGene A

    - Selected genesand the target variable Gene B

    TargetGene D

    Gene CGene A

    - Learned Bayesian network

    - DNA microarraydata from two kinds of leukemia patients

  • 2005-04-16 (Sat.) 61An Introduction to Bayesian Networks

    Data Sets72 samples in total: 25 AML (acute myeloid leukemia) samples + 47 ALL (acute lymphoblastic leukemia) samples (Golub et al., 1999).A data sample consists of expression measurements for over 7,000 genes.Preprocessing

    30 informative genes were selected by the P-metric

    Expression values were discretized into 2 levels, High and Low.The final data is 72 samples of which each consists of ‘High’ or ‘Low’expression values of 30 genes

    ALLAML

    ALLAMLgscoreσσμμ

    +−

    =)(

  • 2005-04-16 (Sat.) 62An Introduction to Bayesian Networks

    Classification by Bayesian NetworksClassification as an inference for a variable (node) in Bayesian networks.

    ∑∈∈

    =

    =

    hCc

    Cc

    DhPhcPDcPc

    )|(),|(maxarg ),|(maxarg)(*

    ggg

    Bayes optimal classifier

    ),(),(

    ),()(

    ),()|( gg

    gg

    gg khl lh

    kh

    h

    khkh cPcP

    cPP

    cPcP ∝==∑

    =),( gtypeP )(typeP),|1( fahtypembP −

    )|( typepicP )|4( typesltcP )|( typezyxinP),|_( pictypemybcP )4,|( sltctypefahP

    1)|ˆ( if =DhP* ˆc ( ) argmax ( | , )c C P c h∈=g g

  • 2005-04-16 (Sat.) 63An Introduction to Bayesian Networks

    Classification by Bayesian NetworksClassification as an inference for a variable (node) in Bayesian networks.

    ∑∈∈

    =

    =

    hCc

    Cc

    DhPhcPDcPc

    )|(),|(maxarg ),|(maxarg)(*

    ggg

    Bayes optimal classifier

    ),(),(

    ),()(

    ),()|( gg

    gg

    gg khl lh

    kh

    h

    khkh cPcP

    cPP

    cPcP ∝==∑

    =),( gtypeP )(typeP),|1( fahtypembP −

    )|( typepicP )|4( typesltcP )|( typezyxinP),|_( pictypemybcP )4,|( sltctypefahP

    1)|ˆ( if =DhP* ˆc ( ) argmax ( | , )c C P c h∈=g g

  • 2005-04-16 (Sat.) 64An Introduction to Bayesian Networks

    Naïve Bayes ClassifierVery restricted form of Bayesian network

    Conditional independence of all variables (nodes) given the value of the class variable.Though simple, widely used in classification tasks up to now.

    )|()(),( typePtypePtypeP ss =

    )|1_()|( )|4()|(

    )|()|_()|(

    typembPtypefahPtypesltcPtypepicP

    typezyxinPtypemybcPtypeP =s

  • 2005-04-16 (Sat.) 65An Introduction to Bayesian Networks

    Tree-Augmented Network(Friedman et al., 1997)Naïve Bayes + tree-structured network of variables but class node.

    Express the dependency between variables in a restricted way.‘Class’ is the root nodeXN can have at most one parent, besides ‘Class’node.

    )|( typezyxinP

    ),|( fahtypezyxinP

  • 2005-04-16 (Sat.) 66An Introduction to Bayesian Networks

    ResultsNB TAN

    2_NO 2_NB

    3_NB3_NO

    65/72 69/72

    67/7266/72

    69/72 (NB)69/72 (NB)

    General BN(max #pa = 2)

    67/72 (NB)70/72 (NB)

    General BN(max #pa = 3)

    68/7269/72TAN

    68/7266/72NB

    30 genes6 genes

    Leave-one-out cross-validationBayesian network learning with 71 samples and test for the remaining one samples.72 iteration

  • 2005-04-16 (Sat.) 67An Introduction to Bayesian Networks

    Markov Blanket in BN

    X3 X4

    X1

    X5 X6

    X7 X8

    X2

    X9 X10

    T

    Markov Blanket of the node T:

    P(T | X\{T}) = P(T | MB(T))MB(T ) = {Pa(T), Ch(T), Pa(Ch(T))\{T}}

    TAN

    Markov Blanket of the node ‘type’

    X4

    X6

    X8

    X3

    X7

  • 2005-04-16 (Sat.) 68An Introduction to Bayesian Networks

    Example 2: Gene-Drug Dependency Analysis

    Task: Construct a Bayesian network for the combined analysis of gene expression data and drug activity data.

    Gene B

    CancerDrug B

    Drug AGene A

    - Selected genes, drugsand cancer type node

    Drug A

    CancerDrug B

    Gene BGene A

    < Learned Bayesian network >- Dependency analysis- Probabilistic inference

    Drug activity DataDrug activity Data

    Gene Expression DataGene Expression Data

    Preprocessing- Thresholding- Discretization

    Bayesia

    n netw

    ork

    learnin

    g

  • 2005-04-16 (Sat.) 69An Introduction to Bayesian Networks

    Data SetsNCI60 data sets (Scherf et al., 2000)

    9703 genes and 1400 chemical compounds from 60 human cancer samples.

    Preprocessing12 genes and 4 drugs were selected based the correlation analysis result in (Scherf et al. 2000): Considering the learning time and visualization of Bayesian networks.Discretize the gene expression values and drug activity values into 3 levels, High, Mid, Low.

    Nodes in Bayesian networks: 12 genes, 4 drugs, cancer type.

  • 2005-04-16 (Sat.) 70An Introduction to Bayesian Networks

    Results

    Visualization by BNJ3 Software (http://bnj.sourceforge.net/)

    Identification of a plausible dependency among 2 genes and 1 drugs

  • 2005-04-16 (Sat.) 71An Introduction to Bayesian Networks

    Inferring Gene Regulation Model Reconstruction of regulatory relation among genes from genome data.One of the challenges in reconstruction of gene regulatory networks using Bayesian networks is statistical robustness.

    Arises from the sparse nature underlying gene expression data.

    Usually, the number of samples are much smaller than that of genes (attributes).Can produce unstable, spurious results.

    Bootstrap, model averaging, incorporation of prior biological knowledge.

  • 2005-04-16 (Sat.) 72An Introduction to Bayesian Networks

    Bootstrap-based approach Multiple Bayesian networks from bootstrapped samples. (Friedman, N. et al., 2000, Pe’er, D. et al., 2001)

    Expression Data

    Estimate feature statistics (e.g., gene Y is in the Markov blanket of X)

    B1 B2 BL…

    Identification of significant - pairwise relations (Friedman, 2000)- subnetworks (Pe’er, 2001)

    Friedman et al., 2000

    1

    1conf ( ) ( )L iif f GL == ∑

    Resampling with replacement

    G1 G2 GL

    Yeast cell-cycle data76 samples for 800 genes

  • 2005-04-16 (Sat.) 73An Introduction to Bayesian Networks

    Model averaging-based approach Estimate the confidence of features by averaging over posterior model (BN) distribution (Hertemink et al., 2002)

    scor

    e

    Expression Data

    conf ( ) ( | ) ( | , )L i iif P G D P f D G=∑

    Location data(serves as a prior)

    G1 G2 GL

    Structure learning with simulated

    annealing

  • 2005-04-16 (Sat.) 74An Introduction to Bayesian Networks

    Incorporation prior knowledge or assumption. Impose biologically-motivated assumptions on the network structure.Construction of gene regulation model from gene expression data: inferring regulator set (Pe’er, D., et al., 2002)

    AssumptionA relatively small set of genes is directly involved in transcriptional regulation

    Limit the number of candidate regulatorsLimit the number of parents for each node (gene) and the number of genes having outgoing edges.

    Significantly reduce the number of possible models.Statistical robustness of the identified regulator sets.

  • 2005-04-16 (Sat.) 75An Introduction to Bayesian Networks

    358 samples from combined data sets for S. cerevisiae.

    Gene expression

    data

    •Remove non-informative genes

    •Discretization

    3,622 genes in total

    Candidate regulator set

    (=456)

    Learning with structure

    constraints

    2-level network

    R1 R2 R3 RM Regulator set

    Target

    Score(g;Pag)=I(g;Pag)

    (Pe’er et al., 2002)

  • 2005-04-16 (Sat.) 76An Introduction to Bayesian Networks

    SummaryBayesian networks provide an efficient/effective framework for organizing the body of knowledge by encoding the probabilistic relationships among variables of interest.

    Graph theory + probability theory: DAG + local probability distribution.Conditional independence and conditional probability are keystones.A compact way to express complex systems by simpler probabilistic modules and thus a natural framework for dealing with complexity and uncertainty.

    Two problems in the learning of Bayesian networks from dataParameter estimation: MLE, MAP, Bayesian estimationStructural learning: tree-structured network, heuristic search for general Bayesian network.

  • 2005-04-16 (Sat.) 77An Introduction to Bayesian Networks

    Summary (Cont’d)Not covered in this tutorial but important issues.

    Probabilistic inference in general Bayesian networksExact & approximate inference

    Incomplete data (missing data, hidden variables)Known structure & unknown structureExpectation-Maximization algorithm can be employed as in latent variable models.

    Bayesian model averaging over structures as well as parameters.Dynamic Bayesian networks for temporal data.

    Many applicationsText and web miningMedical diagnosisIntelligent agent systemBioinformatics

  • 2005-04-16 (Sat.) 78An Introduction to Bayesian Networks

    ReferencesBayesian Networks (papers & books)

    Chow, C. and Liu, C., “Approximating discrete probability distributions with dependency trees”, IEEE Transactions on Information Theory, 14, pp. 462-467, 1968.Pearl, J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, 1988.Neapolitan, E., Probabilistic Reasoning in Expert Systems, 1990.Charniak, E., Bayesian networks without tears, AI Magazine, 12(4):50-63, 1991.Heckerman, Learning Bayesian networks: the combination of knowledge and statistical data, Machine Leanring, 20, 197-243, 1995.Jensen, F.V., An Introduction to Bayesian Networks, Springer-Verlag, 1996.Friedman, N., Geiger, D., and Goldszmidt, M., “Bayesian network classifiers”, Machine Learning, 29(2), pp. 131-163, 1997.Frey, B.J., Graphical Models for Machine Learning and Digital Communication, MIT Press, 1998.Jordan, M. I. eds., Learning in Graphical Models, Kluwer Academic Publishers, 1998.Friedman, N. and Goldszmidt, M., Learning Bayesian networks with local structure, In Learning in Graphical Models (Ed. Jordan, M. I.) , pp. 421-460, MIT Press, 1999.Heckerman, D., A tutorial on learning with Bayesian networks, In Learning with Graphical Models (Ed. Jordan, M. I.), pp. 301-354, MIT Press, 1999.J. Pearl and S. Russel, Bayesian networks, UCLA Cognitive Systems Laboratory, Technical Report (R-277), November 2000. Spirtes, P., Glymour, C., and Scheines, R., Causation, Prediction, and Search, 2nd edition, MIT Press, 2000.Jensen, F. V., Bayesian Networks and Decision Graphs, Springer, 2001.J. Pearl, Bayesian networks, causal inference and knowledge discovery, UCLA Cognitive Systems Laboratory, Technical Report (R-281), March 2001. Korb, K. B. and Nicholson, A. B., Bayesian Artificial Intelligence, CRC Press, 2004.

  • 2005-04-16 (Sat.) 79An Introduction to Bayesian Networks

    Graphical models and Bayesian networks (tutorials & lectures)

    Friedman, N. and Koller, D., Learning Bayesian Networks from Data, (http://www.cs.huji.ac.il/~nir/Nips01-Tutorial/)Murphy, K., A Brief Introduction to Graphical Models and Bayesian Networks (http://www.cs.ubc.ca/~murphyk/Bayes/bayes.html)Ghahramani, Z., Probabilistic Models for Unsupervised Learning, (http://www.gatsby.ucl.ac.uk/~zoubin/NIPStutorial.html)Ghahramani, Z., Bayesian Methods for Machine Learning, (http://www.gatsby.ucl.ac.uk/~zoubin/ICML04-tutorial.html)Moser, A., Probabilistic Independence Networks (I-II), (http://www.eecis.udel.edu/~bloch/eleg867/notes/PIN_I_rev2/index.htm, http://www.eecis.udel.edu/~bloch/eleg867/notes/PIN_II_rev2/index.htm)Poet, M., Special Topics: Belief Networks, (http://www.stat.duke.edu/courses/Spring99/sta294/)

  • 2005-04-16 (Sat.) 80An Introduction to Bayesian Networks

    Bayesian network with application to bioinformaticsMurphy, K. and Mian, S., Modeling gene expression data using dynamic Bayesian networks, Technical. report 1999: Computer Science Division, University of California, Berkeley, CA Friedman, N., Linial, M., Nachman, I., and Pe’er, D., Using Bayesian networks to analyze expression data, Journal of Computational Biology, 7(3/4), pp. 601-620, 2000.Cai, D., Delcher, A., Kao, B., and Kasif, S. Modeling splice sites with Bayes networks, Bioinformatics, 2000, 16(2), pp. 152-158, 2000.Pe’er, D., Regev, A., Elidan, G., and Friedman, N., Inferring subnetworks from perturbed expression profiles, Bioinformatics, 17(suppl 1), pp. S215-S224, 2001.Pe’er, D., Regev, A., and Tanay, A., Minreg: inferring an active regulator set, Bioinformatics, 18(suppl 1), pp. 258-267, 2002.Hartemink, A. J., Gifford, D. K., Jaakkola, T. S., and Young, R. A., Combining location and expression data for principled discovery of genetic regulatory network models, Pacific Symposium on Biocomputing, 7, pp. 437-449, 2002.Imoto, S., Kim, S., Goto, T., Aburatani, S., Tashiro, K., Kuhara, S., and Miyano, S., Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network, Journal of Bioinformatics and Computational Biology, 1(2), pp. 231-252, 2003.Ronald, J., et al., A Bayesian networks approach for predicting protein-protein interactions from genomic data, Science 302: 449-453, 2003.Olga G. Troyanskaya, O. G., Dolinski, K., Owen, A. B., Altman, R. B., and Botstein, D., A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae), PNAS, 100(14), pp. 8348–8353, 2003.Beer M. A. and Tavazoie, S. Predicting gene expression from sequence. Cell, 117(2), pp. 185-198, 2004. Friedman, N., “Inferring cellular networks using probabilistic graphical models”, Science, 303(5659), pp. 799-805, 2004.

    A comprehensive list is available at http://zlab.bu.edu/kasif/bayes-net.html

  • 2005-04-16 (Sat.) 81An Introduction to Bayesian Networks

    Bayesian network software packagesBayes Net Toolbox (Kevin Murphy)

    http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.htmlA variety of algorithms for learning and inference in graphical models (written in MATLAB).

    WEKA http://www.cs.waikato.ac.nz/~ml/weka/Bayesian network learning and classification modules are included among a collection of machine learning algorithms (written in JAVA) .

    Detailed lists and comparison are referred to http://www.cs.ubc.ca/~murphyk/Bayes/bnsoft.htmlBayesian network list in Google

    http://directory.google.com/Top/Computers/Artificial_Intelligence/Belief_Networks/Software/

    Korb, K. B. and Nicholson, A. B., Bayesian Artificial Intelligence, CRC Press, 2004.

    http://www.csse.monash.edu.au/bai/book/appendix_b.pdf

    An Introduction to Bayesian Networks: Concepts and Learning from Data �Bayesian NetworksBayesian Networks (Cont’d)BN as a Probabilistic Graphical ModelRepresentation of Joint ProbabilityReal World Applications of BNCausal NetworksReasoning with Causal Networksd-separationd-separation (Cont’d)d-separation: Car Start ProblemBayesian Networks: RevisitedQuantitative Specification by Probability CalculusBayesian Network�for the Car Start ProblemAn Illustration of Conditional Independence in BNMain Issues in BNLearning Bayesian NetworksLearning Bayesian Networks (Cont’d)Parameter LearningMethods for Parameter EstimationBayes Rule, MAP and MLStructural LearningScoring MetricLikelihood Score in Relation with Information TheoryBayesian ScoreBayesian Dirichlet ScoreBayesian Dirichlet Score (Cont’d)Bayesian Dirichlet Score (Cont’d)Bayesian Dirichlet Score (Cont’d)Structure SearchTree-Structured NetworkSearch Strategies for General Bayesian NetworksGreedy Local Search AlgorithmEnhanced SearchLearning Bayesian Networks: SummaryDNA Microarray Data AnalysisDNA MicroarraysBayesian Networks�in Microarray Data AnalysisDNA MicroarrayData Preparation for Data MiningExample 1: Tumor Type ClassificationData SetsClassification by Bayesian NetworksClassification by Bayesian NetworksNaïve Bayes ClassifierTree-Augmented NetworkResultsMarkov Blanket in BNExample 2: Gene-Drug Dependency AnalysisData SetsResultsInferring Gene Regulation Model SummarySummary (Cont’d)References