1 Lecture 15 Bayesian Networks in Computer Vision Gary Bradski Sebastian Thrun *

Lecture 15Lecture 15Bayesian Networks in Computer Bayesian Networks in Computer

VisionVisionGary BradskiGary Bradski

Sebastian ThrunSebastian Thrun

http://robots.stanford.edu/cs223b/index.html

What is a Bayesian Network?

(random) variables A conditional

probability distribution quantifies the effects of the parents on node.

The graph is directed and acyclic.

It’s a Factored Joint Distribution and/or Causal Diagram

P(F|C) P(R|C,A)

A joint distribution, here p(W,C,A,R,R), is everything we can know about the problem,but it grows exponentially, here 25-1=31. Factoring the distribution in a Bayesnetdecreases the number of parameters, here from 31 to 11 (note probabilities sum to onewhich decreases the number of parameters to be specified).

P(A|W)P(C|W)

causal links

dependencies

Causality and Bayesian Nets

Transf.

Capac.

Ammeter

Battery

Observed

Un-Observed

One can also think of Bayesian Networks as a “Circuit Diagram” of Probability Models

• The Links indicate causal effect, not direction of information flow.• Just as we can predict effects of changes on the circuit diagram, we can predict consequences of “operating” on our probability model diagram.

Inference

• Once we have a model, we need to make it consistent by “diffusing” the distributions around until they are all consistent with one another.

• Central algorithm for this:

Belief Propagation

Specifically:

Belief Propagation

Message

Messages

Going down arrow, sum out parent Going up arrow, Bayes Law)(

)()|()|(

APABPBAP

Bayes Law:

“Causal” message “Diagnostic” message

* some figures from: Peter Lucas BN lecture course

Belief Propagation )(

)()|()|(

APABPBAP

Bayes Law:

)|()( jijVV VVPVj

Diagnostic message against arrow

)()( jjVV VPVj

Causal message with arrow

* some figures from: Peter Lucas BN lecture course

Inference in general graphs• Belief propagation is only guaranteed to be

correct for trees• A general graph should be converted to a

junction tree, by clustering nodes• Computationally complexity is exponential in

size of the resulting clusters (NP-hard)

Junction tree: BN Junction TreeAlgorithm for turning a Bayesian Network with loops into a junction tree

* Lauritzen 96

1. “Moralize” the graph by connecting parents2. Drop the arrows.3. Triangulate (connect nodes if a loop of >3 exists)4. Put in intersection variables

X1Graph:

X1(1) (2) (3)

Junction Tree:

Image fromSam Roweis

Global message passing: Two pass

rootroot

Collect

rootroot

Distribute

Figure from P. Green

• Select one clique as the root

• Two pass message passing: first collect evidence, then distribute evidence.

Junction Tree Inference

Image fromCecil Huang

Global message passing: Parallel, distributed version

Stage 1. Stage 2.

• All nodes can simultaneously send messages out, if it has received the messages from all its parents

• Parallel processing (topology level parallelism).

Details

Junction Tree Algorithm

Junction Tree Properties

An undirected graph whose vertices (clusters) are sets of variables with three properties:1. Singly connected property (only one path)2. Potential property (all variables are represented)3. Running intersection property (variable in 2 nodes

implies that all nodes on the path have the variable)

),(),(),,(1

),,,,( ecψdcψcbaψZ

edcbap

{a,b,c}

{c} {c,e}

Graph: Moralized, triangular graph:

Junction Tree:

Collect and Distribute passnecessary for Inference

Junction Tree 1

Junction Tree 2

Message Passing in Junction Tree• Potential

– U, the space of U (subset of the set of all nodes/vertices V) is the Cartesian product of the state sets of the nodes of U

– A discrete potential on U is a mapping from U to the non-negative real numbers Ro.

– Each clique and seperator in the junction tree has a potential (actually marginalized joint distribution on the nodes in the clique/seperator)

• Propagation/message passing between two adjacent cliques C1, C2 (S0 is their seperator)

– Marginalize C1’s potential to get new potential for S0

– Update C2’s potential

– Update S0’s potential to its new potential

Message Passing General• BayesNet forms a tree

– Pearl’s algorithm is Message Passing first out and then back in from a given node

• Not a tree (has loops)– Turn loops into cliques until net is a tree, then

use Pearl’s algorithm

• Cliques turn out to be too big– Exact computation is exponential in size of

largest cliques– Use approximation algorithms (many)

Towards Decisions

From Bayes’ Net to Decision/Influence Network

Start out with a causal Bayesian Network. In this case, Possible causes of leaf loss in an apple tree.

We want to know what to do about this.

We duplicate the network because we are going toAdd an intervention: Treating sickness

The intervention will cost us,

but might help with our utility:

Making a profit when we Harvest.

Given the cost, we can now infer the optimalTreat/no-treat policy

Replicate cold net and adddecision and cost/utility nodes

Influence Example

No fever means, coldless likely => Treat

No fever, no runny nosehealthy, don’t treatNo fever, runny nose =>

allergy => treat

General

Probabilistic graphical models

Probabilistic models

Directed Undirected

Graphical models

Alarm networkState-space modelsHMMsNaïve Bayes classifierPCA/ ICA

Markov Random FieldBoltzmann machineIsing modelMax-ent modelLog-linear models

(Bayesian belief nets) (Markov nets)

Graphical Models Taxonomy

Typical forms for the Conditional Probability Distributions (CPDs)

at graph nodes• For Discrete-state nodes

– Tabular (CPT) – Decision tree– Deterministic CPD– SoftMax(logistic/sigmoid)– Noisy-OR– MLP– SVM?

• For Continuous-state nodes– Gaussian– Mixture of Gaussians– Linear Gaussian– Conditional Gaussian– Regression tree

We can’t always compute exact inference. We then useApproximate Inference

Beam searchA* search

Importance samplingMCMC

ExpectationPropagation

Mean field

A vi's ca teg oriza tion fo r ap p roxim ate in fe ren ce a lg orith m s

S am p lin gM eth od s

S earchM eth od s

L oop yP rop ag ation

A p p roxim ate com p u ta tionon exac t m od e l

V aria tion a lm eth od s

M in ib u cke ts B oyen -K o lle rm eth od fo r D B N

P ro jec tion

E xac t com p u ta tionon ap p roxim ate m od e l

A p p roxim ate in fe ren ce a lg orith m s

Software

Libraries

Name Authors Src API Exec Free Inference Comments

Bassist U. Helsinki C++ Y U 0 MH Generates C++ for MCMC.

BayesiaLab Bayesia Ltd N N - $ jtree

"Supervised and unsupervised learning, clustering, analysis toolbox, adaptive questionnaires, dynamic models"

BNTMurphy (U.C.Berkeley) Matlab/C Y WUM 0 Many

Also handles dynamic models, like HMMs and Kalman filters.

BNJ Hsu (Kansas) Java - - 0 jtree, IS -

BUGSMRC/Imperial College N N WU 0 Gibbs -

Deal Bottcher et al R - - 0 None Structure learning.

GDAGsimWilkinson (U. Newcastle) C Y WUM 0 Exact

Bayesian analysis of large linear Gaussian directed models.

Genie U. Pittsburgh N WU WU 0 Jtree -

GMRFsimRue (U. Trondheim) C Y WUM 0 MCMC

Bayesian analysis of large linear Gaussian undirected models.

GMTkBilmes (UW), Zweig (IBM) N Y U 0 Jtree Designed for speech recognition.

Grappa Green (Bristol) R - - 0 Jtree -Hugin Expert Hugin N Y W $ Jtree -

HydraWarnes (U.Wash.) Java - - 0 MCMC -

Java Bayes Cozman (CMU) Java Y WUM 0Varelim, jtree -

MIMHyperGraph Software N N W $ Jtree Up to 52 variables.

MSBNx Microsoft N Y W 0 Jtree -Netica Norsys N WUM W $ jtree -

PMT Pavlovic (BU) Matlab/C - - 0special purpose -

PNL Eruhimov (Intel) C++ - - 0 Many A C++ version of BNT; will be released 12/03.

Pulcinella IRIDIA Lisp Y WUM 0 ? Uses valuation systems for non-probabilistic calculi.

RISODodier (U.Colorado) Java Y WUM 0 Polytree Distributed implementation.

Tetrad CMU N N WU 0 None -UnBBayes ? Java - - 0 jtree K2 for struct learning

VibesWinn & Bishop (U. Cambridge) Java Y WU 0 Variational Not yet available.

WinMine Microsoft N N W 0 None Learns BN or dependency net structure.

XBAIES 2.0 Cowell (City U.) N N W 0 Jtree -

Bayesian Net SoftwareAppend A

Compare All BayesNet Software

G Y ManyY

Append A

G Y ManyY

Append A

G Y ManyY

Append A

Compare All BayesNet SoftwareKEY

G Y ManyY

Append A

BN ResearchersMAJOR RESEARCHERSMicrosoft: http://www.research.microsoft.com/research/dtg/ Heckerman & Chickering are big there, currently pushing uses of Dependency NetworksProf. Russell (Berkeley): http://http.cs.berkeley.edu/~russell/ Wants more expressive probabilistic language. Currently pushing Center for Intelligent Systems at Berkeley http://www.eecs.berkeley.edu/CIS Brings together wide range of luminariesProf. Jordan (Berkeley): http://www.cs.berkeley.edu/~jordan/ Writing book, Data retrieval, structure learning, clustering. Variational methods, All.Yair Weiss (Berkely=>Hebrew U): http://www.cs.berkeley.edu/~yweiss/ Computationally tractable approximation. Vision, now at Hebrew U.Prof. Koller (Stanford): http://robotics.stanford.edu/~koller/courses.html Writing book, probabilistic relational models (PRMs) more expressive languages, All.Prof. Frey (Waterloo): http://www.cs.toronto.edu/~frey/ Vision models, machine learning reformulationsProf. Pearl (UCLA): http://bayes.cs.ucla.edu/jp_home.html Founder. Causality theoryBill Freeman (MIT, was MERL, Learning, vision): http://www.ai.mit.edu/people/wtf/ Low level vision, learning theory now at MITPeter Spirtes (CMU, Tetrad project): http://hss.cmu.edu/HTML/departments/philosophy/people/directory/Peter_Spirtes.htmlKevin Murphy (MIT, BN Toolkit): http://www.ai.mit.edu/~murphyk/ Toolboxes (BNT), computational speedups, tutorialsJonathan Yedidia (MERL): http://www.merl.com/people/yedidia/ Learning theoryPietro Perona (CalTech): http://www.erc.caltech.edu/ Vision Center for NeuroMorphic information http://www.erc.caltech.edu/ Brings together machine learning, BN, vision, design etcRon Parr (Duke University) http://www.cs.duke.edu/~parr/ Game theory, reinforcement, multi-agentNir Friedman (Hebrew U): http://www.cs.huji.ac.il/~nirf/ Computational biology, efficient inferenceAvi Pfeffer (Harvard): http://www.eecs.harvard.edu/~avi/ Richer probabilistic expressibility, intelligent systemsZoubin Ghahramani (Gatsby Institute, London): http://www.gatsby.ucl.ac.uk/~zoubin Variational BayesFinn Jensen, (Hugin, Denmark): http://www.cs.auc.dk/~fvj Classical (expert-system style) BNsUffe Kjaerulff, (Hugin, Denmark): http://www.cs.auc.dk/~uk DittoEric Horvitz, (Microsoft): http://research.microsoft.com/~horvitz/ Decision making, user interfaceTommi Jaakkola, (MIT): http://www.ai.mit.edu/people/tommi/tommi.html Theory, structure learning from bio dataRoss Shachter, (Stanford): http://www.stanford.edu/dept/MSandE/faculty/shachter/ Influence diagramsDavid Spiegelhalter, (Univ. College London): http://www.mrc-bsu.cam.ac.uk/BSUsite/AboutUs/People/davids.shtml Bayesian and medical BNsSteffan Laurizten, (Europe): http://www.math.auc.dk/~steffen/ Statisical theory Phil Dawid, (Univ College London): http://www.ucl.ac.uk/~ucak06d/ Statistical theoryKathy Laskey, (George Mason): http://www.ucl.ac.uk/~ucak06d/ Object-oriented BNs, military applicationsJeff Bilmes, (U Washington): http://www.ee.washington.edu/faculty/bilmes/ DBNs for speechHagai Attias, (Microsoft): http://research.microsoft.com/users/hagaia/ Variational and sampling for (acoustic) signal processing

World wide list of Bayesians (not just networks): http://bayes.stat.washington.edu/bayes_people.html CONFERENCESUAI: http://robotics.stanford.edu/~uai01/NIPS: http://www.cs.cmu.edu/Groups/NIPS/

Append C

PNL vs. Other Graphical Models LibrariesName Author Src Cost GUI Un/dir Utility DBN Gauss Inference Learning

.edu .comParam Struct

PNL Intel C++0 0|$ -

U,D* + +

Jtree, BP, Gibbs+ +

BNT Murphy Matlab

Jtree, BP, Gibbs, varelim

GMTk Bilmes C++ 0 0 - D - + - Jtree + +

Hugin Hugin - $ $ + D + - + Jtree + -

BUGS MRC - 0 ∞ + D - - + Gibbs + -

Genie U. Pitt. - 0 ∞ + D + - - Jtree - -

MSBN Microsoft - 0 $ + D + - - Jtree - -

WinMine Microsoft - 0 $ + U,D - - - - + +

JavaBayes Cozman Java0 ∞ +

D- - -

Varelim- -

Present Library:

Intel Library is much more comprehensive

Append C

Examples of Use

Applications

Face Modeling and Recognition Using Bayesian Networks

Gang Song*, Tao Wang, Yimin Zhang, Wei Hu, Guangyou Xu*, Gary Bradski

Face feature finder (separate)

Learn Gabor filter “jet” at each point

System:

Add Pose switching variable

Face Modeling and Recognition Using Bayesian Networks

Gang Song*, Tao Wang, Yimin Zhang, Wei Hu, Guangyou Xu*, Gary Bradski

Results:

Results:Results:

BNPFR – Bayesnet with PoseBNFR – Bayesnet w/o PoseEHMM – Embedded HMMEGM – Gabor jets

Looking for all possible joint configuration J is computationally impractical. Therefore, segmentation takes place in two stages. First, we segment the head and torso, and determine the position of the neck. Then, we jointly segment the upper arms, forearms and hands, and determine the position of the remaining joints.

The Segmentation Problem

AHT ji

Aijijji

HTijijF qPqPPQQQJ,QJ,

JOJOQJO,,

),|(),|(maxarg),|(maxarg

HTA QQ , state assignments for the arm and head&torso regions

HTA JJ , joints for the arms and head&torso components.

Step I Step II

Upper Body Model

LeftHand

JointsJ

ComponentsC

ObservationsOObservations

F A J CF

F AAJAJAJOji

ijijijji

ij uPPqPqOPuOPPij,,

})()|(),|(),,|({})({)(

Anthropological Measurements

LeftForearm

LeftUpper Arm

TorsoT

RightUpper Arm

RightForearm

RightHand

HandSizeSh

HeadSizeShd

TorsoSizeSt

UpperArm Size

ForearmSize Sf

LeftWrist

LeftElbow

LeftShoulder

RightShoulder

RightElbow

RightWrist

HandSizeSh

UpperArm Size

ForearmSize Sf

Body Tracking Results

Audio-Visual Continuous Speech Recognition. The Overall System

Face Detection

Acoustic Features (MFCC)

Mouth Detection

Audio video signal

Visual Features

AV Model

Mouth Tracking

Speaker Independent AVCSR

A coupled HMM for audio visual speech recognition

Audio observations of size 13, modeled with 3states, 32 mixture/state, diagonal covariance matrix (39 English phoenemes).

Visual observations of size 13, modeled with 3states, 12 mixture/state, diagonal covariance matrix (13 English visemes).

AV Speech Reco

AVCSR Experimental Results

•WER obtained on X2MTVS database, 300 speakers, 10 digit enumeration sentences.

The system improves by over 55% the recognition rate of the acoustic only speech recognition at SNR 0db!

Bill Freeman (MIT AI Lab) created a simple model of early visual processing:

MRFs for Hyper-Resolution

He presented blurred images and trained on the sharp original, thentested on new images

ActualCubic SplineInput Bayesian Net

MRFs for Shape from ShadingThe illumination, which changes with each frame,is factored from the reflectance which stays the same:

This model is then used to insert graphics with proper lighting:

Frames overtime =>

Blei, Jordan Malik

Example of learned models(from Frey)

1 Lecture 15 Bayesian Networks in Computer Vision Gary Bradski Sebastian Thrun *

Documents

Bayesian Essentials and Bayesian Regression

Probabilistic Robotics Thrun Burgard Fox

Structure From Motion Sebastian Thrun, Gary Bradski, Daniel Russakoff Stanford CS223B Computer Vision

Bayesian Decision and Bayesian Learning

Stereo Sebastian Thrun, Gary Bradski, Daniel Russakoff Stanford CS223B Computer Vision (with slides by James Rehg and

Thrun 2000 AI Mag

Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Bayesian Learning, cont’d. Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with

Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000

Active Contours / Planes Sebastian Thrun, Gary Bradski, Daniel Russakoff Stanford CS223B Computer Vision Some slides

Anna Petrovskaya and Sebastian Thrun · Anna Petrovskaya and Sebastian Thrun Velodyne Laser Riegl Laser IBEO Laser Applanix INS Bosch Radar SICK LMS Laser SICK LDLRS Laser . Anna

Bayesian Regression & Classiﬁcation · Bayesian Regression & Classiﬁcation learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic

Junior: The Stanford Entry in the Urban Challenge - Sebastian Thrun

cmsc726: HMMs material from: slides from Sebastian Thrun, and Yair Weiss

1 CS 223-B Lecture 1 Sebastian Thrun Gary Bradski CORNEA AQUEOUS HUMOR

© sebastian thrun, CMU, 20001 CS226 Statistical Techniques In Robotics Sebastian Thrun (Instructor) and Josh Bao (TA)

Generalized-ICP - University of Oxfordavsegal/resources/papers/Generalized_ICP.pdf · Sebastian Thrun Stanford University Email: thrun@stanford.edu Abstract—In this paper we combine

© sebastian thrun, CMU, 20001 16-899C Statistical Techniques In Robotics Sebastian Thrun and Geoffrey Gordon Carnegie Mellon University thrun

1 CS 223-B Part A Lect. : Advanced Features Sebastian Thrun Gary Bradski

Joelle Pineau Michael Montemerlo Martha Pollack * Nicholas Roy Sebastian Thrun