Reasoning with Deterministic and Probabilistic graphical models …frossi/rina2.pdf · 2012. 5....

Preview:

Citation preview

Reasoning with Deterministic and Probabilistic graphical models

Class 2: Inference in Constraint Networks

Rina Dechter

class1 Padova

Road Map

n Graphical models

n Constraint networks Model n Inference

n Search

n Probabilistic Networks

class1 Padova

A B red green red yellow green red green yellow yellow green yellow red

Example: map coloring Variables - countries (A,B,C,etc.) Values - colors (red, green, blue) Constraints:

C

A

B

D E

F

G

A Constraint Networks

A

B

E

G

D F

C

Constraint graph

class1 Padova

etc. ,ED D, AB,A ≠≠≠

Example: map coloring Variables - countries (A,B,C,etc.) Values - colors (e.g., red, green, yellow) Constraints:

Constraint Satisfaction Tasks

Are the constraints consistent? Find a solution, find all solutions Count all solutions Find a good (optimal) solution

class1 Padova

etc. ,ED D, AB,A ≠≠≠

Constraint Network

n A constraint network is: R=(X,D,C)

n X variables

n D domain

n C constraints

n R expresses allowed tuples over scopes n A solution is an assignment to all variables that satisfies all constraints (join of all relations). n Tasks: consistency?, one or all solutions, counting, optimization

class1 Padova

},...,{ 1 nXXX =},...{},,...,{ 11 kin vvDDDD ==

),(},...{ 1

iii

t

RSCCCC

=

=

Crossword puzzle

n Variables: x1, …, x13 n Domains: letters n Constraints: words from

{HOSES, LASER, SHEET, SNAIL, STEER, ALSO, EARN, HIKE, IRON, SAME, EAT, LET, RUN, SUN, TEN, YES, BE, IT, NO, US}

class1 Padova

The Queen problem

The network has four variables, all with domains Di = {1, 2, 3, 4}. (a) The labeled chess board. (b) The constraints between variables.

class1 Padova

Constraint’s representations

n Relation: allowed tuples n Algebraic expression: n Propositional formula: n Semantics: by a relation

class1 Padova

YXYX ≠≤+ ,102

cba ¬→∨ )(

312231ZYX

Partial solutions

Not all consistent instantiations are part of a solution: (a) A consistent instantiation that is not part of a solution. (b) The placement of the queens corresponding to the solution (2, 4, 1, 3). (c) The placement of the queens corresponding to the solution (3, 1, 4, 2).

class1 Padova

Constraint Graphs (primal)

Queen problem

class1 Padova

Constraint Graphs: Primal, Dual and Hypergraphs

A (primal) constraint graph: a node per variable arcs connect constrained variables. A dual constraint graph: a node per constraint’s scope, an arc connect nodes sharing variables =hypergraph

class1 Padova

Graph Concepts Reviews: Primal, Hyper and Dual Graphs

n A hypergraph n Dual graphs n A primal graph

class1 Padova

Propositional Satisfiability

ϕ = {(¬C), (A v B v C), (¬A v B v E), (¬B v C v D)}.

class1 Padova

Constraint graphs of 3 instances of the Radio frequency assignment problem in CELAR’s benchmark

class1 Padova

Operations With Relations

n Intersection

n Union

n Difference

n Selection

n Projection

n Join

n Composition

class1 Padova

n Join : n Logical AND:

Local Functions

Combination

class1 Padova

=

gf

gf ∧

= ∧

Global View of the Problem

C1 C2 Global View

What about counting?

Number of true tuples Sum over all the tuples

true is 1 false is 0 logical AND?

TASK

class1 Padova

=

Road Map

n Graphical models

n Constraint networks Model n Inference

n Variable elimination: n Tree-clustering

n Constraint propagation

n Search

n Probabilistic Networks

class1 Padova

Bucket Elimination Adaptive Consistency (Dechter & Pearl, 1987)

Bucket E: E ≠ D, E ≠ C Bucket D: D ≠ A Bucket C: C ≠ B Bucket B: B ≠ A Bucket A:

A ≠ C

contradiction

=

D = C

B = A

= ≠

class1 Padova

widthinduced -*

*

w ))exp(w O(n :Complexity

The Idea of Elimination

3 value assignment

D

B

C RDBC

eliminating E

class1 Padova

project and join E variableEliminate

=∏ ECDBC EBEDDBC RRRR

A

E D C B

E

A

D C B

|| RDBE , || RE

|| RDB

|| RDCB || RACB || RAB

RA

RCBE

Bucket Elimination Adaptive Consistency (Dechter & Pearl, 1987)

class1 Padova

d ordering along widthinduced -(d) ,

*

*

w(d)))exp(w O(n :Complexity

≠≠

E

D

A

C

B

}2,1{

}2,1{}2,1{

}2,1{ }3,2,1{

:)(AB :)(BC :)(AD :)(

BE C,E D,E :)(

ABucketBBucketCBucketDBucketEBucket

≠≠≠

≠≠≠

:)(EB :)(

EC , BC :)(ED :)(

BA D,A :)(

EBucketBBucketCBucketDBucketABucket

≠≠≠

≠≠≠

Adaptive Consistency, Bucket-elimination

Initialize: partition constraints into For i=n down to 1 along d // process in reverse order

for all relations do // join all relations and “project-out”

If is not empty, add it to where k is the largest variable index in Else problem is unsatisfiable Return the set of all relations (old and new) in the buckets

class1 Padova

nbucketbucket ,...,1

im bucketRR ∈,...,1

) ()( jX jnew RRi

∏−←

iX

newR ,, ikbucketk <newR

Properties of bucket-elimination (adaptive consistency)

n Adaptive consistency generates a constraint network that is backtrack-free (can be solved without deadends). n The time and space complexity of adaptive consistency along ordering d is exponential in w* . n Therefore, problems having bounded induced width are tractable (solved in polynomial time).

n trees ( w*=1), n series-parallel networks ( w*=2 ), n and in general k-trees ( w*=k).

class1 Padova

Solving Trees (Mackworth and Freuder, 1985)

Adaptive consistency is linear for trees and equivalent to enforcing directional arc-consistency (recording only unary constraints)

class1 Padova

Tree Solving is Easy

1,2,3 1,2,3 1,2,3 1,2,3

1,2,3 1,2,3

1,2,3

<

<

Y

X

Z

T S R U

class1 Padova

Tree Solving is Easy

1,2,3 1,2,3 1,2,3 1,2,3

1,2,3 1,2,3

1,2,3

<

<

Y

X

Z

T S R U

class1 Padova

Tree Solving is Easy

1,2,3 1,2,3 1,2,3 1,2,3

1,2 1,2

1,2,3

<

<

Y

X

Z

T S R U

class1 Padova

Tree Solving is Easy

1,2,3 1,2,3 1,2,3 1,2,3

1,2 1,2

1,2,3

<

<

Y

X

Z

T S R U

class1 Padova

Tree Solving is Easy

1,2,3 1,2,3 1,2,3 1,2,3

1,2 1,2

1

<

<

Y

X

Z

T S R U

class1 Padova

Tree Solving is Easy

1,2,3 1,2,3 1,2,3 1,2,3

2 2

1

<

<

Y

X

Z

T S R U

class1 Padova

Tree Solving is Easy

3 3 3 3

2 2

1

<

<

Y

X

Z

T S R U

class1 Padova

Tree Solving is Easy

3 3 3 3

2 2

1

<

<

Y

X

Z

T S R U

Constraint propagation Solves trees in linear time

class1 Padova

Road Map

n Graphical models

n Constraint networks Model n Inference

n Variable elimination: n Tree-clustering

n Constraint propagation

n Search

n Probabilistic Networks

class1 Padova

Tree Decomposition

Each function in a cluster Satisfy running intersection property

G

E

F

C D

B

A A B C

R(a), R(b,a), R(c,a,b)

B C D F R(d,b), R(f,c,d)

B E F R(e,b,f)

E F G R(g,e,f)

2

4

1

3

EF

BC

BF

R(a), R(b,a), R(c,a,b) R(d,b), R(f,c,d) R(e,b,f), R(g,e,f)

class1 Padova

Cluster Tree Elimination

A B C R(a), R(b,a), R(c,a,b)

B C D F R(d,b), R(f,c,d),h(1,2)(b,c)

B E F R(e,b,f), h(2,3)(b,f)

E F G R(g,e,f)

2

4

1

3

EF

BC

BF sep(2,3)={B,F} elim(2,3)={C,D}

Computes the minimal domains

class1 Padova

),,(),()(),()2,1( bacRabRaRcbh a ⊗⊗=⇓

),(),,(),(),( )2,1(,)3,2( cbhdcfRbdRfbh dc ⊗⊗=⇓

CTE: Cluster Tree Elimination

ABC

2

4

1

3 BEF

EFG

EF

BF

BC

BCDF

Time: O ( exp(w*+1 )) Space: O ( exp(sep))

G

E

F

C D

B

A

Computes the minimal domains

class1 Padova

),,(),()(),()2,1( bacRabRaRcbh a ⊗⊗=⇓

),(),,(),(),( )2,3(,)1,2( fbhdcfRbdRcbh fd ⊗⊗=⇓

),(),,(),(),( )2,1(,)3,2( cbhdcfRbdRfbh dc ⊗⊗=⇓

),(),,(),( )3,4()2,3( fehfbeRfbh e ⊗=⇓

),(),,(),( )3,2()4,3( fbhfbeRfeh b ⊗=⇓

),,(),()3,4( fegRfeh g=⇓

Tree decompositions

A B C R(a), R(b,a), R(c,a,b)

B C D F R(d,b), R(f,c,d)

B E F R(e,b,f)

E F G R(g,e,f)

EF

BF

BC

G

E

F

C D

B A

Belief network Tree decomposition

class1 Padova

property) onintersecti (running subtree connected a forms set the variableeach For 2.

and thatsuch vertex oneexactly is there function each For 1.

:satisfying and x over vertelabelings are and and a tree is where,,, triple

a is for A

χ(v)}V|X{vXXχ(v))scope(Cψ(v)C

CCCψ(v)Xχ(v)Vv

ψχ(V,E)TTX,D,CRiondecomposit tree

ii

ii

i

∈∈∈

⊆∈

⊆⊆∈

=><

>=<

ψχ

Induced-width and Tree-width

EDCB

DCBA

DBE

ADB

CBE

Tree-width =3

Tree-width =2

Induced-width Of ordering

Tree-width of a graph = smallest cluster in a cluster-tree Path-width of a graph = smallest cluster in a cluster-path

class1 Padova

E

D

A

C

B

Road Map

n Graphical models

n Constraint networks Model n Inference

n Variable elimination: n Tree-clustering

n Constraint propagation

n Search

n Probabilistic Networks

class1 Padova

Sudoku – Approximation: Constraint Propagation

Each row, column and major block must be alldifferent “Well posed” if it has unique solution: 27 constraints

2 3 4 6 2

〉 Variables: empty slots 〉 Domains = {1,2,3,4,5,6,7,8,9} 〉 Constraints:

〉 27 all-different

〉 Constraint 〉 Propagation 〉 Inference

class1 Padova

Approximating Inference: Local Constraint Propagation

n Problem: bucket-elimination/tree-clustering algorithms are intractable when induced width is large

n Approximation: bound the size of recorded dependencies, i.e. perform local constraint propagation (local inference)

class1 Padova

From Global to Local Consistency

class1 Padova

Arc-consistency

A binary constraint R(X,Y) is arc-consistent w.r.t. X is every value In x’s domain has a match in y’s domain.

Only domains are reduced:

class1 Padova

YXRR YX <== constraint },3,2,1{ },3,2,1{

∏←X YXYX DRR

3 2, 1,

3 2, 1, 3 2, 1,

1 ≤ X, Y, Z, T ≤ 3 X < Y Y = Z T < Z X ≤ T

X Y

T Z

3 2, 1, <

=

<

Arc-consistency

class1 Padova

1 ≤ X, Y, Z, T ≤ 3 X < Y Y = Z T < Z X ≤ T

X Y

T Z

<

=

<

1 3

2 3

Arc-consistency

class1 Padova

∏←X YXYX DRR

From Arc-consistency to relational arc-consistency

n Sound n Incomplete n Always converges

(polynomial)

A B

C D

3 2 1 A

3 2 1 B

3 2 1 D

3 2 1 C

<

<

< =

class1 Padova

Relational Distributed Arc-Consistency

Primal Dual

AB

AD

A

A B

C D

3 2 1 A

3 2 1 B

3 2 1 D

3 2 1 C

DC

BC

D

C

B

class1 Padova

AC-3

Complexity: Best case O(ek), since each arc may be processed in O(2k)

class1 Padova

)( 3ekO

Arc-consistency Algorithms

n AC-1: brute-force, distributed

n AC-3, queue-based

n AC-4, context-based, optimal n AC-5,6,7,…. Good in special cases

n Important: applied at every node of search n (n number of variables, e=#constraints, k=domain size) n Mackworth and Freuder (1977,1983), Mohr and Anderson, (1985)…

class1 Padova

)( 3ekO)( 3nekO

)( 2ekO

Path-consistency

n A pair (x, y) is path-consistent relative to Z, if every consistent assignment (x, y) has a consistent extension to z.

class1 Padova

Example: path-consistency

class1 Padova

Path-consistency Algorithms

n Apply Revise-3 (O(k^3)) until no change

n Path-consistency (3-consistency) adds binary constraints. n PC-1:

n PC-2:

n PC-4 optimal:

class1 Padova

)( kjkikijijij RDRRR ⊗⊗∩← π

)( 55knO)( 53knO

)( 33knO

Local i-consistency

i-consistency: Any consistent assignment to any i-1 variables is consistent with at least one value of any i-th variable

class1 Padova

Gausian and Boolean Propagation, Resolution

n Linear inequalities n Boolean constraint propagation, unit resolution

class1 Padova

2,2 ≤≤ yx

)( CA ¬∨

⇒≥≤++ 13,15 zzyx

⇒¬¬∨∨ )(),( BCBA

Directional Resolution ó Adaptive Consistency

class1 Padova

))exp(( :space and timeDR))(exp(||

*

*

wnOwObucketi =

Directional i-consistency

A

E

C D

B

D

C B

E

D

C B

E

D

C B

E

Adaptive d-arc d-path

class1 Padova

DCBR

≠ ≠

≠ ≠≠≠

:AB A:BBC :C

AD C,D :DBE C,E D,E :E

≠≠

≠≠≠

DBDC RR ,CBR

DRCRDR

class1 Padova

Greedy Algorithms for Induced-Width

n Min-width ordering

n Min-induced-width ordering

n Max-cardinality ordering

n Min-fill ordering

n Chordal graphs

n Hypergraph partitionings (Project: present papers on induced-width, run algorithms for induced-width on new benchmarks…)

Min-width ordering

Proposition: algorithm min-width finds a min-width ordering of a graph Complexity:? O(e)

class1 Padova

Greedy orderings heuristics

min-induced-width (miw) input: a graph G = (V;E), V = {v1; :::; vn} output: A miw ordering of the nodes d = (v1; :::; vn). 1. for j = n to 1 by -1 do 2. r ß a node in V with smallest degree. 3. put r in position j. 4. connect r's neighbors: E ß E union {(vi; vj)| (vi; r) in E; (vj ; r) 2 in E}, 5. remove r from the resulting graph: V ßV - {r}. min-fill (min-fill) input: a graph G = (V;E), V = {v1; :::; vn} output: An ordering of the nodes d = (v1; :::; vn). 1. for j = n to 1 by -1 do 2. r ßa node in V with smallest fill edges for his parents. 3. put r in position j. 4. connect r's neighbors: E ßE union {(vi; vj)| (vi; r) 2 E; (vj ; r) in E}, 5. remove r from the resulting graph: V ßV –{r}.

Theorem: A graph is a tree iff it has both width and induced-width of 1.

class1 Padova

class1 Padova

Example

We see again that G in the Figure (a) is not chordal since the parents of A are not connected in the max-cardinality ordering in Figure (d). If we connect B and C, the resulting induced graph is chordal.

class1 Padova Cordal Graphs; Max-Cardinality Ordering

n A graph is cordal if every cycle of length at least 4 has a chord

n Finding w* over chordal graph is easy using the max-cardinality ordering

n If G* is an induced graph it is chordal n K-trees are special chordal graphs. n Finding the max-clique in chordal graphs is easy (just enumerate all cliques in a max-cardinality ordering

class1 Padova

Max-cardinality ordering

Graph Concepts: Hypergraphs and Dual Graphs

n A hypergraph is H = (V,S) , V= {v_1,..,v_n} and a set of subsets Hyperegdes: S={S_1, ..., S_l }. n Dual graphs of a hypergaph: The nodes are the hyperedges and a pair of nodes is connected if they share vertices in V. The arc is labeled by the shared vertices. n A primal (Markov, moral) graph of a hypergraph H = (V,S) has V as its nodes, and any two nodes are connected by an arc if they appear in the same hyperedge. n Factor Graphs. .

class1 Padova

Which greedy algorithm is best?

n MinFill, prefers a node who add the least number of fill-in arcs.

n Empirically, fill-in is the best among the greedy algorithms (MW,MIW,MF,MC)

n Complexity of greedy orderings?

n MW is O(?), MIW: O(?) MF (?) MC is O(mn)

class1 Padova

Road Map

n Graphical models

n Constraint networks Model n Inference

n Variable elimination: n Tree-clustering

n Constraint propagation

n Search

n Probabilistic Networks

class1 Padova

Recommended