29
© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel February 19, 2007 Efficient implementation of BP in P2P networks Roman Schmidt School of Computer and Communication Sciences Ecole Polytechnique Fédérale de Lausanne (EPFL) Evergrow Loopy Belief Propagation Algorithm and Applications Workshop Jerusalem, Israel, February 19-21, 2007

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, IsraelFebruary 19, 2007 Efficient implementation of BP in

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel February 19, 2007

Efficient implementation of BP in P2P networks

Roman Schmidt

School of Computer and Communication SciencesEcole Polytechnique Fédérale de Lausanne (EPFL)

Evergrow Loopy Belief Propagation Algorithm and Applications WorkshopJerusalem, Israel, February 19-21, 2007

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 2February 19, 2007

Motivation

• Users share (correlated) data in P2P systems– currently mainly for retrieval

– but correlations hold hidden knowledge

• Profit by correlations for new services– Distributed Knowledge Base (e.g., for software bugs)

– Structure/cluster data (e.g., for better search results)

– Recommendation system (e.g., for data annotation)

– etc.

• Distributed Inference System on top of a P2P system

• Current focus and contribution– Message reduction

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 3February 19, 2007

Outline

• Motivation

• Basic Concepts

– Belief Propagation

– The P-Grid Overlay

• P2P Belief Propagation

– Inference Architecture

– The Relaxation Algorithm

• Evaluation

• Conclusions

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 4February 19, 2007

Belief Propagation

• Inference based on Bayesian networks– models dependencies between variables

• Iterative message-passing algorithm– compute marginal probabilities (“beliefs”)– provably efficient on trees, works for arbitrary networks

OS1 Driver1

App1

True FalseInstalled 0.2 0.8

True FalseInstalled 0.2 0.8

OS1 Driver1 Runs Error T T 0.9 0.1 T F 0.4 0.6 F T 0.0 1.0 F F 0.0 1.0

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 5February 19, 2007

The BP message-passing algorithm

• Sends messages across edges

– 2 messages per edge and iteration

– if all messages from previous iteration were received

• Beliefs are updated per iteration

– algorithm terminates if beliefs stabilize

• Messages are vectors

– length corresponds to the number of node states

• Computation complexity grows exponentially with

the number of states of nodes

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 6February 19, 2007

The P-Grid Overlay

• Peers are organized in a binary trie structure– one node for every common prefix

– trie is only virtual (exists only via routing tables)

– all nodes remain at the leaf-level (no hierarchy)

• Multiple peers per key space partition

• Multiple routing entries (random choice)– per routing table level

• Logarithmic search complexity– even for skewed data distributions

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 7February 19, 2007

P-Grid routing

00*

0*

01*

1*

10* 11*

AA FF BB CC DD EE

1* : C, D01* : B

Stores data with key prefix 00

1* : E01* : B

Stores data with key prefix 00

1* : C, D00* : F

Stores data with key prefix 01

0* : A, B11* : E

Stores data with key prefix 10

0* : A, F11* : E

Stores data with key prefix 10

0* : B, F10* : D

Stores data with key prefix 11

queryfor ‘100’

queryfor ‘100’

• Keys resolved by longest prefix matching– Insures logarithmic search cost for skewed trees

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 8February 19, 2007

The Distributed Inference System

• P-Grid– Bug reports, metadata, tags, etc.

– Bayesian network

• Variables (spread over P-Grid nodes)

• Dependencies between variables

• Distributive learning

• Belief Propagation– Distributed inference

• Message-passing algorithm

• Identified problem– high message cost

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 9February 19, 2007

Spring Relaxation

• Bayesian network as spring network

– find minimum energy configuration (relax springs)

– energy is proportional to the distance between P-Grid nodes

– variables at the same node require no energy

– optimal: all variables at one node (load balancing)

• Decentralized algorithm

– nodes try to relax their springs

– move correlated variables close to each other

– optimally, at the same node (no physical message)

– considering load distribution

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 10February 19, 2007

Spring relations in P-Grid

00*

0*

01*

1*

10* 11*

AA FF BB CC DD EE

1* : C, D01* : B

a -> h, tf -> o, r…

1* : E01* : B

a -> h, tf -> o, r…

1* : C, D00* : F

h -> a, m m -> h, u…

0* : A, B11* : E

o -> f r -> f, t…

0* : A, F11* : E

o -> fr -> f, t…

0* : B, F10* : D

t -> a, ru -> m…

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 11February 19, 2007

The Relaxation Algorithm (relax variables)

currentLoad = length(localVars);overload = currentLoad - avgLoad / 2;IF (overload <= 0) return;ENDIFundirVars = variables having a tension only at one level;WHILE ((overload > 0) AND (length(unidirVars) > 0)) move variable to a peer from the level with the tension; removeFirst(unidirVars); overload = overload - 1;ENDWHILE…

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 12February 19, 2007

The Relaxation Algorithm (balance load)

…multidirVars = vars having tensions at multiple levels;WHILE ((currentLoad > avgLoad) AND (length(multidirVars) > 0)) FOR i = routingTable.levels TO 1 IF (level i is underpopulated) cand = vars having a tension at level i; FOR j = 1 TO length(cand) IF (cand(j).tension(i) >= max(cand(j).tension)) move variable to a peer from level i; remove(multidirVars, cand(j)); currentLoad = currentLoad - 1; IF (currentLoad <= avgLoad) break; ENDIF; ENDIF; ENDFOR; ENDIF; ENDFOR; ENDWHILE

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 13February 19, 2007

Algorithm execution

• Executed at each node– Iteratively

– Independently (evaluated simultaneous)

• Termination– Max. number of iterations

– No free or multi-directional variables to move

– No tension reduction in last two iterations

• Effort– Variable movements require only 1 message

– Trade-off to message reduction

– Dynamic variables require “remote” updates

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 14February 19, 2007

Evaluation

• Matlab implementation

• Diverse Bayesian networks

– random, binary trees, scale-free

– up to 2048 Bayesian nodes

– up to 512 P-Grid nodes

• 10 repetitions

• 2 main evaluation criterions

– message reduction

– load balance

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 15February 19, 2007

Random network

1024 nodes, average node degree 4

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 16February 19, 2007

Binary tree network

1023 nodes

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 17February 19, 2007

Scale-free network

1024 nodes, average node degree 4

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 18February 19, 2007

Message reduction (random)

128 / 2048 / 4

256 / 2048 / 4 512 / 2048 / 4

64 / 2048 / 4

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 19February 19, 2007

Message reduction (binary tree)

64 / 2047 128 / 2047

256 / 2047 512 / 2047

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 20February 19, 2007

Message reduction (scale-free)

64 / 2048 128 / 2048

256 / 2048 512 / 2048

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 21February 19, 2007

Load balancing (random)

128 / 2048 / 4

256 / 2048 / 4 512 / 2048 / 4

64 / 2048 / 4

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 22February 19, 2007

Load balancing (binary tree)

64 / 2047

128 / 2047

256 / 2047512 / 2047

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 23February 19, 2007

Load balancing (scale-free)

64 / 2048128 / 2048

128 / 2048

512 / 2048

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 24February 19, 2007

Number of iterations

• Till relaxation algorithm termination• Scale-free network (128 nodes, 1024 vars)• 100 runs

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 25February 19, 2007

Reduction effort (random)

128 / 2048 / 4

256 / 2048 / 4 512 / 2048 / 4

64 / 2048 / 4

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 26February 19, 2007

Reduction effort (binary tree)

64 / 2047 128 / 2047

256 / 2047 512 / 2047

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 27February 19, 2007

Reduction effort (scale-free)

64 / 2048 128 / 2048

128 / 2048 512 / 2048

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 28February 19, 2007

Conclusions

• Decentralized relaxation algorithm

– Reduces message cost for Belief Propagation

– Considers load balance

• Several scenarios (Distributed Knowledge Base)

• First evaluation looks promising

• Intermediate steps are still missing

– Learning of Bayesian network

© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel February 19, 2007

Thank you!

Questions?