42
Information Theoretic Approaches to data Association and Fusion in Sensor Networks John Fisher, Alexander Ihler, Jason Williams , Alan Willsky MIT CSAIL/LIDS Haixiao Cai, Sanjeev Kulkarni, Sergio Verdu Princeton University SensorWeb MURI Review Meeting September 22, 2003

Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Embed Size (px)

DESCRIPTION

Information Theoretic Approaches to data Association and Fusion in Sensor Networks. John Fisher, Alexander Ihler, Jason Williams , Alan Willsky MIT CSAIL/LIDS Haixiao Cai, Sanjeev Kulkarni, Sergio Verdu Princeton University SensorWeb MURI Review Meeting September 22, 2003. - PowerPoint PPT Presentation

Citation preview

Page 1: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Information Theoretic Approaches to data Association and Fusion in Sensor Networks

John Fisher, Alexander Ihler, Jason Williams , Alan Willsky

MIT CSAIL/LIDS

Haixiao Cai, Sanjeev Kulkarni, Sergio VerduPrinceton University

SensorWeb MURI Review Meeting September 22, 2003

Page 2: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Problem/Motivation

Large number of simple, myopic sensors. Need to perform local fusion to support

global inference (Battlespace Awareness).

Critical need to understand statistical relationships between sensor outputs in the face of many modes of uncertainty (sensors, scene, geometry, etc).

Page 3: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Challenges

Uncertainty in scene and sensor geometry

Complex, dynamic environment Uncalibrated, multi-modal sensors Unknown joint sensor statistics Need fast, low-complexity algorithms

Page 4: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Activity and Accomplishments

Research Application of data association method to multi-

modal (A/V) correspondence problem. A/V is a surrogate for other modalities primarily because

we can easily collect this data (vs. IR, EM, etc.). Extensions and empirical results to multi-modal

feature-aided tracking. Generalization of data association to triangulated

graphs. Improved K-L Divergence/MI estimators. New developments on applied information-theoretic

sensor management.

Page 5: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Activity and Accomplishments Tech Transition

ARL visits Student (Ihler) on-site at ARL Plans to transition Data Association method to DARPA’s

CTS program (Ft. Belvoir installation) Publications

4 conference publications IPSN (2) ICME (invited) ICASSP (invited)

1 journal submission accepted pending 2nd review

3 Sensor Network workshop panels ARO, NSF, SAMSI

Page 6: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

A Common Thread Fusion and correspondence are difficult

given the types of sensor uncertainties we are facing.

Various information theoretic measures and the need to estimate them arise naturally in such problems.

Exploiting sensor data subject to a common excitation provides a mechanism for estimating such quantities.

Page 7: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Overview

Estimating Information Theoretic Measures from Sensor Data (MIT, Princeton)

Applications Data Association, Multi-modal Tracking,

Inferring Group Interactions, Sensor Management

Future Directions Information driven sensor fusion

Page 8: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Data Association (last year)

Measurements: Separated signals Direction of arrival

1 signal/2 sensors Localize

>2 signals, 2 sensors Ambiguous Sensor A

A1

Sensor BB1A2

B2

Page 9: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Association as a Hypothesis Test

Assuming independent sources, hypotheses are of the form

Asymptotic comparison of known models to those estimated from a single realization

1 1

2 2

1 2 1 2 1 1 2 21 1 1 2 2

1 2 1 2 1 2 2 12 1 2 2 1

, , , , ,: ,

, , , , ,: ,H Hk

H Hk

A A B B p A B p A BH A B A B

A A B B p A B p A BH A B A B

1 1

2 2

1 1 2 2 1 1 2 2

1 2 2 1 1 2 2 1

ˆ ˆ, , , ,

ˆ ˆ, , , ,H H

H H

p A B p A B p A B p A B

p A B p A B p A B p A B

Page 10: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Asymptotics of Likelihood Ratio

Decomposes into two sets of terms: Statistical

dependencies (groupings)

Differences in model parameterizations

1 1

2 2

1

1 1 1 1 2

2

1 1 2 2

1 2 2 1

1 1 2 2

1 2 1 2 1 2 1 2

1 2 1 2

statistical dependence

model differences

s

, ,1log log

, ,

; ;

, ,

tatist

,

;

ca

;

i

H Hk k

k H Hk k

H

H H H H H

H

p A B p A BL

N p A B p A B

I A B I A B

D p A p A p B p B p A A B B

I A B I A B

2 2 2 2 11 2 1 2 1 2 1 2, , ,

l dependence

model differences

H H H H HD p A p A p B p B p A A B B

Page 11: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Asymptotics of Likelihood Ratio

If we estimate from a single realization: Statistical

dependence terms remain

Model divergences go away

1

1 1 1 1 2

2

1 1 2 2

1 2 2 1

1 1 2 2

1 2 1 2 1 2 1 2

1 2 1 2

statistical dependence

model differences

s

ˆ ˆ, ,1log log

ˆ ˆ, ,

; ;

, , ,

; ;

tatistical de

k k

k k k

H

H H H H H

H

p A B p A BL

N p A B p A B

I A B I A B

D p A p A p B p B p A A B B

I A B I A B

2 2 2 2 11 2 1 2 1 2 1 2, ,

pendence

model differen s

,

ce

H H H H HD p A p A p B p B p A A B B

Page 12: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

High Dimensional Data

Learn low-dimensional auxiliary variables which summarize statistical dependency of measurements

1

2

1 2 3 4

1 2 3 4

1 2 3 4 1 1 2 2

1 2 3 4 1 2 1 2

ˆ ˆ, ,1log , log

ˆ ˆ, ,

1log ; ; ; ;

1log ; ; ; ;

lim

lim

k k

k k k

N

H

N

H

p f f p f fL f g

N p g g p g g

L I f f I f f I A B I A BN

L I g g I g g I A B I A BN

1 1 1 2 2 1 3 3 2 4 4 2

1 1 1 2 2 2 3 3 2 4 4 1

k k k k k k k k

k k k k k k k k

f f A f f B f f A f f B

g g A g g B g g A g g B

' , '

1arg max log ,

f s g sL f g

N

Page 13: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

AV Association/Correspondence New since last year,

direct application of the 2 sensors/multiple source case Unknown joint

statistics High-dimensional data Varying scene

parameters Surrogate for multi-

modal sensors

consistent

inconsistent

Page 14: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

AV Association/Correspondence

0.68

association matrix for 8 subjects

0.61

0.19 0.20

Page 15: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

AV Association/Correspondence

0.68

association matrix for 8 subjects

0.61

0.19 0.20

Page 16: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

General Structure Tests Generalization to hypothesis tests over graphical

structures How are observations related to each other?

11

: where , , andi

i

Mi i i i

i H j j d j j kj j

H p S S x x S S

vs vs1x

2x

3x

4x

5x

6x

1x

2x

3x

4x

5x

6x

1x

2x

3x

4x

5x

6x

Page 17: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

General Structure Tests

1 1 11 1 4 2 2 3 3 5 6, , ,S x x S x x S x x

vs1x

2x

3x

4x

5x

6x

H1

1x

2x

3x

4x

5x

6x

H2

2 2 21 1 3 4 2 2 3 5 6, , ,S x x x S x S x x

12 1 2 ,jk j kS S S j k

12 12 1211 1 4 12 1312 12 1221 3 22 2 2312 12 1231 32 33 5 6

,

,

S x x S S

S x S x S

S S S x x

Intersection Sets - groupings on which the hypotheses agree

Page 18: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

General Structure Tests

Asymptotics have a similar decomposition as in the 2-variable case (via the intersection sets):

1 1

1 2

1 1 1 2

1

2 2

2

1 12, ,,

12 2, ,,

1 12 12 2

, ,

2

statistical dep

1 1log lo

endence model differences

gH j t H jk tj j k

t H jk t H j tj k j

H j H jk H jk H jj j k j k jH

H j HH

p S p SL

N N p S p S

D p S p S D p S p S

D p S p

2 1

12 12 1

, ,

statistical dependence model differences

jk H jk H jj j k j k jS D p S p S

Page 19: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

General Structure Tests Extension of the previous work on data

association is straightforward for such tests. Estimation from a single realization incurs a

reduction in separability only in terms of the model difference terms.

The “curse of dimensionality” (with respect to density estimation) arises in 2 ways: Individual measurements may be of high dimension

Could still design low dimensional auxiliary variables The number of variables in a group

New results provide a solution

Page 20: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

General Structure Tests

The test implies potentially 6 joint densities, but is simplified by looking at the intersection sets.

1x

2x

3x

4x

5x

6x

H2

1 1 1

1

2 2 2

2

1 4 2 3 5 6

2 1 3 4 5 6

2 3 2 3

1 3 4 3 1 4

ˆ ˆ ˆ, , ,1 1log log

ˆ ˆ ˆ, , ,

,

, , ,

t

H H HH

H H HH

p x x p x x p x xL

N N p x p x x x p x x

D p x x p x p x

D p x x x p x p x x

1x

2x

3x

4x

5x

6x

H1

Page 21: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

General Structure Tests

High dimensional variables learning auxiliary

variables reduces dimensionality in one aspect.

2 3 2 3

2 3 2 3

1 3 4 1 3 4

1 3 4 1 3 4

ˆ ˆ ˆ,

ˆ ˆ ˆ,

ˆ ˆ ˆ, , ,

ˆ ˆ ˆ, , ,

D p x x p x p x

D p f f p f p f

D p x x x p x p x x

D p g g g p g p g g

1 1 1

1

2 2 2

2

1 4 2 3 5 6

2 1 3 4 5 6

2 3 2 3

1 3 4 3 1 4

ˆ ˆ ˆ, , ,1 1log log

ˆ ˆ ˆ, , ,

,

, , ,

t

H H HH

H H HH

p x x p x x p x xL

N N p x p x x x p x x

D p x x p x p x

D p x x x p x p x x

But we would still have to estimate a 3 dimensional density.

This only gets worse with larger groupings.

Page 22: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

K-L Divergence with Permutations Simple idea which mitigates many of the

dimensionality issues. Exploits the fact that the structures are distinguished

by their groupings of variables. Key Ideas:

1. Permuting sample order between groupings maintains the statistical dependency structure.

2. D(X||Y) >= D(f(X)||f(Y)) This has the advantage that we can design a single

(possibly vector-valued) function of all variables rather than one function for each variable.

Currently doing comparitive analysis (bias, variance) with previous approach.

Page 23: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

kk

K-L Divergence with Permutations

1x 2x 3x

1x 2x 3xf

1 2 3 1 2 3 1 2 3ˆ , , : , , , ,k

p f x x x x x x p x x x

1 2 3 1 2 3 1 2 3ˆ , , : , ,k

p f x x x x x x p x p x p x

1 2 3 1 2 3, ,D p x x x p x p x p x

Page 24: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

More General Structures

Analysis has been extended to comparisons between triangulated graphs.

Can be expressed as sums and differences of product terms.

Admits a wide class of Markov processes.

Page 25: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Modeling Group Interactions Object 3 tries to interpose itself between objects 1

and 2. The graph describes the state (position)

dependency structure.

1,tx

2,tx

3,tx

1, 1tx

2, 1tx

3, 1tx

Page 26: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Modeling Group Interactions

1,tx

2,tx

3,tx

1, 1tx

2, 1tx

3, 1tx

1,tx

2,tx

3,tx

1, 1tx

2, 1tx

3, 1tx

1,tx

2,tx

3,tx

1, 1tx

2, 1tx

3, 1tx

1,tx

2,tx

3,tx

1, 1tx

2, 1tx

3, 1tx

1H 2H 3H

1 2vsH H 2 3vsH H

Page 27: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Previous Work and Current Efforts (Princeton)• Developed fast algorithms based on block sorting for

entropy and divergence estimation for discrete sources.• Simulations and text data show excellent results.

• Have provided analysis of methods showing universal consistency.

• Have recently investigated estimation of mutual information.

• Have recently been investigating performance for hidden Markov sources.

• Currently analyzing performance for hidden Markov sources.• Investigating extensions to continuous alphabet sources.• Applications to various types of data.

Page 28: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

A “Distilled” Problem The Problem: How to estimate the

entropy, divergence, and mutual information of two sources based only on one realization from each source ?

Assumption: Both are finite-alphabet,

finite- memory, stationary sources.

Our goal: Want good estimates, fast convergence, and reasonable computational complexity.

Page 29: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Two Approaches to Estimating Mutual Information

• Estimate mutual information via entropy:

I(X;Y) = H(X) + H(Y) – H(X,Y).

• Estimating mutual information via divergence:

I(X;Y)= D(pxy||pxpy).

• We use our entropy and divergence estimators via Burrows-Wheeler Block Sorting Transform.

Page 30: Information Theoretic Approaches to data Association and Fusion in Sensor Networks
Page 31: Information Theoretic Approaches to data Association and Fusion in Sensor Networks
Page 32: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Estimating Mutual Information

Analysis and simulations shows that both approaches converge to the true value.

Entropy approach appears better than the divergence approach.

Divergence approach does not use the fact that the second distribution pxpy is a product of two marginal distributions.

Page 33: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Hidden Markov Processes X is the underlying Markov Chain. Y is a deterministic mapping of X, or Y

is X observed through a Discrete Memoryless Channel.

Then, Y is a Hidden Markov Process.

Useful in a wide range of applications.

Page 34: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Entropy of HMP

In order to get the mutual information of the input and output of a DMC, we need the entropy of the output, which is a HMP if the input is Markov.

The entropy of HMP can be approximated by an upper bound and a lower bound.

These bounds can be calculated recursively.

)|()(),|( 111

11

dd

dd YYHYHXYYH

)|(lim)(),|(lim 111

11

d

dd

dd

dYYHYHXYYH

Page 35: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Estimating entropy of HMP

Page 36: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

MSE of Our Estimators The MSE of our entropy estimator for i.i.d. sources satisfies

The MSE of our mutual information estimator for i.i.d. sources

We have convergence results for divergence estimator and for Markov sources and stationary ergodic sources.

.1

)](var[log1

]))()(ˆ[(22

2

n

OZqn

qHqHE n

.1

)()(

),(logvar

1]));();(ˆ[(

222

nO

YqXq

YXq

nYXIYXIE

yx

xyn

Page 37: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

MSE of Entropy Estimator for HMP

We can prove H(Yd|Yd-1,…,Y1) converges to H(Y) exponentially fast w.r.t. d, if the Hidden Markov Process’ mapping satisfies that there exists an , such that for exactly one

We want to further establish the convergence rate of our entropy estimator for HMP.

Ya ai )(.Xi

Page 38: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Association vs the Generative Model

The MI fusion approach is equivalent to learning a latent variable model of the audio video measurements. Random variables:

Parameters, appearance bases:

Simultaneously learn statistics of joint audio/video variables and parameters

as the statistic of association (consistent with the theory)

vv v av vkk kav

k

aa a av akk kav

k

Y n

Y n

vk

ak

avk

avk

vkY

akY

, , , , ,v av v a av an n

, , ,v av a av

;av avI

Page 39: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Incorporating Motion Parameters Extension of multi-modal

fusion to include nuisance parameters Audio is an indirect

pointer to the object of interest.

Combine motion model (nuisance parameters) with audio-video appearance model.

vv v av vkk k kav

k

aa a av akk kav

k

Y T n

Y n

kT

vk

ak

avk

avk

vkY

akY

Page 40: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Incorporating Motion Parameters

example frames average image

without motionmodel

with motionmodel

Page 41: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Information Theoretic Sensor Management

Following Zhao, Shin Reich (2002), Chu, Haussecker, Zhao (2002), Ertin, Fisher, Potter (2003) we’ve started extending IT approaches to sensor management.

Specifically, consider the case where a subset of measurements over time has been incorporated into the belief state. When is it better to incorporate

a measurement from the past versus a new measurement?

How can we efficiently choose a set of measurements (avoid the greedy approach)?

kx1kx 2kx 3kx k Mx

0k Mz

1k Mz

Nk Mz

03kz

13kz

3Nkz

02kz

12kz

2Nkz

01kz

11kz

1Nkz

0kz

1kz

Nkz

Page 42: Information Theoretic Approaches to data Association and Fusion in Sensor Networks

Summary Applied association method to multi-modal data New MI/K-L divergence estimators based on

permutation approach Mitigates dimensionality issues, avoids some of the

combinatorics. Extended approach to triangulated graphs.

New estimators for information measures (entropy, divergence, mutual information) based on BWT (block sorting). Doesn’t require knowledge of distribution or parameters

of the sources. Efficient algorithm, good estimates, fast convergence. Significantly outperforms other algorithms tested. Investigating use in several applications including as

component for correspondence and fusion algorithms.