Applications of Information Geometry to Hypothesis Testing ...math.bit.edu.cn/docs/2016-07/20160731013703069844.pdf · 1. Principles of Information Geometry Information geometry is

CMCAA 2016CMCAA 2016

Applications of Information Geometry

to Hypothesis Testing and

Signal Detection

Applications of Information Geometry

to Hypothesis Testing and

Signal Detection

Yongqiang ChengYongqiang Cheng

National University of Defense TechnologyNational University of Defense Technology

July 2016July 2016

1. Principles of Information Geometry

2. Geometry of Hypothesis Testing

3. Matrix CFAR Detection on Manifold of

Symmetric Positive-Definite Matrices

4. Geometry of Matrix CFAR Detection

OutlineOutline

3

1. Principles of Information Geometry1. Principles of Information Geometry

Important problems in statisticsDistribution (likelihood):where

is a vector of data

is a vector of unknowns

1. How much does the data tell about the unknown ?

2. How good is an estimator ?

3. How to measure difference between two distributions?

4. How about the structure of a statistical model specified by a family of distributions?

( | )p x θ

xθ

x θ

θ̂

4


Data processing Statistics Information Geometry

Data Distributions Manifold

What is information geometry?

5


Information geometry is the study of intrinsic properties of manifolds of probability distributions by way of differential geometry.

The main tenet of information geometry is that many important structures in information theory and statistics can be treated as structures in differential geometry by regarding a space of probabilities as a differentiable manifold endowed with a Riemannian metric and a family of affine connections.

Information Geometry

Information Theory

StatisticsProbability

Theory

Differential Geometry

RiemannianGeometry

Systems Theory

Physics

Relationships with other subjects

6


Statistical manifold

Riemannian metric

Affine connections

Distance and geodesic

Curvatures

, , ( | ) | ,nG p R x θ x θ

log logi j

p pG E θ

θ θ

2 ( ) ( )Tij i jds g d d d d θ θ G θ θ

l l l l s l sijk ik ij js ik ks ijj k

s

Rx x

1( ) ( , ) ( , ) ( , ) ( , ) ( , )2

jim j i m j i mE l l E l l l θ θθ x θ x θ x θ x θ x θ

OutlineOutline






8

2. Geometry of Hypothesis Testing 2. Geometry of Hypothesis Testing

0( | )P x

1( | )P x

x

1）Start from target detection

-30 -20 -10 0 10 20 30 -30 -20 -10 0 10 20 300

10

20

30

40

50

60

70

80

90

100

Hypothesis testing

9

• PrinciplesMake division of observation space

• Basic methodThe detector decides if the likelihoodratio exceeds a threshold

• Essentials of signal detectionDiscrimination between two identically distributed distributions with different parameters.

1

0

( | )( )( | )

pLp

xxx

0( | )P x

1( | )P x

x

1

0

( | )( )( | )

pLp

xxx

Likelihood ratio test

2）Likelihood ratio test

1


10

A B

C D

B

A( , )Fd A B

D

C

( , )Fd C D

Family�of�distributions Statistical�manifold

2

2

1; , exp22

xp x

2, ; 0H R

Consider hypothesis testing from a geometric viewpoint


3）Geometric interpretation of hypothesis testing

11


• Equivalence between LRT and Kullback-Leibler divergenceSuppose are i.i.d. observations from a distribution ,

and there are two models (hypotheses) for , denoted by

and . Then, the likelihood ratio is

• Error exponent:

• Stein’s lemma:

1

0

1

1 0

( )( )

Ni

i i

p xLp x

1 2, , , Nx x x ( )q xN0 ( )p x

1( )p x

( )q x

1

00 1( || ) ( || ) 1D q p D q

Np

KLD

Minimum distance detector1lim log MN

K PN

0 1( || )K D p p 2 NKMP


( )( || ) ( ) ln( )

q xD q p q x dxp x

12



X

1d

0( | )p x

S

( | )p x θ

1( | )p x θ0( | )p x θ

x

0d

1( | )p x

1

00 1( || ) ( || ) 1D q p D q

Np

Minimum distance detector

The problem of hypothesis testing can be regarded as a discrimination problem where the decision is made by comparing distances from the signal distribution estimates to two hypotheses in the sense of the KL divergence, i.e., selecting the model that is “closer” to signal distribution estimates.

OutlineOutline






14

1）Constant false alarm rate detector

3. Matrix CFAR Detection3. Matrix CFAR Detection

samples

targets

detection�cell ba

c

1x 2Nx2 1Nx Dx 2Nx 2 1Nx Nx

Arithmetic mean

Decision

Threshold

01

……

……

……

……

0: target absent1: target present

Classical CFAR detectorDecision by comparing the

content of the cell under

test with an adaptive

threshold given by the

arithmetic mean of the

reference cells to achieve

the desired constant

probability of false alarm.

15

2）Matrix CFAR detector


1R

2R iR

1iR

NR

RDR

In 2008, F. Barbaresco proposed a generalized CFAR technique based on the manifold of symmetric positive definite (SPD) matrices.It has been proved that the Riemannian distance-based detector has better detection performance than the classical CFAR detector.

1R iR 1iR NR

R R

16



• Riemannian distance between two SPD matrices

• Riemannian center of N SPD matrices

where p=1, denotes the median; p=2, is the mean.

• The matrix CFAR detector

22 1 2 1 2 2

1 2 1 2 11

, ln lnn

kk

d

R R R R R

1

arg min ,N

pi i

iw d

RR R R

R R

, id R R

17



Initial spectra of measurements Mean spectra of measurements

Intensity Classical detector Geometric detector

18

3）Robust matrix CFAR detector


• Two shortcomings of the Riemannian distance based matrix CFAR detectora) Computational cost is expensive for exponential

operations in the calculation of Riemannian distance and its average;

b) Riemannian mean and median are not robust to outlier.

19



• Symmetrized Kullback-Leibler (sKL) divergence based matrix CFAR detector

• Total Kullback-Leibler (tKL) divergence based matrix CFAR detector

CUT

DivergenceComputation >Threshold

1R iR i+1R NR

sKL mean, sKL median, tKL t center

Covariance Matrix R

Sample Data

20



• sKL divergence between two SPD matrices

• sKL mean of N SPD matrices

• sKL median of N SPD matrices

1 11 2 1 2 2 1, tr( 2 ) sKL R R R R R R I

1 211

=1 =1

1 1=

N N

i ki kN N

R R R

1 211

11 1, ,

N Nji

ki jk i k jsKL sKL

RRR

R R R R

21



• The tKL divergence is a special case of the total Bregman divergence tBD, which is invariant to linear transformation.

BD ,x y

tBD ,x y BD ,x y

tBD ,x y

( , ) ,BD x y f x f y x y f y

2

,( , )

1

f x f y x y f ytBD x y

f y

More robust

Presenter

Presentation Notes

Convex and differentiable function

22



• tKL divergence between two SPD matrices

• tKL center of N SPD matrices

1 11 2 2 1

1 2 22

2

log det,

log det 1 log 22 log det

4 2

tr ntKL

nc

R R R RR R

RR

11

12

,

log det 1 log 2where 2 log det

4 2

ii i i

i jj

ii i

w w

nc

R R

RR

inversely proportional to the value of divergence gradient, which is robust to outliers

Presenter

Presentation Notes

the weight is inversely proportional to the value of the gradient of the convex function in defining the divergence. As noise data and outliers have greater magnitude of the gradient, the t center is robust to clutter.

23



• Comparisons of dissimilarity measures between Riemannian distance, sKL divergence and tKL divergence

The signal-to-clutter ratio (SCR) is significantly improved by the mapping of tKL divergence.

24



• Comparisons of detection performance between Riemannian distance, sKL divergence and tKL divergence

The tKL divergence based matrix CFAR detector has better performance.

25



Algorithm Time (s)Riemannian mean detector 29.74

Riemannian median detector 41.66

sKL mean detector 0.09

sKL median detector 2.81

tKL t center detector 0.15

Table IThe time taken by different algorithms

OutlineOutline






27

Classical CFAR detectorEuclidean spaceEuclidean distance measure

Matrix CFAR detectorMatrix manifoldRiemannian distance measureKL divergence, etc.

A good detector should

Properly characterize the intrinsic structure of the measurement space

Maximize the divergence between two hypotheses (clusters)

4. Geometry of Matrix CFAR Detection4. Geometry of Matrix CFAR Detection

1R

2R iR

1iR

NR

RDR

1R NR

1iR

iR2RRDR

28

Other divergences which have better performance to measure the dissimilarity between distributions

Better approaches for clustering the distributions

Detectors for heavy clutters

Detectors for nonstationary clutters

Detectors for few samples

4. Geometry of Matrix CFAR Detection4. Geometry of Matrix CFAR Detection

Future work

Thank you for your

attention！

Thank you for your

attention！

Documents

Applications of Information Geometry to Hypothesis Testing ...math.bit.edu.cn/docs/2016-07/20160731013703069844.pdf · 1. Principles of Information Geometry Information geometry is