Download pptx - Multiple Instance Hidden Markov Model: Application to Landmine Detection in GPR Data Jeremy Bolton, Seniha Yuksel, Paul Gader CSI Laboratory University

Multiple Instance Hidden Markov Model: Application to Landmine Detection in GPR

Data

Jeremy Bolton, Seniha Yuksel, Paul Gader

CSI Laboratory University of Florida

2/31

CSI Laboratory

2010

Highlights• Hidden Markov Models (HMMs) are

useful tools for landmine detection in GPR imagery

• Explicitly incorporating the Multiple Instance Learning (MIL) paradigm in HMM learning is intuitive and effective

• Classification performance is improved when using the MI-HMM over a standard HMM

• Results further support the idea that explicitly accounting for the MI scenario may lead to improved learning under class label uncertainty

3/31

CSI Laboratory

2010

Outline

I. HMMs for Landmine detection in GPRI. DataII. Feature ExtractionIII.Training

II. MIL Scenario

III.MI-HMM

IV.Classification Results

HMMs for landmine detection

5/31

CSI Laboratory

2010

GPR Data• GPR data

– 3d image cube• Dt, xt, depth

– Subsurface objects are observed as hyperbolas

6/31

CSI Laboratory

2010

GPR Data Feature Extraction• Many features extracted from in GPR data

measure the occurrence of an “edge” – For the typical HMM algorithm (Gader et al.),

• Preprocessing techniques are used to emphasize edges

• Image morphology and structuring elements can be used to extract edges

Image Preprocessed Edge Extraction

7/31

CSI Laboratory

2010

4-d Edge Features

Edge Extraction

8/31

CSI Laboratory

2010

Concept behind the HMM for GPR

• Using the extracted features (an observation sequence when scanning from left to right in an image) we will attempt to estimate some hidden states

9/31

CSI Laboratory

2010

Concept behind the HMM for GPR

10/31

CSI Laboratory

2010

HMM Features• Current AIM viewer by Smock

Image Feature Image

Rising Edge Feature

Falling Edge Feature

11/31

CSI Laboratory

2010

Sampling HMM Summary• Feature Calculation

– Dimensions (Not always relevant whether positive or negative diagonal is observed …. Just simply a diagonal is observed)

• HMMSamp: 2d– Down sampling depth

• HMMSamp: 4

• HMM Models– Number of States

• HMMSamp : 4– Gaussian components per state (Fewer total components

for probability calculation)• HMMSamp : 1 (recent observation)

12/31

CSI Laboratory

2010

Training the HMM• Xuping Zhang proposed a Gibbs Sampling algorithm for

HMM learning– But, given an image(s) how do we choose the training

sequences?– Which sequence(s) do we choose from each image?

• There is an inherent problem in many image analysis settings due to class label uncertainty per sequence

• That is, each image has a class label associated with it, but each image has multiple instances of samples or sequences. Which sample(s) is truly indicative of the target?– Using standard training techniques this translates to

identifying the optimal training set within a set of sequences– If an image has N sequences this translates to a search of 2N

possibilities

13/31

CSI Laboratory

2010

Training Sample Selection Heuristic

• Currently, an MRF approach (Collins et al.) is used to bound the search to a localized area within the image rather than search all sequences within the image.– Reduces search space,

but multiple instance problem still exists

TM46-MB @ 1"

0 10 20 30 40-9000

-8000

-7000

-6000

-5000

-4000%Change in LL: 0.0017

H0 Segmentation H1 Segmentation

Original Data

20 40 60

50

100

150

200

250

Data + Bounding Box

20 40 60

50

100

150

200

250

Multiple Instance Learning

15/31

CSI Laboratory

2010

Standard Learning vs. Multiple Instance Learning

• Standard supervised learning– Optimize some model (or learn a target concept) given

training samples and corresponding labels

• MIL– Learn a target concept given multiple sets of samples

and corresponding labels for the sets.– Interpretation: Learning with uncertain labels / noisy

teacher

},...,{},,...,{ 11 nn yyYxxX

?}?,...,{,1},,...,{ 11 ii iniiinii yyYxxX

16/31

CSI Laboratory

2010

Multiple Instance Learning (MIL)

• Given: – Set of I bags

– Labeled + or -

– The ith bag is a set of Ji samples in some feature space

– Interpretation of labels

• Goal: learn concept– What characteristic is common to the positive bags that

is not observed in the negative bags

},...,,,..{ 11

Iii BBBBB

},...,{ 1 iiJii xxB

1)(: iji xlabeljB

0)(, iji xlabeljB

17/31

CSI Laboratory

2010

Standard learning doesn’t always fit: GPR Example

• Standard Learning– Each training sample

(feature vector) must have a label

– But which ones and how many compose the optimal training set?

• Arduous task: many feature vectors per image and multiple images

• Difficult to label given GPR echoes, ground truthing errors, etc …

• Label of each vector may not be known

EHD: Feature Vector

1x?1 y2y3y?4 y

ny

2x

3x

4x

nx

18/31

CSI Laboratory

2010

POSITIVE BAGS(Each bag is an image)

Learning from Bags• In MIL, a label is attached to a set of

samples. • A bag is a set of samples• A sample within a bag is called an

instance. • A bag is labeled as positive if and only if

at least one of its instances is positive.NEGATIVE BAGS(Each bag is an image)

18

19/31

CSI Laboratory

2010

MI Learning: GPR Example

• Multiple Instance Learning– Each training bag

must have a label

– No need to label all feature vectors, just identify images (bags) where targets are present

– Implicitly accounts for class label uncertainty …

154321 ,...,,,, xxxxxY

EHD: Feature Vector

Multiple Instance Learning HMM: MI-HMM

21/31

CSI Laboratory

2010

MI-HMM• In MI-HMM, instances are

sequences

NEGATIVE BAGS

POSITIVE BAGS

Direction of movement

21

22/31

CSI Laboratory

2010

MI-HMM• Assuming independence between the bags

and assuming the Noisy-OR (Pearl) relationship between the sequences within each bag

• where

23/31

CSI Laboratory

2010

MI-HMM learning• Due to the cumbersome nature of

the noisy-OR, the parameters of the HMM are learned using Metropolis – Hastings sampling.

23

24/31

CSI Laboratory

2010

Sampling• HMM parameters are sampled from Dirichlet

• A new state is accepted or rejected based on the ratio r at iteration t + 1

• where P is the noisy-or model. 24

25/31

CSI Laboratory

2010

Discrete Observations• Note that since we have chosen a Metropolis

Hastings sampling scheme using Dirichlets, our observations must be discretized.

2 4 6 8 10 12 14

10

20

30

40

50

60

70

80

902

4

6

8

10

12

14

16

26/31

CSI Laboratory

2010

MI-HMM Summary• Feature Calculation

– Dimensions• HMMSamp: 2d• MI-HMM: 2d features are descretized into 16 symbols

– Down sampling depth• HMMSamp: 4 • MI-HMM: 4

• HMM Models– Number of States

• HMMSamp : 4• MI-HMM: 4

– Components per state (Fewer total components for probability calculation)

• HMMSamp : 1 Gaussian• MI-HMM: Discrete mixture over 16 symbols

Classification Results

28/31

CSI Laboratory

2010

MI-HMM vs Sampling HMM• Small Millbrook HMM Samp (12,000)

MI-HMM (100)

29/31

CSI Laboratory

2010

What’s the deal with HMM Samp?

Concluding Remarks

31/31

CSI Laboratory

2010

Concluding Remarks• Explicitly incorporating the Multiple

Instance Learning (MIL) paradigm in HMM learning is intuitive and effective

• Classification performance is improved when using the MI-HMM over a standard HMM– More effective and efficient

• Future Work– Construct bags without using MRF heuristic– Apply to EMI data: spatial uncertainty

Back up Slides

33/31

CSI Laboratory

2010

34/31

CSI Laboratory

2010

Standard Learning vs. Multiple Instance Learning

• Standard supervised learning– Optimize some model (or learn a target concept) given

training samples and corresponding labels

• MIL– Learn a target concept given multiple sets of samples

and corresponding labels for the sets.– Interpretation: Learning with uncertain labels / noisy

teacher

},...,{},,...,{ 11 nn yyYxxX

?}?,...,{,1},,...,{ 11 ii iniiinii yyYxxX

35/31

CSI Laboratory

2010

Multiple Instance Learning (MIL)

• Given: – Set of I bags

– Labeled + or -

– The ith bag is a set of Ji samples in some feature space

– Interpretation of labels

• Goal: learn concept– What characteristic is common to the positive bags that

is not observed in the negative bags

},...,,,..{ 11

Iii BBBBB

},...,{ 1 iiJii xxB

1)(: iji xlabeljB

0)(, iji xlabeljB

36/31

CSI Laboratory

2010

MIL Application: Example GPR

• Collaboration: Frigui, Collins, Torrione

• Construction of bags– Collect 15 EHD

feature vectors from the 15 depth bins

– Mine images = + bags

– FA images = - bags 154321 ,...,,,, xxxxx

EHD: Feature Vector

37/31

CSI Laboratory

2010

Standard vs. MI Learning: GPR Example

• Standard Learning– Each training sample

(feature vector) must have a label

• Arduous task – many feature vectors

per image and multiple images

– difficult to label given GPR echoes, ground truthing errors, etc …

– label of each vector may not be known

EHD: Feature Vector

1x1y2y3y4y

ny

2x

3x

4x

nx

38/31

CSI Laboratory

2010

Standard vs MI Learning: GPR Example

• Multiple Instance Learning– Each training bag

must have a label

– No need to label all feature vectors, just identify images (bags) where targets are present

– Implicitly accounts for class label uncertainty …

154321 ,...,,,, xxxxxY

EHD: Feature Vector

Random Set Framework for Multiple Instance Learning

40/31

CSI Laboratory

2010

Random Set Brief

• Random Set

)(R)(R, B

))(,( B

)),(,( PB R

)),(,( PB

41/31

CSI Laboratory

2010

How can we use Random Sets for MIL?

• Random set for MIL: Bags are sets

– Idea of finding commonality of positive bags inherent in random set formulation

• Sets have an empty intersection or non-empty intersection relationship

• Find commonality using intersection operator• Random sets governing functional is based on intersection operator

– Capacity functional : T

It is NOT the case that EACH element is NOT the

target concept

Xx

xTXT )(11)(

},...,{ 1 nxxX

A.K.A. : Noisy-OR gate (Pearl 1988)

42/31

CSI Laboratory

2010

Random Set Functionals• Capacity functionals for intersection calculation

• Use germ and grain model to model random set– Multiple (J) Concepts

– Calculate probability of intersection given X and germ and grain pairs:

– Grains are governed by random radii with assumed cumulative:

)()( XTXP

J

jjj

1

)}({

jj

jjTj

jjjj xrrr

rRPrRPxTj

,)exp(1

22)(1)(})({

j Xx

xTXTj

)(11)(

Random Set model parameters

},{ Germ Grain

43/31

CSI Laboratory

2010

RSF-MIL: Germ and Grain Model

• Positive Bags = blue

• Negative Bags = orange

• Distinct shapes = distinct bags

x

x

x

x

x x

x

x

x

TT

T

T

T

Multiple Instance Learning with Multiple Concepts

45/31

CSI Laboratory

2010

Multiple Concepts: Disjunction or Conjunction?

• Disjunction– When you have multiple types of concepts– When each instance can indicate the presence

of a target• Conjunction

– When you have a target type that is composed of multiple (necessary concepts)

– When each instance can indicate a concept, but not necessary the composite target type

46/31

CSI Laboratory

2010

Conjunctive RSF-MIL

• Previously Developed Disjunctive RSF-MIL (RSF-MIL-d)

• Conjunctive RSF-MIL (RSF-MIL-c)

j Xx

xTXTj

)(11)(

j Xx

xTXTj

)(11)(

Standard noisy-OR for one concept j

Noisy-AND combination across concepts

Noisy-OR combination across concepts and samples

47/31

CSI Laboratory

2010

Synthetic Data Experiments

• Extreme Conjunct data set requires that a target bag exhibits two distinct concepts rather than one or none

AUC (AUC when initialized near solution)

Application to Remote Sensing

49/31

CSI Laboratory

2010

Disjunctive Target Concepts

Target Concept Type 1

NoisyOR

…

NoisyOR

Target Concept Type 2

Target Concept Type n

NoisyOR

OR

Target Concept Present?

• Using Large overlapping bins (GROSS Extraction) the target concept can be encapsulated within 1 instance: Therefore a disjunctive relationship exists

50/31

CSI Laboratory

2010

What if we want features with finer granularity

• Fine Extraction– More detail about image and more

shape information, but may loose disjunctive nature between (multiple) instances

…

NoisyOR

NoisyOR

AND

Target Concept Present?

Constituent Concept 1

(top of hyperbola)

Constituent Concept 2(wings of

hyperbola)

Our features have more granularity, therefore our concepts

may be constituents of a target, rather than encapsulating the

target concept

51/31

CSI Laboratory

2010

GPR Experiments• Extensive GPR Data set

– ~800 targets– ~ 5,000 non-targets

• Experimental Design– Run RSF-MIL-d (disjunctive) and RSF-MIL-c

(conjunctive)– Compare both feature extraction methods

• Gross extraction: large enough to encompass target concept

• Fine extraction: Non-overlapping bins

• Hypothesis– RSF-MIL will perform well when using gross extraction

whereas RSF-MIL-c will perform well using Fine extraction

52/31

CSI Laboratory

2010

Experimental Results• Highlights

– RSF-MIL-d using gross extraction performed best – RSF-MIL-c performed better than RSF-MIL-d when

using fine extraction– Other influencing factors: optimization methods for

RSF-MIL-d and RSF-MIL-c are not the same

Gross Extraction

Fine Extraction

53/31

CSI Laboratory

2010

Future Work• Implement a general form that can learn

disjunction or conjunction relationship from the data

• Implement a general form that can learn the number of concepts

• Incorporate spatial information • Develop an improved optimization

scheme for RSF-MIL-C

54/31

CSI Laboratory

2010

55/31

CSI Laboratory

2010

HMM Model VisualizationDTXTHMM DTXTHMM

0 0.5 1 1.5 20

1

2

1 2 30

0.5

1Initial Probs Transition Probs

1 2 3

123

FallingDiagonal FallingDiagonal

Rising Diagonal Rising Diagonal

Points =

Gaussian Component meansPoints =

Gaussian Component means

Color =

State IndexColor =

State Index

State index1State index 2State index 3

Initial probabilitiesInitial probabilitiesTransition

probabilities from state to state (red =

high probability)

Transition probabilities from

state to state (red = high probability)

Pattern Characterized

56/31

CSI Laboratory

2010

57/31

CSI Laboratory

2010

58/31

CSI Laboratory

2010

59/31

CSI Laboratory

2010

60/31

CSI Laboratory

2010

61/31

CSI Laboratory

2010

62/31

CSI Laboratory

2010

Backup Slides

64/31

CSI Laboratory

2010

MIL Example (AHI Imagery)• Robust learning tool

– MIL tools can learn target signature with limited or incomplete ground truth

Which spectral signature(s) should we

use to train a target model or classifier?

1. Spectral mixing2. Background signal

3. Ground truth not exact

65/31

CSI Laboratory

2010

MI-RVM• Addition of set observations and

inference using noisy-OR to an RVM model

• Prior on the weight w

)exp(1

1)(

)(11)|1(1

zz

xwXyPK

jj

T

),0|()( 1 AwNwp

66/31

CSI Laboratory

2010

SVM review• Classifier structure

• Optimization

by )()( T xφwx

,0,1))((:

2

1min

2

,

iiiT

i

ii

bw

btist

C

xφw

w

67/31

CSI Laboratory

2010

MI-SVM Discussion• RVM was altered to fit MIL problem by

changing the form of the target variable’s posterior to model a noisy-OR gate.

• SVM can be altered to fit the MIL problem by changing how the margin is calculated– Boost the margin between the bag (rather

than samples) and decision surface– Look for the MI separating linear discriminant

• There is at least one sample from each bag in the half space

68/31

CSI Laboratory

2010

mi-SVM• Enforce MI scenario using extra

constraints

1:,1

,1:,12

1

Ii

IiI

i

TIt

TIt

}1,1{,0,1))((:

2

1minmin

2

,}{

iiiiT

i

ii

bwt

tbtist

Ci

xφw

w

Mixed integer program: Must find optimal hyperplane and optimal labeling

set

At least one sample in each positive bag must have a label

of 1.All samples in each negative bag must have a label of -1.

69/31

CSI Laboratory

2010

Current Applications

I. Multiple Instance LearningI. MI ProblemII. MI Applications

II.Multiple Instance Learning: Kernel MachinesI. MI-RVMII. MI-SVM

III. Current Applications I. GPR imageryII. HSI imagery

70/31

CSI Laboratory

2010

HSI: Target Spectra Learning• Given labeled areas of interest: learn

target signature• Given test areas of interest: classify

set of samples

71/31

CSI Laboratory

2010

Overview of MI-RVM Optimization

• Two step optimization1. Estimate optimal w, given posterior of

w• There is no closed form solution for the

parameters of the posterior, so a gradient update method is used

• Iterate until convergence. Then proceed to step 2.

2. Update parameter on prior of w• The distribution on the target variable has

no specific parameters.• Until system convergence, continue at step

1.

72/31

CSI Laboratory

2010

1) Optimization of w• Optimize posterior (Bayes’ Rule) of

w

• Update weights using Newton-Raphson method

)(log)|(logmaxargˆ wpwXpww

MAP

gww tt 11 H

73/31

CSI Laboratory

2010

2) Optimization of Prior• Optimization of covariance of prior

• Making a large number of assumptions, diagonal elements of A can be estimated

dwAwpwXpAXpAAA

)|()|(maxarg)|(maxargˆ

12

1

iii

newi Hwa

74/31

CSI Laboratory

2010

Random Sets: Multiple Instance Learning

• Random set framework for multiple instance learning– Bags are sets– Idea of finding commonality of positive bags

inherent in random set formulation• Find commonality using intersection operator• Random sets governing functional is based on

intersection operator

)()( KPKT

75/31

CSI Laboratory

2010

MI issues

• MIL approaches– Some approaches are biased to believe

only one sample in each bag caused the target concept

– Some approaches can only label bags– It is not clear whether anything is

gained over supervised approaches

76/31

CSI Laboratory

2010

RSF-MIL

• MIL-like • Positive

Bags = blue

• Negative Bags = orange

• Distinct shapes = distinct bags

x

x

x

x

x x

x

x

x

TT

T

T

T

77/31

CSI Laboratory

2010

Side Note: Bayesian Networks• Noisy-OR Assumption

– Bayesian Network representation of Noisy-OR

– Polytree: singly connected DAG

78/31

CSI Laboratory

2010

Side Note• Full Bayesian network may be intractable

– Occurrence of causal factors are rare (sparse co-occurrence)

• So assume polytree• So assume result has boolean relationship with causal

factors– Absorb I, X and A into one node, governed by

randomness of I• These assumptions greatly simplify inference calculation• Calculate Z based on probabilities rather than

constructing a distribution using X

j

jXZPXXXXZP )|1(11}),,,{|1( 4321

79/31

CSI Laboratory

2010

Diverse Density (DD)• Probabilistic Approach

– Goal:• Standard statistics approaches identify areas in a feature space

with high density of target samples and low density of non-target samples

• DD: identify areas in a feature space with a high “density” of samples from EACH of the postitive bags (“diverse”), and low density of samples from negative bags.

– Identify attributes or characteristics similar to positive bags, dissimilar with negative bags

– Assume t is a target characterization– Goal:

– Assuming the bags are conditionally independent

tBBBBP mnt

|,...,,,...,maxarg 11

jj

ii

ttBPtBP ||maxarg

80/31

CSI Laboratory

2010

Diverse Density

• Calculation (Noisy-OR Model):

• Optimization

j

iji BtPBtP )|(11)|( },...,{ 1 iiJii xxB

j

iji BtPBtP )|(1)|(

22

expexp)|( txtBBtP ijijij


target concept

jj

ii

ttBPtBP ||maxarg

81/31

CSI Laboratory

2010

Random Set Brief

• Random Set

)(R)(R, B

))(,( B

)),(,( PB R

)),(,( PB

82/31

CSI Laboratory

2010

Random Set Functionals• Capacity and avoidance

functionals

– Given a germ and grain model

– Assumed random radii

)()( KPKT

in

j

ijiji

1

)}({

ijijijij

Tij

ijijij

ij

xrrr

rRPrRP

xTxPij

,)exp(1

2)(1)(

})({)|}({

)()( KPKQ

)(1)( KQKT

83/31

CSI Laboratory

2010

When disjunction makes sense

• Using Large overlapping bins the target concept can be encapsulated within 1 instance: Therefore a disjunctive relationship exists

ORTarget

Concept Present

84/31

CSI Laboratory

2010

Theoretical and Developmental Progress

• Previous Optimization:• Did not necessarily promote

diverse density

• Current optimization• Better for context learning and MIL

• Previously no feature relevance or selection (hypersphere)– Improvement: included learned weights

on each feature dimension

j

jji

ii BQBT )()(maxarg ,,

j

jji

ii BQBT )()(maxarg ,,

• Previous TO DO list• Improve Existing Code

– Develop joint optimization for context learning and MIL

• Apply MIL approaches (broad scale)• Learn similarities between feature sets of

mines• Aid in training existing algos: find “best”

EHD features for training / testing• Construct set-based classifiers?

• Previous TO DO list• Improve Existing Code

– Develop joint optimization for context learning and MIL

• Apply MIL approaches (broad scale)• Learn similarities between feature sets of

mines• Aid in training existing algos: find “best”

EHD features for training / testing• Construct set-based classifiers?

85/31

CSI Laboratory

2010

How do we impose the MI scenario?: Diverse Density (Maron et al.)

• Calculation (Noisy-OR Model):– Inherent in Random Set formulation

• Optimization

– Combo of exhaustive search and gradient ascent

j

iji BtPBtP )|(11)|( },...,{ 1 iiJii xxB

j

iji BtPBtP )|(1)|(

22

expexp)|( txtBBtP ijijij

jj

ii

tBtPBtP ||maxarg


target concept

86/31

CSI Laboratory

2010

How can we use Random Sets for MIL?

• Random set for MIL: Bags are sets– Idea of finding commonality of positive bags inherent in

random set formulation• Sets have an empty intersection or non-empty intersection

relationship• Find commonality using intersection operator• Random sets governing functional is based on intersection operator

• Example:

Bags with target{l,a,e,i,o,p,u,f}{f,b,a,e,i,z,o,u}

{a,b,c,i,o,u,e,p,f}{a,f,t,e,i,u,o,d,v}

Bags without target

{s,r,n,m,p,l}{z,s,w,t,g,n,c}

{f,p,k,r}{q,x,z,c,v}

{p,l,f}

{a,e,i,o,u,f}

intersection

union

{f,s,r,n,m,p,l,z,w,g,n,c,v,q,k}Target concept = \ = {a,e,i,o,u}