66
Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang OPLab @ NTUIM

Designing multiple biometric systems: Measure of ensemble effectiveness

  • Upload
    abril

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Designing multiple biometric systems: Measure of ensemble effectiveness. Allen Tang OPLab @ NTUIM. Agenda. Introduction Measures of performance Measures of ensemble effectiveness Combination Rules Experimental Results Conclusion. INTRODUCTION. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Designing multiple biometric systems:  Measure of ensemble effectiveness

Designing multiple biometric systems:

Measure of ensemble effectiveness

Allen TangOPLab @ NTUIM

Page 2: Designing multiple biometric systems:  Measure of ensemble effectiveness

Agenda

Introduction Measures of performance Measures of ensemble effectiveness Combination Rules Experimental Results Conclusion

2

Page 3: Designing multiple biometric systems:  Measure of ensemble effectiveness

INTRODUCTION

Page 4: Designing multiple biometric systems:  Measure of ensemble effectiveness

Introduction

Multimodal biometrics is better

Fuse multiple biometric results

Fusion at matching level is easier

4

Page 5: Designing multiple biometric systems:  Measure of ensemble effectiveness

Introduction

Which biometric experts shall we choose?

How to evaluate ensemble effectiveness?

Which measure gives out the best result?

5

Page 6: Designing multiple biometric systems:  Measure of ensemble effectiveness

MEASURES OF PERFORMANCE

Page 7: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of performance

Notation E={E1…Ej…EN}: a set of N experts U={ui}: the set of users sj: the set of all scores by Ej for all user sij: the score by Ej for a user ui

fj(ui): function of Ej produce sij for ui

th: threshold; gen: genuine; imp: impostor

7

Page 8: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of performance: Basic

False Rejection Rate(FRR) for expert Ej:

False Acceptance Rate(FAR) for expert Ej:

......(1))|()|()( genthsPdsgenspthFRR jth

jjj

......(2))|()|()( impthsPdsimpspthFAR jth

jjj

8

Page 9: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of performance: Basic

p(sj|gen): Ej score probability distribution to genuine users

p(sj|imp): Ej score probability distribution to impostor users

Threshold(th) changes with the requirements of the application at hand

9

Page 10: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of performance

Area under the ROC curve(AUC)

Equal error rate(ERR)

The “decidability” index d’

10

Page 11: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of performance

11

Page 12: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of performance: AUC

Estimate AUC by Mann-Whitney statistics:

This formulation of AUC is also called the “probability of correct pair-wise ranking”, as it computes the probability P( > )

......(3))(

]),([1 1

,,

nn

ssIAUC

n

p

n

q

impjq

genjp

genjps ,

impjqs ,

12

Page 13: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of performance: AUC

n+/n−: no. of genuine/imposter users : score set by Ej for genuine users : score set by Ej for impostor users

, ,

, , , ,

, ,

1:

( , ) 0.5 :

0 :

gen impp j q j

gen imp gen impp j q j p j q j

gen impp j q j

s s

I s s s s

s s

genjps ,

impjqs ,

13

Page 14: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of performance: AUC

Features of AUC estimated by WMW stat. :

Theoretically equivalent to the value by integrating ROC curve

Attain more reliable estimation of AUC in real cases(finite samples)

Divide all scores sij into 2 sets: &

genjps ,

impjqs ,

14

Page 15: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of performance: EER

EER is the point of ROC curve where FAR and FRR are equal

The lower the value of EER, the better the performance of a biometric system

15

Page 16: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of performance: d’

The d’ in the biometrics is to measure the separability of the distributions of genuine and impostor scores

2 2

'2 2

gen imp

gen imp

d

16

Page 17: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of performance: d’

μgen/μimp: mean of genuine/impostor score distribution

σgen/σimp: std. deviation of genuine/impostor score distribution

The larger the d’, the better the performance of a biometric system

17

Page 18: Designing multiple biometric systems:  Measure of ensemble effectiveness

MEASURES OF ENSEMBLE EFFECTIVENESS

Page 19: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of ensemble effectiveness

4 measures for estimating effectiveness of ensemble of biometric experts: AUC, EER, d’, and Score Dissimilarity(SD) Index

But we must take the difference in performance among the experts into consideration

19

Page 20: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of ensemble effectiveness

Generic, weighted and normalized performance measure(pm) formulation:

pmδ=μpm∙ (1−tanh(σpm))

For AUC: AUCδ=μAUC∙ (1−tanh(σAUC)) The higher the AUC average, the

better the performances of an ensemble of experts

20

Page 21: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of ensemble effectiveness

For ERR: ERRδ=μERR∙ (1−tanh(σERR)) The lower the ERR average, the better the

performances of an ensemble of experts For d’, consider the value of d’ that can be

much larger than 1, use normalized D’=logb(1+d’) instead of d’, and base b=10 according to the values of d’ in experiments

Thus D’δ=μD’∙ (1−tanh(σD’)) is used

21

Page 22: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of ensemble effectiveness: SD index

SD index is based on the WMW formulation of the AUC, and is designed to measure the amount of improvement in AUC of the combination of an ensemble of experts

SD index is a measure of the amount of AUC that can be “recovered” by exploiting the complementarity of the experts

22

Page 23: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of ensemble effectiveness: SD index

Consider 2 experts E1 & E2, and all possible scores pairs , divide these pairs into 4 subsets S00, S10, S01, S11:

,1 ,1 ,2 ,2{{ , },{ , }}gen imp gen impp q p qs s s s

23

Page 24: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of ensemble effectiveness: SD index

AUC of E1 & E2 are listed below, where card(Suv) is the cardinality of the subset Suv:

SD index is defined as:

11 101

[ ( ) ( )]card S card SAUC n n

11 012

[ ( ) ( )]card S card SAUC n n

10 01

11 10 01

( ) ( )......(4)

( ) ( ) ( )

card S card SSD

card S card S card S

24

Page 25: Designing multiple biometric systems:  Measure of ensemble effectiveness

Measures of ensemble effectiveness: SD index

The higher the value of SD, the higher the maximum AUC that could be obtained by the combined scores

But actual increments of AUC depends on the combination method, and high SDs usually related to low performance experts

Performance measure formulation for SD: SDδ=μSD∙ (1−tanh(σSD))

25

Page 26: Designing multiple biometric systems:  Measure of ensemble effectiveness

COMBINATION RULES

Page 27: Designing multiple biometric systems:  Measure of ensemble effectiveness

Combination Rules

Combination(Fusion) in this work is at the score level, as it is the most widely used and flexible combination level

Investigate the performance of 4 combination methods: mean rule, product rule, linear combination by LDA, and DSS

LDA & DSS require a training phase to estimate the parameters needed to perform the combination

27

Page 28: Designing multiple biometric systems:  Measure of ensemble effectiveness

Combination Rules: Mean Rule

The mean rule is applied directly to the matching scores produced by the set of N experts

,1

1 N

i mean ijj

S sN

28

Page 29: Designing multiple biometric systems:  Measure of ensemble effectiveness

Combination Rules: Product Rule

The product rule is applied directly to the matching scores produced by the set of N experts

,1

1 N

i prod ijj

S SN

29

Page 30: Designing multiple biometric systems:  Measure of ensemble effectiveness

Combination Rules: Linear Combination by LDA

Linear discriminant analysis(LDA) can be used to compute the weights of a linear combination of the scores

This rule is to attain a fused score with minimum within-class variations and maximum between-class variations

,

ti LDA iS W S

30

Page 31: Designing multiple biometric systems:  Measure of ensemble effectiveness

Combination Rules: Linear Combination by LDA

Wt(W): transformation vector

computed using a training set Si: vector of the scores assigned to

the user ui by all the experts μgen/μimp: mean of genuine/impostor

score distribution Sw: within-class scatter matrix

1( )W gen impW S

31

Page 32: Designing multiple biometric systems:  Measure of ensemble effectiveness

Combination Rules: DSS

Dynamic score selection(DSS) is to select one of the scores sij available for each user ui, instead of fusing them into a new score

The ideal selector is based on the knowledge of the state of nature of each user:

i

,*i

max{ }: if u is a genuine......(5)

min{ }: if u is an imposter

ij

iij

sS

s

32

Page 33: Designing multiple biometric systems:  Measure of ensemble effectiveness

Combination Rules: DSS

DSS selects the scores according estimation of the state of nature for each user, and the algorithm is based on quadratic discriminant classifier (QDC)

For the estimation, a vector space is built where the vector components are the scores assigned to the user by the N experts

33

Page 34: Designing multiple biometric systems:  Measure of ensemble effectiveness

Combination Rules: DSS

Train a classifier on this vector space by using a training set related to genuine and impostor users

Using the classifier to estimate the state of nature of the user

After getting the estimation of the state of nature of the user, select user’s score according to (5).

34

Page 35: Designing multiple biometric systems:  Measure of ensemble effectiveness

EXPERIMENTAL RESULTS

Page 36: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: Goal

Investigate the correlation between the measures of the effectiveness of the ensemble

Understand final performances achieved by the combined experts, and get the best measures

36

Page 37: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: Preparation

Scores source: 41 experts and 4 DBs from open category in 3rd Fingerprint Verification Competition(FVC2004)

No. of scores: For each sensor and for each expert, a total of 7750 scores, attempts from gen./imp. users are 2800/4950

For LDA & DSS training, divide scores into 4 subsets, with 700 gen. and 1238 imp. each

37

Page 38: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: Process

No. of expert pairs: 13,120(41x40x2x4) For each pair, compute the measures

of effectiveness by AUC, EER, d’ and SD index

Combine the pairs using 4 combination rules, then compute related values of AUC and EER to show the performance

Use a graphical representation of the results of the experiments

38

Page 39: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: AUCδ plotted against AUC

39

Page 40: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: AUCδ plotted against AUC

40

Page 41: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: AUCδ plotted against AUC

According to graphs, AUCδ isn’t useful because no clear relationship with AUC of combination rules

High AUCδ attains high AUC, but lower AUCδ gets value in wide range

High AUCδ relates to high performance and similar behavior experts pair

Mean rule has best AUCδ

41

Page 42: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: AUCδ plotted against EER

42

Page 43: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: AUCδ plotted against EER

43

Page 44: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: AUCδ plotted against EER

AUCδ is uncorrelated with the EER too Any value of AUCδ , the EER spans

over a wide range of values Can not predict the performance of

the combination in terms of EER by AUCδ

44

Page 45: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: EERδ plotted against AUC

45

Page 46: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: EERδ plotted against AUC

46

Page 47: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: EERδ plotted against AUC

Behavior better than AUCδ, but still no clear relationship between EERδ and AUC

Mean rules has best result too

47

Page 48: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: EERδ plotted against EER

48

Page 49: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: EERδ plotted against EER

49

Page 50: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: EERδ plotted against EER

No correlation between EERδ and EER Graphs from AUCδ against EER and

EERδ against EER have similar results So AUC and EER are not suitable to

evaluate combination of experts, despite that they are widely used for unimodal biometric system

50

Page 51: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: D’δ plotted against AUC

51

Page 52: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: D’δ plotted against AUC

52

Page 53: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: D’δ plotted against AUC

Higher values of D'δ guarantee smaller ranges of values of the performance of the combination

D'δ has higher and clearer correlation with performance of combination

Mean rule gets best result, and product rule is the worst

53

Page 54: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: D’δ plotted against EER

54

Page 55: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: D’δ plotted against EER

55

Page 56: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: D’δ plotted against EER

D'δ has better correlation with EER too

D'δ is much better than AUCδ and EERδ

D'δ is a good measure to evaluate the effectiveness of candidate ensembles of biometric experts

56

Page 57: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: SDδ plotted against AUC

57

Page 58: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: SDδ plotted against AUC

58

Page 59: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: SDδ plotted against AUC

SDδ does have some correlation with AUC because SD is designed to predict max improvement in AUC by combining experts, but is still not clear enough

Small SDδs guarantee large performance, especially for high performance experts pair, because higher the AUC of the individual experts, the smaller the complementarity

59

Page 60: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: SDδ plotted against EER

60

Page 61: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: SDδ plotted against EER

61

Page 62: Designing multiple biometric systems:  Measure of ensemble effectiveness

Experimental Results: SDδ plotted against EER

SDδ with EER isn’t as good as AUC

Result from product rule is still no good

62

Page 63: Designing multiple biometric systems:  Measure of ensemble effectiveness

CONCLUSION

Page 64: Designing multiple biometric systems:  Measure of ensemble effectiveness

Conclusion

To predict performance improvement, product rule exhibit worst, mean rule is best, and LDA & DSS not far from mean rule

Under mean rule, LDA & DSS have similar results

Performance of combined experts is not highly correlated with single one in general

64

Page 65: Designing multiple biometric systems:  Measure of ensemble effectiveness

Conclusion

The best measure of ensemble is D'δ, while AUC δ and ERR δ isn’t good enough, and SD δ performs like AUC δ

Based on above results, D' δ with mean rule tops any other pairs of measure and combination rule, and is the most suitable method to be the measure of ensemble effeectiveness

65

Page 66: Designing multiple biometric systems:  Measure of ensemble effectiveness

THANKS FOR LISTENING!It’s Q&A time!