47

Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Embed Size (px)

Citation preview

Page 1: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production
Page 2: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Visual Media

Capture of shape and appearance of real objects and people

Sign languagerecognition

Preservation of cultural artefacts

3D Broadcastproduction

Animation

Page 3: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Medical Imaging and Remote Medical Imaging and Remote SensingSensing

3D MRI image analysis (brain tumour detection) Alzheimer’s condition diagnosis (PET brain imaging) 2D-3D Elastic image matching

3D liver reconstruction

Microcalcificationdetection

Vascularreconstruction

Seismic

Pipelinedetection

Page 4: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Robot VisionRobot Vision

Visual learning Scene interpretation Model selection Control of

perception

3D object recognition from 2D views

Vision based navigation

Target detection

Visual surveillance

Page 5: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Multimedia Signal Processing and Multimedia Signal Processing and InterpretationInterpretation

VOICE

FACE

LIPS

Fusion

Biometrics

Image/Video Retrieval

Page 6: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Ensemble MLP classifier DesignEnsemble MLP classifier Design

Terry Windeatt University of Surrey, UK

Page 7: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

IntroductionIntroduction

Ensembles – Multiple Classifier Systems (MCS)

Ensemble Multi-layer Perceptron Architecture

Tuning Base Classifiers using measures & OOB estimate

Multi-class ECOC using OOB

Feature Selection and Feature Ranking

Face Recognition

Page 8: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

SINGLE CLASSIFIER APPROACH 

Goals  assign a pattern to one of several classes

  find best possible feature settraining setlearning machine structure & parameters

  Task is especially difficult when

    number of classes is high

    classes highly overlapped in feature space

    training samples are few and very noisy

Learning is ill-posed problem & requires built-in assumptions

Page 9: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Multi-layer Perceptron (MLP)Multi-layer Perceptron (MLP)

Input layer Hidden layer Output layer

Unstable Base Classifier from random starting weights

#hidden nodes varies complexity &

#epochs varies degree of training

Page 10: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

MLPClassifier 1

MLPClassifier 2

MLPClassifier B

Combiner

1

2

B

MCS ArchitectureMCS Architecture

Idea is to use multiple simple MLPs rather than single complex MLP

Bias/Variance 0/1 loss function more complex than regression

-ensemble reduces variance & tuning base classifier reduces bias

Page 11: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Multiple Classifiers (MCS)Multiple Classifiers (MCS)

MCS based upon:• finding classifiers that perform well but diversely• appropriate combining strategy

Techniques:• Different types of classifiers• Different parameters same classifier • Different unstable base classifiers e.g MLP • Different Feature Sets e.g Random Subspace• Different Training Sets e.g. Bagging/Boosting• Different class labeling e.g. ECOC

Measures of Diversity• Accuracy/Diversity Dilemma

Page 12: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Base Classifier Parameter Base Classifier Parameter TuningTuning

Importance of Parameter Tuning Every researcher seems to get good results but how?

Need to measure sensitivity to parameters Helps understand significance of results

Requires systematic change of parameters

How to set parameters? Alternates to validation set or cross-validation techniques

Page 13: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Out-of-Bootstrap (OOB)Out-of-Bootstrap (OOB)

Bootstrapping – Sample with Replacement

Promotes diversity among classifiers

OOB provides alternative to validation

Base classifier OOB uses training patterns left outapprox one third

Ensemble OOB uses classifiers left outapprox one third

Page 14: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

= f(Xm), where m = 1,…. is number of patterns

 xmi and {0,1}, i = 1 …B

•2-class problem

•B parallel base classifiers

• incompletely specified & noisy function

BINARY-TO-BINARY MAPPING

),,,( 21 mBmmm xxxX

Page 15: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

CLASS SEPARABILITY MEASURE

2-CLASS

calculated over pairs of patterns chosen from different classes

Example: 1 indicates correct classification

0 0 1 1 1 0 0 1 1 0 class 1

1 1 1 0 1 0 0 1 1 0 class 2

Page 16: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

B

j

bqjpj

aabpqN

1

)(~

CLASS SEPARABILITY MEASURE

calculated over pairs of patterns p & q chosen from different classes

a,b{0,1}

2

1

00

1

1

11 ~~

K

N

K

Nq

pqq

pq

p

,, 01 yy

Page 17: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

PAIR-WISE DIVERSITY MEASURES Q

 

bmj

m

ami

abijN

1

,, 01 yy

a,b{0,1}

Use counts between classifier pairs:

Giving N11 N10 N01 N00

10010011

10010011

NNNN

NNNNQ ji

Page 18: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

EXPERIMENTS 2-CLASSEXPERIMENTS 2-CLASS

100 single hidden-layer MLP base classifiers

Levenberg-Marquardt training, default parameters

Systematic variation of epochs and nodes

Different random starting weights + bootstrapping

Datasets random 20/80 train/test split (10 runs)– with added classification noise to encourage overfitting

Page 19: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

DATASET #pat #class #con #dis

cancer 699 2 0 9

card 690 2 6 9

credita 690 2 3 11

diabetes 768 2 8 0

heart 920 2 5 30

ion 351 2 31 3

vote 435 2 0 16

Page 20: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Figure : Mean test error rates, OOB estimates, measures , Q for Diabetes 20/80 with [2,4,8,16] nodes

Page 21: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

mean test error, , Q over seven 20/80 two-class datasets using 8 hidden-node bootstrapped base classifiers for [0,20,40] % noise

Page 22: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

MULTI-CLASS ECOCMULTI-CLASS ECOC

Coding step: – Map training patterns into two super-classes according to 1’s and

0’s in ECOC matrix Z

Train base classifier on 2-class decompositions

Decoding step: – Assign test pattern according to minimum distance to row of

ECOC matrix Z

Page 23: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

MULTI-CLASSMULTI-CLASSECOC CODE MATRIXECOC CODE MATRIX

  Example ECOC matrix:Example ECOC matrix:

0 1 1 1 .............0 1 1 1 .............1 0 0 0 .............1 0 0 0 .............0 1 0 1 .............0 1 0 1 .............1 0 1 0 .............1 0 1 0 .............1 1 0 1 .............1 1 0 1 .............0 0 1 0 .............0 0 1 0 .............

each row is a code wordeach row is a code wordeach column defines two super-classeseach column defines two super-classes

6 classes

Page 24: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Distance-based decoding rules Distance-based decoding rules (e.g. Hamming, L(e.g. Hamming, L11))

10…1

10…1 1

3

2

Pattern Space ECOC Ensemble Target Classes

MLP

MLP

MLP

01…0

11…1

*** OOB uses only classifiers that are not used in training

Page 25: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Experiments Multi-classExperiments Multi-class

200 base classifiers

Random ECOC matrices

20/80 train/test split repeated 10 times

Levenberg-Marquardt training algorithm

Page 26: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

DATASET #pat #class #con #dis

dermatology 366 6 1 33

ecoli 336 8 5 2

glass 214 6 9 0

iris 150 3 4 0

segment 2310 7 19 0

soybean 683 19 0 35

thyroid 7200 3 6 15

vehicle 846 4 18 0

vowel 990 11 10 1

wave 5000 3 21 0

yeast 1484 10 7 1

Page 27: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Yeast 2/4/8/16 nodes 1-69 epochs

Page 28: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Feature RankingFeature Ranking Intended for large number of features – small sample One vs multi-dimensional Context of MCS - base classifier vs combiner Simple one-dim methods Sophisticated multi-dim search methods

Modulus of MLP weights – ‘product of weights’

j

jiji WWw 21

W1 is the first layer weight matrix and W2 is the output weight vector

Page 29: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Recursive Feature EliminationRecursive Feature Elimination

Simple algorithm for eliminating irrelevant features and operates recursively as follows:

1) Rank the features according to a suitable feature ranking method

2) Identify and remove the r least ranked features

If r>1, usually desirable from an efficiency viewpoint, a feature subset ranking is obtained.

Page 30: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Mean test error rates, Bias, Variance for RFE MLP ensemble over seven 2-class Datasets 20/80, 10/90. 5/95 train/test split

Page 31: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Yeast RFE for [20/80 10/90 5/95]

Page 32: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Face Recognition - ORL DatabaseFace Recognition - ORL Database

400 images of forty faces - 40 class identification problem

Variation in lighting,facial hair, pose ….

Controlled background with subjects upright frontal

No need for face detection so fair comparison

We use 40-dim PCA + 20-dim LDA

Random 50/50 train/test split

16 hidden node MLP L-M base classifiers (x200)

Expts repeated twenty times with 40 x 200 ECOC code matrix

Page 33: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

ORL Database - Results

Test error, , Q for ORL 50/50 database using 16 hidden-node base classifiers for [0,20,40] %

classification noise.

Page 34: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Facial Action Unit (FACS)Facial Action Unit (FACS)

Difficult because depends on age, ethnicity, gender, and

occlusions due to cosmetics, hair, glasses

FACS categorises deformation and motion into visual classes

Decouples interpretation from individual actions

Requires skilled practitioners

Small sample size problem

– Large #features and small #training pats

Page 35: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production
Page 36: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Cohn-Kanade DatabaseCohn-Kanade Database

• frontal camera from 100 university students

• contains posed (as opposed to the more difficult spontaneous) expression sequences

• only the last image is au coded.

• combinations of aus, in some cases non-additive

•Upper face aus au1 au2 au4 au5 au6 au7

Page 37: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Design DecisionsDesign Decisionsa)    All image sequences of size 640 x 480 chosen from the database

b) Last image in sequence (no neutral) giving 424 images, 115 containing au1

c)    Full image resolution, no compression

d)  Manually located eye centres plus rotation/scaling into 2 common eye

coordinates

e)    Window extracted of size 150 x 75 pixels centred on eye coordinates

f)   Forty Gabor filters [18], five special frequencies at five orientations with top 4

principle components for each Gabor filter, 160-dimensional feature vector

g)    Comparison of feature selection schemes described in Section 3

h)    Comparison of MLP ensemble and Support Vector Classifier

i)     Random training/test split of 90/10 and 50/50 repeated twenty times and

averaged

Page 38: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

ID sc1 sc2 sc3 sc4 sc5 sc6

superclass {} 1,2 1,2,5 4 6 1,4

#patterns 149 21 44 26 64 18

sc7 sc8 sc9 sc10 sc11 sc12

1,4,7 4,7 4,6,7 6,7 1 1,2,4

10 39 16 7 6 4

ECOC super-classes of action units and number of patterns

Page 39: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

  2-classError %

2-classROC

ECOCError %

ECOCROC

au1 8.0/16/28 0.97/16/36 9.0/4/36 0.94/4/17

au2 2.9/1/22 0.99/16/36 3.2/16/22 0.97/1/46

au4 8.5/16/36 0.95//16/28 9.0/1/28 0.95/4/36

au5 5.5/1/46 0.97/1/46 3.5/1/36 0.98/1/36

au6 10.3/4/36 0.94/4/28 12.5/4/28 0.92/1/28

au7 10.3/1/28 0.92/16/60 11.6/4/46 0.92/1/36

mean 7.6 0.96 8.1 0.95

Table 3: Mean best error rates (%) and area under ROC showing #nodes /#features for au classification 90/10 with optimized PCA features and MLP ensemble

Page 40: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

ConclusionConclusion

Measures may be used to optimise base classifier

parameters without validation

OOB estimate can select optimal features

– Even for Ensemble OOB

Multi-class uses OOB with ECOC

Modulus of MLP weights is simple feature

ranking that works well with RFE

Page 41: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

THANK YOU THANK YOU

Page 42: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Feature ranking schemes comparedFeature ranking schemes compared

RFE with MLP weights RFE with noisy bootstrap

– Extends training set by resampling with noise Boosting single feature each iteration One-dimensional class-separability

– Trace(SW-1 *SB) Within & Between class scatter

SFFS (Sequential Floating Forward Search)

Page 43: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

perceptron-ensemble classifier

rfenn rfenb 1dim SFFS boost

Mean20/80 15.1 14.6 14.2 15.4 15.4

Mean10/90 16.3 16.3 16.6 18.0 17.6

Mean5/95 18.4 18.5 20.0 21.3 21.3

Table : Mean best error rates for seven two-class problems (20/80, 10/90, 5/95 train/test ) with five feature-ranking schemes

Page 44: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

The extended M2VTS (XM2VTS) database

• Contains 295 subjects

• Recorded in four separate sessions over 5 months

• Experimental protocol assigns 200 clients and 95 impostors.

• 3 training, 3 evaluation and 2 test images.

• Impostor set partitioned into 25 evaluation and 70 test impostors

• Features are extracted using PCA + 199-dim LDA

Page 45: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Distance based combination 

Use ECOC with 200 x 512 matrix

To test client claim is authentic use average distance (L1 Norm) between vector y and the elements of set of class i

N

l

b

jjj

li yy

Nyd

1 1

1)(

where yj is the jth binary classifier output, and ylj is the jth

classifier output for the lth member of class i.

distance is checked against a decision threshold

FA 1.3% FR 0.8%

Page 46: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

16 node MLP-ensemble classifier

rfenn rfenb 1dim SFFS boost

10.0/28 10.9/43 10.9/43 12.3/104 11.9/43

Linear SVC classifier

rfesvc rfenb- 1dim SFFS boost

11.6/28 12.1/28 11.9/67 13.9/67 12.4/43

Mean best error rates (%)/number of Gabor features for au1 classification 90/10 with five feature ranking schemes

Page 47: Visual Media Capture of shape and appearance of real objects and people Sign language recognition Preservation of cultural artefacts 3D Broadcast production

Windeatt T. and Ghaderi R., Coding and Decoding Strategies for multiclass learning problems, Information Fusion, 4(1), 2003, pp 11-21. 

Windeatt T, Vote Counting Measures for Ensemble Classifiers, Pattern Recognition, 36(12), 2003, pp 2743-2756.  

J. Kittler, R. Ghaderi, T. Windeatt and J. Matas Face verification via error correcting output codes, Image and Vision Computing, Volume 21, Issues 13-14, 1 December 2003, Pages 1163-1169.

T. Windeatt, Diversity Measures for Multiple Classifier System Analysis and Design, Information Fusion, 6 (1), 2004, 21-36.   

T. Windeatt, Accuracy/ Diversity and Ensemble Classifier Design, IEEE Trans Neural Networks, 17(4), July, 2006.

R. S. Smith, T. Windeatt, Decoding Rules for ECOC, Proc. 6th Int. Workshop Multiple Classifier Systems, Editors: N. C. Oza, R. Polikar, J. Kittler, F. Roli, Seaside, Calif, USA, June 2005,  Lecture notes in computer science, Springer-Verlag, pp 53-63.

M. Prior, T. Windeatt, Over-fitting in Ensembles of Neural Network Classifiers within ECOC frameworks, Proc. 6th Int. Workshop Multiple Classifier Systems, Editors: N. C. Oza, R. Polikar, J. Kittler, F. Roli, Seaside, Calif, USA, June 2005,  Lecture notes in computer science, Springer-Verlag, pp 286-295.

T. Windeatt, Ensemble Neural Classifier Design for Face Recognition, European Symposium on Artificial Neural Networks, ESANN2007, Bruges, April 2007.

T. Windeatt, M. Prior, Stopping Criteria for Ensemble-based Feature Selection, Proc. 7th Int. Workshop Multiple Classifier Systems, Prague May 2007,  Lecture notes in computer science, Springer-Verlag, pp

T. Windeatt, M. Prior, N. Effron, N. Intrator, Ensemble-based Feature Selection Criteria, Proc. Conference on Machine Learning Data Mining MLDM2007, Leipzig, July 2007.