6.835 Project 4: Multimodal Signal Fusion Using HMMspeople.csail.mit.edu/yalesong/publications/Spring... · Matlab (you need version 7 or later) and load the dataset. >> load NATOPS6.mat

Spring 2013, MIT 6.835 Intelligent Multimodal Interfaces

6.835 Project 4:

Multimodal Signal Fusion Using HMMs

Due: 6PM Wednesday, April 16, 2013

1 Introduction

The goal of this project is to explore various multimodal signal fusion techniques,using an HMM as the base learner. You will implement and compare fourdifferent fusion algorithms on an aircraft handling signals dataset containingtwo kinds of information: body joint information and hand shape information.As usual, the implementation will be in Matlab. Unlike previous mini projects,however, this mini project will take much longer to finish, mainly due to heavycomputational requirements of the algorithms you will be using. It may nottake much longer in terms of time to write the required code, but the testingand debugging might be much slower because of the longer run-times. So startnow!

2 Getting Started

Download the packaged .zip file from Stellar. After extracting the file, startMatlab (you need version 7 or later) and load the dataset.

>> load NATOPS6.mat

The dataset contains six aircraft handling signals used in routine practice on thedeck of an aircraft carrier, performed in a laboratory setting [4]. The six typesof gestures are shown in Figure 1. The dataset includes upper body posturesand the shapes of both hands. The body features are 3D angular velocities offour body joints – left/right elbows and wrists – represented as a 12-dimensionalfeature vector. The hand feature vector contains probability estimates of fourpredefined hand shapes – opened/closed palm, and thumb up/down, yieldingan 8-dimensional feature vector.

1

Figure 1: Six aircraft handling signals from the NATOPS database [4] (#1: AllClear, #2: Not Clear, #3: Insert Chocks, #4: Remove Chocks, #5: Brakes On,#6: Brakes Off). Body movements are illustrated in yellow arrows and handposes are illustrated with synthesized images of hands. Red rectangles indicatehand poses are important in distinguishing gesture pairs (#1-#2 and #5-#6).

Once you load the dataset, you will see a struct variable D that contains two cell-array type variables, seqs and labels; the seqs contain 2,400 gesture sequencesand the labels contain corresponding labels. As detailed in [4], the NATOPSdataset was collected from 20 participants performing each gesture 20 times(400 samples per gesture). The 2,400 samples (6 gesture types x 400 samplesper gesture) are complied in the order of subject-gesture-trial, that is, the first120 samples are from the first subject, and the first 20 samples of the first 120samples are of gesture type #1 (all clear).

Each sample in seqs is a 20-by-Ti matrix, where Ti is the length of the i-thgesture sequence (i.e., each column in the matrix represents observation at eachtime frame). The table below lists the 20-dimensional features.

2

Features Joint Datax(1:3,:) left elbow angular velocity (x,y,z)x(4:6,:) left wrist angular velocity (x,y,z)x(7:9,:) right elbow angular velocity (x,y,z)x(10:12,:) right wrist angular velocity (x,y,z)x(13,:) p(opened palm | left hand)x(14,:) p(closed palm | left hand)x(15,:) p(thumb up | left hand)x(16,:) p(thumb down | left hand)x(17,:) p(opened palm | right hand)x(18,:) p(closed palm | right hand)x(19,:) p(thumb up | right hand)x(20,:) p(thumb down | right hand)

We have provided a Matlab script run.m that will run through all the exper-iments you will perform in this mini project. You will experiment with fourdifferent fusion algorithms using the functions below:

• Early fusion HMM, experiment early hmm.m

• Late fusion HMM, experiment late hmm.m

• Coupled HMM, experiment coupled hmm.m

• Co-training HMM, experiment cotrain hmm.m

The above functions will train and test each fusion algorithm, using these func-tions

• trainHMM.m

• testHMM.m

• testLateHMM.m

• trainCHMM.m

• testCHMM.m

• trainCoHMM.m

Four utility functions are provided:

• split data.m divides a dataset into four splits: train, validate, test, andunlabeled. It takes as an input labels, a 1-by-N cell array, that containlabels of N samples, and ratio, a 1-by-4 array, that specifies percentagesof each split. It produces 4 arrays of indices, where the indices indicateelements from the dataset.

3

• get best model.m finds the best performing model based on the classifi-cation accuracy on the validation split. It takes as an input R, a 1-by-Kcell array that contains experimental results, and returns the index of thebest performing model.

• build confmat.m builds a confusion matrix. It takes as an input predictionand groundtruth, each of which is a 1-by-N array, and returns an N-by-Nconfusion matrix.

• plot confmat.m prints a confusion matrix. It takes as an input confmat,an N-by-N matrix, obtained from build confmat. Values in the diagonalof the plotted confusion matrix show per-class classification accuracy. Off-diagonal values show the percentage of times one label was confused withanother.

We have also provided you with three files (fwdback.m, learn params dbn em.m,and mixgauss prob.m) that contain bug fixes of the FullBNT package. Thesewill automatically override existing BNT functions as long as they are in thesame directory as you are working in. In other words, you don’t need to over-write the existing files with the provided files; just keep them in your workingdirectory.

3 Unimodal Gesture Recognition

In this first part of the project, we will experiment with a standard HMM usingbody or hand features alone. Follow the steps in run.m (Part 1a and 1b).

To use run.m, copy sections from it and paste them into Matlab for execution.Start by copying and pasting the initializations: all the lines above “Part 1a”.This sets a number of parameters. Then, to do Step 1a, copy the lines betweenStep 1a and Step 1b, paste into Matlab command line environment, etc.

Question 1 [2 points]: From the experiments using body features (Part 1a)and hand features (Part 1b), what are the best classification accuracies youobtained? What parameter values did you validate and what was your strategyfor finding the best parameter value?

Question 2 [4 points]: For each experiment, the run.m script will automaticallygenerate a confusion matrix for the best performing model. Submit and interpretthe two confusion matrices you obtained: For each modality explain which pairsof gestures are confused the most and why you think they were. See the gesturepairs in Figure 1 and use them in explaining the why.

4

4 Early versus Late Fusion HMM

We will compare two simple modality fusion algorithms, early fusion and latefusion [1]. Follow the steps in run.m (Part 2a and 2b).

Question 3 [2 points]: Follow the steps in Part 2a. What is the best classifi-cation accuracy you obtained using early fusion HMM? What parameter valuesdid you try and what was your strategy for finding the best parameter value?

Question 4 [8 points]: Implement testLateHMM.m to perform late fusion, andexplain your implementation with pseudocode in your writeup. Note that youmust follow the type signature of the function, as the function’s input and outputparameters are used in the function experiment late hmm.m.

Implementation detail: The input variables to the function are seqs, a 1-by-Ncell array of test samples, labels, a 1-by-N cell array of test labels, hmm, a1-by-2 cell array of trained HMMs, each of which is a set of HMMs from eachmodality, featureMap, a 1-by-2 cell array of indices defining each modality , andweightsMV, a 1-by-K cell array of modality weights we want to try out (theseweights determine the relative weight given to each modality during fusion).To implement the function, use hmm{1} (and hmm{2}) to obtain estimates oflog-likelihoods of all samples using only body features (and only hand features).Then, for each weight value in weightsMV, compute the weighted average of thetwo log-likelihoods, obtain predicted labels by taking the max of log-likelihoodsof each sample, and finally compute the accuracy of the estimates. Your returnvariable stat must contain three field values, Ystar, a 1-by-N predicted labels,Ytrue, a 1-by-N ground truth labels, and accuracy the classification accuracyvalue.

Question 5 [2 points]: Follow the steps in Part 2b. What is the best classi-fication accuracy you can get using late fusion HMM? What parameter valuesdid you validate and what was your strategy for finding the best parametervalue? Which weight value tends to give you the best performance in terms ofthe classification accuracy? Why do you think that weight value give the bestresult?

Question 6 [4 points]: Describe the differences between early and late fusion al-gorithms in terms of the underlying assumptions, how the classifiers are trainedand then used to test new samples.

Question 7 [4 points]: Submit and interpret the two confusion matrices youobtained (both from early fusion and late fusion). Which approach (early versuslate) performed better? Pick the confusion matrix that performed better andcompare it to the two confusion matrices you obtained in Question 2. Whatdifferences do you see? Do you see a better classification accuracy on thosegesture pairs that were confused the most in unimodal approach? Why do youthink the performance has improved?

5

5 Coupled HMM for Mid-Level Fusion

The previous section compared two simple modality fusion approaches: earlyand late fusion. In this section you will implement and try out more sophis-ticated model, a coupled HMM [2], that performs mid-level fusion. Figure 2illustrates the difference between the standard HMM and a coupled HMM. Tostart, follow the steps in run.m (Part 3).

Figure 2: Graphical representation of the standard HMM (left) and a coupledHMM (right).

Question 8 [10 points]: Implement the function chmm = make chmm(N,Q,X)

(located inside trainCHMM.m) that generates the graph structure of a coupledHMM. Explain your implementation with pseudocode in your writeup. Notethat you must follow the type signature of the function, as the function’s inputand output parameters are used in the function trainCHMM.m.

Hint 1: The FullBNT package provides documentation (located under ./bnt/docs).See the documentation usage dbn.html to learn how to construct a graph struc-ture, and type help mk dbn in Matlab command line to learn how to make aDynamic Bayesian Network using that function.

Hint 2: For debugging purpose, we have provided you with a file chmm 2 [3,3] [12,8].mat

that contains a variable named chmm, the output of the function with parametervalues chmm = make chmm(2,[3,3],[12,8]). Your output must be the sameas the provided variable when using the same parameter values.

Question 9 [4 points]: Follow the step in Part 3 and run an experiment us-ing coupled HMM. What is the best classification accuracy you obtained usingcoupled HMM? What parameter values did you validate and what was yourstrategy for finding the best parameter value? 1

1You may notice that it takes so much longer to train a coupled HMM than a standardHMM. This is mainly because the graph structure of a coupled HMM contain many loops, and

6

Question 10 [2 points]: Submit and interpret the confusion matrix you ob-tained, comparing to the confusion matrices you obtained so far. Does coupledHMM tend to perform better than the early/late fusion HMMs? Why do youthink it did (or did not)?

Question 11 [4 points]: Describe the differences between coupled HMM andearly/late fusion HMMs in terms of the underlying assumptions, how the clas-sifiers are trained and then used to test new samples.

6 Co-training HMM

Question 12 [6 points]: What are the two assumptions that a co-trainingalgorithm makes? In the context of the NATOPS dataset, do those assumptionsmake sense?

Question 13 [16 points]: Implement trainCoHMM.m that performs co-trainingof HMMs. We have provided you with pseudocode for implementing co-trainingalgorithms (Figure 3). Note that you must follow the type signature of thefunction, as the function’s input and output parameters are used in the functionexperiment cotrain hmm.m.

Figure 3: Pesudocode of a co-training algorithm.

Implementation detail: The input variables to the function are seqs, a 1-by-Ncell array of training samples, labels, a 1-by-N cell array of labels, useqs, a

the complexity of the inference algorithm we use (i.e., junction tree) increases exponentiallyas the clique size of the graph increases. You may get some speedup by using approximateinference algorithms, such as Boyen-Koller algorithm implemented in bk inf engine.m.

7

1-by-M cell array of unlabeled samples, and params containing training parame-ters. There are three parameters you need during co-training: params.maxiter cotrain

that specifies the maximum number of iterations in co-training, params.initN cotrain

that specifies the initial pool size, and Ny cotrain that specifies the number ofsamples per class to be added by each classifier in each iteration. Your returnvariable cohmm is a 1-by-2 cell array of trained HMMs, each of which is a set ofHMMs from each modality.

Question 14 [2 points]: Follow the step in Part 4 and run an experimentusing co-training HMM. What is the best classification accuracy you obtainedusing co-training HMM? What parameter values did you try out and what wasyour strategy for finding the best parameter value? Submit and interpret theconfusion matrix you obtained.

Note: Please do NOT submit NATOPS6.mat file.

References

[1] Cees Snoek, Marcel Worring, Arnold W. M. Smeulders: Early versus latefusion in semantic video analysis. ACM Multimedia 2005: 399-402

[2] Matthew Brand, Nuria Oliver, Alex Pentland: Coupled hidden Markov mod-els for complex action recognition. CVPR 1997: 994-999

[3] C. Mario Christoudias, Kate Saenko, Louis-Philippe Morency, Trevor Darrell:Co-Adaptation of audio-visual speech and gesture classifiers. ICMI 2006: 84-91

[4] Yale Song, David Demirdjian, Randall Davis: Tracking body and hands forgesture recognition: NATOPS aircraft handling signals database. FG 2011:500-506

8

Documents

6.835 Project 4: Multimodal Signal Fusion Using HMMspeople.csail.mit.edu/yalesong/publications/Spring... · Matlab (you need version 7 or later) and load the dataset. >> load NATOPS6.mat