Event Processing at Sensor Nodes in the Cloud

Event Processing at Sensor Nodes in the

Cloud

Submitted by Lee Jun Hui A0067228B

Department of Electrical & Computer Engineering

In partial fulfilment of the requirements for the Degree of

Bachelor of Engineering National University of Singapore

Page | i

ABSTRACT

Engineers are able to acquire large streams of environmental data, often from

scattered independent sensors. To properly make sense of the data however, there

needs to be a system that can handle the incoming streams as well as a

mathematical analysis to make sense of the data.

This project applies the event processing problem to the domain of occupancy

detection. Conventional occupancy detection approaches rely on multiple

tolerance checks or a simulation modelling.

The contribution of this project is the usage of the statistical properties of the data

through a Hidden Markov Model (HMM) to detect and forecast events emerging

from the hidden states of a multi-dimensional sensor stream. The underlying

occupancy state of the environment is deduced, as well as forecasting the short-

term future occupancy. The system relies on a pre-trained HMM model and

calculates in real-time. Decoded occupancy states, as well as interpolated and

extracted states, are found to be accurate within 30% error margin on average.

Page | ii

ACKNOWLEDGEMENTS

The author would like to express his greatest gratitude towards his supervisor,

Professor Tham Chen-Khong, for his guidance and support towards my project.

He is grateful to be able to work on such an interesting statistical pattern analysis

project.

The author would also like to thank his examiner, Dr. Mohan Gurusamy, for the

time spent on the assessment of the project.

Finally, the author would like to show his appreciation to his graduate research

assistant, Li Qiang, for his technical guidance and encouragement throughout the

course of the project.

Page | iii

TABLE OF CONTENTS

ABSTRACT ............................................................................................................. i

ACKNOWLEDGEMENTS .................................................................................... ii

TABLE OF CONTENTS ....................................................................................... iii

LIST OF TABLES ................................................................................................. vi

LIST OF FIGURES .............................................................................................. vii

LIST OF SYMBOLS AND ABBREVIATIONS ................................................ viii

1. INTRODUCTION .............................................................................................. 1

1.1 Background & Motivation ............................................................................ 1

1.2 Objective of Thesis ....................................................................................... 1

1.3 Thesis Organisation ....................................................................................... 2

2. LITERATURE REVIEW.................................................................................... 3

2.1 Occupancy Detection .................................................................................... 3

2.1.1 Passive Infrared Sensors ........................................................................ 3

2.1.2 Simulation Modelling ............................................................................ 3

2.1.3 Hidden Markov Modelling ..................................................................... 4

2.2 Mathematical Tools ....................................................................................... 4

2.2.1 Regression Algorithms ........................................................................... 4

2.2.2 Clustering Algorithms ............................................................................ 5

2.2.3 Stochastic Classifier Algorithms ............................................................ 6

3. HIDDEN MARKOV MODELS ......................................................................... 7

Page | iv

3.1 Markov Chain ............................................................................................... 7

3.2 Hidden Markov Chain ................................................................................... 8

3.2.1 Problem 1 of HMM Evaluation......................................................... 10

3.2.2 Problem 2 of HMM Decoding .......................................................... 11

3.2.3 Problem 3 of HMM Learning (Estimation) ...................................... 12

3.2.4 Problem 3 of HMM Learning (Baum-Welch) .................................. 13

3.2.5 Problem 4 of HMM Generation ........................................................ 15

3.3 Continuous HMM ....................................................................................... 16

3.3.1 Multivariate CHMM ............................................................................ 18

4. HARDWARE AND SOFTWARE IMPLEMENTATION ............................... 20

4.1 Embedded Board ......................................................................................... 20

4.1.1 Comparison of Features ....................................................................... 21

4.2 Software Development Platform ................................................................. 23

4.3 Software Architecture ................................................................................. 24

4.4 Software Implementation ............................................................................ 26

4.4.1 Serialization of HMM .......................................................................... 26

4.4.2 Scheduling Processing and Server Mutex Functions ........................... 29

5. EXPERIMENTS & RESULTS ......................................................................... 33

5.1 Dataset ......................................................................................................... 33

5.1.1 Ground Truth Value ............................................................................. 34

5.2 Experimental Setup ..................................................................................... 35

Page | v

5.3 Capability Test ............................................................................................ 35

5.3.1 Test on Evaluating Day of the Week ................................................... 36

5.4 Test on Occupancy Decoding ..................................................................... 37

5.5 Test on Occupancy Interpolation ................................................................ 38

5.6 Test on Occupancy Extrapolation ............................................................... 40

6. LIMITATIONS AND RECOMMENDATIONS .............................................. 42

6.1 Highly Correlated Data ............................................................................... 42

6.2 Dynamically Improving HMM ................................................................... 42

6.3 Decision Fusion ........................................................................................... 43

7. CONCLUSION ................................................................................................. 45

APPENDIX A: BIBLIOGRAPHY ......................................................................... A

APPENDIX B: TYPICAL AND ATYPICAL DAYS ............................................ B

APPENDIX C: SOURCE CODE SNIPPET ........................................................... C

Page | vi

LIST OF TABLES

Table 1 Comparison of features of 3 Embedded Boards ................................... 21

Table 2 Interpolation statistics for different Gap Sizes and Time Periods ........ 40

Table 3 Extrapolation statistics for different Gap Sizes and Time Periods ....... 41

Page | vii

LIST OF FIGURES

Figure 1 2-state Markov Chain ............................................................................ 7

Figure 2 2-state 3-emission Hidden Markov Chain ............................................. 8

Figure 3 Gaussian Mixture Model within a HMM ............................................ 17

Figure 4 A PandaBoard embedded device showing cable and connections ...... 23

Figure 5 UML Diagram of Software Architecture ............................................. 25

Figure 6 Directory of a serialized HMM model and training set ....................... 27

Figure 7 Serialized Contents of HMM Metadata ............................................... 28

Figure 8 Serialized Contents of HMM State Transition Matrix ........................ 28

Figure 9 Serialized Contents of a GMM emission mean (top) and covariance

(bottom) ................................................................................................................. 29

Figure 10 Layering Server and Processing functions into Network and

Application side logic ........................................................................................... 30

Figure 11 Process Flowchart for Mutex Access ................................................ 31

Figure 12 Sensor measurements across a 4-day period ..................................... 33

Figure 13 Occupancy inferred from power consumption .................................. 34

Figure 14 Log-likelihood of observation belonging to a Weekday Model ........ 37

Figure 15 Decoded Error % for Occupancy ....................................................... 38

Figure 16 Interpolation results for different Gap Sizes and Time Periods ........ 40

Figure 17 Extrapolation results for different Gap Sizes and Time Periods ....... 41

Figure 18 Data Fusion of streams into a vector before decision making ........... 43

Figure 19 Decision Fusion of streams decision upstream into a final decision 44

Page | viii

LIST OF SYMBOLS AND ABBREVIATIONS

PIR Passive Infrared [Sensor]

HVAC Heating, Ventilation, and Air-conditioning

HMM Hidden Markov Model

DHMM Discrete Hidden Markov Model

CHMM Continuous Hidden Markov Model

GMM Gaussian Mixture Model

K-Means K-Means Clustering Algorithm

Page | 1

1. INTRODUCTION

1.1 Background & Motivation

With the arrival of smart embedded systems that are capable of high processing

load, as well as lightweight sensors that can be cheaply deployed, engineers have

gained the ability to monitor and analyse our environment in greater detail than

before. Engineers can plant multiple sensor devices within the living environment

that will gather and relay measurements towards a central node for further

processing, detecting event changes in the environment as it occurs.

With so many incoming data streams, there is a great opportunity to make use of

statistical and probability models to extract greater information and detect events

that may not be readily observable. Engineers can also make use of the temporal

aspect of the data to place additional constraints on event detection, as well as

make inferences about the future. Such an event detection system could be used to

facilitate things such as abnormalities in the health of a patient, or making sense

of data through the detection of recurrent patterns.

1.2 Objective of Thesis

This thesis aims to demonstrate that statistical modelling and machine learning

can be used to effectively detect events when applied to the domain of occupancy

detection. The project attempts to model the occupancy state of a typical student

residential suite. It demonstrates that it is possible to deduce the occupancy of the

suite through secondary data provided by environmental sensors, and also make

short-term forecasts as well.

Page | 2

The scenario of occupancy detection and modelling has many practical purposes

as it allows a building facility manager to estimate human traffic loads in advance.

Such information would be valuable for safety precautions and can also be

exploited for sales and marketing purposes in shopping districts. On a smaller

scale, a facility manager of a small office can also monitor the occupancy of its

rooms and cubicles and tweak the heating, ventilation, and air conditioning

(HVAC) policies of the building to optimize energy consumption.

1.3 Thesis Organisation

For the report, Chapter 2 will be a literature review covering existing methods for

occupancy detection related to HVAC operations. The various possible

mathematical tools that may be used to help identify events are also mentioned. In

Chapter 3, readers will be presented with the basics of the HMM, which is the

statistical model that the project is using, as well as the variants of HMM that the

project has evaluated experimentally. Chapter 4 will discuss the hardware

components and the software platform that the demonstration program has been

deployed on, and also talk about the software architecture of the project. Chapter

5 presents the experimental results that have been obtained. Chapter 6 discusses

some of the issues that constrained the project and suggests further improvement

work. Finally, Chapter 7 will conclude the results and provide details on how

improvements can be made in the future.

Page | 3

2. LITERATURE REVIEW

2.1 Occupancy Detection

2.1.1 Passive Infrared Sensors

Commercially, the most popular means of occupancy detection is via the use of a

Passive Infrared (PIR) sensor. It measure the amount of infrared light (IR) that

reaches its field of view; when there is a change in the IR radiation a movement

event is registered by the sensors and that indicates the presence of an occupant.

However, such a system often generates false negatives as it assumes a non-idle

occupant.

One way to improve the PIR sensor is to pair it up with a reed switch placed on a

doorway, which can detect whether the door is open or closed [1]. Oftentimes, by

applying additional modes of measurements, further constraints are imposed onto

the detection and achieve a greater accuracy.

2.1.2 Simulation Modelling

A paper from Liao and Barooah [2] made use of machine learning and simulation

modelling. To improve sensor readings, a crowd simulation was constructed from

past observation data. Due to the complexity of the simulation, it was run off-line,

and the reduced-order statistics of the simulation results were compiled. These

reduced-order statistics were then compared with present observational data to

estimate the occupancy level. However, such a method requires a non-trivial

simulation model as well as reliable room occupancy probabilities that have been

estimated though long term observations of said environment.

Page | 4

2.1.3 Hidden Markov Modelling

The idea of using Hidden Markov Models to model occupancy is suggested in a

smart thermostat project that uses a PIR sensor and a reed switch on the doorways

[3]. The smart thermostat helps to prepare a comfortable environment for its

house occupants by pre-empting their departure from and arrival home through

the predictive ability of a HMM. The accuracy of the HMM predictive model

(88%) over reactive algorithms (78%) is a convincing proof that HMM models

can be effectively applied towards the domain of home occupancy.

2.2 Mathematical Tools

There are several mathematical tools that make use of stochastic processes,

sequence labelling, and clustering algorithms to help bring clarity to an otherwise

chaotic data collection.

2.2.1 Regression Algorithms

In regression algorithms, there are 2 notable variants of unsupervised regression

algorithms: the Independent Component Analysis (ICA) and the Principal

Component Analysis (PCA). These 2 algorithms attempt to separate a data set into

additive subcomponents, where each subcomponent is maximally independent or

has maximum variance, respectively.

An example of ICA utility is in electroencephalography (EEG). It can

automatically identify a number of channels that are statistically independent from

each other. White noise as well as EEG artefacts like ocular movement can be

identified and subtracted additively while preserving core data. However, ICA is

not as relevant in this project as it assumes that the identified channels are

Page | 5

statistically independent from each other. This is not the case environmental

data in the domain of HVAC are often heavily correlated.

One of the applications of PCA is in the field of data compression and

visualisation. Given a collection of n-dimensional vectors, PCA decomposes it

into eigenvalue and eigenvector pairs. Eigenvectors which are maximally

orthogonal are kept, the rest discarded. In that sense, a multi-dimensional, or a

multi-axial graph, has been remapped to new eigenvectors having discarded

insignificant axes of low orthogonality. The result is a collection of vectors of

lower dimensionality, but which is still able to reconstruct the original dataset.

Thus, the purpose of PCA is dimensionality reduction. While it is certainly useful

as a pre-processing step, the project does not require a lot of dimensionality (due

to limited sensor types). A simple cluster analysis, as described in section 2.2.2,

will suffice.

2.2.2 Clustering Algorithms

As mentioned in section 2.2.1, cluster analysis is a more appropriate tool to pre-

process our data. This is because PCA operates on variable reduction, while

cluster analysis works on observation reduction. Essentially, cluster analysis

groups like observations. It reduces number of unique observations into a

limited set, somewhat akin to an Analogue-to-Digital Converter (ADC) that

quantizes continuous values to discrete buckets.

One of the more well-known clustering algorithms is the K-Means Clustering

Algorithm (K-Means). By making use of a metric between members, often the

Squared Euclidean Distance, a data set is grouped into k clusters where members

of each cluster have the nearest metric. This quantization process is crucial to the

Page | 6

usage of Discrete Hidden Markov Models (DHMM) in order to minimize the

number of unique observation vectors, as described in Chapter 3.

2.2.3 Stochastic Classifier Algorithms

Stochastic classifiers attempt to put data into categorical labels based on their

stochastic attributes. Because the project is operating on data streams, there is the

added dimension of time open for exploitation which leads to the Markov Chain

which is a temporal process based on stochastic probabilities.

A Markov Chain is a system that is able to transition to one of several states

depending on a stochastic process. To classify things however, the concept needs

to be further extended to a Hidden Markov Chain (HMM). In a HMM, the states

of the system are not observable; one can only observe the system through its

emission observations, whose appearance is statistically dependent on the

underlying hidden state.

Through observations of the system over a period of time, it is possible to

decipher the underlying state transitions that have led to the observations. It is also

possible to match up observations to several models of HMM whichever model

fits the observations best would be the label under which the observations is

classified under.

Because HMM is a robust and well-studied model that classifies the data on a

temporal dimension, the project chooses HMM to be its mathematical tool to

identify incoming events from the environmental sensor data.

Page | 7

3. HIDDEN MARKOV MODELS

In this Chapter, the report presents the concepts behind the Hidden Markov Model

(HMM) in greater detail, as it is the mathematical tool the project uses to identify

and process incoming events.

3.1 Markov Chain

A Markov Chain is a discrete-time system that transitions from one state to

another via a random process. Each state has its own static, random probability of

transition, and is not affected by previous states.

Take for instance, a Markov chain modelling the breakfast habits of an individual.

It consists of 2 states which represent what the individual had for breakfast

cereal or bread. Assuming that the system is memory-less, all that dictates the

next breakfast is the transition probabilities of the current breakfast. This is

represented by the diagram Figure 1 below:

Figure 1 2-state Markov Chain

According to Figure 1, if the current breakfast is cereal, there is a 40% chance that

the next breakfast is still cereal, and a 60% chance it might be bread. If the current

breakfast is bread, there is a 30% chance the next breakfast is still bread, but a 70%

Page | 8

chance it could be cereal. This series of probabilities can be represented as a

transition matrix:

0.4 0.6 0.7 0.3

3.2 Hidden Markov Chain

A Hidden Markov Chain (HMM) extends the concept of the Markov Chain. Now,

the states of the system are hidden and cannot be observed. However, the outputs

of the state, the emissions, are observable. Again, like the states, each emission

belongs to a emission space that is different for each state, and has random

probability of emission as well.

Using the example in section 3.1, assume that the cereal and bread state has 3

possible emissions: satiety, hunger, bloated. Each state however has a different

distribution function of emissions. This is represented by the updated HMM in

Figure 2 below:

Figure 2 2-state 3-emission Hidden Markov Chain

These emissions can be represented by the following emission probability matrix:

Page | 9

0.70 0.08 0.22 0.35 0.65 0

Sometimes it is also useful to define an additional initial state distribution matrix.

The initial state distribution matrix will give the probability that the system begins

in a particular state. If for example, the system above has a 90% chance of starting

in the cereal state, then the matrix is as follows:

0.9 0.1 To illustrate the usefulness of HMM, it is helpful to refer to the 3 conventional

problems that HMM can tackle, as famously described in Rabiners 1989 paper

[4]:

1. Evaluation Given an observation sequence, find the likelihood that it was

generated by this HMM model. Useful for comparing different models

effectiveness in modelling a particular phenomenon, or classifying

phenomenon according to known models.

2. Decoding Given an observation sequence, infer the most probable

hidden state transitions that have led up to the observations given. Useful

for uncovering the hidden states of a system.

3. Learning Given an observation sequence, constructs a HMM that is most

likely to have generated such an observation sequence. Useful for creating

a model using real-world data.

All 3 problems are relevant to the project and have been implemented. There is

also a 4th problem that is implemented in the project but is not usually included in

the list of 3 HMM problems in literature:

Page | 10

4. Generation Simulates future output by running the model, or fills up

gaps in observations. Useful for anticipating future changes in the system

or interpolating lost data packets.

The report will now go into detail on how each problem can be solved via HMM.

3.2.1 Problem 1 of HMM Evaluation

As mentioned, evaluation solves the problem where, given an observation

sequence, one has to find the likelihood that it was generated by a HMM model .

Assume that the HMM being used is the same one as that in Figure 2. Now also

assume that the hidden state sequence X is cereal, bread, bread. If given an

observation sequence O of bloated, hungry, full, it is possible to calculate the

likelihood of such a sequence appearing, (,|). The formula is:

(|, ) = (|) (|) (|) = () () ()

(|) = ( ) (|) (|) = (,|) = (|, ) (|)

Problem 1 of HMM, evaluation, demands to know the likelihood of an

observation sequence O given a HMM . This is basically (|), which is the sum of (,|) for all possible permutations of observations:

Page | 11

(|) = (,|)

Through a technique called the forward-pass algorithm (), it is possible to optimize the computation time complexity of this operation, as detailed in

Stamps paper [5], reducing the equation to the following:

( ) = ( ,1, ,, = |)= 1( )1

=0 ()

(|) = 1( )1 =0

The forward-pass algorithm () essentially computes the probability that state i is observed at time t, given the partial observation sequence from 0 to time t.

Now that it is possible to evaluate the likelihood of an observation sequence, by

pairing up the observation sequence O with different HMM models , it is

possible to find the most fitting HMM model. The systems observation is

therefore classified under the label of this particular model.

max(|)

3.2.2 Problem 2 of HMM Decoding

Problem 2 of HMM, which is decoding, tries to find the most probable hidden

state transitions that have led up to the observations given. The algorithm used is a

Dynamic Programming algorithm, which permutes through all possible

Page | 12

combinations of a state at each instance of time. For example, at time = 0, the

formula is:

=0( ) = (0) Armed with calculations of every state, the time is advanced forward by one unit,

to determine the previous state j that will give the highest likelihood in the new

time instance:

=1( ) = max [=0()(1)]

This can be generalized to:

() = max

[1()()] Consequently, by finding the maximum 1() and by recording every state j during the process, the state sequence most probable in generating the given

observation sequence can be found.

3.2.3 Problem 3 of HMM Learning (Estimation)

There are 2 ways to do HMM learning, which is the construction of a HMM based

on training observation samples. The 2 methods are estimation and Baum-Welch.

Both are used in the project.

Estimation-based learning is described in Blunsoms paper as a supervised

approach to training [6]. The report, additionally, sees it as a way to build HMM

models for solving Problem 2 of HMM. This is because the project requirement

for decoding is to decode the hidden occupancy state of the system. In order to

control the labeling process of the hidden states, the learning algorithm needs to

Page | 13

be fed training observations that have been tagged with known hidden states.

Estimation process can be fed tagged training observations; Baum-Welch does not.

The theory behind estimation-based learning is simple enough. Assuming that the

training observation sets are representation of the population, the frequency of

occurrence of a particular state in the training set shall approximate that states

probability distribution function, as described mathematically:

( ) = #

Many other attributes of the HMM can be estimated similarly:

( | ) = = # & # 3.2.4 Problem 3 of HMM Learning (Baum-Welch)

The second method of doing HMM learning is to make use of the Baum-Welch

algorithm. It randomly initializes the HMM parameters, and then uses expectation

maximization to adjust the parameters of the HMM model to a local maximum.

In this project, the Baum-Welch method is preferred over the estimation method

for learning a HMM model to solve Problem 1 of HMM evaluation. Baum-

Welch method is able to learn a much more precise HMM model. The downside

is that such a model has illegible hidden states. They are not conveniently

human-labelled hidden states mapped to occupancy, as the Baum-Welch derives

states by local optimization. With illegible hidden states, they are meaningless if

decoded. However, this setback is not applicable to model evaluation (Problem 1),

as only the evaluated likelihood is relevant, not the hidden states. Hence Baum-

Page | 14

Welch learning is preferred over estimation learning for crafting HMM models for

Problem 1.

What follows is an explanation of the Baum-Welch algorithm.

Note: The derivation of the Baum-Welch algorithm is rather complicated, and one

may wish to skip this section if desired.

Firstly, one needs to define 3 more parameters: the backward pass algorithm (), the gamma (), and the xi (, ). The backward pass () is similar to the forward pass (), except that instead of using the partial observation sequence from 0 to time t, it uses the partial

sequence from time t to the final time T-1 to find the probability of seeing state i

at time t:

( ) = (+1,+2, ,1| = , ) = 1, = 1 (+1)+1( )1

=0 , 0 < 1 The gamma () represents the probabilities of the current state being state i. If a state i gives maximum value at time t, then that state is the most likely state at

time t.

( ) = ( = |, ) = ()() ()()1 =0

The final parameter to define is the xi (, ), which is the probability that current state is state i and next state is state j at time t.

Page | 15

(, ) = ( , + 1|, ) = ()(+1)+1() ()(+1)+1()1=01 =0

() = (, )1=0

From these 3 parameters, it is possible to obtain estimates of the HMM models:

= 0() = (, )2

=0

()2=0

() = (){0,1,,2}

=

()2=0

The learning algorithm is thus implemented as follows:

1. Initialize random model parameters for , A and B

2. Compute the intermediate parameters (),(), (, ),() 3. Re-estimate the model parameters , A and B by using the intermediate

parameters

4. Check improvement of (|). If does not meet requirements, repeat Step 2 again.

3.2.5 Problem 4 of HMM Generation

In the projects self-defined Problem 4 of HMM, a simulation of the HMM

generates a non-deterministic future states and emissions of the system. It can be

used to extrapolate and simulate the future. Or it can be used to interpolate

Page | 16

and fill up gaps in the knowledge of past states of the system, for instance when

data packets are lost or unrecoverable during network propagation.

For extrapolation, a Gaussian distribution is used to produce values from 0.0 to

1.0. Depending on the output, the system is advanced to the corresponding state

based on the transition matrix. Another Gaussian roll is used to determine the

emission from that state. This process continues until the desired forecasted length

is reached. A random approach is used for extrapolation to ensure that the process

is non-deterministic.

For interpolation, the start and the end hidden state are known. An exhaustive

permutation of all possible states in-between is conducted. The state sequence

with the highest likelihood of appearing is thus selected.

3.3 Continuous HMM

The previous HMM model described in earlier sections of Chapter 3 was that of a

Discrete HMM (DHMM) model. It is termed discrete because the emission

symbols are allowed to take on only specific values, for instance hungry and

bloated. A continuous emission would however take on intermediate values like

0.5 feel of hungry, or 0.751 feel of bloated-ness.

This can be achieved by representing the emissions of each state not as a

collection of scalar values, but as a collection of Gaussian distributions, also

known as a Gaussian Mixture Model (GMM). Each GMM contains a vector of

weights, which dictates the weightage of each Gaussian distribution within.

Because Gaussian distribution has a continuous-valued distribution function, it

can represent a range of observations without needing to discretize it. Thus, a

Page | 17

HMM that uses a GMM as its emission symbol is called a Continuous HMM

(CHMM).

To determine the probability of emission of an observation, the observation is fed

into the GMM. The observation is fed into each Gaussian distribution in-turn, and

the resultant probability gathered using a weighted sum. That sum is the

probability of emission. This is illustrated in Figure 3 below:

Figure 3 Gaussian Mixture Model within a HMM

In Figure 3, assuming an observation x was at the state and GMM illustrated. The

probability of emission will be calculated as so:

(| ) = 0.2 (, 1, 1) + 0.5 (, 2, 2) + 0.3 (, 3, 3) For example, an observation that is highly similar to Gaussian 2 will return a near

unitary value, but a GMM also dictates the probability of that values occurrence,

so it is normalized to a factor of 0.5.

Page | 18

3.3.1 Multivariate CHMM

One of the additional benefits of using CHMM is its ability to take in observation

vectors, observations with more than 1 dimension. This is because a Multivariate

Normal distribution can be utilized instead, as indicated in Jacksons HMM

tutorial [7]:

(,) = 1(2)|ik| exp ( )ik1( )2

Conveniently, the multivariate case reduces to a single-variate distribution when

the number of dimensions is 1.

However, one of the great difficulties of using a multivariate normal distribution

is due to the presence of an inverse covariance matrix ik1 within the formula. A

matrix is non-invertible or singular when its rows are correlated. This can arise

especially when the training data sets are insufficient or too highly correlated.

A singular matrix is not invertible, so the pseudo-inverse matrix is used instead.

One well known pseudo-inverse is the Moore Penrose Pseudo-Inverse obtained

using Singular Value Decomposition (SVD) for solving linear equations. Here

matrix A is decomposed into the following:

= = [1 3] 0 00 0 [1 3] Where A is the M*N matrix to decompose.

U is an M*M matrix. Mathematically U contains

columns of eigenvectors of AAT.

D is an M*N diagonal matrix. Mathematically, each diagonal is the

Page | 19

singular value of A. 2 is the eigenvalue of AAT or ATA.

V is an N*N matrix. Mathematically V contains

columns of eigenvectors of ATA.

If the diagonals of matrix D are not all non-zero, the matrix A is singular. The

pseudo-inverse is thus defined as:

+ = 01 00 0 Conveniently, + = 1 if A is a non-singular matrix. The pseudo-determinant also has to be defined for calculating the determinant of

the pseudo-inverse matrix. It is the product of all non-zero diagonal values in the

diagonal matrix D of the SVD:

||+ = =1

Recall that the original multivariate Gaussian distribution was defined as:

(,) = 1(2)|ik| exp ( )ik1( )2

Now, the modified distribution for singular covariance matrix is:

(,+ ) = 1|2ik|+ exp ( )ik+ ( )2

Again, the singular multivariate normal distribution conveniently gives the same

result as its non-singular counterpart when the covariance matrix is non-singular.

Page | 20

The underlying take-away lesson is the theoretical reason why the project only

starts learning new HMM models when a large number of observations are

collected for training, and will fail if there are insufficient observations. It also

highlights the fact that the HMM model may occasionally fail simply because the

training observations were, by chance, highly correlated.

This flaw can be mitigated through careful selection of training observation sets,

as this project has done.

4. HARDWARE AND SOFTWARE IMPLEMENTATION

In this Chapter the project describes its hardware selection choices, and also

details its software implementation.

4.1 Embedded Board

The projects objective is to get data from tiny, distributed sensors. One of the

possibilities for the future of the project is to explore building a networked

collection of processing nodes that does layer-by-layer event processing, and

relays the decision events up a hierarchical computing framework. Under such a

vision, the processing nodes are situated locally in close proximity to the sensor

nodes, and do real-time event processing at the source. As a result, there are

certain requirements for the hardware of this processing node.

Firstly, it should be an embedded platform this reduces the hardware and

deployment costs, allowing a hypothetical project budget to buy in quantity and

improve the processing-node-to-sensor-node ratio; fewer sensor nodes per

processing node. It would be impractical to purchase a desktop PC in comparison.

Page | 21

Secondly, the embedded board should have a powerful CPU as the HMM

algorithm can be quite computationally intense.

Thirdly, the embedded board should be well-supported and backed by an active

and mature community, which would potentially allow it to interoperate with

more types of sensors.

4.1.1 Comparison of Features

Below are 3 different embedded boards that were under consideration in the

project. The core metric here is CPU capability as well as sensor interoperability.

PandaBoard ES BeagleBone Black Raspberry Pi CPU Dual-core ARM Cortex A9 up to 1.2GHz each ARM Cortex A8, 1.0 GHz ARM, 700MHz GPU SGX540 with OpenGL ES 1.1, 2.0, OpenVG 1.1, EGL 1.3 SGX530 with 3D acceleration Broadcom Videocore IV, OpenGL ES 2.0 Operating System Ubuntu, Android Ubuntu, Android Custom Debian/Fedora, Android A/V I/O HDMI out 3.5 Audio out Stereo audio input HDMI HDMI out 3.5 Audio out Stereo audio input Memory 1Gb RAM DDR2 SD/MMC 512Mb RAM 2Gb Flash SD/MMC 512Mb RAM SD/MMC Connectivity WiFi Bluetooth - Ports Ethernet 3 USB 2.0 Ethernet 1 USB, 1 Mini-USB Ethernet 2 USB, 1 Mini-USB power Power 440-710mA1 210-460 mA @ 5V 700 mA @ 5V Cost USD$182 USD$45 USD$35

Table 1 Comparison of features of 3 Embedded Boards

1 http://www.omappedia.org/wiki/Panda_Test_Data

Page | 22

In terms of CPU capability, the Raspberry Pi loses out because it does not have

enough computation power, while the PandaBoard ES is clearly superior in that

respect, boasting a dual-core processor with a 1 Gb RAM.

In terms of interoperability, the Operating System (OS) is examined. A large

derivative OS would be convenient for 3rd party libraries as well as pre-built

compatibility libraries offered by the sensor manufacturers. Again, Raspberry Pi

loses out, as it only supports Android and a custom derivation of Debian and

Fedora, unlike the other 2 boards which has Ubuntu, a very popular Linux

distribution.

Another metric relevant to interoperability is the number of ports available on the

boards. PandaBoard ES supports 3 USB ports which is the highest amongst the

boards listed.

As the PandaBoard ES consistently ranks ahead in all of the important metrics, it

is the projects embedded board of choice.

Page | 23

Figure 4 A PandaBoard embedded device showing cable and connections

In Figure 4 above, a PandaBoard is shown connected with requisite hardware. At

the bottom, an SD Card which contains the Ubuntu OS. On the right, a RS232

cable that allows a developmental PC to connect to PandaBoard via a virtual

terminal in headless mode.

At the top, from the right, is the 5V power cable, followed by an Ethernet cable

which shares an Internet connection from the developmental PC via LAN. The

USB ports are also located beneath it. And finally, a HDMI cable that provides

graphical output to an attached monitor.

4.2 Software Development Platform

The PandaBoard ES offers Ubuntu and Android as operating systems, but the

project decided on Ubuntu as it is more full-featured and has a software repository

manager which makes it easy to pull and install software packages. The Ubuntu

Page | 24

version used was 12.04 Precise Pangolin LTS, the most recent pre-built version

available for PandaBoard.

Developmental work was done not on the PandaBoard ES, but on an Intel laptop

running Ubuntu 13.10 Saucy Salamander. Because both the Pandaboard ES and

the development platform are Linux systems, so the source code is portable across

systems. It is much faster to compile and test on a more powerful processor.

Similarly, although it is possible to run a POSIX environment on a Windows

laptop via Cygwin compatibility layer, it is much more efficient to work natively

on Linux on the development laptop, and compilation times are faster by an order

of magnitude.

The language of choice was C++, as the application needs to be able to run fast on

the PandaBoard ES. Several 3rd party libraries were used. Firstly, Armadillo, a

C++ matrix and linear algebra library was used. Secondly, Boost, a C++ utility

library, was included. Boost is a requisite for Armadillo, but it also provided many

convenient container classes to supplement the traditional vector and dictionary

classes in the C++ Standard Template Library. Finally, mlpack++, a C++ machine

learning library, was also imported, providing basic HMM algorithms, a

clustering algorithm and an implementation of a GMM.

4.3 Software Architecture

The software architecture has been separated into several different namespaces

and classes, as illustrated in Figure 5 below:

Page | 25

Figure 5 UML Diagram of Software Architecture

In Figure 5, the calc namespace holds two child namespaces, func and model. The

calc::model::hmm namespace contains classes that are used to define and model a

HMM as well as the distribution function, whereas the calc::model::map

namespace has several mappers that are used to discretize observations from

continuous-valued to specific discrete values. Meanwhile, in the calc::func::hmm

namespace, there is a helper class that helps to serialize and de-serialize the HMM

classes from running memory into local file storage.

Page | 26

The util namespace contains several helper methods for basic operations not

present in the C++ API.

Finally, the test namespace contains the test and experiment routines used to

evaluate the performance of and demonstrate the capability of the system.

4.4 Software Implementation

4.4.1 Serialization of HMM

Serialization is the process of converting a data structure and its object state into a

format that can be easily transmitted and reconstructed. In the project,

serialization is used to save the HMM models locally on the SD card. By saving

the HMM models, the models do not need to be retrained on program startup

every time. Moreover, it becomes possible to train a comprehensive model offline

on a powerful computer, then deploy the model onto the target PandaBoard

instantly, reducing deployment time and improving detection accuracy.

A HMM model has several attributes: the state transition matrix, and the emission

probability matrix. For a CHMM model, instead of the emission probability

matrix, there is a state-specific GMM instead. This leads to one GMM per state.

Within each GMM, there are further attributes: the weight, the mean as well as the

covariance matrix of each Gaussian distribution. Finally, to enable successful

reconstruction, several metadata properties of the HMM are also saved.

On top of saving the HMM model, it is also possible to save a training observation

set. Oftentimes the training set is rather large, and it is tedious to keep retrieving

the entries manually from the CSV (an ASCII comma-separated text file) database.

By serializing the training set, it is possible to make it as portable as the HMM

Page | 27

model and allow operators to reinitialize/retrain the HMM model with a different

set of parameters.

One beneficial side-effect is that serialization produces a human-readable ASCII

format, which also makes it easy for a human operator to analyze the HMM

offline. A sample of each of the various serialized data is presented here to

demonstrate the concept better.

In Figure 6, the name of the system is airconTester. As this is the 1st HMM

model in the system, it is given an index of 0, hence airconTester0.

Figure 6 Directory of a serialized HMM model and training set

The training sets are serialized with a .train extension. Since there are 3

files: .train0, .train1, and .train2, it means that 3 windows of training observation

data are used to construct this HMM model. .trainMeta contains the metadata,

which is shown in Figure 7 below.

Page | 28

Figure 7 Serialized Contents of HMM Metadata

Because the model is to be trained to be used for Decoding, Estimation-based

HMM Learning is used. That means the observation values need to be tagged with

occupancy values the ground truth. Here they are stored

in .trainStates0, .trainStates1, .trainStates2, one for each window of training

observation data.

As for the HMM model itself, the state transition matrix is stored in .hmmTrans

and is a 3x3 (3 state) matrix, as seen in Figure 8 below.

Figure 8 Serialized Contents of HMM State Transition Matrix

There are 3 states and 8 emissions per state. For instance, .hmmEmit0Covar1

represents the covariance matrix of the 2nd Gaussian of the 1st state;

.hmmEmit2Mean5 represents the mean vector of the 6th Gaussian of the 3rd

state. A preview of a serialized covariance matrix and mean vector is shown in

Figure 9 below.

Page | 29

Figure 9 Serialized Contents of a GMM emission mean (top) and covariance (bottom)

4.4.2 Scheduling Processing and Server Mutex Functions

The project application, as a server, has to be able to accept a continuous input

stream from more than one sensor input. At the same time, the application has to

perform time-consuming HMM calculations. These 2 processes should be

designed not to interrupt each other.

The way the project does it, is to split the Server and the Processing aspects of the

application, essentially layering the network and the application side logic, as seen

in Figure 10 below. The server will concentrate on receiving the UDP packets

being transmitted from the sensors. The server is a Python UDP server that

continually flushes any received packets into a mutex-ed shared file. The HMM

application will periodically check into the mutex-ed shared file to see if any

additional data packets are received, and retrieve them if so.

Page | 30

Figure 10 Layering Server and Processing functions into Network and Application side logic

The mutex lock is achieved using the POSIX API flock(), which guarantees any

POSIX-compliant access to be mutually exclusive. Because flock() is also fed

with the LOCK_NB flag, the mutex request operation is non-blocking. This

workflow is further illustrated upon in Figure 11 below. Upon failure of the non-

blocking mutex request, the server will temporarily save the packet data in a file

buffer. But if the mutex request is successful, the data in the packet, as well as any

previous packet data in the file buffer, will be transferred together to the shared

file. This ensures that the server continues receiving packets even when the mutex

fails.

On the HMM application side, the application will continuously poll the shared

file for data. If it is successfully in the non-blocking mutex request, it obtains new

Page | 31

data and is able to calculate. The mutex is freed at the first possible instance.

However, if the mutex request failed, the HMM will simply continue polling.

Figure 11 Process Flowchart for Mutex Access

The project recognises the potential downside to this mutex process. If there are a

lot of packet arrivals on the server, the servers mutex request may swamp out the

HMM application; the mutex lock is dominantly held by the server and the HMM

does not have time to access.

Page | 32

However this issue is mitigated by the fact that packet arrivals are not as frequent.

Even with multiple sensors sending incoming packets, it will not swamp the

mutex lock. This is because each sensor, identified by a MAC Address metadata

attached in the UDP packet, is assigned a unique mutex shared file, as seen in

Figure 10. Hence no single mutex lock can be swamped the server is a single-

threaded application that can only hold a single lock at any one time; the HMM

application will have plenty other data from other sensors to process from.

Page | 33

5. EXPERIMENTS & RESULTS

5.1 Dataset

The dataset (archive dataset) used in the experiments is environmental data

taken from a student residential suite. It consists of readings taken continuously

for 24 hours at minimum, spread across a total of 3 months.

At the end of the collection, 34 days worth of environmental data is collected. It

contains measurements of temperature, humidity, luminosity and noise, sampled

once every 10 minutes. A small 4-day window of the dimensions is illustrated in

Figure 12 below.

Figure 12 Sensor measurements across a 4-day period

The various days are also labelled according to whether they were typical or

atypical days, which would impact occupancy. For instance, typical days were

days where it was not a public or school holiday, nor sandwiched between

atypical days. For more information, please refer to Appendix B.

Page | 34

5.1.1 Ground Truth Value

In the sensor measurements, there was no measurement of occupancy. Hence, an

alternative method of verification was sought.

Using the power consumption measurements which also came as the 5th modality

in the data set (but were not included as the HMM inputs), it is possible to infer

the Ground Truth Value of occupancy. Any increase in the power consumption of

the room is regarded as an indicatory of occupancy. However, that only gives a

binary value of occupancy with multiple jitters due to the slow and discrete

movement of the power meter. To smoothen out the jitters, a digital Gaussian

moving average filter of window size 5 was applied forward and backwards to the

power readings, and the result is a power meter consumption reading of 3 quanta,

as illustrated in Figure 13 below.

Figure 13 Occupancy inferred from power consumption

Page | 35

5.2 Experimental Setup

A laptop is setup to simulate a wireless sensor. A Python application is run on the

laptop, streaming values from the archive dataset over a UDP connection. All 4

modalities of the dataset are streamed together. Attached in every packet is a

mock MAC address to identify the sensor.

A PandaBoard ES is set up on the other end, running a Python server that accepts

the incoming sensor observations. The HMM application is also running on the

PandaBoard ES to process in real-time the incoming packet data.

The HMM model that the application is executing, has been set-up and trained

before the start of the experiment. For each experiment, the report will specify the

section of the dataset that was used for training as well as for testing.

The HMM application had its source code developed on an Ubuntu system

beforehand, before being transported over to the PandaBoard ES and compiled.

This is because the machine instruction set is different; the development machine

runs on Intel, while the PandaBoard ES runs on ARM.

5.3 Capability Test

In order to properly understand the capabilities of the HMM application, an

introductory test was applied. The HMM application was challenged to identify

the day of the week when a dataset is fed, for example, was the day a Monday or a

Sunday. This test is indirectly related to the issue of occupancy detection, as the

day has an influence on the occupancy of a room.

Page | 36

5.3.1 Test on Evaluating Day of the Week

For this test, a 4-state, 4-emission DHMM model is used. Out of the 4 modalities,

only luminosity is considered. The model is fed 2 typical weekdays worth of

observations, hence the model represents a typical weekday HMM. The model is

then matched against 24-hour observation data from 12 different days.

In Figure 14 below, the results of the test are presented. To interpret the graph,

know that the more negative the log-likelihood, the less probable the observation

belongs to the weekday HMM model. The results illustrate that the model is

unable to differentiate between weekday and weekend observations. This may be

due to the fact that the environmental conditions between a weekday and a

weekend are not very different in terms of luminosity.

However, the model does report a marked difference between typical and atypical

days. Atypical days were defined as days where special events such as exam

period, school holidays, or public holidays occurred. Those are days upon which

the occupancy level will be affected. This finding hence suggests that luminosity

is to a certain extent dependent on occupancy, but on its own the results are not

obvious, and that weekdays are similar to weekends generally.

Page | 37

Figure 14 Log-likelihood of observation belonging to a Weekday Model

5.4 Test on Occupancy Decoding

In this experiment, the accuracy of the HMM algorithm is tested to see if it can

correctly identify the occupancy of the system. The actual occupancy of the

system is known to the author, not to the system, but it would be good to compare

this actual occupancy (ground truth) against the decoded occupancy.

A CHMM model of 3-states, 9-emissions is used. The 3 states correspond to fully-

occupied, mildly-occupied, and unoccupied. The CHMM model needs to be a

representative sample of the system; hence it is trained with 10 days worth of

data taken at 3 day intervals across all 34 days of archived sensor data. All 4

modalities are used: temperature, humidity, luminosity and noise. The result of

testing the CHMM model against every single day of sensor observation is shown

in Figure 15 below:

Page | 38

Figure 15 Decoded Error % for Occupancy

The light grey bars signify observations that have been used to train the CHMM

model. The dark grey bars signify novel observations that the CHMM has no

knowledge of.

The performance of the system is generally good as even in the worst case

anomalous observations, the error rate never exceeds 65%. The mean error rate

inclusive of the training set is 21.6%, exclusive is 23.2%. The median error rate,

which is less sensitive to outlier values, is 17.1% inclusive, 17.8% exclusive. The

sample variance of the error % is 0.0268 inclusive, 0.0277 exclusive.

5.5 Test on Occupancy Interpolation

For this test, the HMM problem of Decoding is employed to decipher the

underlying hidden states of the system, and interpolate between them. The actual

occupancy of the system is known to the author, not to the system, but it would be

Page | 39

good to compare this actual occupancy (ground truth) against the decoded

occupancy.

The assumption is that the system has already successfully decoded the start and

end point of the interpolation sequence, but is missing a particular segment in

between, which could be due to packet data discarded due to errors, dropped

packet data, or the wireless sensors suffering a temporary hardware failure.

A CHMM model of 3-states, 7-emissions is used. The 3 states correspond to fully-

occupied, mildly-occupied, and unoccupied. Again, the CHMM model needs to be

a representative sample of the system; hence it is trained with 10 days worth of

data taken at 3 day intervals across all 34 days of archived sensor data. All 4

modalities are used: temperature, humidity, luminosity and noise.

The variables in the experiment are: the gap size to be interpolated, as well as the

time period across which interpolation is done. Both results are illustrated in

Figure 16 and Table 2 below.

The experiment shows that the interpolation process is quite accurate. For instance

in short gap sizes of 30 min to 1 hour, which is closer to the kind of gap sizes one

would expect in a real-life scenario, the average and median error rate never

exceeds 16.7%.

Page | 40

Figure 16 Interpolation results for different Gap Sizes and Time Periods

Table 2 Interpolation statistics for different Gap Sizes and Time Periods

5.6 Test on Occupancy Extrapolation

The test on occupancy extrapolation will show how effective the HMM model is

at predicting the future occupancy of the residential suite. In doing so, it is

possible, for instance, a building administrator to forecast the power consumption

of the building and reduce power consumption.

Page | 41

Again, a CHMM model of 3-states, 7-emissions is used. The 3 states correspond

to fully-occupied, mildly-occupied, and unoccupied, trained using 10 days worth

of training data selected uniformly across the archived data set of 34 days. All 4

modalities are used: temperature, humidity, luminosity and noise.

Similar to the interpolation experiment, the variables in the experiment are: the

gap size to be extrapolated, as well as the time period across which extrapolation

is to be done. Results are illustrated in Figure 17 and Table 3 below:

Figure 17 Extrapolation results for different Gap Sizes and Time Periods

Table 3 Extrapolation statistics for different Gap Sizes and Time Periods

Page | 42

The results demonstrate that extrapolation generally works well with less than 30%

error rate for varying window sizes.

6. LIMITATIONS AND RECOMMENDATIONS

6.1 Highly Correlated Data

As discussed in Section 3.3.1, one of the limitations is that the observation data

may be highly correlated and result in a non-invertible correlation matrix. That

would force the usage of the multivariate CHMM to become unusable, as it is no

longer possible to computer the probability distribution of the multivariate normal

distribution.

Despite the usage of the pseudo-inverse derivation, occasionally the HMM

algorithm still does not compute as the rank of the matrix is simply too low. The

project sidesteps the limitation by re-learning the HMM model using the same

training data due to the non-deterministic property of learning, it is possible to

derive a valid correlation matrix with one to two retries.

Still, there is no guarantee that a successful CHMM model can be learnt on the

first try. Even more drastically, there is also no guarantee that with enough retries,

a successful CHMM can be relearnt. As a result, it is inevitable that human

operator intervention is necessary to make sure that a valid CHMM is prepared.

6.2 Dynamically Improving HMM

A possible improvement to the project would be to improve the HMM model

based on input readings. This improvement can be of two methods:

Page | 43

1. Supplement the existing learning data set by including the latest

observation data

2. Replace old learning data with the latest observation data

The choice of which method to use will depend on if the system that is being

observed is a highly dynamic system. If changes in the system are gradual, then it

would make sense to expand the HMM model with more training data. If the

system is an evolutionary one, then old training data would quickly become

irrelevant, and one should use the replacement method instead.

6.3 Decision Fusion

In Event Processing literature, what the project currently does is Data Fusion. In

Data Fusion, multiple modalities of data are fused together to form a single n-

dimension vector, as illustrated in Figure 18. So instead of multiple streams of

scalar data, there is a single stream of vector data. The benefit of Data Fusion is

that it is very attractive when there are very few sensor streams involved, as it

gives the highest accuracy and performance [8].

Figure 18 Data Fusion of streams into a vector before decision making

Page | 44

There is however, Decision Fusion as well, which could be a possible track of

investigation for future projects. Decision Fusion is when each stream of scalar

data has its own event processing node. This results in multiple streams of

decisions, each originating from a single stream of scalar. The multiple streams of

decisions are then fused into a single stream of vector decisions and further

computed as seen in Figure 19. At this point it resembles Data Fusion.

Decision Fusion is like a multi-tiered decision tree, where decisions are made

locally at the source, and then transmitted upstream where it is collated with other

decisions. It scales better in terms of computational and communications

complexity when more streams are added, as the transformation from data to

decision reduces the data complexity, akin to a data compression.

Figure 19 Decision Fusion of streams decision upstream into a final decision

Page | 45

7. CONCLUSION

In the thesis, the mathematical derivation and problem solving abilities of the

Hidden Markov Model have been explained, including the 3 conventional

scenarios of HMM: Evaluation, Decoding and Learning.

A well-trained HMM represents a physical system. Through HMM, it becomes

possible to categorically label an observation under a particular model. One can

also decode the underlying states that the system had gone through, based on the

observations given. Using a representative sample of observations, a HMM model

can also be constructed through likelihood maximization. And with a complete

HMM model, it becomes possible to simulate observations and forecast the future

state and observation of the system.

The thesis also demonstrates that when applied to the domain of occupancy

detection, the HMM algorithm is able discover the underlying occupancy state of

the environment using 4 modalities: temperature, humidity, luminosity and noise.

It is able to give the correct underlying occupancy with an error rate of 23.2% on

average. For interpolating between gaps in known occupancy states, it can

minimize its error rate to approximately 25.2% for long gaps and 12.5% for short

gaps. For forecasting future occupancy, the HMM model is accurate to within a

28.9% error rate for 12 hour extrapolations.

Page | A

APPENDIX A: BIBLIOGRAPHY

[1] Y. Agarwal, B. Balaji, R. Gupta, J. Lyles, M. Wei, and T. Weng, Occupancy-driven energy management for smart building automation, in Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building, 2010, pp. 16.

[2] C. Liao and P. Barooah, An integrated approach to occupancy modeling and estimation in commercial buildings, in American Control Conference (ACC), 2010, 2010, pp. 31303135.

[3] J. Lu, T. Sookoor, V. Srinivasan, G. Gao, B. Holben, J. Stankovic, E. Field, and K. Whitehouse, The smart thermostat: using occupancy sensors to save energy in homes, in Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems, 2010, pp. 211224.

[4] L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE, vol. 77, no. 2, pp. 257286, 1989.

[5] M. Stamp, A revealing introduction to hidden Markov models, Dep. Comput. Sci. San Jose State Univ., 2004.

[6] P. Blunsom, Hidden markov models, Lect. Notes August, 2004. [7] Jackson, HMM tutorial 4. [Online]. Available:

http://www.ee.surrey.ac.uk/Personal/P.Jackson/tutorial/. [8] R. R. Brooks, P. Ramanathan, and A. M. Sayeed, Distributed target

classification and tracking in sensor networks, Proc. IEEE, vol. 91, no. 8, pp. 11631171, Aug. 2003.

Page | B

APPENDIX B: TYPICAL AND ATYPICAL DAYS

Page | C

APPENDIX C: SOURCE CODE SNIPPET

Not all source code has been provided here, as the total source code amounts to

more than 4000 lines of C++ code and would be quite impossible to place here.

// ======================================================== // HMMx.cpp // * implements HMM functions // ======================================================== #include "HMMx.hpp" #include #include #include "../../../util/timer.hpp" using std::cout; inline bool generateNewInterpolateState(arma::Col &states, int numStates); template void HMMx::interpolate(int guessLen, unsigned int prevState, unsigned int forwState, arma::mat &guessObservations, arma::Col &guessStates, bool isCHMM) { TM_START; /** * Brute-force checking * We will only permute up to 8 states */ if (guessLen > 8) { // Generate the excess int toFillIn = guessLen - 8; arma::mat frontDataSeq; // remove dim arma::Col frontStateSeq(toFillIn); if (isCHMM) { HMMx *chmm = (HMMx*)this; chmm->Generate(toFillIn, frontDataSeq, frontStateSeq, prevState); } else this->Generate(toFillIn, frontDataSeq, frontStateSeq, prevState); // Interpolate the rest arma::mat backDataSeq(guessObservations.n_rows, 8);

Page | D

arma::Col backStateSeq(8); this->interpolate(8, frontStateSeq(toFillIn-1), forwState, backDataSeq, backStateSeq, isCHMM); guessObservations.cols(0, toFillIn-1) = frontDataSeq; guessObservations.cols(toFillIn, guessLen-1) = backDataSeq; guessStates.rows(0, toFillIn-1) = frontStateSeq; guessStates.rows(toFillIn, guessLen-1) = backStateSeq; } else { double bestLikelihood = 0; double currLikelihood = 1; arma::Col bestTrial(guessLen, arma::fill::zeros); arma::Col currTrial(guessLen, arma::fill::zeros); int numStates = this->Transition().n_cols; while (true) { // Iterate through another state bool validity = generateNewInterpolateState(currTrial, numStates); if (!validity) // no more new states available break; // Evaluate probability currLikelihood = 1; for (int i=0; iTransition()(currTrial(i), prevState); else currLikelihood *= this->Transition()(currTrial(i), currTrial(i-1)); } currLikelihood *= this->Transition()(forwState, currTrial(guessLen-1)); // Evaluate probability (is it better?) if (currLikelihood > bestLikelihood) { bestLikelihood = currLikelihood; bestTrial = currTrial; } } guessStates = bestTrial; // generate emissions

Page | E

for (int i=0; iEmission().at(guessStates(i)); val = gmm.Random(); } else val = this->Emission().at(guessStates(i)).Random(); guessObservations.col(i) = val; } } TM_STOP; PRINTTIME; } inline bool generateNewInterpolateState(arma::Col &states, int numStates) { // backtracking int i = states.n_rows-1; while (true) { if ((int)states(i) != numStates-1) // if we still haven't iterated all for curr state index { states(i) ++; for (unsigned int j=i+1; j

Page | F

// ======================================================== //HMMx.hpp // * header file for HMMx.cpp // ======================================================== #ifndef HMMX_HPP_ #define HMMX_HPP_ #include #include #include #include #include #include "distribution/DiscreteDistri.hpp" using namespace mlpack::hmm; using namespace mlpack::gmm; /** * Changes are: * - Transition states by default are no longer homogeneous. */ template class HMMx : public HMM { bool isCHMM; public: HMMx(const size_t states, const Distribution emissions, bool isCHMM, const double tolerance = 1e-5): HMM(states, emissions, tolerance) { this->isCHMM = isCHMM; double variance = this->Transition().at(0) * 0.1; srand(time(NULL)); for (unsigned int i=0; iTransition().size(); ++i) { if (rand()%2 == 0) this->Transition().at(i) += variance * rand() / RAND_MAX; else this->Transition().at(i) -= variance * rand() / RAND_MAX;; } // normalise for (unsigned int i=0; iTransition().n_cols; ++i) {

Page | G

double sum = accu(this->Transition().col(i)); this->Transition().col(i) /= sum; } } HMMx(const arma::mat& transition, const std::vector& emission, bool isCHMM, const double tolerance = 1e-5): HMM (transition, emission, tolerance) { this->isCHMM = isCHMM; } /** * @return 1 if GMM, 0 if not. * * Abandoned. You need to uncast it from a pointer or you access invalid memory anyway */ /*int distributionType() const { if (isCHMM) return 1; else return 0; }*/ /** * Assuming you have a break in observation results, interpolate will reconstruct the missing bits for you. */ void interpolate(int guessLen, unsigned int prevState, unsigned int forwState, arma::mat &guessObservations, arma::Col &guessStates, bool isCHMM); void interpolate(int guessLen, const arma::mat &prevObservations, const arma::mat &forwObservations, arma::mat &guessObservations, arma::Col &guessStates, bool isCHMM) { arma::Col prevStates; this->Predict(prevObservations, prevStates); arma::Col forwStates; this->Predict(forwObservations, forwStates); this->interpolate(guessLen, prevStates(prevStates.n_rows-1), forwStates(0), guessObservations, guessStates, isCHMM); } }; #endif /* HMMX_HPP_ */

Page | H

Page | I

// ======================================================== // HMMFunc.cpp // * saves the HMM model to disk and loads it // ======================================================== #include "HMMFunc.h" #include #include #include #include #include "../../../util/fileExists.hpp" #include "../../model/hmm/HMMx.hpp" #include "../../model/hmm/metadata.h" #include "../../model/hmm/distribution/DiscreteDistri.hpp" #include "../../model/map/mapper_kmeans.h" #include "../../model/map/mapperMv_kmeans.h" using std::string; using std::vector; using std::ifstream; using std::ostringstream; vector getAvailable(char* searchStr, const HMMFunc *hmmFunc); // --- // Properties // --- /** Get a list of all available models of HMM that we can load. */ vector HMMFunc::getAvailableModels() const { return getAvailable((char *) "hmm", this); } /** Get a list of all available training sets that we can use. */ vector HMMFunc::getAvailableTrains() const { return getAvailable((char *) "train0", this); } /** Get a list of all available stuff that we can load. */ vector getAvailable(char* searchStr, const HMMFunc *hmmFunc) { vector results; int lastResult = 0; for (int i=0; i

Page | J

if (file_exist(fileURI.str().c_str())) { results.push_back(i); lastResult = i; } // DEBUG /*else std::cout

Page | K

// Will always be false unless there is one True // --- if (states != NULL) { sprintf(fileURI, "%s%d.trainStates%d", this->getFileSaveName().c_str(), trainIndex, i); result |= !(states->at(i).save(fileURI, arma::arma_ascii)); } } // Searches for any file after this index and deletes it // Important because this is our indicator for vector termination sprintf(fileURI, "%s%d.train%d", this->getFileSaveName().c_str(), trainIndex, (int)data.size()); if (file_exist(fileURI)) remove(fileURI); // If there is no state info, we make sure there is no state file saved as well if (states == NULL) { sprintf(fileURI, "%s%d.trainStates0", this->getFileSaveName().c_str(), trainIndex); if (file_exist(fileURI)) remove(fileURI); } std::cout

Page | L

sprintf(fileURI, "%s%d.trainStates0", this->getFileSaveName().c_str(), trainIndex); if (file_exist(fileURI)) states = new vector(); for (int i=0; ; ++i) { sprintf(fileURI, "%s%d.train%d", this->getFileSaveName().c_str(), trainIndex, i); // Check if there is any matrices left to read if (!file_exist(fileURI)) break; // Load it! arma::mat myMatrix; myMatrix.load(fileURI, arma::arma_ascii); data.push_back(myMatrix); // --- if (states != NULL) { sprintf(fileURI, "%s%d.trainStates%d", this->getFileSaveName().c_str(), trainIndex, i); arma::Col colvec; colvec.load(fileURI, arma::arma_ascii); states->push_back(colvec); } } // Metadata loading HMM_meta metadata; { sprintf(fileURI, "%s%d.trainMeta", this->getFileSaveName().c_str(), trainIndex); FILE* fp = fopen(fileURI, "r"); // obtain filesize fseek(fp, 0, SEEK_END); long lsize = ftell(fp); rewind(fp); char *buffer = new char[lsize]; fread(buffer, sizeof(char), lsize, fp); string text = string(buffer); metadata = HMM_meta::fromString(text); delete [] buffer; } return metadata;

Page | M

} // --- // --- /** Load HMM model from local file */ HMM_meta HMMFunc::load(HMMx* &hmm, int hmmIndex) const { // Check if index exists vector indices = this->getAvailableModels(); if (std::find(indices.begin(), indices.end(), hmmIndex) == indices.end()) std::cout

Page | N

mapper = new KMeansMapper(keysVal); } else // multi-variate { // load keysVal vector keysVal; for (unsigned int i=0; i

Page | O

sprintf(fileURI, "%s.hmmEmit%dCovar%d", basicFileURI.c_str(), i, j); covarSingle.load(fileURI); mean.push_back(meanSingle); covar.push_back(covarSingle); } GMM gmm(mean, covar, weight); emit.push_back(gmm); } hmm = (HMMx*) new HMMx(transition, emit, metadata.tolerance); printf("[DEBUG] dimension is %d", hmm->Dimensionality()); } return metadata; } /** Save HMM model to local file */ int HMMFunc::save(HMM_meta metadata, const HMMx* hmm, int hmmIndex) const { // Find an index for it if (hmmIndex == -1) { vector indices = this->getAvailableModels(); while (1) { hmmIndex ++; if (std::find(indices.begin(), indices.end(), hmmIndex) == indices.end()) break; } } char fileURI[999]; string basicFileURI; { ostringstream oss; oss getFileSaveName()

Page | P

// Save transition hmm->Transition().save((basicFileURI+".hmmTrans").c_str(), arma::arma_ascii); // Save emission if (metadata.isCHMM == false) { vector emit = hmm->Emission(); for (unsigned int i=0; iTransition().n_cols; ++i) { sprintf(fileURI, "%s.hmmEmit%d", basicFileURI.c_str(), i); emit[i].Probabilities().save(fileURI, arma::arma_ascii); } // Save mapper if (typeid(emit[0].getMapper()).name() == typeid(KMeansMapper).name()) { KMeansMapper *mapper = (KMeansMapper*) &(emit[0].getMapper()); vector keysVal = mapper->getKeysVal(); // Not doing keys because it is boost::unordered_map, very troublesome // Also can be derived from keysVal later anyway. // put keysVal into a row vector arma::rowvec *keyVal_mat = new arma::rowvec(keysVal.size()); for (unsigned int j=0; jat(j) = keysVal[j]; // save sprintf(fileURI, "%s.hmmMap", basicFileURI.c_str()); keyVal_mat->save(fileURI, arma::arma_ascii); // delete delete keyVal_mat; } else { KMeansMvMapper *mapper = (KMeansMvMapper*) &(emit[0].getMapper()); vector keysVal = mapper->getKeysVal(); // put keysVal into a matrix arma::mat *keyVal_mat = new arma::mat(mapper->get_dimensions(), keysVal.size()); for (unsigned int j=0; jcol(j) = keysVal[j];

Page | Q

// save sprintf(fileURI, "%s.hmmMap", basicFileURI.c_str()); keyVal_mat->save(fileURI, arma::arma_ascii); // delete delete keyVal_mat; } } else { vector *emit = (vector*) &(hmm->Emission()); for (unsigned int i=0; isize(); ++i) // for each state { sprintf(fileURI, "%s.hmmEmit%dWeight", basicFileURI.c_str(), i); emit->at(i).Weights().save(fileURI, arma::arma_ascii); vector covar = emit->at(i).Covariances(); vector mean = emit->at(i).Means(); for (unsigned int j=0; jat(i).Gaussians(); ++j) // for each state there are Gaussians { sprintf(fileURI, "%s.hmmEmit%dCovar%d", basicFileURI.c_str(), i, j); covar[j].save(fileURI, arma::arma_ascii); sprintf(fileURI, "%s.hmmEmit%dMean%d", basicFileURI.c_str(), i, j); mean[j].save(fileURI, arma::arma_ascii); } } } return hmmIndex; }

Page | R

// ======================================================== // HMMx.hpp // * header file for HMMx.cpp // ======================================================== #ifndef HMMFUNC_H_ #define HMMFUNC_H_ #include #include #include #include "../../model/hmm/distribution/DiscreteDistri.hpp" #include "../../model/hmm/HMMx.hpp" #include "../../model/hmm/metadata.h" using std::string; using std::vector; class HMMFunc { string fileSaveName; public: /** * @param _fileSaveName Assumed unique file access location for saving to */ HMMFunc(string _fileSaveName):fileSaveName(_fileSaveName) {} // --- // Properties // --- string getFileSaveName() const { return fileSaveName; } /** Get a list of all available models of HMM that we can load. */ vector getAvailableModels() const; /** Get a list of all available training sets that we can use. */ vector getAvailableTrains() const; // --- // Methods (Training Set) // --- /** Save data as training set for HMM * @param trainIndex Note that this is a different index from HMM model saving * @param states If there is state info present, please put it in. Else leave as NULL.

Page | S

* @return True if saving was successful. */ bool saveTrainingSet(HMM_meta metadata, const vector &data, const vector *states, int trainIndex) const; /** Load data as training set for HMM * @param trainIndex Note that this is a different index from HMM model loading * @param states If there is state info present, it will be loaded into the pointer. Else it is NULL. */ HMM_meta loadTrainingSet(vector &data, vector* &states, int trainIndex) const; // --- // Methods (HMM Models) // --- /** Save HMM model to local file */ int save(HMM_meta metadata, const HMMx* hmm, int hmmIndex=-1) const; /** Load HMM model from local file */ HMM_meta load(HMMx* &hmm, int hmmIndex) const; }; #endif /* HMMFUNC_H_ */

ABSTRACTACKNOWLEDGEMENTSTABLE OF CONTENTSLIST OF TABLESLIST OF FIGURESLIST OF SYMBOLS AND ABBREVIATIONS1. INTRODUCTION1.1 Background & Motivation1.2 Objective of Thesis1.3 Thesis Organisation

2. LITERATURE REVIEW2.1 Occupancy Detection2.1.1 Passive Infrared Sensors2.1.2 Simulation Modelling2.1.3 Hidden Markov Modelling

2.2 Mathematical Tools2.2.1 Regression Algorithms2.2.2 Clustering Algorithms2.2.3 Stochastic Classifier Algorithms

3. HIDDEN MARKOV MODELS3.1 Markov Chain3.2 Hidden Markov Chain3.2.1 Problem 1 of HMM Evaluation3.2.2 Problem 2 of HMM Decoding3.2.3 Problem 3 of HMM Learning (Estimation)3.2.4 Problem 3 of HMM Learning (Baum-Welch)3.2.5 Problem 4 of HMM Generation

3.3 Continuous HMM3.3.1 Multivariate CHMM

4. HARDWARE AND SOFTWARE IMPLEMENTATION4.1 Embedded Board4.1.1 Comparison of Features

4.2 Software Development Platform4.3 Software Architecture4.4 Software Implementation4.4.1 Serialization of HMM4.4.2 Scheduling Processing and Server Mutex Functions

5. EXPERIMENTS & RESULTS5.1 Dataset5.1.1 Ground Truth Value

5.2 Experimental Setup5.3 Capability Test5.3.1 Test on Evaluating Day of the Week

5.4 Test on Occupancy Decoding5.5 Test on Occupancy Interpolation5.6 Test on Occupancy Extrapolation

6. LIMITATIONS AND RECOMMENDATIONS6.1 Highly Correlated Data6.2 Dynamically Improving HMM6.3 Decision Fusion

7. CONCLUSIONAPPENDIX A: BIBLIOGRAPHYAPPENDIX B: TYPICAL AND ATYPICAL DAYSAPPENDIX C: SOURCE CODE SNIPPET

Documents

Event Processing at Sensor Nodes in the Cloud