Upload
lee-jun-hui
View
9
Download
0
Embed Size (px)
DESCRIPTION
A description of preliminary analysis of how to perform event processing through the use of Hidden Markov Models, as applied to a room monitoring system
Citation preview
Event Processing at Sensor Nodes in the
Cloud
Submitted by Lee Jun Hui A0067228B
Department of Electrical & Computer Engineering
In partial fulfilment of the requirements for the Degree of
Bachelor of Engineering National University of Singapore
Page | i
ABSTRACT
Engineers are able to acquire large streams of environmental data, often from
scattered independent sensors. To properly make sense of the data however, there
needs to be a system that can handle the incoming streams as well as a
mathematical analysis to make sense of the data.
This project applies the event processing problem to the domain of occupancy
detection. Conventional occupancy detection approaches rely on multiple
tolerance checks or a simulation modelling.
The contribution of this project is the usage of the statistical properties of the data
through a Hidden Markov Model (HMM) to detect and forecast events emerging
from the hidden states of a multi-dimensional sensor stream. The underlying
occupancy state of the environment is deduced, as well as forecasting the short-
term future occupancy. The system relies on a pre-trained HMM model and
calculates in real-time. Decoded occupancy states, as well as interpolated and
extracted states, are found to be accurate within 30% error margin on average.
Page | ii
ACKNOWLEDGEMENTS
The author would like to express his greatest gratitude towards his supervisor,
Professor Tham Chen-Khong, for his guidance and support towards my project.
He is grateful to be able to work on such an interesting statistical pattern analysis
project.
The author would also like to thank his examiner, Dr. Mohan Gurusamy, for the
time spent on the assessment of the project.
Finally, the author would like to show his appreciation to his graduate research
assistant, Li Qiang, for his technical guidance and encouragement throughout the
course of the project.
Page | iii
TABLE OF CONTENTS
ABSTRACT ............................................................................................................. i
ACKNOWLEDGEMENTS .................................................................................... ii
TABLE OF CONTENTS ....................................................................................... iii
LIST OF TABLES ................................................................................................. vi
LIST OF FIGURES .............................................................................................. vii
LIST OF SYMBOLS AND ABBREVIATIONS ................................................ viii
1. INTRODUCTION .............................................................................................. 1
1.1 Background & Motivation ............................................................................ 1
1.2 Objective of Thesis ....................................................................................... 1
1.3 Thesis Organisation ....................................................................................... 2
2. LITERATURE REVIEW.................................................................................... 3
2.1 Occupancy Detection .................................................................................... 3
2.1.1 Passive Infrared Sensors ........................................................................ 3
2.1.2 Simulation Modelling ............................................................................ 3
2.1.3 Hidden Markov Modelling ..................................................................... 4
2.2 Mathematical Tools ....................................................................................... 4
2.2.1 Regression Algorithms ........................................................................... 4
2.2.2 Clustering Algorithms ............................................................................ 5
2.2.3 Stochastic Classifier Algorithms ............................................................ 6
3. HIDDEN MARKOV MODELS ......................................................................... 7
Page | iv
3.1 Markov Chain ............................................................................................... 7
3.2 Hidden Markov Chain ................................................................................... 8
3.2.1 Problem 1 of HMM Evaluation......................................................... 10
3.2.2 Problem 2 of HMM Decoding .......................................................... 11
3.2.3 Problem 3 of HMM Learning (Estimation) ...................................... 12
3.2.4 Problem 3 of HMM Learning (Baum-Welch) .................................. 13
3.2.5 Problem 4 of HMM Generation ........................................................ 15
3.3 Continuous HMM ....................................................................................... 16
3.3.1 Multivariate CHMM ............................................................................ 18
4. HARDWARE AND SOFTWARE IMPLEMENTATION ............................... 20
4.1 Embedded Board ......................................................................................... 20
4.1.1 Comparison of Features ....................................................................... 21
4.2 Software Development Platform ................................................................. 23
4.3 Software Architecture ................................................................................. 24
4.4 Software Implementation ............................................................................ 26
4.4.1 Serialization of HMM .......................................................................... 26
4.4.2 Scheduling Processing and Server Mutex Functions ........................... 29
5. EXPERIMENTS & RESULTS ......................................................................... 33
5.1 Dataset ......................................................................................................... 33
5.1.1 Ground Truth Value ............................................................................. 34
5.2 Experimental Setup ..................................................................................... 35
Page | v
5.3 Capability Test ............................................................................................ 35
5.3.1 Test on Evaluating Day of the Week ................................................... 36
5.4 Test on Occupancy Decoding ..................................................................... 37
5.5 Test on Occupancy Interpolation ................................................................ 38
5.6 Test on Occupancy Extrapolation ............................................................... 40
6. LIMITATIONS AND RECOMMENDATIONS .............................................. 42
6.1 Highly Correlated Data ............................................................................... 42
6.2 Dynamically Improving HMM ................................................................... 42
6.3 Decision Fusion ........................................................................................... 43
7. CONCLUSION ................................................................................................. 45
APPENDIX A: BIBLIOGRAPHY ......................................................................... A
APPENDIX B: TYPICAL AND ATYPICAL DAYS ............................................ B
APPENDIX C: SOURCE CODE SNIPPET ........................................................... C
Page | vi
LIST OF TABLES
Table 1 Comparison of features of 3 Embedded Boards ................................... 21
Table 2 Interpolation statistics for different Gap Sizes and Time Periods ........ 40
Table 3 Extrapolation statistics for different Gap Sizes and Time Periods ....... 41
Page | vii
LIST OF FIGURES
Figure 1 2-state Markov Chain ............................................................................ 7
Figure 2 2-state 3-emission Hidden Markov Chain ............................................. 8
Figure 3 Gaussian Mixture Model within a HMM ............................................ 17
Figure 4 A PandaBoard embedded device showing cable and connections ...... 23
Figure 5 UML Diagram of Software Architecture ............................................. 25
Figure 6 Directory of a serialized HMM model and training set ....................... 27
Figure 7 Serialized Contents of HMM Metadata ............................................... 28
Figure 8 Serialized Contents of HMM State Transition Matrix ........................ 28
Figure 9 Serialized Contents of a GMM emission mean (top) and covariance
(bottom) ................................................................................................................. 29
Figure 10 Layering Server and Processing functions into Network and
Application side logic ........................................................................................... 30
Figure 11 Process Flowchart for Mutex Access ................................................ 31
Figure 12 Sensor measurements across a 4-day period ..................................... 33
Figure 13 Occupancy inferred from power consumption .................................. 34
Figure 14 Log-likelihood of observation belonging to a Weekday Model ........ 37
Figure 15 Decoded Error % for Occupancy ....................................................... 38
Figure 16 Interpolation results for different Gap Sizes and Time Periods ........ 40
Figure 17 Extrapolation results for different Gap Sizes and Time Periods ....... 41
Figure 18 Data Fusion of streams into a vector before decision making ........... 43
Figure 19 Decision Fusion of streams decision upstream into a final decision 44
Page | viii
LIST OF SYMBOLS AND ABBREVIATIONS
PIR Passive Infrared [Sensor]
HVAC Heating, Ventilation, and Air-conditioning
HMM Hidden Markov Model
DHMM Discrete Hidden Markov Model
CHMM Continuous Hidden Markov Model
GMM Gaussian Mixture Model
K-Means K-Means Clustering Algorithm
Page | 1
1. INTRODUCTION
1.1 Background & Motivation
With the arrival of smart embedded systems that are capable of high processing
load, as well as lightweight sensors that can be cheaply deployed, engineers have
gained the ability to monitor and analyse our environment in greater detail than
before. Engineers can plant multiple sensor devices within the living environment
that will gather and relay measurements towards a central node for further
processing, detecting event changes in the environment as it occurs.
With so many incoming data streams, there is a great opportunity to make use of
statistical and probability models to extract greater information and detect events
that may not be readily observable. Engineers can also make use of the temporal
aspect of the data to place additional constraints on event detection, as well as
make inferences about the future. Such an event detection system could be used to
facilitate things such as abnormalities in the health of a patient, or making sense
of data through the detection of recurrent patterns.
1.2 Objective of Thesis
This thesis aims to demonstrate that statistical modelling and machine learning
can be used to effectively detect events when applied to the domain of occupancy
detection. The project attempts to model the occupancy state of a typical student
residential suite. It demonstrates that it is possible to deduce the occupancy of the
suite through secondary data provided by environmental sensors, and also make
short-term forecasts as well.
Page | 2
The scenario of occupancy detection and modelling has many practical purposes
as it allows a building facility manager to estimate human traffic loads in advance.
Such information would be valuable for safety precautions and can also be
exploited for sales and marketing purposes in shopping districts. On a smaller
scale, a facility manager of a small office can also monitor the occupancy of its
rooms and cubicles and tweak the heating, ventilation, and air conditioning
(HVAC) policies of the building to optimize energy consumption.
1.3 Thesis Organisation
For the report, Chapter 2 will be a literature review covering existing methods for
occupancy detection related to HVAC operations. The various possible
mathematical tools that may be used to help identify events are also mentioned. In
Chapter 3, readers will be presented with the basics of the HMM, which is the
statistical model that the project is using, as well as the variants of HMM that the
project has evaluated experimentally. Chapter 4 will discuss the hardware
components and the software platform that the demonstration program has been
deployed on, and also talk about the software architecture of the project. Chapter
5 presents the experimental results that have been obtained. Chapter 6 discusses
some of the issues that constrained the project and suggests further improvement
work. Finally, Chapter 7 will conclude the results and provide details on how
improvements can be made in the future.
Page | 3
2. LITERATURE REVIEW
2.1 Occupancy Detection
2.1.1 Passive Infrared Sensors
Commercially, the most popular means of occupancy detection is via the use of a
Passive Infrared (PIR) sensor. It measure the amount of infrared light (IR) that
reaches its field of view; when there is a change in the IR radiation a movement
event is registered by the sensors and that indicates the presence of an occupant.
However, such a system often generates false negatives as it assumes a non-idle
occupant.
One way to improve the PIR sensor is to pair it up with a reed switch placed on a
doorway, which can detect whether the door is open or closed [1]. Oftentimes, by
applying additional modes of measurements, further constraints are imposed onto
the detection and achieve a greater accuracy.
2.1.2 Simulation Modelling
A paper from Liao and Barooah [2] made use of machine learning and simulation
modelling. To improve sensor readings, a crowd simulation was constructed from
past observation data. Due to the complexity of the simulation, it was run off-line,
and the reduced-order statistics of the simulation results were compiled. These
reduced-order statistics were then compared with present observational data to
estimate the occupancy level. However, such a method requires a non-trivial
simulation model as well as reliable room occupancy probabilities that have been
estimated though long term observations of said environment.
Page | 4
2.1.3 Hidden Markov Modelling
The idea of using Hidden Markov Models to model occupancy is suggested in a
smart thermostat project that uses a PIR sensor and a reed switch on the doorways
[3]. The smart thermostat helps to prepare a comfortable environment for its
house occupants by pre-empting their departure from and arrival home through
the predictive ability of a HMM. The accuracy of the HMM predictive model
(88%) over reactive algorithms (78%) is a convincing proof that HMM models
can be effectively applied towards the domain of home occupancy.
2.2 Mathematical Tools
There are several mathematical tools that make use of stochastic processes,
sequence labelling, and clustering algorithms to help bring clarity to an otherwise
chaotic data collection.
2.2.1 Regression Algorithms
In regression algorithms, there are 2 notable variants of unsupervised regression
algorithms: the Independent Component Analysis (ICA) and the Principal
Component Analysis (PCA). These 2 algorithms attempt to separate a data set into
additive subcomponents, where each subcomponent is maximally independent or
has maximum variance, respectively.
An example of ICA utility is in electroencephalography (EEG). It can
automatically identify a number of channels that are statistically independent from
each other. White noise as well as EEG artefacts like ocular movement can be
identified and subtracted additively while preserving core data. However, ICA is
not as relevant in this project as it assumes that the identified channels are
Page | 5
statistically independent from each other. This is not the case environmental
data in the domain of HVAC are often heavily correlated.
One of the applications of PCA is in the field of data compression and
visualisation. Given a collection of n-dimensional vectors, PCA decomposes it
into eigenvalue and eigenvector pairs. Eigenvectors which are maximally
orthogonal are kept, the rest discarded. In that sense, a multi-dimensional, or a
multi-axial graph, has been remapped to new eigenvectors having discarded
insignificant axes of low orthogonality. The result is a collection of vectors of
lower dimensionality, but which is still able to reconstruct the original dataset.
Thus, the purpose of PCA is dimensionality reduction. While it is certainly useful
as a pre-processing step, the project does not require a lot of dimensionality (due
to limited sensor types). A simple cluster analysis, as described in section 2.2.2,
will suffice.
2.2.2 Clustering Algorithms
As mentioned in section 2.2.1, cluster analysis is a more appropriate tool to pre-
process our data. This is because PCA operates on variable reduction, while
cluster analysis works on observation reduction. Essentially, cluster analysis
groups like observations. It reduces number of unique observations into a
limited set, somewhat akin to an Analogue-to-Digital Converter (ADC) that
quantizes continuous values to discrete buckets.
One of the more well-known clustering algorithms is the K-Means Clustering
Algorithm (K-Means). By making use of a metric between members, often the
Squared Euclidean Distance, a data set is grouped into k clusters where members
of each cluster have the nearest metric. This quantization process is crucial to the
Page | 6
usage of Discrete Hidden Markov Models (DHMM) in order to minimize the
number of unique observation vectors, as described in Chapter 3.
2.2.3 Stochastic Classifier Algorithms
Stochastic classifiers attempt to put data into categorical labels based on their
stochastic attributes. Because the project is operating on data streams, there is the
added dimension of time open for exploitation which leads to the Markov Chain
which is a temporal process based on stochastic probabilities.
A Markov Chain is a system that is able to transition to one of several states
depending on a stochastic process. To classify things however, the concept needs
to be further extended to a Hidden Markov Chain (HMM). In a HMM, the states
of the system are not observable; one can only observe the system through its
emission observations, whose appearance is statistically dependent on the
underlying hidden state.
Through observations of the system over a period of time, it is possible to
decipher the underlying state transitions that have led to the observations. It is also
possible to match up observations to several models of HMM whichever model
fits the observations best would be the label under which the observations is
classified under.
Because HMM is a robust and well-studied model that classifies the data on a
temporal dimension, the project chooses HMM to be its mathematical tool to
identify incoming events from the environmental sensor data.
Page | 7
3. HIDDEN MARKOV MODELS
In this Chapter, the report presents the concepts behind the Hidden Markov Model
(HMM) in greater detail, as it is the mathematical tool the project uses to identify
and process incoming events.
3.1 Markov Chain
A Markov Chain is a discrete-time system that transitions from one state to
another via a random process. Each state has its own static, random probability of
transition, and is not affected by previous states.
Take for instance, a Markov chain modelling the breakfast habits of an individual.
It consists of 2 states which represent what the individual had for breakfast
cereal or bread. Assuming that the system is memory-less, all that dictates the
next breakfast is the transition probabilities of the current breakfast. This is
represented by the diagram Figure 1 below:
Figure 1 2-state Markov Chain
According to Figure 1, if the current breakfast is cereal, there is a 40% chance that
the next breakfast is still cereal, and a 60% chance it might be bread. If the current
breakfast is bread, there is a 30% chance the next breakfast is still bread, but a 70%
Page | 8
chance it could be cereal. This series of probabilities can be represented as a
transition matrix:
0.4 0.6 0.7 0.3
3.2 Hidden Markov Chain
A Hidden Markov Chain (HMM) extends the concept of the Markov Chain. Now,
the states of the system are hidden and cannot be observed. However, the outputs
of the state, the emissions, are observable. Again, like the states, each emission
belongs to a emission space that is different for each state, and has random
probability of emission as well.
Using the example in section 3.1, assume that the cereal and bread state has 3
possible emissions: satiety, hunger, bloated. Each state however has a different
distribution function of emissions. This is represented by the updated HMM in
Figure 2 below:
Figure 2 2-state 3-emission Hidden Markov Chain
These emissions can be represented by the following emission probability matrix:
Page | 9
0.70 0.08 0.22 0.35 0.65 0
Sometimes it is also useful to define an additional initial state distribution matrix.
The initial state distribution matrix will give the probability that the system begins
in a particular state. If for example, the system above has a 90% chance of starting
in the cereal state, then the matrix is as follows:
0.9 0.1 To illustrate the usefulness of HMM, it is helpful to refer to the 3 conventional
problems that HMM can tackle, as famously described in Rabiners 1989 paper
[4]:
1. Evaluation Given an observation sequence, find the likelihood that it was
generated by this HMM model. Useful for comparing different models
effectiveness in modelling a particular phenomenon, or classifying
phenomenon according to known models.
2. Decoding Given an observation sequence, infer the most probable
hidden state transitions that have led up to the observations given. Useful
for uncovering the hidden states of a system.
3. Learning Given an observation sequence, constructs a HMM that is most
likely to have generated such an observation sequence. Useful for creating
a model using real-world data.
All 3 problems are relevant to the project and have been implemented. There is
also a 4th problem that is implemented in the project but is not usually included in
the list of 3 HMM problems in literature:
Page | 10
4. Generation Simulates future output by running the model, or fills up
gaps in observations. Useful for anticipating future changes in the system
or interpolating lost data packets.
The report will now go into detail on how each problem can be solved via HMM.
3.2.1 Problem 1 of HMM Evaluation
As mentioned, evaluation solves the problem where, given an observation
sequence, one has to find the likelihood that it was generated by a HMM model .
Assume that the HMM being used is the same one as that in Figure 2. Now also
assume that the hidden state sequence X is cereal, bread, bread. If given an
observation sequence O of bloated, hungry, full, it is possible to calculate the
likelihood of such a sequence appearing, (,|). The formula is:
(|, ) = (|) (|) (|) = () () ()
(|) = ( ) (|) (|) = (,|) = (|, ) (|)
Problem 1 of HMM, evaluation, demands to know the likelihood of an
observation sequence O given a HMM . This is basically (|), which is the sum of (,|) for all possible permutations of observations:
Page | 11
(|) = (,|)
Through a technique called the forward-pass algorithm (), it is possible to optimize the computation time complexity of this operation, as detailed in
Stamps paper [5], reducing the equation to the following:
( ) = ( ,1, ,, = |)= 1( )1
=0 ()
(|) = 1( )1 =0
The forward-pass algorithm () essentially computes the probability that state i is observed at time t, given the partial observation sequence from 0 to time t.
Now that it is possible to evaluate the likelihood of an observation sequence, by
pairing up the observation sequence O with different HMM models , it is
possible to find the most fitting HMM model. The systems observation is
therefore classified under the label of this particular model.
max(|)
3.2.2 Problem 2 of HMM Decoding
Problem 2 of HMM, which is decoding, tries to find the most probable hidden
state transitions that have led up to the observations given. The algorithm used is a
Dynamic Programming algorithm, which permutes through all possible
Page | 12
combinations of a state at each instance of time. For example, at time = 0, the
formula is:
=0( ) = (0) Armed with calculations of every state, the time is advanced forward by one unit,
to determine the previous state j that will give the highest likelihood in the new
time instance:
=1( ) = max [=0()(1)]
This can be generalized to:
() = max
[1()()] Consequently, by finding the maximum 1() and by recording every state j during the process, the state sequence most probable in generating the given
observation sequence can be found.
3.2.3 Problem 3 of HMM Learning (Estimation)
There are 2 ways to do HMM learning, which is the construction of a HMM based
on training observation samples. The 2 methods are estimation and Baum-Welch.
Both are used in the project.
Estimation-based learning is described in Blunsoms paper as a supervised
approach to training [6]. The report, additionally, sees it as a way to build HMM
models for solving Problem 2 of HMM. This is because the project requirement
for decoding is to decode the hidden occupancy state of the system. In order to
control the labeling process of the hidden states, the learning algorithm needs to
Page | 13
be fed training observations that have been tagged with known hidden states.
Estimation process can be fed tagged training observations; Baum-Welch does not.
The theory behind estimation-based learning is simple enough. Assuming that the
training observation sets are representation of the population, the frequency of
occurrence of a particular state in the training set shall approximate that states
probability distribution function, as described mathematically:
( ) = #
Many other attributes of the HMM can be estimated similarly:
( | ) = = # & # 3.2.4 Problem 3 of HMM Learning (Baum-Welch)
The second method of doing HMM learning is to make use of the Baum-Welch
algorithm. It randomly initializes the HMM parameters, and then uses expectation
maximization to adjust the parameters of the HMM model to a local maximum.
In this project, the Baum-Welch method is preferred over the estimation method
for learning a HMM model to solve Problem 1 of HMM evaluation. Baum-
Welch method is able to learn a much more precise HMM model. The downside
is that such a model has illegible hidden states. They are not conveniently
human-labelled hidden states mapped to occupancy, as the Baum-Welch derives
states by local optimization. With illegible hidden states, they are meaningless if
decoded. However, this setback is not applicable to model evaluation (Problem 1),
as only the evaluated likelihood is relevant, not the hidden states. Hence Baum-
Page | 14
Welch learning is preferred over estimation learning for crafting HMM models for
Problem 1.
What follows is an explanation of the Baum-Welch algorithm.
Note: The derivation of the Baum-Welch algorithm is rather complicated, and one
may wish to skip this section if desired.
Firstly, one needs to define 3 more parameters: the backward pass algorithm (), the gamma (), and the xi (, ). The backward pass () is similar to the forward pass (), except that instead of using the partial observation sequence from 0 to time t, it uses the partial
sequence from time t to the final time T-1 to find the probability of seeing state i
at time t:
( ) = (+1,+2, ,1| = , ) = 1, = 1 (+1)+1( )1
=0 , 0 < 1 The gamma () represents the probabilities of the current state being state i. If a state i gives maximum value at time t, then that state is the most likely state at
time t.
( ) = ( = |, ) = ()() ()()1 =0
The final parameter to define is the xi (, ), which is the probability that current state is state i and next state is state j at time t.
Page | 15
(, ) = ( , + 1|, ) = ()(+1)+1() ()(+1)+1()1=01 =0
() = (, )1=0
From these 3 parameters, it is possible to obtain estimates of the HMM models:
= 0() = (, )2
=0
()2=0
() = (){0,1,,2}
=
()2=0
The learning algorithm is thus implemented as follows:
1. Initialize random model parameters for , A and B
2. Compute the intermediate parameters (),(), (, ),() 3. Re-estimate the model parameters , A and B by using the intermediate
parameters
4. Check improvement of (|). If does not meet requirements, repeat Step 2 again.
3.2.5 Problem 4 of HMM Generation
In the projects self-defined Problem 4 of HMM, a simulation of the HMM
generates a non-deterministic future states and emissions of the system. It can be
used to extrapolate and simulate the future. Or it can be used to interpolate
Page | 16
and fill up gaps in the knowledge of past states of the system, for instance when
data packets are lost or unrecoverable during network propagation.
For extrapolation, a Gaussian distribution is used to produce values from 0.0 to
1.0. Depending on the output, the system is advanced to the corresponding state
based on the transition matrix. Another Gaussian roll is used to determine the
emission from that state. This process continues until the desired forecasted length
is reached. A random approach is used for extrapolation to ensure that the process
is non-deterministic.
For interpolation, the start and the end hidden state are known. An exhaustive
permutation of all possible states in-between is conducted. The state sequence
with the highest likelihood of appearing is thus selected.
3.3 Continuous HMM
The previous HMM model described in earlier sections of Chapter 3 was that of a
Discrete HMM (DHMM) model. It is termed discrete because the emission
symbols are allowed to take on only specific values, for instance hungry and
bloated. A continuous emission would however take on intermediate values like
0.5 feel of hungry, or 0.751 feel of bloated-ness.
This can be achieved by representing the emissions of each state not as a
collection of scalar values, but as a collection of Gaussian distributions, also
known as a Gaussian Mixture Model (GMM). Each GMM contains a vector of
weights, which dictates the weightage of each Gaussian distribution within.
Because Gaussian distribution has a continuous-valued distribution function, it
can represent a range of observations without needing to discretize it. Thus, a
Page | 17
HMM that uses a GMM as its emission symbol is called a Continuous HMM
(CHMM).
To determine the probability of emission of an observation, the observation is fed
into the GMM. The observation is fed into each Gaussian distribution in-turn, and
the resultant probability gathered using a weighted sum. That sum is the
probability of emission. This is illustrated in Figure 3 below:
Figure 3 Gaussian Mixture Model within a HMM
In Figure 3, assuming an observation x was at the state and GMM illustrated. The
probability of emission will be calculated as so:
(| ) = 0.2 (, 1, 1) + 0.5 (, 2, 2) + 0.3 (, 3, 3) For example, an observation that is highly similar to Gaussian 2 will return a near
unitary value, but a GMM also dictates the probability of that values occurrence,
so it is normalized to a factor of 0.5.
Page | 18
3.3.1 Multivariate CHMM
One of the additional benefits of using CHMM is its ability to take in observation
vectors, observations with more than 1 dimension. This is because a Multivariate
Normal distribution can be utilized instead, as indicated in Jacksons HMM
tutorial [7]:
(,) = 1(2)|ik| exp ( )ik1( )2
Conveniently, the multivariate case reduces to a single-variate distribution when
the number of dimensions is 1.
However, one of the great difficulties of using a multivariate normal distribution
is due to the presence of an inverse covariance matrix ik1 within the formula. A
matrix is non-invertible or singular when its rows are correlated. This can arise
especially when the training data sets are insufficient or too highly correlated.
A singular matrix is not invertible, so the pseudo-inverse matrix is used instead.
One well known pseudo-inverse is the Moore Penrose Pseudo-Inverse obtained
using Singular Value Decomposition (SVD) for solving linear equations. Here
matrix A is decomposed into the following:
= = [1 3] 0 00 0 [1 3] Where A is the M*N matrix to decompose.
U is an M*M matrix. Mathematically U contains
columns of eigenvectors of AAT.
D is an M*N diagonal matrix. Mathematically, each diagonal is the
Page | 19
singular value of A. 2 is the eigenvalue of AAT or ATA.
V is an N*N matrix. Mathematically V contains
columns of eigenvectors of ATA.
If the diagonals of matrix D are not all non-zero, the matrix A is singular. The
pseudo-inverse is thus defined as:
+ = 01 00 0 Conveniently, + = 1 if A is a non-singular matrix. The pseudo-determinant also has to be defined for calculating the determinant of
the pseudo-inverse matrix. It is the product of all non-zero diagonal values in the
diagonal matrix D of the SVD:
||+ = =1
Recall that the original multivariate Gaussian distribution was defined as:
(,) = 1(2)|ik| exp ( )ik1( )2
Now, the modified distribution for singular covariance matrix is:
(,+ ) = 1|2ik|+ exp ( )ik+ ( )2
Again, the singular multivariate normal distribution conveniently gives the same
result as its non-singular counterpart when the covariance matrix is non-singular.
Page | 20
The underlying take-away lesson is the theoretical reason why the project only
starts learning new HMM models when a large number of observations are
collected for training, and will fail if there are insufficient observations. It also
highlights the fact that the HMM model may occasionally fail simply because the
training observations were, by chance, highly correlated.
This flaw can be mitigated through careful selection of training observation sets,
as this project has done.
4. HARDWARE AND SOFTWARE IMPLEMENTATION
In this Chapter the project describes its hardware selection choices, and also
details its software implementation.
4.1 Embedded Board
The projects objective is to get data from tiny, distributed sensors. One of the
possibilities for the future of the project is to explore building a networked
collection of processing nodes that does layer-by-layer event processing, and
relays the decision events up a hierarchical computing framework. Under such a
vision, the processing nodes are situated locally in close proximity to the sensor
nodes, and do real-time event processing at the source. As a result, there are
certain requirements for the hardware of this processing node.
Firstly, it should be an embedded platform this reduces the hardware and
deployment costs, allowing a hypothetical project budget to buy in quantity and
improve the processing-node-to-sensor-node ratio; fewer sensor nodes per
processing node. It would be impractical to purchase a desktop PC in comparison.
Page | 21
Secondly, the embedded board should have a powerful CPU as the HMM
algorithm can be quite computationally intense.
Thirdly, the embedded board should be well-supported and backed by an active
and mature community, which would potentially allow it to interoperate with
more types of sensors.
4.1.1 Comparison of Features
Below are 3 different embedded boards that were under consideration in the
project. The core metric here is CPU capability as well as sensor interoperability.
PandaBoard ES BeagleBone Black Raspberry Pi CPU Dual-core ARM Cortex A9 up to 1.2GHz each ARM Cortex A8, 1.0 GHz ARM, 700MHz GPU SGX540 with OpenGL ES 1.1, 2.0, OpenVG 1.1, EGL 1.3 SGX530 with 3D acceleration Broadcom Videocore IV, OpenGL ES 2.0 Operating System Ubuntu, Android Ubuntu, Android Custom Debian/Fedora, Android A/V I/O HDMI out 3.5 Audio out Stereo audio input HDMI HDMI out 3.5 Audio out Stereo audio input Memory 1Gb RAM DDR2 SD/MMC 512Mb RAM 2Gb Flash SD/MMC 512Mb RAM SD/MMC Connectivity WiFi Bluetooth - Ports Ethernet 3 USB 2.0 Ethernet 1 USB, 1 Mini-USB Ethernet 2 USB, 1 Mini-USB power Power 440-710mA1 210-460 mA @ 5V 700 mA @ 5V Cost USD$182 USD$45 USD$35
Table 1 Comparison of features of 3 Embedded Boards
1 http://www.omappedia.org/wiki/Panda_Test_Data
Page | 22
In terms of CPU capability, the Raspberry Pi loses out because it does not have
enough computation power, while the PandaBoard ES is clearly superior in that
respect, boasting a dual-core processor with a 1 Gb RAM.
In terms of interoperability, the Operating System (OS) is examined. A large
derivative OS would be convenient for 3rd party libraries as well as pre-built
compatibility libraries offered by the sensor manufacturers. Again, Raspberry Pi
loses out, as it only supports Android and a custom derivation of Debian and
Fedora, unlike the other 2 boards which has Ubuntu, a very popular Linux
distribution.
Another metric relevant to interoperability is the number of ports available on the
boards. PandaBoard ES supports 3 USB ports which is the highest amongst the
boards listed.
As the PandaBoard ES consistently ranks ahead in all of the important metrics, it
is the projects embedded board of choice.
Page | 23
Figure 4 A PandaBoard embedded device showing cable and connections
In Figure 4 above, a PandaBoard is shown connected with requisite hardware. At
the bottom, an SD Card which contains the Ubuntu OS. On the right, a RS232
cable that allows a developmental PC to connect to PandaBoard via a virtual
terminal in headless mode.
At the top, from the right, is the 5V power cable, followed by an Ethernet cable
which shares an Internet connection from the developmental PC via LAN. The
USB ports are also located beneath it. And finally, a HDMI cable that provides
graphical output to an attached monitor.
4.2 Software Development Platform
The PandaBoard ES offers Ubuntu and Android as operating systems, but the
project decided on Ubuntu as it is more full-featured and has a software repository
manager which makes it easy to pull and install software packages. The Ubuntu
Page | 24
version used was 12.04 Precise Pangolin LTS, the most recent pre-built version
available for PandaBoard.
Developmental work was done not on the PandaBoard ES, but on an Intel laptop
running Ubuntu 13.10 Saucy Salamander. Because both the Pandaboard ES and
the development platform are Linux systems, so the source code is portable across
systems. It is much faster to compile and test on a more powerful processor.
Similarly, although it is possible to run a POSIX environment on a Windows
laptop via Cygwin compatibility layer, it is much more efficient to work natively
on Linux on the development laptop, and compilation times are faster by an order
of magnitude.
The language of choice was C++, as the application needs to be able to run fast on
the PandaBoard ES. Several 3rd party libraries were used. Firstly, Armadillo, a
C++ matrix and linear algebra library was used. Secondly, Boost, a C++ utility
library, was included. Boost is a requisite for Armadillo, but it also provided many
convenient container classes to supplement the traditional vector and dictionary
classes in the C++ Standard Template Library. Finally, mlpack++, a C++ machine
learning library, was also imported, providing basic HMM algorithms, a
clustering algorithm and an implementation of a GMM.
4.3 Software Architecture
The software architecture has been separated into several different namespaces
and classes, as illustrated in Figure 5 below:
Page | 25
Figure 5 UML Diagram of Software Architecture
In Figure 5, the calc namespace holds two child namespaces, func and model. The
calc::model::hmm namespace contains classes that are used to define and model a
HMM as well as the distribution function, whereas the calc::model::map
namespace has several mappers that are used to discretize observations from
continuous-valued to specific discrete values. Meanwhile, in the calc::func::hmm
namespace, there is a helper class that helps to serialize and de-serialize the HMM
classes from running memory into local file storage.
Page | 26
The util namespace contains several helper methods for basic operations not
present in the C++ API.
Finally, the test namespace contains the test and experiment routines used to
evaluate the performance of and demonstrate the capability of the system.
4.4 Software Implementation
4.4.1 Serialization of HMM
Serialization is the process of converting a data structure and its object state into a
format that can be easily transmitted and reconstructed. In the project,
serialization is used to save the HMM models locally on the SD card. By saving
the HMM models, the models do not need to be retrained on program startup
every time. Moreover, it becomes possible to train a comprehensive model offline
on a powerful computer, then deploy the model onto the target PandaBoard
instantly, reducing deployment time and improving detection accuracy.
A HMM model has several attributes: the state transition matrix, and the emission
probability matrix. For a CHMM model, instead of the emission probability
matrix, there is a state-specific GMM instead. This leads to one GMM per state.
Within each GMM, there are further attributes: the weight, the mean as well as the
covariance matrix of each Gaussian distribution. Finally, to enable successful
reconstruction, several metadata properties of the HMM are also saved.
On top of saving the HMM model, it is also possible to save a training observation
set. Oftentimes the training set is rather large, and it is tedious to keep retrieving
the entries manually from the CSV (an ASCII comma-separated text file) database.
By serializing the training set, it is possible to make it as portable as the HMM
Page | 27
model and allow operators to reinitialize/retrain the HMM model with a different
set of parameters.
One beneficial side-effect is that serialization produces a human-readable ASCII
format, which also makes it easy for a human operator to analyze the HMM
offline. A sample of each of the various serialized data is presented here to
demonstrate the concept better.
In Figure 6, the name of the system is airconTester. As this is the 1st HMM
model in the system, it is given an index of 0, hence airconTester0.
Figure 6 Directory of a serialized HMM model and training set
The training sets are serialized with a .train extension. Since there are 3
files: .train0, .train1, and .train2, it means that 3 windows of training observation
data are used to construct this HMM model. .trainMeta contains the metadata,
which is shown in Figure 7 below.
Page | 28
Figure 7 Serialized Contents of HMM Metadata
Because the model is to be trained to be used for Decoding, Estimation-based
HMM Learning is used. That means the observation values need to be tagged with
occupancy values the ground truth. Here they are stored
in .trainStates0, .trainStates1, .trainStates2, one for each window of training
observation data.
As for the HMM model itself, the state transition matrix is stored in .hmmTrans
and is a 3x3 (3 state) matrix, as seen in Figure 8 below.
Figure 8 Serialized Contents of HMM State Transition Matrix
There are 3 states and 8 emissions per state. For instance, .hmmEmit0Covar1
represents the covariance matrix of the 2nd Gaussian of the 1st state;
.hmmEmit2Mean5 represents the mean vector of the 6th Gaussian of the 3rd
state. A preview of a serialized covariance matrix and mean vector is shown in
Figure 9 below.
Page | 29
Figure 9 Serialized Contents of a GMM emission mean (top) and covariance (bottom)
4.4.2 Scheduling Processing and Server Mutex Functions
The project application, as a server, has to be able to accept a continuous input
stream from more than one sensor input. At the same time, the application has to
perform time-consuming HMM calculations. These 2 processes should be
designed not to interrupt each other.
The way the project does it, is to split the Server and the Processing aspects of the
application, essentially layering the network and the application side logic, as seen
in Figure 10 below. The server will concentrate on receiving the UDP packets
being transmitted from the sensors. The server is a Python UDP server that
continually flushes any received packets into a mutex-ed shared file. The HMM
application will periodically check into the mutex-ed shared file to see if any
additional data packets are received, and retrieve them if so.
Page | 30
Figure 10 Layering Server and Processing functions into Network and Application side logic
The mutex lock is achieved using the POSIX API flock(), which guarantees any
POSIX-compliant access to be mutually exclusive. Because flock() is also fed
with the LOCK_NB flag, the mutex request operation is non-blocking. This
workflow is further illustrated upon in Figure 11 below. Upon failure of the non-
blocking mutex request, the server will temporarily save the packet data in a file
buffer. But if the mutex request is successful, the data in the packet, as well as any
previous packet data in the file buffer, will be transferred together to the shared
file. This ensures that the server continues receiving packets even when the mutex
fails.
On the HMM application side, the application will continuously poll the shared
file for data. If it is successfully in the non-blocking mutex request, it obtains new
Page | 31
data and is able to calculate. The mutex is freed at the first possible instance.
However, if the mutex request failed, the HMM will simply continue polling.
Figure 11 Process Flowchart for Mutex Access
The project recognises the potential downside to this mutex process. If there are a
lot of packet arrivals on the server, the servers mutex request may swamp out the
HMM application; the mutex lock is dominantly held by the server and the HMM
does not have time to access.
Page | 32
However this issue is mitigated by the fact that packet arrivals are not as frequent.
Even with multiple sensors sending incoming packets, it will not swamp the
mutex lock. This is because each sensor, identified by a MAC Address metadata
attached in the UDP packet, is assigned a unique mutex shared file, as seen in
Figure 10. Hence no single mutex lock can be swamped the server is a single-
threaded application that can only hold a single lock at any one time; the HMM
application will have plenty other data from other sensors to process from.
Page | 33
5. EXPERIMENTS & RESULTS
5.1 Dataset
The dataset (archive dataset) used in the experiments is environmental data
taken from a student residential suite. It consists of readings taken continuously
for 24 hours at minimum, spread across a total of 3 months.
At the end of the collection, 34 days worth of environmental data is collected. It
contains measurements of temperature, humidity, luminosity and noise, sampled
once every 10 minutes. A small 4-day window of the dimensions is illustrated in
Figure 12 below.
Figure 12 Sensor measurements across a 4-day period
The various days are also labelled according to whether they were typical or
atypical days, which would impact occupancy. For instance, typical days were
days where it was not a public or school holiday, nor sandwiched between
atypical days. For more information, please refer to Appendix B.
Page | 34
5.1.1 Ground Truth Value
In the sensor measurements, there was no measurement of occupancy. Hence, an
alternative method of verification was sought.
Using the power consumption measurements which also came as the 5th modality
in the data set (but were not included as the HMM inputs), it is possible to infer
the Ground Truth Value of occupancy. Any increase in the power consumption of
the room is regarded as an indicatory of occupancy. However, that only gives a
binary value of occupancy with multiple jitters due to the slow and discrete
movement of the power meter. To smoothen out the jitters, a digital Gaussian
moving average filter of window size 5 was applied forward and backwards to the
power readings, and the result is a power meter consumption reading of 3 quanta,
as illustrated in Figure 13 below.
Figure 13 Occupancy inferred from power consumption
Page | 35
5.2 Experimental Setup
A laptop is setup to simulate a wireless sensor. A Python application is run on the
laptop, streaming values from the archive dataset over a UDP connection. All 4
modalities of the dataset are streamed together. Attached in every packet is a
mock MAC address to identify the sensor.
A PandaBoard ES is set up on the other end, running a Python server that accepts
the incoming sensor observations. The HMM application is also running on the
PandaBoard ES to process in real-time the incoming packet data.
The HMM model that the application is executing, has been set-up and trained
before the start of the experiment. For each experiment, the report will specify the
section of the dataset that was used for training as well as for testing.
The HMM application had its source code developed on an Ubuntu system
beforehand, before being transported over to the PandaBoard ES and compiled.
This is because the machine instruction set is different; the development machine
runs on Intel, while the PandaBoard ES runs on ARM.
5.3 Capability Test
In order to properly understand the capabilities of the HMM application, an
introductory test was applied. The HMM application was challenged to identify
the day of the week when a dataset is fed, for example, was the day a Monday or a
Sunday. This test is indirectly related to the issue of occupancy detection, as the
day has an influence on the occupancy of a room.
Page | 36
5.3.1 Test on Evaluating Day of the Week
For this test, a 4-state, 4-emission DHMM model is used. Out of the 4 modalities,
only luminosity is considered. The model is fed 2 typical weekdays worth of
observations, hence the model represents a typical weekday HMM. The model is
then matched against 24-hour observation data from 12 different days.
In Figure 14 below, the results of the test are presented. To interpret the graph,
know that the more negative the log-likelihood, the less probable the observation
belongs to the weekday HMM model. The results illustrate that the model is
unable to differentiate between weekday and weekend observations. This may be
due to the fact that the environmental conditions between a weekday and a
weekend are not very different in terms of luminosity.
However, the model does report a marked difference between typical and atypical
days. Atypical days were defined as days where special events such as exam
period, school holidays, or public holidays occurred. Those are days upon which
the occupancy level will be affected. This finding hence suggests that luminosity
is to a certain extent dependent on occupancy, but on its own the results are not
obvious, and that weekdays are similar to weekends generally.
Page | 37
Figure 14 Log-likelihood of observation belonging to a Weekday Model
5.4 Test on Occupancy Decoding
In this experiment, the accuracy of the HMM algorithm is tested to see if it can
correctly identify the occupancy of the system. The actual occupancy of the
system is known to the author, not to the system, but it would be good to compare
this actual occupancy (ground truth) against the decoded occupancy.
A CHMM model of 3-states, 9-emissions is used. The 3 states correspond to fully-
occupied, mildly-occupied, and unoccupied. The CHMM model needs to be a
representative sample of the system; hence it is trained with 10 days worth of
data taken at 3 day intervals across all 34 days of archived sensor data. All 4
modalities are used: temperature, humidity, luminosity and noise. The result of
testing the CHMM model against every single day of sensor observation is shown
in Figure 15 below:
Page | 38
Figure 15 Decoded Error % for Occupancy
The light grey bars signify observations that have been used to train the CHMM
model. The dark grey bars signify novel observations that the CHMM has no
knowledge of.
The performance of the system is generally good as even in the worst case
anomalous observations, the error rate never exceeds 65%. The mean error rate
inclusive of the training set is 21.6%, exclusive is 23.2%. The median error rate,
which is less sensitive to outlier values, is 17.1% inclusive, 17.8% exclusive. The
sample variance of the error % is 0.0268 inclusive, 0.0277 exclusive.
5.5 Test on Occupancy Interpolation
For this test, the HMM problem of Decoding is employed to decipher the
underlying hidden states of the system, and interpolate between them. The actual
occupancy of the system is known to the author, not to the system, but it would be
Page | 39
good to compare this actual occupancy (ground truth) against the decoded
occupancy.
The assumption is that the system has already successfully decoded the start and
end point of the interpolation sequence, but is missing a particular segment in
between, which could be due to packet data discarded due to errors, dropped
packet data, or the wireless sensors suffering a temporary hardware failure.
A CHMM model of 3-states, 7-emissions is used. The 3 states correspond to fully-
occupied, mildly-occupied, and unoccupied. Again, the CHMM model needs to be
a representative sample of the system; hence it is trained with 10 days worth of
data taken at 3 day intervals across all 34 days of archived sensor data. All 4
modalities are used: temperature, humidity, luminosity and noise.
The variables in the experiment are: the gap size to be interpolated, as well as the
time period across which interpolation is done. Both results are illustrated in
Figure 16 and Table 2 below.
The experiment shows that the interpolation process is quite accurate. For instance
in short gap sizes of 30 min to 1 hour, which is closer to the kind of gap sizes one
would expect in a real-life scenario, the average and median error rate never
exceeds 16.7%.
Page | 40
Figure 16 Interpolation results for different Gap Sizes and Time Periods
Table 2 Interpolation statistics for different Gap Sizes and Time Periods
5.6 Test on Occupancy Extrapolation
The test on occupancy extrapolation will show how effective the HMM model is
at predicting the future occupancy of the residential suite. In doing so, it is
possible, for instance, a building administrator to forecast the power consumption
of the building and reduce power consumption.
Page | 41
Again, a CHMM model of 3-states, 7-emissions is used. The 3 states correspond
to fully-occupied, mildly-occupied, and unoccupied, trained using 10 days worth
of training data selected uniformly across the archived data set of 34 days. All 4
modalities are used: temperature, humidity, luminosity and noise.
Similar to the interpolation experiment, the variables in the experiment are: the
gap size to be extrapolated, as well as the time period across which extrapolation
is to be done. Results are illustrated in Figure 17 and Table 3 below:
Figure 17 Extrapolation results for different Gap Sizes and Time Periods
Table 3 Extrapolation statistics for different Gap Sizes and Time Periods
Page | 42
The results demonstrate that extrapolation generally works well with less than 30%
error rate for varying window sizes.
6. LIMITATIONS AND RECOMMENDATIONS
6.1 Highly Correlated Data
As discussed in Section 3.3.1, one of the limitations is that the observation data
may be highly correlated and result in a non-invertible correlation matrix. That
would force the usage of the multivariate CHMM to become unusable, as it is no
longer possible to computer the probability distribution of the multivariate normal
distribution.
Despite the usage of the pseudo-inverse derivation, occasionally the HMM
algorithm still does not compute as the rank of the matrix is simply too low. The
project sidesteps the limitation by re-learning the HMM model using the same
training data due to the non-deterministic property of learning, it is possible to
derive a valid correlation matrix with one to two retries.
Still, there is no guarantee that a successful CHMM model can be learnt on the
first try. Even more drastically, there is also no guarantee that with enough retries,
a successful CHMM can be relearnt. As a result, it is inevitable that human
operator intervention is necessary to make sure that a valid CHMM is prepared.
6.2 Dynamically Improving HMM
A possible improvement to the project would be to improve the HMM model
based on input readings. This improvement can be of two methods:
Page | 43
1. Supplement the existing learning data set by including the latest
observation data
2. Replace old learning data with the latest observation data
The choice of which method to use will depend on if the system that is being
observed is a highly dynamic system. If changes in the system are gradual, then it
would make sense to expand the HMM model with more training data. If the
system is an evolutionary one, then old training data would quickly become
irrelevant, and one should use the replacement method instead.
6.3 Decision Fusion
In Event Processing literature, what the project currently does is Data Fusion. In
Data Fusion, multiple modalities of data are fused together to form a single n-
dimension vector, as illustrated in Figure 18. So instead of multiple streams of
scalar data, there is a single stream of vector data. The benefit of Data Fusion is
that it is very attractive when there are very few sensor streams involved, as it
gives the highest accuracy and performance [8].
Figure 18 Data Fusion of streams into a vector before decision making
Page | 44
There is however, Decision Fusion as well, which could be a possible track of
investigation for future projects. Decision Fusion is when each stream of scalar
data has its own event processing node. This results in multiple streams of
decisions, each originating from a single stream of scalar. The multiple streams of
decisions are then fused into a single stream of vector decisions and further
computed as seen in Figure 19. At this point it resembles Data Fusion.
Decision Fusion is like a multi-tiered decision tree, where decisions are made
locally at the source, and then transmitted upstream where it is collated with other
decisions. It scales better in terms of computational and communications
complexity when more streams are added, as the transformation from data to
decision reduces the data complexity, akin to a data compression.
Figure 19 Decision Fusion of streams decision upstream into a final decision
Page | 45
7. CONCLUSION
In the thesis, the mathematical derivation and problem solving abilities of the
Hidden Markov Model have been explained, including the 3 conventional
scenarios of HMM: Evaluation, Decoding and Learning.
A well-trained HMM represents a physical system. Through HMM, it becomes
possible to categorically label an observation under a particular model. One can
also decode the underlying states that the system had gone through, based on the
observations given. Using a representative sample of observations, a HMM model
can also be constructed through likelihood maximization. And with a complete
HMM model, it becomes possible to simulate observations and forecast the future
state and observation of the system.
The thesis also demonstrates that when applied to the domain of occupancy
detection, the HMM algorithm is able discover the underlying occupancy state of
the environment using 4 modalities: temperature, humidity, luminosity and noise.
It is able to give the correct underlying occupancy with an error rate of 23.2% on
average. For interpolating between gaps in known occupancy states, it can
minimize its error rate to approximately 25.2% for long gaps and 12.5% for short
gaps. For forecasting future occupancy, the HMM model is accurate to within a
28.9% error rate for 12 hour extrapolations.
Page | A
APPENDIX A: BIBLIOGRAPHY
[1] Y. Agarwal, B. Balaji, R. Gupta, J. Lyles, M. Wei, and T. Weng, Occupancy-driven energy management for smart building automation, in Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building, 2010, pp. 16.
[2] C. Liao and P. Barooah, An integrated approach to occupancy modeling and estimation in commercial buildings, in American Control Conference (ACC), 2010, 2010, pp. 31303135.
[3] J. Lu, T. Sookoor, V. Srinivasan, G. Gao, B. Holben, J. Stankovic, E. Field, and K. Whitehouse, The smart thermostat: using occupancy sensors to save energy in homes, in Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems, 2010, pp. 211224.
[4] L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE, vol. 77, no. 2, pp. 257286, 1989.
[5] M. Stamp, A revealing introduction to hidden Markov models, Dep. Comput. Sci. San Jose State Univ., 2004.
[6] P. Blunsom, Hidden markov models, Lect. Notes August, 2004. [7] Jackson, HMM tutorial 4. [Online]. Available:
http://www.ee.surrey.ac.uk/Personal/P.Jackson/tutorial/. [8] R. R. Brooks, P. Ramanathan, and A. M. Sayeed, Distributed target
classification and tracking in sensor networks, Proc. IEEE, vol. 91, no. 8, pp. 11631171, Aug. 2003.
Page | B
APPENDIX B: TYPICAL AND ATYPICAL DAYS
Page | C
APPENDIX C: SOURCE CODE SNIPPET
Not all source code has been provided here, as the total source code amounts to
more than 4000 lines of C++ code and would be quite impossible to place here.
// ======================================================== // HMMx.cpp // * implements HMM functions // ======================================================== #include "HMMx.hpp" #include #include #include "../../../util/timer.hpp" using std::cout; inline bool generateNewInterpolateState(arma::Col &states, int numStates); template void HMMx::interpolate(int guessLen, unsigned int prevState, unsigned int forwState, arma::mat &guessObservations, arma::Col &guessStates, bool isCHMM) { TM_START; /** * Brute-force checking * We will only permute up to 8 states */ if (guessLen > 8) { // Generate the excess int toFillIn = guessLen - 8; arma::mat frontDataSeq; // remove dim arma::Col frontStateSeq(toFillIn); if (isCHMM) { HMMx *chmm = (HMMx*)this; chmm->Generate(toFillIn, frontDataSeq, frontStateSeq, prevState); } else this->Generate(toFillIn, frontDataSeq, frontStateSeq, prevState); // Interpolate the rest arma::mat backDataSeq(guessObservations.n_rows, 8);
Page | D
arma::Col backStateSeq(8); this->interpolate(8, frontStateSeq(toFillIn-1), forwState, backDataSeq, backStateSeq, isCHMM); guessObservations.cols(0, toFillIn-1) = frontDataSeq; guessObservations.cols(toFillIn, guessLen-1) = backDataSeq; guessStates.rows(0, toFillIn-1) = frontStateSeq; guessStates.rows(toFillIn, guessLen-1) = backStateSeq; } else { double bestLikelihood = 0; double currLikelihood = 1; arma::Col bestTrial(guessLen, arma::fill::zeros); arma::Col currTrial(guessLen, arma::fill::zeros); int numStates = this->Transition().n_cols; while (true) { // Iterate through another state bool validity = generateNewInterpolateState(currTrial, numStates); if (!validity) // no more new states available break; // Evaluate probability currLikelihood = 1; for (int i=0; iTransition()(currTrial(i), prevState); else currLikelihood *= this->Transition()(currTrial(i), currTrial(i-1)); } currLikelihood *= this->Transition()(forwState, currTrial(guessLen-1)); // Evaluate probability (is it better?) if (currLikelihood > bestLikelihood) { bestLikelihood = currLikelihood; bestTrial = currTrial; } } guessStates = bestTrial; // generate emissions
Page | E
for (int i=0; iEmission().at(guessStates(i)); val = gmm.Random(); } else val = this->Emission().at(guessStates(i)).Random(); guessObservations.col(i) = val; } } TM_STOP; PRINTTIME; } inline bool generateNewInterpolateState(arma::Col &states, int numStates) { // backtracking int i = states.n_rows-1; while (true) { if ((int)states(i) != numStates-1) // if we still haven't iterated all for curr state index { states(i) ++; for (unsigned int j=i+1; j
Page | F
// ======================================================== //HMMx.hpp // * header file for HMMx.cpp // ======================================================== #ifndef HMMX_HPP_ #define HMMX_HPP_ #include #include #include #include #include #include "distribution/DiscreteDistri.hpp" using namespace mlpack::hmm; using namespace mlpack::gmm; /** * Changes are: * - Transition states by default are no longer homogeneous. */ template class HMMx : public HMM { bool isCHMM; public: HMMx(const size_t states, const Distribution emissions, bool isCHMM, const double tolerance = 1e-5): HMM(states, emissions, tolerance) { this->isCHMM = isCHMM; double variance = this->Transition().at(0) * 0.1; srand(time(NULL)); for (unsigned int i=0; iTransition().size(); ++i) { if (rand()%2 == 0) this->Transition().at(i) += variance * rand() / RAND_MAX; else this->Transition().at(i) -= variance * rand() / RAND_MAX;; } // normalise for (unsigned int i=0; iTransition().n_cols; ++i) {
Page | G
double sum = accu(this->Transition().col(i)); this->Transition().col(i) /= sum; } } HMMx(const arma::mat& transition, const std::vector& emission, bool isCHMM, const double tolerance = 1e-5): HMM (transition, emission, tolerance) { this->isCHMM = isCHMM; } /** * @return 1 if GMM, 0 if not. * * Abandoned. You need to uncast it from a pointer or you access invalid memory anyway */ /*int distributionType() const { if (isCHMM) return 1; else return 0; }*/ /** * Assuming you have a break in observation results, interpolate will reconstruct the missing bits for you. */ void interpolate(int guessLen, unsigned int prevState, unsigned int forwState, arma::mat &guessObservations, arma::Col &guessStates, bool isCHMM); void interpolate(int guessLen, const arma::mat &prevObservations, const arma::mat &forwObservations, arma::mat &guessObservations, arma::Col &guessStates, bool isCHMM) { arma::Col prevStates; this->Predict(prevObservations, prevStates); arma::Col forwStates; this->Predict(forwObservations, forwStates); this->interpolate(guessLen, prevStates(prevStates.n_rows-1), forwStates(0), guessObservations, guessStates, isCHMM); } }; #endif /* HMMX_HPP_ */
Page | H
Page | I
// ======================================================== // HMMFunc.cpp // * saves the HMM model to disk and loads it // ======================================================== #include "HMMFunc.h" #include #include #include #include #include "../../../util/fileExists.hpp" #include "../../model/hmm/HMMx.hpp" #include "../../model/hmm/metadata.h" #include "../../model/hmm/distribution/DiscreteDistri.hpp" #include "../../model/map/mapper_kmeans.h" #include "../../model/map/mapperMv_kmeans.h" using std::string; using std::vector; using std::ifstream; using std::ostringstream; vector getAvailable(char* searchStr, const HMMFunc *hmmFunc); // --- // Properties // --- /** Get a list of all available models of HMM that we can load. */ vector HMMFunc::getAvailableModels() const { return getAvailable((char *) "hmm", this); } /** Get a list of all available training sets that we can use. */ vector HMMFunc::getAvailableTrains() const { return getAvailable((char *) "train0", this); } /** Get a list of all available stuff that we can load. */ vector getAvailable(char* searchStr, const HMMFunc *hmmFunc) { vector results; int lastResult = 0; for (int i=0; i
Page | J
if (file_exist(fileURI.str().c_str())) { results.push_back(i); lastResult = i; } // DEBUG /*else std::cout
Page | K
// Will always be false unless there is one True // --- if (states != NULL) { sprintf(fileURI, "%s%d.trainStates%d", this->getFileSaveName().c_str(), trainIndex, i); result |= !(states->at(i).save(fileURI, arma::arma_ascii)); } } // Searches for any file after this index and deletes it // Important because this is our indicator for vector termination sprintf(fileURI, "%s%d.train%d", this->getFileSaveName().c_str(), trainIndex, (int)data.size()); if (file_exist(fileURI)) remove(fileURI); // If there is no state info, we make sure there is no state file saved as well if (states == NULL) { sprintf(fileURI, "%s%d.trainStates0", this->getFileSaveName().c_str(), trainIndex); if (file_exist(fileURI)) remove(fileURI); } std::cout
Page | L
sprintf(fileURI, "%s%d.trainStates0", this->getFileSaveName().c_str(), trainIndex); if (file_exist(fileURI)) states = new vector(); for (int i=0; ; ++i) { sprintf(fileURI, "%s%d.train%d", this->getFileSaveName().c_str(), trainIndex, i); // Check if there is any matrices left to read if (!file_exist(fileURI)) break; // Load it! arma::mat myMatrix; myMatrix.load(fileURI, arma::arma_ascii); data.push_back(myMatrix); // --- if (states != NULL) { sprintf(fileURI, "%s%d.trainStates%d", this->getFileSaveName().c_str(), trainIndex, i); arma::Col colvec; colvec.load(fileURI, arma::arma_ascii); states->push_back(colvec); } } // Metadata loading HMM_meta metadata; { sprintf(fileURI, "%s%d.trainMeta", this->getFileSaveName().c_str(), trainIndex); FILE* fp = fopen(fileURI, "r"); // obtain filesize fseek(fp, 0, SEEK_END); long lsize = ftell(fp); rewind(fp); char *buffer = new char[lsize]; fread(buffer, sizeof(char), lsize, fp); string text = string(buffer); metadata = HMM_meta::fromString(text); delete [] buffer; } return metadata;
Page | M
} // --- // --- /** Load HMM model from local file */ HMM_meta HMMFunc::load(HMMx* &hmm, int hmmIndex) const { // Check if index exists vector indices = this->getAvailableModels(); if (std::find(indices.begin(), indices.end(), hmmIndex) == indices.end()) std::cout
Page | N
mapper = new KMeansMapper(keysVal); } else // multi-variate { // load keysVal vector keysVal; for (unsigned int i=0; i
Page | O
sprintf(fileURI, "%s.hmmEmit%dCovar%d", basicFileURI.c_str(), i, j); covarSingle.load(fileURI); mean.push_back(meanSingle); covar.push_back(covarSingle); } GMM gmm(mean, covar, weight); emit.push_back(gmm); } hmm = (HMMx*) new HMMx(transition, emit, metadata.tolerance); printf("[DEBUG] dimension is %d", hmm->Dimensionality()); } return metadata; } /** Save HMM model to local file */ int HMMFunc::save(HMM_meta metadata, const HMMx* hmm, int hmmIndex) const { // Find an index for it if (hmmIndex == -1) { vector indices = this->getAvailableModels(); while (1) { hmmIndex ++; if (std::find(indices.begin(), indices.end(), hmmIndex) == indices.end()) break; } } char fileURI[999]; string basicFileURI; { ostringstream oss; oss getFileSaveName()
Page | P
// Save transition hmm->Transition().save((basicFileURI+".hmmTrans").c_str(), arma::arma_ascii); // Save emission if (metadata.isCHMM == false) { vector emit = hmm->Emission(); for (unsigned int i=0; iTransition().n_cols; ++i) { sprintf(fileURI, "%s.hmmEmit%d", basicFileURI.c_str(), i); emit[i].Probabilities().save(fileURI, arma::arma_ascii); } // Save mapper if (typeid(emit[0].getMapper()).name() == typeid(KMeansMapper).name()) { KMeansMapper *mapper = (KMeansMapper*) &(emit[0].getMapper()); vector keysVal = mapper->getKeysVal(); // Not doing keys because it is boost::unordered_map, very troublesome // Also can be derived from keysVal later anyway. // put keysVal into a row vector arma::rowvec *keyVal_mat = new arma::rowvec(keysVal.size()); for (unsigned int j=0; jat(j) = keysVal[j]; // save sprintf(fileURI, "%s.hmmMap", basicFileURI.c_str()); keyVal_mat->save(fileURI, arma::arma_ascii); // delete delete keyVal_mat; } else { KMeansMvMapper *mapper = (KMeansMvMapper*) &(emit[0].getMapper()); vector keysVal = mapper->getKeysVal(); // put keysVal into a matrix arma::mat *keyVal_mat = new arma::mat(mapper->get_dimensions(), keysVal.size()); for (unsigned int j=0; jcol(j) = keysVal[j];
Page | Q
// save sprintf(fileURI, "%s.hmmMap", basicFileURI.c_str()); keyVal_mat->save(fileURI, arma::arma_ascii); // delete delete keyVal_mat; } } else { vector *emit = (vector*) &(hmm->Emission()); for (unsigned int i=0; isize(); ++i) // for each state { sprintf(fileURI, "%s.hmmEmit%dWeight", basicFileURI.c_str(), i); emit->at(i).Weights().save(fileURI, arma::arma_ascii); vector covar = emit->at(i).Covariances(); vector mean = emit->at(i).Means(); for (unsigned int j=0; jat(i).Gaussians(); ++j) // for each state there are Gaussians { sprintf(fileURI, "%s.hmmEmit%dCovar%d", basicFileURI.c_str(), i, j); covar[j].save(fileURI, arma::arma_ascii); sprintf(fileURI, "%s.hmmEmit%dMean%d", basicFileURI.c_str(), i, j); mean[j].save(fileURI, arma::arma_ascii); } } } return hmmIndex; }
Page | R
// ======================================================== // HMMx.hpp // * header file for HMMx.cpp // ======================================================== #ifndef HMMFUNC_H_ #define HMMFUNC_H_ #include #include #include #include "../../model/hmm/distribution/DiscreteDistri.hpp" #include "../../model/hmm/HMMx.hpp" #include "../../model/hmm/metadata.h" using std::string; using std::vector; class HMMFunc { string fileSaveName; public: /** * @param _fileSaveName Assumed unique file access location for saving to */ HMMFunc(string _fileSaveName):fileSaveName(_fileSaveName) {} // --- // Properties // --- string getFileSaveName() const { return fileSaveName; } /** Get a list of all available models of HMM that we can load. */ vector getAvailableModels() const; /** Get a list of all available training sets that we can use. */ vector getAvailableTrains() const; // --- // Methods (Training Set) // --- /** Save data as training set for HMM * @param trainIndex Note that this is a different index from HMM model saving * @param states If there is state info present, please put it in. Else leave as NULL.
Page | S
* @return True if saving was successful. */ bool saveTrainingSet(HMM_meta metadata, const vector &data, const vector *states, int trainIndex) const; /** Load data as training set for HMM * @param trainIndex Note that this is a different index from HMM model loading * @param states If there is state info present, it will be loaded into the pointer. Else it is NULL. */ HMM_meta loadTrainingSet(vector &data, vector* &states, int trainIndex) const; // --- // Methods (HMM Models) // --- /** Save HMM model to local file */ int save(HMM_meta metadata, const HMMx* hmm, int hmmIndex=-1) const; /** Load HMM model from local file */ HMM_meta load(HMMx* &hmm, int hmmIndex) const; }; #endif /* HMMFUNC_H_ */
ABSTRACTACKNOWLEDGEMENTSTABLE OF CONTENTSLIST OF TABLESLIST OF FIGURESLIST OF SYMBOLS AND ABBREVIATIONS1. INTRODUCTION1.1 Background & Motivation1.2 Objective of Thesis1.3 Thesis Organisation
2. LITERATURE REVIEW2.1 Occupancy Detection2.1.1 Passive Infrared Sensors2.1.2 Simulation Modelling2.1.3 Hidden Markov Modelling
2.2 Mathematical Tools2.2.1 Regression Algorithms2.2.2 Clustering Algorithms2.2.3 Stochastic Classifier Algorithms
3. HIDDEN MARKOV MODELS3.1 Markov Chain3.2 Hidden Markov Chain3.2.1 Problem 1 of HMM Evaluation3.2.2 Problem 2 of HMM Decoding3.2.3 Problem 3 of HMM Learning (Estimation)3.2.4 Problem 3 of HMM Learning (Baum-Welch)3.2.5 Problem 4 of HMM Generation
3.3 Continuous HMM3.3.1 Multivariate CHMM
4. HARDWARE AND SOFTWARE IMPLEMENTATION4.1 Embedded Board4.1.1 Comparison of Features
4.2 Software Development Platform4.3 Software Architecture4.4 Software Implementation4.4.1 Serialization of HMM4.4.2 Scheduling Processing and Server Mutex Functions
5. EXPERIMENTS & RESULTS5.1 Dataset5.1.1 Ground Truth Value
5.2 Experimental Setup5.3 Capability Test5.3.1 Test on Evaluating Day of the Week
5.4 Test on Occupancy Decoding5.5 Test on Occupancy Interpolation5.6 Test on Occupancy Extrapolation
6. LIMITATIONS AND RECOMMENDATIONS6.1 Highly Correlated Data6.2 Dynamically Improving HMM6.3 Decision Fusion
7. CONCLUSIONAPPENDIX A: BIBLIOGRAPHYAPPENDIX B: TYPICAL AND ATYPICAL DAYSAPPENDIX C: SOURCE CODE SNIPPET