Event Processing at Sensor Nodes in the Cloud

Embed Size (px)

DESCRIPTION

A description of preliminary analysis of how to perform event processing through the use of Hidden Markov Models, as applied to a room monitoring system

Citation preview

  • Event Processing at Sensor Nodes in the

    Cloud

    Submitted by Lee Jun Hui A0067228B

    Department of Electrical & Computer Engineering

    In partial fulfilment of the requirements for the Degree of

    Bachelor of Engineering National University of Singapore

  • Page | i

    ABSTRACT

    Engineers are able to acquire large streams of environmental data, often from

    scattered independent sensors. To properly make sense of the data however, there

    needs to be a system that can handle the incoming streams as well as a

    mathematical analysis to make sense of the data.

    This project applies the event processing problem to the domain of occupancy

    detection. Conventional occupancy detection approaches rely on multiple

    tolerance checks or a simulation modelling.

    The contribution of this project is the usage of the statistical properties of the data

    through a Hidden Markov Model (HMM) to detect and forecast events emerging

    from the hidden states of a multi-dimensional sensor stream. The underlying

    occupancy state of the environment is deduced, as well as forecasting the short-

    term future occupancy. The system relies on a pre-trained HMM model and

    calculates in real-time. Decoded occupancy states, as well as interpolated and

    extracted states, are found to be accurate within 30% error margin on average.

  • Page | ii

    ACKNOWLEDGEMENTS

    The author would like to express his greatest gratitude towards his supervisor,

    Professor Tham Chen-Khong, for his guidance and support towards my project.

    He is grateful to be able to work on such an interesting statistical pattern analysis

    project.

    The author would also like to thank his examiner, Dr. Mohan Gurusamy, for the

    time spent on the assessment of the project.

    Finally, the author would like to show his appreciation to his graduate research

    assistant, Li Qiang, for his technical guidance and encouragement throughout the

    course of the project.

  • Page | iii

    TABLE OF CONTENTS

    ABSTRACT ............................................................................................................. i

    ACKNOWLEDGEMENTS .................................................................................... ii

    TABLE OF CONTENTS ....................................................................................... iii

    LIST OF TABLES ................................................................................................. vi

    LIST OF FIGURES .............................................................................................. vii

    LIST OF SYMBOLS AND ABBREVIATIONS ................................................ viii

    1. INTRODUCTION .............................................................................................. 1

    1.1 Background & Motivation ............................................................................ 1

    1.2 Objective of Thesis ....................................................................................... 1

    1.3 Thesis Organisation ....................................................................................... 2

    2. LITERATURE REVIEW.................................................................................... 3

    2.1 Occupancy Detection .................................................................................... 3

    2.1.1 Passive Infrared Sensors ........................................................................ 3

    2.1.2 Simulation Modelling ............................................................................ 3

    2.1.3 Hidden Markov Modelling ..................................................................... 4

    2.2 Mathematical Tools ....................................................................................... 4

    2.2.1 Regression Algorithms ........................................................................... 4

    2.2.2 Clustering Algorithms ............................................................................ 5

    2.2.3 Stochastic Classifier Algorithms ............................................................ 6

    3. HIDDEN MARKOV MODELS ......................................................................... 7

  • Page | iv

    3.1 Markov Chain ............................................................................................... 7

    3.2 Hidden Markov Chain ................................................................................... 8

    3.2.1 Problem 1 of HMM Evaluation......................................................... 10

    3.2.2 Problem 2 of HMM Decoding .......................................................... 11

    3.2.3 Problem 3 of HMM Learning (Estimation) ...................................... 12

    3.2.4 Problem 3 of HMM Learning (Baum-Welch) .................................. 13

    3.2.5 Problem 4 of HMM Generation ........................................................ 15

    3.3 Continuous HMM ....................................................................................... 16

    3.3.1 Multivariate CHMM ............................................................................ 18

    4. HARDWARE AND SOFTWARE IMPLEMENTATION ............................... 20

    4.1 Embedded Board ......................................................................................... 20

    4.1.1 Comparison of Features ....................................................................... 21

    4.2 Software Development Platform ................................................................. 23

    4.3 Software Architecture ................................................................................. 24

    4.4 Software Implementation ............................................................................ 26

    4.4.1 Serialization of HMM .......................................................................... 26

    4.4.2 Scheduling Processing and Server Mutex Functions ........................... 29

    5. EXPERIMENTS & RESULTS ......................................................................... 33

    5.1 Dataset ......................................................................................................... 33

    5.1.1 Ground Truth Value ............................................................................. 34

    5.2 Experimental Setup ..................................................................................... 35

  • Page | v

    5.3 Capability Test ............................................................................................ 35

    5.3.1 Test on Evaluating Day of the Week ................................................... 36

    5.4 Test on Occupancy Decoding ..................................................................... 37

    5.5 Test on Occupancy Interpolation ................................................................ 38

    5.6 Test on Occupancy Extrapolation ............................................................... 40

    6. LIMITATIONS AND RECOMMENDATIONS .............................................. 42

    6.1 Highly Correlated Data ............................................................................... 42

    6.2 Dynamically Improving HMM ................................................................... 42

    6.3 Decision Fusion ........................................................................................... 43

    7. CONCLUSION ................................................................................................. 45

    APPENDIX A: BIBLIOGRAPHY ......................................................................... A

    APPENDIX B: TYPICAL AND ATYPICAL DAYS ............................................ B

    APPENDIX C: SOURCE CODE SNIPPET ........................................................... C

  • Page | vi

    LIST OF TABLES

    Table 1 Comparison of features of 3 Embedded Boards ................................... 21

    Table 2 Interpolation statistics for different Gap Sizes and Time Periods ........ 40

    Table 3 Extrapolation statistics for different Gap Sizes and Time Periods ....... 41

  • Page | vii

    LIST OF FIGURES

    Figure 1 2-state Markov Chain ............................................................................ 7

    Figure 2 2-state 3-emission Hidden Markov Chain ............................................. 8

    Figure 3 Gaussian Mixture Model within a HMM ............................................ 17

    Figure 4 A PandaBoard embedded device showing cable and connections ...... 23

    Figure 5 UML Diagram of Software Architecture ............................................. 25

    Figure 6 Directory of a serialized HMM model and training set ....................... 27

    Figure 7 Serialized Contents of HMM Metadata ............................................... 28

    Figure 8 Serialized Contents of HMM State Transition Matrix ........................ 28

    Figure 9 Serialized Contents of a GMM emission mean (top) and covariance

    (bottom) ................................................................................................................. 29

    Figure 10 Layering Server and Processing functions into Network and

    Application side logic ........................................................................................... 30

    Figure 11 Process Flowchart for Mutex Access ................................................ 31

    Figure 12 Sensor measurements across a 4-day period ..................................... 33

    Figure 13 Occupancy inferred from power consumption .................................. 34

    Figure 14 Log-likelihood of observation belonging to a Weekday Model ........ 37

    Figure 15 Decoded Error % for Occupancy ....................................................... 38

    Figure 16 Interpolation results for different Gap Sizes and Time Periods ........ 40

    Figure 17 Extrapolation results for different Gap Sizes and Time Periods ....... 41

    Figure 18 Data Fusion of streams into a vector before decision making ........... 43

    Figure 19 Decision Fusion of streams decision upstream into a final decision 44

  • Page | viii

    LIST OF SYMBOLS AND ABBREVIATIONS

    PIR Passive Infrared [Sensor]

    HVAC Heating, Ventilation, and Air-conditioning

    HMM Hidden Markov Model

    DHMM Discrete Hidden Markov Model

    CHMM Continuous Hidden Markov Model

    GMM Gaussian Mixture Model

    K-Means K-Means Clustering Algorithm

  • Page | 1

    1. INTRODUCTION

    1.1 Background & Motivation

    With the arrival of smart embedded systems that are capable of high processing

    load, as well as lightweight sensors that can be cheaply deployed, engineers have

    gained the ability to monitor and analyse our environment in greater detail than

    before. Engineers can plant multiple sensor devices within the living environment

    that will gather and relay measurements towards a central node for further

    processing, detecting event changes in the environment as it occurs.

    With so many incoming data streams, there is a great opportunity to make use of

    statistical and probability models to extract greater information and detect events

    that may not be readily observable. Engineers can also make use of the temporal

    aspect of the data to place additional constraints on event detection, as well as

    make inferences about the future. Such an event detection system could be used to

    facilitate things such as abnormalities in the health of a patient, or making sense

    of data through the detection of recurrent patterns.

    1.2 Objective of Thesis

    This thesis aims to demonstrate that statistical modelling and machine learning

    can be used to effectively detect events when applied to the domain of occupancy

    detection. The project attempts to model the occupancy state of a typical student

    residential suite. It demonstrates that it is possible to deduce the occupancy of the

    suite through secondary data provided by environmental sensors, and also make

    short-term forecasts as well.

  • Page | 2

    The scenario of occupancy detection and modelling has many practical purposes

    as it allows a building facility manager to estimate human traffic loads in advance.

    Such information would be valuable for safety precautions and can also be

    exploited for sales and marketing purposes in shopping districts. On a smaller

    scale, a facility manager of a small office can also monitor the occupancy of its

    rooms and cubicles and tweak the heating, ventilation, and air conditioning

    (HVAC) policies of the building to optimize energy consumption.

    1.3 Thesis Organisation

    For the report, Chapter 2 will be a literature review covering existing methods for

    occupancy detection related to HVAC operations. The various possible

    mathematical tools that may be used to help identify events are also mentioned. In

    Chapter 3, readers will be presented with the basics of the HMM, which is the

    statistical model that the project is using, as well as the variants of HMM that the

    project has evaluated experimentally. Chapter 4 will discuss the hardware

    components and the software platform that the demonstration program has been

    deployed on, and also talk about the software architecture of the project. Chapter

    5 presents the experimental results that have been obtained. Chapter 6 discusses

    some of the issues that constrained the project and suggests further improvement

    work. Finally, Chapter 7 will conclude the results and provide details on how

    improvements can be made in the future.

  • Page | 3

    2. LITERATURE REVIEW

    2.1 Occupancy Detection

    2.1.1 Passive Infrared Sensors

    Commercially, the most popular means of occupancy detection is via the use of a

    Passive Infrared (PIR) sensor. It measure the amount of infrared light (IR) that

    reaches its field of view; when there is a change in the IR radiation a movement

    event is registered by the sensors and that indicates the presence of an occupant.

    However, such a system often generates false negatives as it assumes a non-idle

    occupant.

    One way to improve the PIR sensor is to pair it up with a reed switch placed on a

    doorway, which can detect whether the door is open or closed [1]. Oftentimes, by

    applying additional modes of measurements, further constraints are imposed onto

    the detection and achieve a greater accuracy.

    2.1.2 Simulation Modelling

    A paper from Liao and Barooah [2] made use of machine learning and simulation

    modelling. To improve sensor readings, a crowd simulation was constructed from

    past observation data. Due to the complexity of the simulation, it was run off-line,

    and the reduced-order statistics of the simulation results were compiled. These

    reduced-order statistics were then compared with present observational data to

    estimate the occupancy level. However, such a method requires a non-trivial

    simulation model as well as reliable room occupancy probabilities that have been

    estimated though long term observations of said environment.

  • Page | 4

    2.1.3 Hidden Markov Modelling

    The idea of using Hidden Markov Models to model occupancy is suggested in a

    smart thermostat project that uses a PIR sensor and a reed switch on the doorways

    [3]. The smart thermostat helps to prepare a comfortable environment for its

    house occupants by pre-empting their departure from and arrival home through

    the predictive ability of a HMM. The accuracy of the HMM predictive model

    (88%) over reactive algorithms (78%) is a convincing proof that HMM models

    can be effectively applied towards the domain of home occupancy.

    2.2 Mathematical Tools

    There are several mathematical tools that make use of stochastic processes,

    sequence labelling, and clustering algorithms to help bring clarity to an otherwise

    chaotic data collection.

    2.2.1 Regression Algorithms

    In regression algorithms, there are 2 notable variants of unsupervised regression

    algorithms: the Independent Component Analysis (ICA) and the Principal

    Component Analysis (PCA). These 2 algorithms attempt to separate a data set into

    additive subcomponents, where each subcomponent is maximally independent or

    has maximum variance, respectively.

    An example of ICA utility is in electroencephalography (EEG). It can

    automatically identify a number of channels that are statistically independent from

    each other. White noise as well as EEG artefacts like ocular movement can be

    identified and subtracted additively while preserving core data. However, ICA is

    not as relevant in this project as it assumes that the identified channels are

  • Page | 5

    statistically independent from each other. This is not the case environmental

    data in the domain of HVAC are often heavily correlated.

    One of the applications of PCA is in the field of data compression and

    visualisation. Given a collection of n-dimensional vectors, PCA decomposes it

    into eigenvalue and eigenvector pairs. Eigenvectors which are maximally

    orthogonal are kept, the rest discarded. In that sense, a multi-dimensional, or a

    multi-axial graph, has been remapped to new eigenvectors having discarded

    insignificant axes of low orthogonality. The result is a collection of vectors of

    lower dimensionality, but which is still able to reconstruct the original dataset.

    Thus, the purpose of PCA is dimensionality reduction. While it is certainly useful

    as a pre-processing step, the project does not require a lot of dimensionality (due

    to limited sensor types). A simple cluster analysis, as described in section 2.2.2,

    will suffice.

    2.2.2 Clustering Algorithms

    As mentioned in section 2.2.1, cluster analysis is a more appropriate tool to pre-

    process our data. This is because PCA operates on variable reduction, while

    cluster analysis works on observation reduction. Essentially, cluster analysis

    groups like observations. It reduces number of unique observations into a

    limited set, somewhat akin to an Analogue-to-Digital Converter (ADC) that

    quantizes continuous values to discrete buckets.

    One of the more well-known clustering algorithms is the K-Means Clustering

    Algorithm (K-Means). By making use of a metric between members, often the

    Squared Euclidean Distance, a data set is grouped into k clusters where members

    of each cluster have the nearest metric. This quantization process is crucial to the

  • Page | 6

    usage of Discrete Hidden Markov Models (DHMM) in order to minimize the

    number of unique observation vectors, as described in Chapter 3.

    2.2.3 Stochastic Classifier Algorithms

    Stochastic classifiers attempt to put data into categorical labels based on their

    stochastic attributes. Because the project is operating on data streams, there is the

    added dimension of time open for exploitation which leads to the Markov Chain

    which is a temporal process based on stochastic probabilities.

    A Markov Chain is a system that is able to transition to one of several states

    depending on a stochastic process. To classify things however, the concept needs

    to be further extended to a Hidden Markov Chain (HMM). In a HMM, the states

    of the system are not observable; one can only observe the system through its

    emission observations, whose appearance is statistically dependent on the

    underlying hidden state.

    Through observations of the system over a period of time, it is possible to

    decipher the underlying state transitions that have led to the observations. It is also

    possible to match up observations to several models of HMM whichever model

    fits the observations best would be the label under which the observations is

    classified under.

    Because HMM is a robust and well-studied model that classifies the data on a

    temporal dimension, the project chooses HMM to be its mathematical tool to

    identify incoming events from the environmental sensor data.

  • Page | 7

    3. HIDDEN MARKOV MODELS

    In this Chapter, the report presents the concepts behind the Hidden Markov Model

    (HMM) in greater detail, as it is the mathematical tool the project uses to identify

    and process incoming events.

    3.1 Markov Chain

    A Markov Chain is a discrete-time system that transitions from one state to

    another via a random process. Each state has its own static, random probability of

    transition, and is not affected by previous states.

    Take for instance, a Markov chain modelling the breakfast habits of an individual.

    It consists of 2 states which represent what the individual had for breakfast

    cereal or bread. Assuming that the system is memory-less, all that dictates the

    next breakfast is the transition probabilities of the current breakfast. This is

    represented by the diagram Figure 1 below:

    Figure 1 2-state Markov Chain

    According to Figure 1, if the current breakfast is cereal, there is a 40% chance that

    the next breakfast is still cereal, and a 60% chance it might be bread. If the current

    breakfast is bread, there is a 30% chance the next breakfast is still bread, but a 70%

  • Page | 8

    chance it could be cereal. This series of probabilities can be represented as a

    transition matrix:

    0.4 0.6 0.7 0.3

    3.2 Hidden Markov Chain

    A Hidden Markov Chain (HMM) extends the concept of the Markov Chain. Now,

    the states of the system are hidden and cannot be observed. However, the outputs

    of the state, the emissions, are observable. Again, like the states, each emission

    belongs to a emission space that is different for each state, and has random

    probability of emission as well.

    Using the example in section 3.1, assume that the cereal and bread state has 3

    possible emissions: satiety, hunger, bloated. Each state however has a different

    distribution function of emissions. This is represented by the updated HMM in

    Figure 2 below:

    Figure 2 2-state 3-emission Hidden Markov Chain

    These emissions can be represented by the following emission probability matrix:

  • Page | 9

    0.70 0.08 0.22 0.35 0.65 0

    Sometimes it is also useful to define an additional initial state distribution matrix.

    The initial state distribution matrix will give the probability that the system begins

    in a particular state. If for example, the system above has a 90% chance of starting

    in the cereal state, then the matrix is as follows:

    0.9 0.1 To illustrate the usefulness of HMM, it is helpful to refer to the 3 conventional

    problems that HMM can tackle, as famously described in Rabiners 1989 paper

    [4]:

    1. Evaluation Given an observation sequence, find the likelihood that it was

    generated by this HMM model. Useful for comparing different models

    effectiveness in modelling a particular phenomenon, or classifying

    phenomenon according to known models.

    2. Decoding Given an observation sequence, infer the most probable

    hidden state transitions that have led up to the observations given. Useful

    for uncovering the hidden states of a system.

    3. Learning Given an observation sequence, constructs a HMM that is most

    likely to have generated such an observation sequence. Useful for creating

    a model using real-world data.

    All 3 problems are relevant to the project and have been implemented. There is

    also a 4th problem that is implemented in the project but is not usually included in

    the list of 3 HMM problems in literature:

  • Page | 10

    4. Generation Simulates future output by running the model, or fills up

    gaps in observations. Useful for anticipating future changes in the system

    or interpolating lost data packets.

    The report will now go into detail on how each problem can be solved via HMM.

    3.2.1 Problem 1 of HMM Evaluation

    As mentioned, evaluation solves the problem where, given an observation

    sequence, one has to find the likelihood that it was generated by a HMM model .

    Assume that the HMM being used is the same one as that in Figure 2. Now also

    assume that the hidden state sequence X is cereal, bread, bread. If given an

    observation sequence O of bloated, hungry, full, it is possible to calculate the

    likelihood of such a sequence appearing, (,|). The formula is:

    (|, ) = (|) (|) (|) = () () ()

    (|) = ( ) (|) (|) = (,|) = (|, ) (|)

    Problem 1 of HMM, evaluation, demands to know the likelihood of an

    observation sequence O given a HMM . This is basically (|), which is the sum of (,|) for all possible permutations of observations:

  • Page | 11

    (|) = (,|)

    Through a technique called the forward-pass algorithm (), it is possible to optimize the computation time complexity of this operation, as detailed in

    Stamps paper [5], reducing the equation to the following:

    ( ) = ( ,1, ,, = |)= 1( )1

    =0 ()

    (|) = 1( )1 =0

    The forward-pass algorithm () essentially computes the probability that state i is observed at time t, given the partial observation sequence from 0 to time t.

    Now that it is possible to evaluate the likelihood of an observation sequence, by

    pairing up the observation sequence O with different HMM models , it is

    possible to find the most fitting HMM model. The systems observation is

    therefore classified under the label of this particular model.

    max(|)

    3.2.2 Problem 2 of HMM Decoding

    Problem 2 of HMM, which is decoding, tries to find the most probable hidden

    state transitions that have led up to the observations given. The algorithm used is a

    Dynamic Programming algorithm, which permutes through all possible

  • Page | 12

    combinations of a state at each instance of time. For example, at time = 0, the

    formula is:

    =0( ) = (0) Armed with calculations of every state, the time is advanced forward by one unit,

    to determine the previous state j that will give the highest likelihood in the new

    time instance:

    =1( ) = max [=0()(1)]

    This can be generalized to:

    () = max

    [1()()] Consequently, by finding the maximum 1() and by recording every state j during the process, the state sequence most probable in generating the given

    observation sequence can be found.

    3.2.3 Problem 3 of HMM Learning (Estimation)

    There are 2 ways to do HMM learning, which is the construction of a HMM based

    on training observation samples. The 2 methods are estimation and Baum-Welch.

    Both are used in the project.

    Estimation-based learning is described in Blunsoms paper as a supervised

    approach to training [6]. The report, additionally, sees it as a way to build HMM

    models for solving Problem 2 of HMM. This is because the project requirement

    for decoding is to decode the hidden occupancy state of the system. In order to

    control the labeling process of the hidden states, the learning algorithm needs to

  • Page | 13

    be fed training observations that have been tagged with known hidden states.

    Estimation process can be fed tagged training observations; Baum-Welch does not.

    The theory behind estimation-based learning is simple enough. Assuming that the

    training observation sets are representation of the population, the frequency of

    occurrence of a particular state in the training set shall approximate that states

    probability distribution function, as described mathematically:

    ( ) = #

    Many other attributes of the HMM can be estimated similarly:

    ( | ) = = # & # 3.2.4 Problem 3 of HMM Learning (Baum-Welch)

    The second method of doing HMM learning is to make use of the Baum-Welch

    algorithm. It randomly initializes the HMM parameters, and then uses expectation

    maximization to adjust the parameters of the HMM model to a local maximum.

    In this project, the Baum-Welch method is preferred over the estimation method

    for learning a HMM model to solve Problem 1 of HMM evaluation. Baum-

    Welch method is able to learn a much more precise HMM model. The downside

    is that such a model has illegible hidden states. They are not conveniently

    human-labelled hidden states mapped to occupancy, as the Baum-Welch derives

    states by local optimization. With illegible hidden states, they are meaningless if

    decoded. However, this setback is not applicable to model evaluation (Problem 1),

    as only the evaluated likelihood is relevant, not the hidden states. Hence Baum-

  • Page | 14

    Welch learning is preferred over estimation learning for crafting HMM models for

    Problem 1.

    What follows is an explanation of the Baum-Welch algorithm.

    Note: The derivation of the Baum-Welch algorithm is rather complicated, and one

    may wish to skip this section if desired.

    Firstly, one needs to define 3 more parameters: the backward pass algorithm (), the gamma (), and the xi (, ). The backward pass () is similar to the forward pass (), except that instead of using the partial observation sequence from 0 to time t, it uses the partial

    sequence from time t to the final time T-1 to find the probability of seeing state i

    at time t:

    ( ) = (+1,+2, ,1| = , ) = 1, = 1 (+1)+1( )1

    =0 , 0 < 1 The gamma () represents the probabilities of the current state being state i. If a state i gives maximum value at time t, then that state is the most likely state at

    time t.

    ( ) = ( = |, ) = ()() ()()1 =0

    The final parameter to define is the xi (, ), which is the probability that current state is state i and next state is state j at time t.

  • Page | 15

    (, ) = ( , + 1|, ) = ()(+1)+1() ()(+1)+1()1=01 =0

    () = (, )1=0

    From these 3 parameters, it is possible to obtain estimates of the HMM models:

    = 0() = (, )2

    =0

    ()2=0

    () = (){0,1,,2}

    =

    ()2=0

    The learning algorithm is thus implemented as follows:

    1. Initialize random model parameters for , A and B

    2. Compute the intermediate parameters (),(), (, ),() 3. Re-estimate the model parameters , A and B by using the intermediate

    parameters

    4. Check improvement of (|). If does not meet requirements, repeat Step 2 again.

    3.2.5 Problem 4 of HMM Generation

    In the projects self-defined Problem 4 of HMM, a simulation of the HMM

    generates a non-deterministic future states and emissions of the system. It can be

    used to extrapolate and simulate the future. Or it can be used to interpolate

  • Page | 16

    and fill up gaps in the knowledge of past states of the system, for instance when

    data packets are lost or unrecoverable during network propagation.

    For extrapolation, a Gaussian distribution is used to produce values from 0.0 to

    1.0. Depending on the output, the system is advanced to the corresponding state

    based on the transition matrix. Another Gaussian roll is used to determine the

    emission from that state. This process continues until the desired forecasted length

    is reached. A random approach is used for extrapolation to ensure that the process

    is non-deterministic.

    For interpolation, the start and the end hidden state are known. An exhaustive

    permutation of all possible states in-between is conducted. The state sequence

    with the highest likelihood of appearing is thus selected.

    3.3 Continuous HMM

    The previous HMM model described in earlier sections of Chapter 3 was that of a

    Discrete HMM (DHMM) model. It is termed discrete because the emission

    symbols are allowed to take on only specific values, for instance hungry and

    bloated. A continuous emission would however take on intermediate values like

    0.5 feel of hungry, or 0.751 feel of bloated-ness.

    This can be achieved by representing the emissions of each state not as a

    collection of scalar values, but as a collection of Gaussian distributions, also

    known as a Gaussian Mixture Model (GMM). Each GMM contains a vector of

    weights, which dictates the weightage of each Gaussian distribution within.

    Because Gaussian distribution has a continuous-valued distribution function, it

    can represent a range of observations without needing to discretize it. Thus, a

  • Page | 17

    HMM that uses a GMM as its emission symbol is called a Continuous HMM

    (CHMM).

    To determine the probability of emission of an observation, the observation is fed

    into the GMM. The observation is fed into each Gaussian distribution in-turn, and

    the resultant probability gathered using a weighted sum. That sum is the

    probability of emission. This is illustrated in Figure 3 below:

    Figure 3 Gaussian Mixture Model within a HMM

    In Figure 3, assuming an observation x was at the state and GMM illustrated. The

    probability of emission will be calculated as so:

    (| ) = 0.2 (, 1, 1) + 0.5 (, 2, 2) + 0.3 (, 3, 3) For example, an observation that is highly similar to Gaussian 2 will return a near

    unitary value, but a GMM also dictates the probability of that values occurrence,

    so it is normalized to a factor of 0.5.

  • Page | 18

    3.3.1 Multivariate CHMM

    One of the additional benefits of using CHMM is its ability to take in observation

    vectors, observations with more than 1 dimension. This is because a Multivariate

    Normal distribution can be utilized instead, as indicated in Jacksons HMM

    tutorial [7]:

    (,) = 1(2)|ik| exp ( )ik1( )2

    Conveniently, the multivariate case reduces to a single-variate distribution when

    the number of dimensions is 1.

    However, one of the great difficulties of using a multivariate normal distribution

    is due to the presence of an inverse covariance matrix ik1 within the formula. A

    matrix is non-invertible or singular when its rows are correlated. This can arise

    especially when the training data sets are insufficient or too highly correlated.

    A singular matrix is not invertible, so the pseudo-inverse matrix is used instead.

    One well known pseudo-inverse is the Moore Penrose Pseudo-Inverse obtained

    using Singular Value Decomposition (SVD) for solving linear equations. Here

    matrix A is decomposed into the following:

    = = [1 3] 0 00 0 [1 3] Where A is the M*N matrix to decompose.

    U is an M*M matrix. Mathematically U contains

    columns of eigenvectors of AAT.

    D is an M*N diagonal matrix. Mathematically, each diagonal is the

  • Page | 19

    singular value of A. 2 is the eigenvalue of AAT or ATA.

    V is an N*N matrix. Mathematically V contains

    columns of eigenvectors of ATA.

    If the diagonals of matrix D are not all non-zero, the matrix A is singular. The

    pseudo-inverse is thus defined as:

    + = 01 00 0 Conveniently, + = 1 if A is a non-singular matrix. The pseudo-determinant also has to be defined for calculating the determinant of

    the pseudo-inverse matrix. It is the product of all non-zero diagonal values in the

    diagonal matrix D of the SVD:

    ||+ = =1

    Recall that the original multivariate Gaussian distribution was defined as:

    (,) = 1(2)|ik| exp ( )ik1( )2

    Now, the modified distribution for singular covariance matrix is:

    (,+ ) = 1|2ik|+ exp ( )ik+ ( )2

    Again, the singular multivariate normal distribution conveniently gives the same

    result as its non-singular counterpart when the covariance matrix is non-singular.

  • Page | 20

    The underlying take-away lesson is the theoretical reason why the project only

    starts learning new HMM models when a large number of observations are

    collected for training, and will fail if there are insufficient observations. It also

    highlights the fact that the HMM model may occasionally fail simply because the

    training observations were, by chance, highly correlated.

    This flaw can be mitigated through careful selection of training observation sets,

    as this project has done.

    4. HARDWARE AND SOFTWARE IMPLEMENTATION

    In this Chapter the project describes its hardware selection choices, and also

    details its software implementation.

    4.1 Embedded Board

    The projects objective is to get data from tiny, distributed sensors. One of the

    possibilities for the future of the project is to explore building a networked

    collection of processing nodes that does layer-by-layer event processing, and

    relays the decision events up a hierarchical computing framework. Under such a

    vision, the processing nodes are situated locally in close proximity to the sensor

    nodes, and do real-time event processing at the source. As a result, there are

    certain requirements for the hardware of this processing node.

    Firstly, it should be an embedded platform this reduces the hardware and

    deployment costs, allowing a hypothetical project budget to buy in quantity and

    improve the processing-node-to-sensor-node ratio; fewer sensor nodes per

    processing node. It would be impractical to purchase a desktop PC in comparison.

  • Page | 21

    Secondly, the embedded board should have a powerful CPU as the HMM

    algorithm can be quite computationally intense.

    Thirdly, the embedded board should be well-supported and backed by an active

    and mature community, which would potentially allow it to interoperate with

    more types of sensors.

    4.1.1 Comparison of Features

    Below are 3 different embedded boards that were under consideration in the

    project. The core metric here is CPU capability as well as sensor interoperability.

    PandaBoard ES BeagleBone Black Raspberry Pi CPU Dual-core ARM Cortex A9 up to 1.2GHz each ARM Cortex A8, 1.0 GHz ARM, 700MHz GPU SGX540 with OpenGL ES 1.1, 2.0, OpenVG 1.1, EGL 1.3 SGX530 with 3D acceleration Broadcom Videocore IV, OpenGL ES 2.0 Operating System Ubuntu, Android Ubuntu, Android Custom Debian/Fedora, Android A/V I/O HDMI out 3.5 Audio out Stereo audio input HDMI HDMI out 3.5 Audio out Stereo audio input Memory 1Gb RAM DDR2 SD/MMC 512Mb RAM 2Gb Flash SD/MMC 512Mb RAM SD/MMC Connectivity WiFi Bluetooth - Ports Ethernet 3 USB 2.0 Ethernet 1 USB, 1 Mini-USB Ethernet 2 USB, 1 Mini-USB power Power 440-710mA1 210-460 mA @ 5V 700 mA @ 5V Cost USD$182 USD$45 USD$35

    Table 1 Comparison of features of 3 Embedded Boards

    1 http://www.omappedia.org/wiki/Panda_Test_Data

  • Page | 22

    In terms of CPU capability, the Raspberry Pi loses out because it does not have

    enough computation power, while the PandaBoard ES is clearly superior in that

    respect, boasting a dual-core processor with a 1 Gb RAM.

    In terms of interoperability, the Operating System (OS) is examined. A large

    derivative OS would be convenient for 3rd party libraries as well as pre-built

    compatibility libraries offered by the sensor manufacturers. Again, Raspberry Pi

    loses out, as it only supports Android and a custom derivation of Debian and

    Fedora, unlike the other 2 boards which has Ubuntu, a very popular Linux

    distribution.

    Another metric relevant to interoperability is the number of ports available on the

    boards. PandaBoard ES supports 3 USB ports which is the highest amongst the

    boards listed.

    As the PandaBoard ES consistently ranks ahead in all of the important metrics, it

    is the projects embedded board of choice.

  • Page | 23

    Figure 4 A PandaBoard embedded device showing cable and connections

    In Figure 4 above, a PandaBoard is shown connected with requisite hardware. At

    the bottom, an SD Card which contains the Ubuntu OS. On the right, a RS232

    cable that allows a developmental PC to connect to PandaBoard via a virtual

    terminal in headless mode.

    At the top, from the right, is the 5V power cable, followed by an Ethernet cable

    which shares an Internet connection from the developmental PC via LAN. The

    USB ports are also located beneath it. And finally, a HDMI cable that provides

    graphical output to an attached monitor.

    4.2 Software Development Platform

    The PandaBoard ES offers Ubuntu and Android as operating systems, but the

    project decided on Ubuntu as it is more full-featured and has a software repository

    manager which makes it easy to pull and install software packages. The Ubuntu

  • Page | 24

    version used was 12.04 Precise Pangolin LTS, the most recent pre-built version

    available for PandaBoard.

    Developmental work was done not on the PandaBoard ES, but on an Intel laptop

    running Ubuntu 13.10 Saucy Salamander. Because both the Pandaboard ES and

    the development platform are Linux systems, so the source code is portable across

    systems. It is much faster to compile and test on a more powerful processor.

    Similarly, although it is possible to run a POSIX environment on a Windows

    laptop via Cygwin compatibility layer, it is much more efficient to work natively

    on Linux on the development laptop, and compilation times are faster by an order

    of magnitude.

    The language of choice was C++, as the application needs to be able to run fast on

    the PandaBoard ES. Several 3rd party libraries were used. Firstly, Armadillo, a

    C++ matrix and linear algebra library was used. Secondly, Boost, a C++ utility

    library, was included. Boost is a requisite for Armadillo, but it also provided many

    convenient container classes to supplement the traditional vector and dictionary

    classes in the C++ Standard Template Library. Finally, mlpack++, a C++ machine

    learning library, was also imported, providing basic HMM algorithms, a

    clustering algorithm and an implementation of a GMM.

    4.3 Software Architecture

    The software architecture has been separated into several different namespaces

    and classes, as illustrated in Figure 5 below:

  • Page | 25

    Figure 5 UML Diagram of Software Architecture

    In Figure 5, the calc namespace holds two child namespaces, func and model. The

    calc::model::hmm namespace contains classes that are used to define and model a

    HMM as well as the distribution function, whereas the calc::model::map

    namespace has several mappers that are used to discretize observations from

    continuous-valued to specific discrete values. Meanwhile, in the calc::func::hmm

    namespace, there is a helper class that helps to serialize and de-serialize the HMM

    classes from running memory into local file storage.

  • Page | 26

    The util namespace contains several helper methods for basic operations not

    present in the C++ API.

    Finally, the test namespace contains the test and experiment routines used to

    evaluate the performance of and demonstrate the capability of the system.

    4.4 Software Implementation

    4.4.1 Serialization of HMM

    Serialization is the process of converting a data structure and its object state into a

    format that can be easily transmitted and reconstructed. In the project,

    serialization is used to save the HMM models locally on the SD card. By saving

    the HMM models, the models do not need to be retrained on program startup

    every time. Moreover, it becomes possible to train a comprehensive model offline

    on a powerful computer, then deploy the model onto the target PandaBoard

    instantly, reducing deployment time and improving detection accuracy.

    A HMM model has several attributes: the state transition matrix, and the emission

    probability matrix. For a CHMM model, instead of the emission probability

    matrix, there is a state-specific GMM instead. This leads to one GMM per state.

    Within each GMM, there are further attributes: the weight, the mean as well as the

    covariance matrix of each Gaussian distribution. Finally, to enable successful

    reconstruction, several metadata properties of the HMM are also saved.

    On top of saving the HMM model, it is also possible to save a training observation

    set. Oftentimes the training set is rather large, and it is tedious to keep retrieving

    the entries manually from the CSV (an ASCII comma-separated text file) database.

    By serializing the training set, it is possible to make it as portable as the HMM

  • Page | 27

    model and allow operators to reinitialize/retrain the HMM model with a different

    set of parameters.

    One beneficial side-effect is that serialization produces a human-readable ASCII

    format, which also makes it easy for a human operator to analyze the HMM

    offline. A sample of each of the various serialized data is presented here to

    demonstrate the concept better.

    In Figure 6, the name of the system is airconTester. As this is the 1st HMM

    model in the system, it is given an index of 0, hence airconTester0.

    Figure 6 Directory of a serialized HMM model and training set

    The training sets are serialized with a .train extension. Since there are 3

    files: .train0, .train1, and .train2, it means that 3 windows of training observation

    data are used to construct this HMM model. .trainMeta contains the metadata,

    which is shown in Figure 7 below.

  • Page | 28

    Figure 7 Serialized Contents of HMM Metadata

    Because the model is to be trained to be used for Decoding, Estimation-based

    HMM Learning is used. That means the observation values need to be tagged with

    occupancy values the ground truth. Here they are stored

    in .trainStates0, .trainStates1, .trainStates2, one for each window of training

    observation data.

    As for the HMM model itself, the state transition matrix is stored in .hmmTrans

    and is a 3x3 (3 state) matrix, as seen in Figure 8 below.

    Figure 8 Serialized Contents of HMM State Transition Matrix

    There are 3 states and 8 emissions per state. For instance, .hmmEmit0Covar1

    represents the covariance matrix of the 2nd Gaussian of the 1st state;

    .hmmEmit2Mean5 represents the mean vector of the 6th Gaussian of the 3rd

    state. A preview of a serialized covariance matrix and mean vector is shown in

    Figure 9 below.

  • Page | 29

    Figure 9 Serialized Contents of a GMM emission mean (top) and covariance (bottom)

    4.4.2 Scheduling Processing and Server Mutex Functions

    The project application, as a server, has to be able to accept a continuous input

    stream from more than one sensor input. At the same time, the application has to

    perform time-consuming HMM calculations. These 2 processes should be

    designed not to interrupt each other.

    The way the project does it, is to split the Server and the Processing aspects of the

    application, essentially layering the network and the application side logic, as seen

    in Figure 10 below. The server will concentrate on receiving the UDP packets

    being transmitted from the sensors. The server is a Python UDP server that

    continually flushes any received packets into a mutex-ed shared file. The HMM

    application will periodically check into the mutex-ed shared file to see if any

    additional data packets are received, and retrieve them if so.

  • Page | 30

    Figure 10 Layering Server and Processing functions into Network and Application side logic

    The mutex lock is achieved using the POSIX API flock(), which guarantees any

    POSIX-compliant access to be mutually exclusive. Because flock() is also fed

    with the LOCK_NB flag, the mutex request operation is non-blocking. This

    workflow is further illustrated upon in Figure 11 below. Upon failure of the non-

    blocking mutex request, the server will temporarily save the packet data in a file

    buffer. But if the mutex request is successful, the data in the packet, as well as any

    previous packet data in the file buffer, will be transferred together to the shared

    file. This ensures that the server continues receiving packets even when the mutex

    fails.

    On the HMM application side, the application will continuously poll the shared

    file for data. If it is successfully in the non-blocking mutex request, it obtains new

  • Page | 31

    data and is able to calculate. The mutex is freed at the first possible instance.

    However, if the mutex request failed, the HMM will simply continue polling.

    Figure 11 Process Flowchart for Mutex Access

    The project recognises the potential downside to this mutex process. If there are a

    lot of packet arrivals on the server, the servers mutex request may swamp out the

    HMM application; the mutex lock is dominantly held by the server and the HMM

    does not have time to access.

  • Page | 32

    However this issue is mitigated by the fact that packet arrivals are not as frequent.

    Even with multiple sensors sending incoming packets, it will not swamp the

    mutex lock. This is because each sensor, identified by a MAC Address metadata

    attached in the UDP packet, is assigned a unique mutex shared file, as seen in

    Figure 10. Hence no single mutex lock can be swamped the server is a single-

    threaded application that can only hold a single lock at any one time; the HMM

    application will have plenty other data from other sensors to process from.

  • Page | 33

    5. EXPERIMENTS & RESULTS

    5.1 Dataset

    The dataset (archive dataset) used in the experiments is environmental data

    taken from a student residential suite. It consists of readings taken continuously

    for 24 hours at minimum, spread across a total of 3 months.

    At the end of the collection, 34 days worth of environmental data is collected. It

    contains measurements of temperature, humidity, luminosity and noise, sampled

    once every 10 minutes. A small 4-day window of the dimensions is illustrated in

    Figure 12 below.

    Figure 12 Sensor measurements across a 4-day period

    The various days are also labelled according to whether they were typical or

    atypical days, which would impact occupancy. For instance, typical days were

    days where it was not a public or school holiday, nor sandwiched between

    atypical days. For more information, please refer to Appendix B.

  • Page | 34

    5.1.1 Ground Truth Value

    In the sensor measurements, there was no measurement of occupancy. Hence, an

    alternative method of verification was sought.

    Using the power consumption measurements which also came as the 5th modality

    in the data set (but were not included as the HMM inputs), it is possible to infer

    the Ground Truth Value of occupancy. Any increase in the power consumption of

    the room is regarded as an indicatory of occupancy. However, that only gives a

    binary value of occupancy with multiple jitters due to the slow and discrete

    movement of the power meter. To smoothen out the jitters, a digital Gaussian

    moving average filter of window size 5 was applied forward and backwards to the

    power readings, and the result is a power meter consumption reading of 3 quanta,

    as illustrated in Figure 13 below.

    Figure 13 Occupancy inferred from power consumption

  • Page | 35

    5.2 Experimental Setup

    A laptop is setup to simulate a wireless sensor. A Python application is run on the

    laptop, streaming values from the archive dataset over a UDP connection. All 4

    modalities of the dataset are streamed together. Attached in every packet is a

    mock MAC address to identify the sensor.

    A PandaBoard ES is set up on the other end, running a Python server that accepts

    the incoming sensor observations. The HMM application is also running on the

    PandaBoard ES to process in real-time the incoming packet data.

    The HMM model that the application is executing, has been set-up and trained

    before the start of the experiment. For each experiment, the report will specify the

    section of the dataset that was used for training as well as for testing.

    The HMM application had its source code developed on an Ubuntu system

    beforehand, before being transported over to the PandaBoard ES and compiled.

    This is because the machine instruction set is different; the development machine

    runs on Intel, while the PandaBoard ES runs on ARM.

    5.3 Capability Test

    In order to properly understand the capabilities of the HMM application, an

    introductory test was applied. The HMM application was challenged to identify

    the day of the week when a dataset is fed, for example, was the day a Monday or a

    Sunday. This test is indirectly related to the issue of occupancy detection, as the

    day has an influence on the occupancy of a room.

  • Page | 36

    5.3.1 Test on Evaluating Day of the Week

    For this test, a 4-state, 4-emission DHMM model is used. Out of the 4 modalities,

    only luminosity is considered. The model is fed 2 typical weekdays worth of

    observations, hence the model represents a typical weekday HMM. The model is

    then matched against 24-hour observation data from 12 different days.

    In Figure 14 below, the results of the test are presented. To interpret the graph,

    know that the more negative the log-likelihood, the less probable the observation

    belongs to the weekday HMM model. The results illustrate that the model is

    unable to differentiate between weekday and weekend observations. This may be

    due to the fact that the environmental conditions between a weekday and a

    weekend are not very different in terms of luminosity.

    However, the model does report a marked difference between typical and atypical

    days. Atypical days were defined as days where special events such as exam

    period, school holidays, or public holidays occurred. Those are days upon which

    the occupancy level will be affected. This finding hence suggests that luminosity

    is to a certain extent dependent on occupancy, but on its own the results are not

    obvious, and that weekdays are similar to weekends generally.

  • Page | 37

    Figure 14 Log-likelihood of observation belonging to a Weekday Model

    5.4 Test on Occupancy Decoding

    In this experiment, the accuracy of the HMM algorithm is tested to see if it can

    correctly identify the occupancy of the system. The actual occupancy of the

    system is known to the author, not to the system, but it would be good to compare

    this actual occupancy (ground truth) against the decoded occupancy.

    A CHMM model of 3-states, 9-emissions is used. The 3 states correspond to fully-

    occupied, mildly-occupied, and unoccupied. The CHMM model needs to be a

    representative sample of the system; hence it is trained with 10 days worth of

    data taken at 3 day intervals across all 34 days of archived sensor data. All 4

    modalities are used: temperature, humidity, luminosity and noise. The result of

    testing the CHMM model against every single day of sensor observation is shown

    in Figure 15 below:

  • Page | 38

    Figure 15 Decoded Error % for Occupancy

    The light grey bars signify observations that have been used to train the CHMM

    model. The dark grey bars signify novel observations that the CHMM has no

    knowledge of.

    The performance of the system is generally good as even in the worst case

    anomalous observations, the error rate never exceeds 65%. The mean error rate

    inclusive of the training set is 21.6%, exclusive is 23.2%. The median error rate,

    which is less sensitive to outlier values, is 17.1% inclusive, 17.8% exclusive. The

    sample variance of the error % is 0.0268 inclusive, 0.0277 exclusive.

    5.5 Test on Occupancy Interpolation

    For this test, the HMM problem of Decoding is employed to decipher the

    underlying hidden states of the system, and interpolate between them. The actual

    occupancy of the system is known to the author, not to the system, but it would be

  • Page | 39

    good to compare this actual occupancy (ground truth) against the decoded

    occupancy.

    The assumption is that the system has already successfully decoded the start and

    end point of the interpolation sequence, but is missing a particular segment in

    between, which could be due to packet data discarded due to errors, dropped

    packet data, or the wireless sensors suffering a temporary hardware failure.

    A CHMM model of 3-states, 7-emissions is used. The 3 states correspond to fully-

    occupied, mildly-occupied, and unoccupied. Again, the CHMM model needs to be

    a representative sample of the system; hence it is trained with 10 days worth of

    data taken at 3 day intervals across all 34 days of archived sensor data. All 4

    modalities are used: temperature, humidity, luminosity and noise.

    The variables in the experiment are: the gap size to be interpolated, as well as the

    time period across which interpolation is done. Both results are illustrated in

    Figure 16 and Table 2 below.

    The experiment shows that the interpolation process is quite accurate. For instance

    in short gap sizes of 30 min to 1 hour, which is closer to the kind of gap sizes one

    would expect in a real-life scenario, the average and median error rate never

    exceeds 16.7%.

  • Page | 40

    Figure 16 Interpolation results for different Gap Sizes and Time Periods

    Table 2 Interpolation statistics for different Gap Sizes and Time Periods

    5.6 Test on Occupancy Extrapolation

    The test on occupancy extrapolation will show how effective the HMM model is

    at predicting the future occupancy of the residential suite. In doing so, it is

    possible, for instance, a building administrator to forecast the power consumption

    of the building and reduce power consumption.

  • Page | 41

    Again, a CHMM model of 3-states, 7-emissions is used. The 3 states correspond

    to fully-occupied, mildly-occupied, and unoccupied, trained using 10 days worth

    of training data selected uniformly across the archived data set of 34 days. All 4

    modalities are used: temperature, humidity, luminosity and noise.

    Similar to the interpolation experiment, the variables in the experiment are: the

    gap size to be extrapolated, as well as the time period across which extrapolation

    is to be done. Results are illustrated in Figure 17 and Table 3 below:

    Figure 17 Extrapolation results for different Gap Sizes and Time Periods

    Table 3 Extrapolation statistics for different Gap Sizes and Time Periods

  • Page | 42

    The results demonstrate that extrapolation generally works well with less than 30%

    error rate for varying window sizes.

    6. LIMITATIONS AND RECOMMENDATIONS

    6.1 Highly Correlated Data

    As discussed in Section 3.3.1, one of the limitations is that the observation data

    may be highly correlated and result in a non-invertible correlation matrix. That

    would force the usage of the multivariate CHMM to become unusable, as it is no

    longer possible to computer the probability distribution of the multivariate normal

    distribution.

    Despite the usage of the pseudo-inverse derivation, occasionally the HMM

    algorithm still does not compute as the rank of the matrix is simply too low. The

    project sidesteps the limitation by re-learning the HMM model using the same

    training data due to the non-deterministic property of learning, it is possible to

    derive a valid correlation matrix with one to two retries.

    Still, there is no guarantee that a successful CHMM model can be learnt on the

    first try. Even more drastically, there is also no guarantee that with enough retries,

    a successful CHMM can be relearnt. As a result, it is inevitable that human

    operator intervention is necessary to make sure that a valid CHMM is prepared.

    6.2 Dynamically Improving HMM

    A possible improvement to the project would be to improve the HMM model

    based on input readings. This improvement can be of two methods:

  • Page | 43

    1. Supplement the existing learning data set by including the latest

    observation data

    2. Replace old learning data with the latest observation data

    The choice of which method to use will depend on if the system that is being

    observed is a highly dynamic system. If changes in the system are gradual, then it

    would make sense to expand the HMM model with more training data. If the

    system is an evolutionary one, then old training data would quickly become

    irrelevant, and one should use the replacement method instead.

    6.3 Decision Fusion

    In Event Processing literature, what the project currently does is Data Fusion. In

    Data Fusion, multiple modalities of data are fused together to form a single n-

    dimension vector, as illustrated in Figure 18. So instead of multiple streams of

    scalar data, there is a single stream of vector data. The benefit of Data Fusion is

    that it is very attractive when there are very few sensor streams involved, as it

    gives the highest accuracy and performance [8].

    Figure 18 Data Fusion of streams into a vector before decision making

  • Page | 44

    There is however, Decision Fusion as well, which could be a possible track of

    investigation for future projects. Decision Fusion is when each stream of scalar

    data has its own event processing node. This results in multiple streams of

    decisions, each originating from a single stream of scalar. The multiple streams of

    decisions are then fused into a single stream of vector decisions and further

    computed as seen in Figure 19. At this point it resembles Data Fusion.

    Decision Fusion is like a multi-tiered decision tree, where decisions are made

    locally at the source, and then transmitted upstream where it is collated with other

    decisions. It scales better in terms of computational and communications

    complexity when more streams are added, as the transformation from data to

    decision reduces the data complexity, akin to a data compression.

    Figure 19 Decision Fusion of streams decision upstream into a final decision

  • Page | 45

    7. CONCLUSION

    In the thesis, the mathematical derivation and problem solving abilities of the

    Hidden Markov Model have been explained, including the 3 conventional

    scenarios of HMM: Evaluation, Decoding and Learning.

    A well-trained HMM represents a physical system. Through HMM, it becomes

    possible to categorically label an observation under a particular model. One can

    also decode the underlying states that the system had gone through, based on the

    observations given. Using a representative sample of observations, a HMM model

    can also be constructed through likelihood maximization. And with a complete

    HMM model, it becomes possible to simulate observations and forecast the future

    state and observation of the system.

    The thesis also demonstrates that when applied to the domain of occupancy

    detection, the HMM algorithm is able discover the underlying occupancy state of

    the environment using 4 modalities: temperature, humidity, luminosity and noise.

    It is able to give the correct underlying occupancy with an error rate of 23.2% on

    average. For interpolating between gaps in known occupancy states, it can

    minimize its error rate to approximately 25.2% for long gaps and 12.5% for short

    gaps. For forecasting future occupancy, the HMM model is accurate to within a

    28.9% error rate for 12 hour extrapolations.

  • Page | A

    APPENDIX A: BIBLIOGRAPHY

    [1] Y. Agarwal, B. Balaji, R. Gupta, J. Lyles, M. Wei, and T. Weng, Occupancy-driven energy management for smart building automation, in Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building, 2010, pp. 16.

    [2] C. Liao and P. Barooah, An integrated approach to occupancy modeling and estimation in commercial buildings, in American Control Conference (ACC), 2010, 2010, pp. 31303135.

    [3] J. Lu, T. Sookoor, V. Srinivasan, G. Gao, B. Holben, J. Stankovic, E. Field, and K. Whitehouse, The smart thermostat: using occupancy sensors to save energy in homes, in Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems, 2010, pp. 211224.

    [4] L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE, vol. 77, no. 2, pp. 257286, 1989.

    [5] M. Stamp, A revealing introduction to hidden Markov models, Dep. Comput. Sci. San Jose State Univ., 2004.

    [6] P. Blunsom, Hidden markov models, Lect. Notes August, 2004. [7] Jackson, HMM tutorial 4. [Online]. Available:

    http://www.ee.surrey.ac.uk/Personal/P.Jackson/tutorial/. [8] R. R. Brooks, P. Ramanathan, and A. M. Sayeed, Distributed target

    classification and tracking in sensor networks, Proc. IEEE, vol. 91, no. 8, pp. 11631171, Aug. 2003.

  • Page | B

    APPENDIX B: TYPICAL AND ATYPICAL DAYS

  • Page | C

    APPENDIX C: SOURCE CODE SNIPPET

    Not all source code has been provided here, as the total source code amounts to

    more than 4000 lines of C++ code and would be quite impossible to place here.

    // ======================================================== // HMMx.cpp // * implements HMM functions // ======================================================== #include "HMMx.hpp" #include #include #include "../../../util/timer.hpp" using std::cout; inline bool generateNewInterpolateState(arma::Col &states, int numStates); template void HMMx::interpolate(int guessLen, unsigned int prevState, unsigned int forwState, arma::mat &guessObservations, arma::Col &guessStates, bool isCHMM) { TM_START; /** * Brute-force checking * We will only permute up to 8 states */ if (guessLen > 8) { // Generate the excess int toFillIn = guessLen - 8; arma::mat frontDataSeq; // remove dim arma::Col frontStateSeq(toFillIn); if (isCHMM) { HMMx *chmm = (HMMx*)this; chmm->Generate(toFillIn, frontDataSeq, frontStateSeq, prevState); } else this->Generate(toFillIn, frontDataSeq, frontStateSeq, prevState); // Interpolate the rest arma::mat backDataSeq(guessObservations.n_rows, 8);

  • Page | D

    arma::Col backStateSeq(8); this->interpolate(8, frontStateSeq(toFillIn-1), forwState, backDataSeq, backStateSeq, isCHMM); guessObservations.cols(0, toFillIn-1) = frontDataSeq; guessObservations.cols(toFillIn, guessLen-1) = backDataSeq; guessStates.rows(0, toFillIn-1) = frontStateSeq; guessStates.rows(toFillIn, guessLen-1) = backStateSeq; } else { double bestLikelihood = 0; double currLikelihood = 1; arma::Col bestTrial(guessLen, arma::fill::zeros); arma::Col currTrial(guessLen, arma::fill::zeros); int numStates = this->Transition().n_cols; while (true) { // Iterate through another state bool validity = generateNewInterpolateState(currTrial, numStates); if (!validity) // no more new states available break; // Evaluate probability currLikelihood = 1; for (int i=0; iTransition()(currTrial(i), prevState); else currLikelihood *= this->Transition()(currTrial(i), currTrial(i-1)); } currLikelihood *= this->Transition()(forwState, currTrial(guessLen-1)); // Evaluate probability (is it better?) if (currLikelihood > bestLikelihood) { bestLikelihood = currLikelihood; bestTrial = currTrial; } } guessStates = bestTrial; // generate emissions

  • Page | E

    for (int i=0; iEmission().at(guessStates(i)); val = gmm.Random(); } else val = this->Emission().at(guessStates(i)).Random(); guessObservations.col(i) = val; } } TM_STOP; PRINTTIME; } inline bool generateNewInterpolateState(arma::Col &states, int numStates) { // backtracking int i = states.n_rows-1; while (true) { if ((int)states(i) != numStates-1) // if we still haven't iterated all for curr state index { states(i) ++; for (unsigned int j=i+1; j

  • Page | F

    // ======================================================== //HMMx.hpp // * header file for HMMx.cpp // ======================================================== #ifndef HMMX_HPP_ #define HMMX_HPP_ #include #include #include #include #include #include "distribution/DiscreteDistri.hpp" using namespace mlpack::hmm; using namespace mlpack::gmm; /** * Changes are: * - Transition states by default are no longer homogeneous. */ template class HMMx : public HMM { bool isCHMM; public: HMMx(const size_t states, const Distribution emissions, bool isCHMM, const double tolerance = 1e-5): HMM(states, emissions, tolerance) { this->isCHMM = isCHMM; double variance = this->Transition().at(0) * 0.1; srand(time(NULL)); for (unsigned int i=0; iTransition().size(); ++i) { if (rand()%2 == 0) this->Transition().at(i) += variance * rand() / RAND_MAX; else this->Transition().at(i) -= variance * rand() / RAND_MAX;; } // normalise for (unsigned int i=0; iTransition().n_cols; ++i) {

  • Page | G

    double sum = accu(this->Transition().col(i)); this->Transition().col(i) /= sum; } } HMMx(const arma::mat& transition, const std::vector& emission, bool isCHMM, const double tolerance = 1e-5): HMM (transition, emission, tolerance) { this->isCHMM = isCHMM; } /** * @return 1 if GMM, 0 if not. * * Abandoned. You need to uncast it from a pointer or you access invalid memory anyway */ /*int distributionType() const { if (isCHMM) return 1; else return 0; }*/ /** * Assuming you have a break in observation results, interpolate will reconstruct the missing bits for you. */ void interpolate(int guessLen, unsigned int prevState, unsigned int forwState, arma::mat &guessObservations, arma::Col &guessStates, bool isCHMM); void interpolate(int guessLen, const arma::mat &prevObservations, const arma::mat &forwObservations, arma::mat &guessObservations, arma::Col &guessStates, bool isCHMM) { arma::Col prevStates; this->Predict(prevObservations, prevStates); arma::Col forwStates; this->Predict(forwObservations, forwStates); this->interpolate(guessLen, prevStates(prevStates.n_rows-1), forwStates(0), guessObservations, guessStates, isCHMM); } }; #endif /* HMMX_HPP_ */

  • Page | H

  • Page | I

    // ======================================================== // HMMFunc.cpp // * saves the HMM model to disk and loads it // ======================================================== #include "HMMFunc.h" #include #include #include #include #include "../../../util/fileExists.hpp" #include "../../model/hmm/HMMx.hpp" #include "../../model/hmm/metadata.h" #include "../../model/hmm/distribution/DiscreteDistri.hpp" #include "../../model/map/mapper_kmeans.h" #include "../../model/map/mapperMv_kmeans.h" using std::string; using std::vector; using std::ifstream; using std::ostringstream; vector getAvailable(char* searchStr, const HMMFunc *hmmFunc); // --- // Properties // --- /** Get a list of all available models of HMM that we can load. */ vector HMMFunc::getAvailableModels() const { return getAvailable((char *) "hmm", this); } /** Get a list of all available training sets that we can use. */ vector HMMFunc::getAvailableTrains() const { return getAvailable((char *) "train0", this); } /** Get a list of all available stuff that we can load. */ vector getAvailable(char* searchStr, const HMMFunc *hmmFunc) { vector results; int lastResult = 0; for (int i=0; i

  • Page | J

    if (file_exist(fileURI.str().c_str())) { results.push_back(i); lastResult = i; } // DEBUG /*else std::cout

  • Page | K

    // Will always be false unless there is one True // --- if (states != NULL) { sprintf(fileURI, "%s%d.trainStates%d", this->getFileSaveName().c_str(), trainIndex, i); result |= !(states->at(i).save(fileURI, arma::arma_ascii)); } } // Searches for any file after this index and deletes it // Important because this is our indicator for vector termination sprintf(fileURI, "%s%d.train%d", this->getFileSaveName().c_str(), trainIndex, (int)data.size()); if (file_exist(fileURI)) remove(fileURI); // If there is no state info, we make sure there is no state file saved as well if (states == NULL) { sprintf(fileURI, "%s%d.trainStates0", this->getFileSaveName().c_str(), trainIndex); if (file_exist(fileURI)) remove(fileURI); } std::cout

  • Page | L

    sprintf(fileURI, "%s%d.trainStates0", this->getFileSaveName().c_str(), trainIndex); if (file_exist(fileURI)) states = new vector(); for (int i=0; ; ++i) { sprintf(fileURI, "%s%d.train%d", this->getFileSaveName().c_str(), trainIndex, i); // Check if there is any matrices left to read if (!file_exist(fileURI)) break; // Load it! arma::mat myMatrix; myMatrix.load(fileURI, arma::arma_ascii); data.push_back(myMatrix); // --- if (states != NULL) { sprintf(fileURI, "%s%d.trainStates%d", this->getFileSaveName().c_str(), trainIndex, i); arma::Col colvec; colvec.load(fileURI, arma::arma_ascii); states->push_back(colvec); } } // Metadata loading HMM_meta metadata; { sprintf(fileURI, "%s%d.trainMeta", this->getFileSaveName().c_str(), trainIndex); FILE* fp = fopen(fileURI, "r"); // obtain filesize fseek(fp, 0, SEEK_END); long lsize = ftell(fp); rewind(fp); char *buffer = new char[lsize]; fread(buffer, sizeof(char), lsize, fp); string text = string(buffer); metadata = HMM_meta::fromString(text); delete [] buffer; } return metadata;

  • Page | M

    } // --- // --- /** Load HMM model from local file */ HMM_meta HMMFunc::load(HMMx* &hmm, int hmmIndex) const { // Check if index exists vector indices = this->getAvailableModels(); if (std::find(indices.begin(), indices.end(), hmmIndex) == indices.end()) std::cout

  • Page | N

    mapper = new KMeansMapper(keysVal); } else // multi-variate { // load keysVal vector keysVal; for (unsigned int i=0; i

  • Page | O

    sprintf(fileURI, "%s.hmmEmit%dCovar%d", basicFileURI.c_str(), i, j); covarSingle.load(fileURI); mean.push_back(meanSingle); covar.push_back(covarSingle); } GMM gmm(mean, covar, weight); emit.push_back(gmm); } hmm = (HMMx*) new HMMx(transition, emit, metadata.tolerance); printf("[DEBUG] dimension is %d", hmm->Dimensionality()); } return metadata; } /** Save HMM model to local file */ int HMMFunc::save(HMM_meta metadata, const HMMx* hmm, int hmmIndex) const { // Find an index for it if (hmmIndex == -1) { vector indices = this->getAvailableModels(); while (1) { hmmIndex ++; if (std::find(indices.begin(), indices.end(), hmmIndex) == indices.end()) break; } } char fileURI[999]; string basicFileURI; { ostringstream oss; oss getFileSaveName()

  • Page | P

    // Save transition hmm->Transition().save((basicFileURI+".hmmTrans").c_str(), arma::arma_ascii); // Save emission if (metadata.isCHMM == false) { vector emit = hmm->Emission(); for (unsigned int i=0; iTransition().n_cols; ++i) { sprintf(fileURI, "%s.hmmEmit%d", basicFileURI.c_str(), i); emit[i].Probabilities().save(fileURI, arma::arma_ascii); } // Save mapper if (typeid(emit[0].getMapper()).name() == typeid(KMeansMapper).name()) { KMeansMapper *mapper = (KMeansMapper*) &(emit[0].getMapper()); vector keysVal = mapper->getKeysVal(); // Not doing keys because it is boost::unordered_map, very troublesome // Also can be derived from keysVal later anyway. // put keysVal into a row vector arma::rowvec *keyVal_mat = new arma::rowvec(keysVal.size()); for (unsigned int j=0; jat(j) = keysVal[j]; // save sprintf(fileURI, "%s.hmmMap", basicFileURI.c_str()); keyVal_mat->save(fileURI, arma::arma_ascii); // delete delete keyVal_mat; } else { KMeansMvMapper *mapper = (KMeansMvMapper*) &(emit[0].getMapper()); vector keysVal = mapper->getKeysVal(); // put keysVal into a matrix arma::mat *keyVal_mat = new arma::mat(mapper->get_dimensions(), keysVal.size()); for (unsigned int j=0; jcol(j) = keysVal[j];

  • Page | Q

    // save sprintf(fileURI, "%s.hmmMap", basicFileURI.c_str()); keyVal_mat->save(fileURI, arma::arma_ascii); // delete delete keyVal_mat; } } else { vector *emit = (vector*) &(hmm->Emission()); for (unsigned int i=0; isize(); ++i) // for each state { sprintf(fileURI, "%s.hmmEmit%dWeight", basicFileURI.c_str(), i); emit->at(i).Weights().save(fileURI, arma::arma_ascii); vector covar = emit->at(i).Covariances(); vector mean = emit->at(i).Means(); for (unsigned int j=0; jat(i).Gaussians(); ++j) // for each state there are Gaussians { sprintf(fileURI, "%s.hmmEmit%dCovar%d", basicFileURI.c_str(), i, j); covar[j].save(fileURI, arma::arma_ascii); sprintf(fileURI, "%s.hmmEmit%dMean%d", basicFileURI.c_str(), i, j); mean[j].save(fileURI, arma::arma_ascii); } } } return hmmIndex; }

  • Page | R

    // ======================================================== // HMMx.hpp // * header file for HMMx.cpp // ======================================================== #ifndef HMMFUNC_H_ #define HMMFUNC_H_ #include #include #include #include "../../model/hmm/distribution/DiscreteDistri.hpp" #include "../../model/hmm/HMMx.hpp" #include "../../model/hmm/metadata.h" using std::string; using std::vector; class HMMFunc { string fileSaveName; public: /** * @param _fileSaveName Assumed unique file access location for saving to */ HMMFunc(string _fileSaveName):fileSaveName(_fileSaveName) {} // --- // Properties // --- string getFileSaveName() const { return fileSaveName; } /** Get a list of all available models of HMM that we can load. */ vector getAvailableModels() const; /** Get a list of all available training sets that we can use. */ vector getAvailableTrains() const; // --- // Methods (Training Set) // --- /** Save data as training set for HMM * @param trainIndex Note that this is a different index from HMM model saving * @param states If there is state info present, please put it in. Else leave as NULL.

  • Page | S

    * @return True if saving was successful. */ bool saveTrainingSet(HMM_meta metadata, const vector &data, const vector *states, int trainIndex) const; /** Load data as training set for HMM * @param trainIndex Note that this is a different index from HMM model loading * @param states If there is state info present, it will be loaded into the pointer. Else it is NULL. */ HMM_meta loadTrainingSet(vector &data, vector* &states, int trainIndex) const; // --- // Methods (HMM Models) // --- /** Save HMM model to local file */ int save(HMM_meta metadata, const HMMx* hmm, int hmmIndex=-1) const; /** Load HMM model from local file */ HMM_meta load(HMMx* &hmm, int hmmIndex) const; }; #endif /* HMMFUNC_H_ */

    ABSTRACTACKNOWLEDGEMENTSTABLE OF CONTENTSLIST OF TABLESLIST OF FIGURESLIST OF SYMBOLS AND ABBREVIATIONS1. INTRODUCTION1.1 Background & Motivation1.2 Objective of Thesis1.3 Thesis Organisation

    2. LITERATURE REVIEW2.1 Occupancy Detection2.1.1 Passive Infrared Sensors2.1.2 Simulation Modelling2.1.3 Hidden Markov Modelling

    2.2 Mathematical Tools2.2.1 Regression Algorithms2.2.2 Clustering Algorithms2.2.3 Stochastic Classifier Algorithms

    3. HIDDEN MARKOV MODELS3.1 Markov Chain3.2 Hidden Markov Chain3.2.1 Problem 1 of HMM Evaluation3.2.2 Problem 2 of HMM Decoding3.2.3 Problem 3 of HMM Learning (Estimation)3.2.4 Problem 3 of HMM Learning (Baum-Welch)3.2.5 Problem 4 of HMM Generation

    3.3 Continuous HMM3.3.1 Multivariate CHMM

    4. HARDWARE AND SOFTWARE IMPLEMENTATION4.1 Embedded Board4.1.1 Comparison of Features

    4.2 Software Development Platform4.3 Software Architecture4.4 Software Implementation4.4.1 Serialization of HMM4.4.2 Scheduling Processing and Server Mutex Functions

    5. EXPERIMENTS & RESULTS5.1 Dataset5.1.1 Ground Truth Value

    5.2 Experimental Setup5.3 Capability Test5.3.1 Test on Evaluating Day of the Week

    5.4 Test on Occupancy Decoding5.5 Test on Occupancy Interpolation5.6 Test on Occupancy Extrapolation

    6. LIMITATIONS AND RECOMMENDATIONS6.1 Highly Correlated Data6.2 Dynamically Improving HMM6.3 Decision Fusion

    7. CONCLUSIONAPPENDIX A: BIBLIOGRAPHYAPPENDIX B: TYPICAL AND ATYPICAL DAYSAPPENDIX C: SOURCE CODE SNIPPET