49
v Can hidden Markov models be used for inference about operational risk? Markus Holmgren & Hampus Pettersson Master Thesis, 30 ECTS Department of Mathematics and Mathematical Statistics Umeå University Spring 2018

v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

v

Can hidden Markov modelsbe used for inference about

operational risk?Markus Holmgren & Hampus Pettersson

Master Thesis, 30 ECTSDepartment of Mathematics and Mathematical Statistics

Umeå UniversitySpring 2018

Page 2: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

Abstract

This thesis aims to investigate the possibility if hidden Markov models (HMM) canbe used for inference about operational risk given financial time series data of Auditchanges and Audit prices. The models tested vary in the number of possible stateseach underlying latent process can take. All models have been implemented using R-statistical software along with the depmixS4 package. From the evaluation of the work,it was shown that there was a clear difference between the states, according to the thetypes of observation they emitted, for the final model. The thesis shows that the biggestfactors affecting operational risk were the number of changes of the trades and the timebetween those changes. It also showed that it was, in large part, the same trader whocarried out all the trades as well as changes and only within the internal department.The final conclusion is therefore that HMMs are possible and appropriate to use forinference about operational risk, but that more labeled data are required to express themodels predictive performance.

Page 3: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

Sammanfattning

Denna avhandling syftar till att undersöka möjligheten om dolda Markov-modeller(HMM) kan användas för inferens om operativ risk, given den finansiell tidsseriedataav revisionsförändringar och revisionspriser. De testade modellerena varierar i antaletmöjliga stater som den underliggande latenta processen kan anta. Alla modeller harimplementerats genom R-statistical-programvara tillsammans med depmixS4-paketet.Utvärderingen av arbetet visade på att det var en tydlig skillnad mellan tillstånden, setttill vilka typer av observationer de emitterade, för den slutliga modellen. Avhandlingenvisade även på att de största faktorerna som driver operativ risk var antalet förändringerför affärerna och tiden mellan dessa. Den visade också på att det i stor utsträckning varsamma handlare som genomförde alla affärer samt ändringarna av dessa, och då endastinom den interna avdelningen. Den slutgiltiga slutsatsen är därför att HMMs är möjligaoch lämpliga att använda för inferens om operativ risk, men att mer märkt data (labeleddata) krävs för att uttala sig om modellernas prediktiva prestanda.

Page 4: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

Acknowledgements

It has been a pleasure to write our master thesis at Svenska Handelsbanken and it hastruly been inspiring spending this time surrounded by so many passionate people, whowork closely together to provide state of the art machine learning solutions. We wouldlike to thank Richard Henricsson for introducing us to this interesting field and forhis continuous help and support during this study. To Cecilia, Joachim, Pontus, andThomas, for your insights and our ever so interesting conversations. Not only havewe learned a tremendous amount from all of you, but we’ve also created long lastingrelationships, which made this experience a lot more endurable, thank you. Finally,we like to express our gratitude to our supervisor Professor Oleg Seleznjev, for hisconsiderable contribution to the overall improvement of this study, thank you.

Page 5: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

Contents

1 Introduction 11.1 Problem specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Delimitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.6 Disposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Theory 52.1 Fundamental Theory and Definitions . . . . . . . . . . . . . . . . . . . 5

2.1.1 Multinomial distribution . . . . . . . . . . . . . . . . . . . . . 52.1.2 Maximum a posteriori estimation . . . . . . . . . . . . . . . . 5

2.2 Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.1 Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.2 HMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.3 The three problems of an HMM . . . . . . . . . . . . . . . . . 9

2.3 Expectation-Maximization algorithm . . . . . . . . . . . . . . . . . . . 102.4 Problem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4.1 Forward algorithm . . . . . . . . . . . . . . . . . . . . . . . . 122.4.2 Backward algorithm . . . . . . . . . . . . . . . . . . . . . . . 12

2.5 Problem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5.1 Viterbi algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.6 Problem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.6.1 Baum-Welch algorithm . . . . . . . . . . . . . . . . . . . . . . 15

2.7 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.7.1 Akaike information criterion . . . . . . . . . . . . . . . . . . . 172.7.2 Bayesian information criterion . . . . . . . . . . . . . . . . . . 172.7.3 Integrated complete likelihood criterion . . . . . . . . . . . . . 17

3 Data 193.1 Description of the data . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Data inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

i

Page 6: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CONTENTS

4 Method 214.1 Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1.1 depmixS4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.5 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Results 285.1 Model selection criterion . . . . . . . . . . . . . . . . . . . . . . . . . 285.2 Viterbi-decoded state sequence . . . . . . . . . . . . . . . . . . . . . . 295.3 Feature distribution over the states . . . . . . . . . . . . . . . . . . . . 35

6 Discussion and Conclusion 376.1 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.1.1 Model selection criterion . . . . . . . . . . . . . . . . . . . . . 376.1.2 Viterbi-decoded state sequence . . . . . . . . . . . . . . . . . . 38

6.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.3 Further Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

ii June 11, 2018

Page 7: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

List of Figures

2.1 An N-state urn and ball model which illustrates the general case of dis-crete symbol HMM. Where the possible sequence of observerd colors,O, can be seen (Rabiner, 1989). . . . . . . . . . . . . . . . . . . . . . . 8

5.1 Model selection criterions for the fitted HMMs where the number of pa-rameters which each of the HMMs were required to estimate are shownon nPar-axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2 Viterbi-decoded state sequence for the 4-state HMM. . . . . . . . . . . 295.3 Viterbi-decoded state sequence for the 5-state HMM. . . . . . . . . . . 305.4 Viterbi-decoded state sequence for the 6-state HMM . . . . . . . . . . 315.5 Viterbi-decoded state sequence for the 7-state HMM. . . . . . . . . . . 325.6 Viterbi-decoded state sequence for the 8-state HMM. . . . . . . . . . . 335.7 Viterbi-decoded state sequence for the 9-state HMM. . . . . . . . . . . 345.8 Plot of the 7-state HMM and it’s feature distribution over the states for

Delta, Entry, Equal Trader and Update number. . . . . . . . . . . . . . 355.9 Plot of the 7-state HMM and it’s feature distribution over the states for

Time between updates, Market value, Number of changes and Counterparty type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

iii

Page 8: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

List of Tables

2.1 Urn and ball example with; Clock time, urn (hidden) state and color(observation). Where the Clock time corresponds to a point in timewhere a particular color (observation) were recorded from a specific(hidden) urn, illustrating an typical observation sequence of an HMM. . 8

2.2 Descriptive formal model notation for a discrete observation HMM. . . 9

3.1 Table of the features obtained after data inference containing informa-tion on the type of feature, examples of the feature attributes and thecorresponding levels of the categorical features. . . . . . . . . . . . . . 20

4.1 Kurtosis and Skewness values for the Price and Nominal feature de-scribing how far away from normally distributed the features are. Wherethe normal distribution have a Kurtosis of 3 and a Skewness of 0. . . . . 23

4.2 Table of the features obtained after data preprocessing, containing in-formation on the type of feature, examples of the feature attributes andthe corresponding levels of the categorical features . . . . . . . . . . . 24

4.3 Model specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.1 Table of the 4-state HMM. Where the number of unlabeled and labeledobservations for each state are shown. . . . . . . . . . . . . . . . . . . 29

5.2 Table of the 5-state HMM. Where the number of unlabeled and labeledobservations for each state are shown. . . . . . . . . . . . . . . . . . . 30

5.3 Table of the 6-state HMM. Where the number of unlabeled and labeledobservations for each state are shown. . . . . . . . . . . . . . . . . . . 31

5.4 Table of the 7-state HMM. Where the number of unlabeled and labeledobservations for each state are shown. . . . . . . . . . . . . . . . . . . 32

5.5 Table of the 8-state HMM. Where the number of unlabeled and labeledobservations for each state are shown. . . . . . . . . . . . . . . . . . . 33

5.6 Table of the 9-state HMM. Where the number of unlabeled and labeledobservations for each state are shown. . . . . . . . . . . . . . . . . . . 34

iv

Page 9: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

Chapter 1

Introduction

1.1 Problem specification

In this thesis, the possibility of using hidden Markov models (HMM) for detecting op-erational risk (OpRisk) is examined by constructing and evaluating different types ofHMMs. Following Basel III regulation for banks, OpRisk is defined as the risk of lossresulting from inadequate or failed internal processes, people and systems or from exter-nal events (Basel Committee on Banking Supervision, 2017). The department for RiskControl, Model Validation & Quantitative Analysis (HCRM) at Svenska Handelsbanken(SHB) has provided this thesis with a data set of 64770 observations from Audit changesand Audit prices, made by traders, where 63 of those observations have been labeled asOpRisk. This thesis will therefore use these labeled observations when evaluating theHMMs to give an inference about which state (or states) in the specified HMMs, thatbest captures the behavior of OpRisk.

HMMs have been applied in many other machine-learning fields, e.g., speech recog-nition (Rabiner, 1989) and stock market prediction (Gupta et al. 2012) with success.However, since this thesis deals with the unsupervised learning case of HMMs, whichin itself is a much less well-defined problem than in the case of supervised learning(SL). Essentially meaning that we are not told what kind of patterns to look for, andthere is no obvious error metric to use (as opposed to the SL case). This implicates thatthe problem in this thesis does not have any best solution for selecting what features touse for OpRisk inference nor for how the corresponding HMMs should be designed andselected. This thesis therefore strictly aims to examine the possibility if HMMs can beused to give an inference about OpRisk.

To the best of the author’s knowledge, no other study has implemented HMMs in thisspecific field.

1

Page 10: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 1. INTRODUCTION

1.2 Background

For many years, financial institutions were of the opinion that credit risk and market riskwere the only two types of risks that they had to deal with. However, in the second halfof the twentieth century another type of risk emerged. Banks started to trade extensivelyfor their customers as well as for their own accounts in many different markets (e.g.equities, fixed income, commodities). But, in the last decade of the twentieth century,the industry witnessed the demise of several financial institutions. For example, in 1995,Nick Leeson, a rogue trader at Barings Bank in Singapore, lost 700mm dollars due toa number of unauthorized trades he made betting on the direction of the Japanese yen.This incident led to Barings going out of business (Xu et al., 2017). Furthermore, Xuet al. (2017) mentions another example when an equity trader at Mizuho bank in 2005made an error with data entry, causing the bank to lose 350mm dollars. It were thesetypes of events that brought light to the third type of risk, OpRisk. It’s therefore fairto assume that OpRisk are less visible (i.e. it can’t be explicitly observed) compared toother risks, e.g., market risk and credit risk. Hence, causing OpRisk to be neglected fora very long time.

Due to the difficulties when it comes to getting an inference about OpRisk, identificationof OpRisk is one of the most important areas to manage these types of risks while failureto do so will most certainly mean that no action can be taken towards managing them. Acommon method used in identification of OpRisk is the use of workshops to brainstormfor new ways to identify and classify them (CIMA, 2008).

Nowadays financial services are aware of OpRisk, which has turned out to be just asimportant as credit risk and market risk. Svenska Handelsbanken (2018) are today col-lecting data to apply different kinds of risk indicators in order to identify and warn ofheightened OpRisk and categories the risk according to seven types of events:

• Execution, delivery and process management;

• Business disruption and system failures;

• Clients, products and business practices;

• External crime;

• Damage to physical assets;

• Employment practices and workplace safety;

• Internal fraud.

It should be clear from the above examples and categories of OpRisk, that it’s a conceptthat cannot be explicitly observed. However OpRisk, at least in terms of the observedAudit changes and Audit prices, can be assumed to be of Markovian nature, i.e. given

2 June 11, 2018

Page 11: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 1. INTRODUCTION

that the present is known, the future is not dependent on the past. Therefore, one couldassume HMMs to be able to model OpRisk as an underlying (latent) process generatingan observable sequence of observations. This assumption leads to the interesting ques-tion: Can HMMs be used to give an inference about OpRisk given the collected timeseries data of Audit changes and Audit prices?

One example of how such a model could be used is to help with the identification ofOpRisk in the trading process and possibly classifying different levels of risk. If somesequence of Audit changes belong to a certain state that can be said to represent highOpRisk one could automatically flag the trade in the system and notify compliance forfurther investigation and, if necessary, take appropriate action towards minimizing therisk.

1.3 Aim

The aim of this thesis is to investigate if HMMs are capable to give an inference aboutOpRisk given data of Audit changes and Audit prices. Furthermore, this thesis willserve as the first study and as a guidance for SHB for future a implementation of HMMsfor OpRisk detection or inference.

1.4 Approach

In this thesis nine different HMMs are constructed and evaluated, each with its ownunique number of hidden states. The HMMs are implemented with R-statistical soft-ware version 3.4.4, and the package depmixS4 version 1.3-3 by Visser and Speeken-brink. This choice is motivated by the aim of the thesis together with the usefulnessof R when it comes to data manipulation and the package depmixS4 which provides aneasy and intuitive way of specifying HMMs. This approach is further motivated by thefact that we don’t know in before-hand how many hidden states the latent process ofOpRisk consists of (Rabiner, 1989). Which makes it possible to investigate each of theconstructed HMMs and how the labeled observations distribute over the states, to givean inference about which of the HMMs that best seem to mimic the latent process ofOpRisk.

1.5 Delimitation

In this Section the delimitations which have been made are defined, all naturally moti-vated by the aim of the thesis, as follows:

3 June 11, 2018

Page 12: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 1. INTRODUCTION

1. The number of HMMs modeled are limited to 9 and the number of states for eachcorresponding model is unique, ranging from 2− 10. This is motivated by thefact that in general, the more states that are included in HMM, the harder it get toassign the OpRisk meaning to the state.

2. No comprehensive feature selection method will be used as it’s not justified bythe aim of the thesis.

3. Due to the aim of this thesis (together with the fact that the true state sequence ofOpRisk is unknown) no comprehensive measurement other than the most prag-matic one will be used to evaluate the predictive performance of the HMMs.

4. The labeled observations will only be used for inference about which state (orstates) in the HMMs that captures the behavior of OpRisk.

5. As a last delimitation, no automatized HMM algorithm will be developed for SHBsince it’s not justified by the aim of this thesis.

1.6 Disposition

In this thesis the disposition is as follows. Chapter 2 presents all the relevant theorybehind the methods and algorithms used in this thesis. It begins by describing distribu-tion and probability theory, before moving on to define and describe Markov chains andHMMs by explaining their properties, implications of model assumptions and the threeproblems of an HMM, followed by an extensive account of the algorithms used to solvethose problems. Lastly, an description of the three model selection criterion, used forHMM selection, is presented.

Chapter 3 describes and displays the data which have been used in this thesis. It startswith describing the original data set of Audit changes and Audit prices and ends with adescription of the subset of the data, later used for preprocessing.

Chapter 4 presents the method used in this thesis and consists of a number of subsec-tions. It begins with Section 4.1 describing the package that have been used to im-plement the HMMs by then moving on to Section 3.2 which detail for how the datainference done. Then follows Section 4.2, which detail for how the data preprocess-ing was conducted. Section 4.3 carefully explains how the HMMs was specified andfitted using previous mentioned package and Section 4.4 describes how the model se-lection was conducted. Lastly, Section 4.5 detail for the model evaluation method of theHMMs.

Chapter 5 presents the results from the thesis and in Chapter 6 we discuss those resultsand the final conclusion are made. Last, but not least, we present some ideas for furtherstudies.

4 June 11, 2018

Page 13: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

Chapter 2

Theory

2.1 Fundamental Theory and Definitions

2.1.1 Multinomial distribution

The multinomial distribution is a multidimensional generalization of the binomial dis-tribution. Consider a trial that can result in only one of k possible distinct outcomes,labeled Ai, where i = 1, ...,k. Outcome Ai occurs with probability pi, where ∑

ki=1 pi = 1.

The multinomial distribution relates to a set of n-independent trials of this type. Themultinomial variate M is a vector of random variable Mi where M =(M1,M2 . . . ,Mk) and∑

ki=1 Mi = n. Mi is the random variable “number of times event Ai occurs,” i = 1, ...,k.

The outcome of M is a vector x = (x1, ...,xk). For the multinomial variate, xi is theoutcome of Mi and is the number of times event Ai occurs in the n trials. The jointprobability function f (x1, ...,xk) is the probability that each event Ai occurs xi times,i = 1, ...,k, in the n trials (Evans et.al., 2000). This probability function is given by:

f (x1, . . . ,xk) = n!k

∏i=1

(pxii /xi!).

2.1.2 Maximum a posteriori estimation

In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimateof an unknown quantity, that equals the mode of the posterior distribution. The MAP canbe used to obtain a point estimate of an unobserved quantity on the basis of empiricaldata,. It is closely related to the method of maximum likelihood (ML) estimation, butemploys an augmented optimization objective which incorporates a prior distribution,over the quantity one wants to estimate. MAP estimation can therefore be seen as a

5

Page 14: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 2. THEORY

regularization of ML estimation.

2.2 Hidden Markov Model

An HMM is one type of stochastic signal models, a Markov chain, which models a gen-erative sequence that can be characterized by an underlying (latent) process generatingan observable sequence. Furthermore, this observable sequence can be characterized assignals which can be discrete or continuous in nature (Rabiner, 1986) (i.e. produced ingeneral by real-world processes). The Markov chain and the Markov property is pre-sented in Section 2.2.1. This section is followed by a description of the discrete type ofHMMs together with a simple example. Furthermore, the problems of HMMs that needto be dealt with are presented in Section 2.2.3.

2.2.1 Markov chain

The easiest way to treat sequential (time series) data is to simply ignore the sequentialaspects and to treat the observations as independent and identically distributed (i.i.d.).However, one downside of choosing such an approach is the fail to exploit the sequentialpatterns in the data, such as correlations between observations that are close in the se-quence. Suppose, for instance, that a binary variable is observed. Given a time series ofrecent observations of this variable, one wish to predict the outcome of the next variablein the sequence. But, if the data are treated as i.i.d., then the only information that canbe attained is the relative frequency of the observation. In practice, sequential data oftenexhibit trends. Therefore it is of significant help to observe the most recent variable inpredicting the next value (Bishop, 2006). However there exists another approach, wherethe i.i.d. assumptions are relaxed to attain such effects in a probabilistic model, calledthe Markov property.

The Markov property refers to the memoryless property of a stochastic process. Astochastic process has the Markov property if the conditional probability distributionof future observations of the process (conditional on both past an present observations)depends only upon the present observation, not on the sequence of event that precededit. A process containing this property is called a Markov process, e.g., the 1st-orderMarkov chain described in the following section.

To fix the idea around the Markov property let us consider the Markov model

P(q1, . . . ,qN) =N

∏n=1

P(qn|q1, . . . ,qn−1), (2.2.1)

where q1, . . . ,qN is the possible set of observed values. Furthermore, if we now assume

6 June 11, 2018

Page 15: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 2. THEORY

that each of the conditional probability distributions on the right-hand side in Equation(2.2.1) is independent of all previous observations except the most recent one, then weobtain the first-order Markov chain.

The joint distribution for a sequence of N observations under this model is given by

p(q1, . . . ,qN) = p(q1)N

∏n=2

p(qn|qn−1),

thus the first-order Markov chain fulfills the Markov property.

2.2.2 HMM

Rabiner (1989) formally defines the elements and the mechanism of the type of HMMsthat we use in this thesis:

1. There are a finite number, say N of states in the model; the states are not rigor-ously defined as to what a state is, instead we say that within a state the signalposses some measurable, distinctive properties.

2. At each clock time, t, a new state is entered based upon a transition probabilitydistribution which depends on the previous state (the Markovian chain property,see Section 2.2.1). (Note that the transition may be such that the process remainsin the previous state.)

3. After each transition is made, an observation output is produced according to aprobability distribution which depends on the current state. his probability distri-bution is held fixed for the state regardless of when and how the state is entered.There are thus N such observation probability distributions which, of course, rep-resent random variables or stochastic processes.

7 June 11, 2018

Page 16: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 2. THEORY

Figure 2.1: An N-state urn and ball model which illustrates the gen-eral case of discrete symbol HMM. Where the possible sequence of ob-serverd colors, O, can be seen (Rabiner, 1989).

To fix the ideas, Rabiner (1989) presents an "urn and ball" model example, illus-trated in Figure 2.1. There are N urns, each filled with a large number of coloredballs. There are M possible colors for each ball. The observation sequence is gen-erated by initially choosing one of the N urns (according to an initial probabilitydistribution), selecting a ball from the initial urn, recording its color, replacing theball, and then choosing a new urn according to a transition probability distributionassociated with the current urn. A typical observation sequence that could occurcan be seen in Table 2.1.

Table 2.1: Urn and ball example with; Clock time, urn (hidden) stateand color (observation). Where the Clock time corresponds to a pointin time where a particular color (observation) were recorded from aspecific (hidden) urn, illustrating an typical observation sequence of anHMM.

Observation sequence

Clock time 1, 2, . . . , Turn (hidden) state q3,q1,q1,q2,q1, . . .qN−2color (observation) R, B, Y, Y . . . , R

We can now, from the above illustrated urn example, formally define the followingmodel notation for a discrete observation HMM:

8 June 11, 2018

Page 17: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 2. THEORY

Table 2.2: Descriptive formal model notation for a discrete observation HMM.

Notation Description

T Lengths of the observation sequence (total number of clock times)N numer of states in the modelM number of observation symbolsS S1, . . . ,SN , the individual statesQ {q1,q2, . . .qT}, state at time tV {v1,v2, . . .vM}, discrete set of possible symbol observationsA {ai j},ai j = P(q j at t +1|qi at t), state transition probability distributionB {b j(k)},b j(k) = P(vk at t|q j at t), observation symbol probability distribution in state jπ {πi}, πi = P(qi at t = 1), initial state probabilities

Using the model, an observation sequence, O = {O1,O2, . . . ,OT} is generated asfollows

(a) Randomly choose an initial state, qi, according to the initial state distribu-tion, π .

(b) Set t = 1.

(c) Randomly choose Ot according to bi,t(k), the symbol probability distributionin state Si, i.e. bi(k).

(d) Transit to a new state qt+1, according to the state transition probability dis-tribution for state Si, i.e. ai j.

(e) Set t = t+1 and return to step (c) if t < T , otherwise terminate the procedure.

4. It can be seen from above that a complete specification of an HMM requires spec-ification of two model parameters (N and M), specification of observation sym-bols, and the specification of the three probability measures A, B and π . Forconvenience, we use the compact notation λ = (A,B,π) to indicate the completeparameter set of the model.

2.2.3 The three problems of an HMM

According to Rabiner and Juang (1986), there are three problems of interest that onemust consider and solve before a HMM can be considered useful in any real-worldapplications. To clarify these problems, let’s consider the observation sequence O ={O1,O2, ...,OT} and the HMM λ = (A,B,π). The three problems are defined as fol-lows:

1. Evaluation problem: How do we (efficiently) compute P(O|λ )? That is, the prob-ability that the observed sequence O was produced by the model λ .

2. Uncovering problem: How do we pick a corresponding state sequence,Q = {q1,q2, ...,qT}, that corresponds to the observations sequence O and model

9 June 11, 2018

Page 18: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 2. THEORY

λ in some significant sense? As there is no true or correct state sequence Q for allbut the trivial cases, some optimal criterion needs to be selected and imposed forpractical solutions.

3. Training problem: How do we select the model parameters A,B,π to maximizeP(O|λ ) and best describe how a given sequence of observations came about?

In the following sections the relevant theory to solve the above problems are detailedfor.

2.3 Expectation-Maximization algorithm

The Expectation-Maximization algorithm (EM), first introduced by Dempster et al.(1977), is a general maximum likelihood estimation method (MLE) that is used forfinding the MLE of parameters in probabilistic models over incomplete data-set, i.e.data sets containing unobserved (latent) variables such as the HMM.

Suppose we havex = (x1,x2, . . .xn)

containing observed variables and

z = (z1,z2, . . . ,zn)

containing unobserved (latent) variables. Furthermore, let λ denote the unknown pa-rameters.

DenoteL (λ ;x,z) = P(x,z|λ )

as the likelihood function which estimates the parameters for λ . The parameters cannow be estimated by maximizing the marginal likelihood

L (λ ;x) = P(x|λ ) = ∑z

P(x,z|λ ), (2.3.1)

which is the sum of the likelihood function over all possible values of the latent variablesz.

1. Initialization: Set up initial guess of parameters λ .

2. Expectation: Calculate the expected value of the logarithm of the likelihood func-tion L (λ ;x,z), with respect to the conditional distribution of z given x under thecurrent estimate of the parameters λt

10 June 11, 2018

Page 19: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 2. THEORY

Q(λ |λt) = Ez|x,λt[log L (λ ;x,z)] (2.3.2)

3. Maximization: Update the estimation of unknown parameters by maximizing theconditional expectation of the likelihood function i.e.

λt+1 = arg maxλ

Q(λ |λt) (2.3.3)

4. Evaluation: Estimate the likelihood value.

5. Repeat: Step 2 to 4 until the likelihood value converges.

2.4 Problem 1

The straightforward approach of solving problem 1, see Section 2.2, is by enumeratingevery possible state sequence of the same length as the number of observations, T . Ifwe consider a fixed state sequence Q = {q1,q2, ...,qT}, the probability of observingsequence O with model λ is by the assumption of independence between observationsexpressed as

P(O|Q,λ ) =T

∏t=1

P(Ot |qt ,λ ) =T

∏t=1

bqt (Ot), (2.4.1)

where bqt (Ot) denotes the probability of observation Ot in state qt . We then express theprobability of the state sequence Q as

P(Q|λ ) = πq1

T

∏t=2

aqt−1,qt , (2.4.2)

where π is the initial state distribution and a is the state transition probability distribu-tion.

The probability that O and Q occur simultaneously is given by the product of Equa-tions (2.4.1) and (2.4.2), and the solution to the problem is given by summing the jointprobability over all state sequences,

P(O|λ ) = ∑all Q

P(O|Q,λ )P(Q|λ )

= ∑all Q

πq1bq1(O1)aq1q2bq2(O2)...aqT−1qT bqT (OT ).(2.4.3)

11 June 11, 2018

Page 20: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 2. THEORY

However, evaluating the above expression is a very time consuming task as it involves2T NT calculations. The forward-backward algorithm can instead be used to simplifythis task.

2.4.1 Forward algorithm

Denote qt(i) as the event that the hidden state at time t is Si. Then, the forward variableat time t for state i is defined as αt(i). That is, let αt(i) express the probability of a par-tial observation sequence {O1,O2, ...,Ot} and state qt(i), given the model λ (Rabiner,1989).

Formally, we can write it as

αt(i) = P(O1,O2, ...,Ot ,qt(i)|λ ).

Initialize the computation as

α1(i) = πibi(O1), where 1≤ i≤ N,

and solve for αt(i) inductively

αt+1( j) =

[N

∑i=1

αt(i)ai j

]b j(Ot+1) where 1≤ t ≤ T −1 and 1≤ j ≤ N.

Terminate once we reach αT (i), since

P(O|λ ) =N

∑i=1

αT (i)

as by definitionαT (i) = P(O1,O2, ...,OT ,qT (i)|λ )

and it becomes clear that summing these forward variables is equivalent to solving forEquation (2.4.3). But going this route conveniently gives us N2T calculations insteadof the previous 2T NT .

2.4.2 Backward algorithm

We can in a similar manner to the forward variable also define the backward variableβt(i). This procedure is not required for solving problem 1 of the HMM, but will be

12 June 11, 2018

Page 21: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 2. THEORY

used in the process of solving for problem 3. Let βt(i) be the probability of the partialobservation sequence from t +1 to T (Rabiner, 1989),

Bt(i) = P(Ot+1,Ot+2, ...,OT |qt(i),λ ). (2.4.4)

Arbitrarily set βT (i) to be 1 for all i since the the initial state can be assumed to be given,and use induction to solve for βt(i)

βt(i) =N

∑j=1

ai jb j(Ot+1)βt+1( j). (2.4.5)

2.5 Problem 2

No unique solution can be given for problem 2 as there are many different approaches tothe problem, depending on the definition of what is "optimal". One approach is to selectthe individually most likely qt for each time step t as the optimality criterion. Using thisapproach, we can define a new variable γt(i) that represents the probability of being instate Si given the observation sequence O and model λ ,

γt(i) = P(qt(i)|O,λ ). (2.5.1)

We can then express γt(i) in terms of the forward and backward variables from Equation(2.4.1) and (2.4.5), since the forward variables express the probability of observationsequence O1 to Ot and state Si at time t while the backward variable accounts for theremaining Ot+1 to OT observations given state Si at time t,

γt(i) =αt(i)βt(i)P(O|λ )

=αt(i)βt(i)

∑Nj=1 αt( j)βt( j)

. (2.5.2)

Using Equation (2.5.2), we can then solve for the most likely state at each time

qt = argmax [γt(i)] . (2.5.3)

Using this approach may however be problematic since it solves for the individuallymost likely state at each time without accounting for sequences of states, i.e if a statetransition has a probability of 0 (ai, j = 0) the optimal state may not be a reachablestate. It may therefore be more feasible to instead maximize P(Q,O|λ ) using the Viterbialgorithm (Rabiner, 1989).

13 June 11, 2018

Page 22: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 2. THEORY

2.5.1 Viterbi algorithm

The Viterbi Algorithm tracks the state of a stochastic process recursively to find the op-timal solution to the problem of estimating the state sequence, or in a general form, thesolution of maximum a posteriori probability (MAP) estimation, described in Section2.1.2, of a discrete-time and finite-state Markov process observed in memoryless noise.The Viterbi Algorithm utilizes dynamic programming to solve the MAP sequence esti-mation in a similar fashion to solving the shortest route problem through a certain graph(Forney, 1973).

To find our optimal state sequence Q for the given observation sequence O we definethe quantity

δt(i) = maxq1,q2,...,qt−1

[P(q1,q2, ...,qt(i),O1,O2, ...,Ot |λ )] , (2.5.4)

as the highest possible probability along a path at time t, where we end up in state Si.We can then solve for δt+1(i) with induction

δt+1(i) =[

maxi

δt(i)ai j

]b j(Ot +1). (2.5.5)

We then initialize our recursion by

δ1(t) = πibi(O1), where 1≤ i≤ N (2.5.6)

andψ1(i) = 0, (2.5.7)

where the array ψt(i) is used to track the maximizing argument of each iteration. Afterinitializing, we recursively solve

δt( j) = max1≤i≤N

[δt−1(i)ai j

]b j(Ot), (2.5.8)

ψt( j) = arg max1≤i≤N

[δt−1(i)ai j

]. (2.5.9)

Until we have solved for all j up until T . At this stage we terminate the procedure as

P∗ = max1≤i≤N

[δT (i)] , (2.5.10)

q∗T = arg max1≤i≤N

[δT (i)] . (2.5.11)

14 June 11, 2018

Page 23: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 2. THEORY

We then backtrack to find the optimal sequence

q∗t = ψt+1×q∗t+1. (2.5.12)

2.6 Problem 3

There is no known way to analytically solve problem 3 presented in Section 2.2. How-ever we can choose λ = (A,B,π) so that P(O|λ ) is locally maximized. This can beobtained by the an approach of the EM algorithm, described in Section 2.3, based on thework of Baum and his colleagues, called the Baum-Welch algorithm (Rabiner, 1989).

2.6.1 Baum-Welch algorithm

To illustrate the iterative process of the Baum-Welch algorithm, used to optimize themodel parameters, consider the following.

Define ξt(i, j) as the probability of being in state Si at time t, and state S j at time t +1,given the model and the observation sequence as

ξt(i, j) = P(qt(i),qt+1( j)|O,λ ). (2.6.1)

By making use of the forward and backward variables defined in Equation (2.4.1) and(2.4.5) respectively, ξt(i, j) can be redefined as

ξt(i, j) =αt(i)ai jb j(Ot+1)βt+1( j)

P(O|λ )

=αt(i)ai jb j(Ot+1)βt+1( j)

∑Ni=1 ∑

Nj=1 αt(i)ai jb j(Ot+1)βt+1( j)

,(2.6.2)

which yields the desired probability measure.

In Equation (2.5.2), we defined γt(i) as the probability of being in state Si at time t,given the observation sequence, O, and the model, λ . Hence, it’s possible to relate γt(i)to ξt(i, j), defined in Equation (2.6.2), by summing over j, giving

γt(i) =N

∑j=1

ξt(i, j). (2.6.3)

15 June 11, 2018

Page 24: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 2. THEORY

Furthermore, by summing γt(i) over the time index t a quantity is obtained which canbe interpreted as the expected (over time) number of transitions made from state Si(excluding the time slot t = T from the summation). Similarly, summing ξt(i, j) in thesame fashion can be interpreted as the expected number of transitions from state Si tostate S j (Rabiner, 1989). This yields

T−1

∑t=1

γt(i) = expected number of transitions from Si (2.6.4)

and

T−1

∑t=1

ξt(i, j) = expected number of transitions from Si to S j. (2.6.5)

From Equation (2.6.4) and (2.6.5), a method for re-estimation of the parameters of anHMM can be developed from the following set of formulas for π , A and B:

πi = γ1(i), (2.6.6)

ai j =E[transitions from Si to S j]

E[transitions from Si]

=∑

T−1t=1 ξt(i, j)

∑T−1t=1 γt(i)

,

(2.6.7)

bi(k) =E[times of visiting Si and observing symbol vk]

E[times of visiting Si]

=∑

T−1t=1,s.t.Ot=vk

γt( j)

∑T−1t=1 γt( j)

.

(2.6.8)

Finally, by using the current model defined as λ = (A,B,π) to calculate the right handside of Equation (2.6.6)-(2.6.8) the re-estimated model is defined as λ = (A, B, π).

2.7 Model Selection

In this Section the theory behind the model selection criterions used in this thesis arepresented. The model with the minimum model selection criterions are considered tobe the most appropriate.

16 June 11, 2018

Page 25: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 2. THEORY

2.7.1 Akaike information criterion

The Akaike information criterion (AIC) is an estimator of the relative quality of sta-tistical models for a given set of data. Given a collection of models for the data, AICestimates the quality of each model, relative to each of the other models. Thus, AIC pro-vides a means for model selection. When using the AIC, the focus lies on out-of-samplepredictive accuracy. Given a model fitted using maximum likelihood, with correspond-ing estimate θ for the parameter vector θ , the AIC is defined as

AIC =−2logL (θ |O)+2p.

Where L (·|O) is the likelihood function given the observed sequence O= {O1, . . . ,OT}and p is the number of model parameters (Pohle et al., 2017).

2.7.2 Bayesian information criterion

In statistics, the Bayesian information criterion (BIC) or Schwarz criterion (also SBC,SBIC) is a criterion for model selection among a finite set of models; the model withthe lowest BIC is preferred. It is based, in part, on the likelihood function and is closelyrelated to the AIC, see Section 2.7.1. The BIC is defined as

BIC =−2logL (θ |O)+ p log(T )

and differs from AIC only in its increased penalty term (for T≤8). It’s derived from aBayesian point of view and aims to identity the model which is most likely to be true,instead of maximizing predictive accuracy (Pohle et al., 2017).

2.7.3 Integrated complete likelihood criterion

The integrated completed likelihood (ICL) criterion, takes into account model evidence,as similar to BIC, but it additionally consider the relevance of partitions of the data intodistinct states, as obtained under the model (Pohle et al., 2017).

The ICL criterion approximates the integrated complete-data likelihood, which is thejoint likelihood of the observed values O = {O1, . . . ,OT} and its associated underlyingstate sequence Q = {q1 . . .qT} using a BIC-like approximation. As the true state se-quence is unknown, it is replaced with the Viterbi-decoded state sequence Q, describedin Section 2.5, that is the most probable state sequence under the model considered.With Lc(·|O, Q) denoting the (approximate) complete-data likelihood, the ICL crite-rion is defined as

17 June 11, 2018

Page 26: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 2. THEORY

ICL =−2logLc(θ |O, Q)+ p log(T ).

18 June 11, 2018

Page 27: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

Chapter 3

Data

In this Chapter the financial time series (sequential) data are presented and are dividedinto two sections: Description of the data, where the data given by the HCRM depart-ment is loosely described; Data inference, which give a detailed description of the datato be used for preprocessing.

3.1 Description of the data

In this thesis, financial time series data containing Audit changes and Audit prices fromFX-instruments are used. The data set contains 43 features and 64770 observations,ranging from 2014-07-01 to 2017-10-02. Furthermore, HCRM have provided us with afeature for the entire data set containing labels for whether or not an observation is seenas belonging to a state of operational risk or not. However, it should be noted that only63 out of 64770 observations are labeled as belonging to a state of OpRisk.

3.2 Data inference

Some of the features from the original data set contain, based on the domain knowledgeat HCRM, meaningless or superfluous information and are not of importance since thefeatures are not correlated with OpRisk. Therefore, when selecting features from theoriginal data set on which further data preprocessing is made the domain knowledgewithin the HCRM department is utilized. Employees at the department have discov-ered that certain features seemed to be highly correlated with OpRisk. Based on thisknowledge, 13 features are selected for further data preprocessing, see Table 3.1.

19

Page 28: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 3. DATA

Table 3.1: Table of the features obtained after data inference containing information onthe type of feature, examples of the feature attributes and the corresponding levels ofthe categorical features.

Feature Feature Type Sample of Feature Attributes Levels

Trade Number Integer 12345, 23456, 34567 -Create Time Time YYYY-MM-DD HH:MM -Update Time Time YYYY-MM-DD HH:MM -Trader Categorical ZXC01, QWE02, ASD00 -Create User Categorical ZXC01, QWE02, ASD00 -Update User Categorical ZXC01, QWE02, ASD00 -Price Numerical 9.13, 11.5, 0.2 -Nominal Numerical 30000, -40000, 50000 -Second Nominal Numerical -273900, 460000, -10000 -Delta Integer 0, 1, 133 -Entry Categorical Update, Delete 2Counterparty Type Categorical Intern Department, Counterparty, Broker, Market 4Trade changes Categorical/Numerical Specific changes made to an observation 36

However, the data seen in the above Table need to be preprocessed before it can be usedfor modeling.

20 June 11, 2018

Page 29: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

Chapter 4

Method

This Chapter is divided into six sections: (1) Package, detail for the package used inorder to specify and analyze the HMMs; (2) Data inference, the method used in orderto extract the relevant information from the raw data; (3) Data preprocessing, explainsfor how the data is processed in order to specify the HMMs; (4) Modeling, detail forhow the HMMs are specified and fitted; (5) Model selection, gives an description ofthe different methods used in order to select HMMs for further analyzes; (6) Modelevaluation, describes for how the evaluation of the HMMs are done.

4.1 Package

4.1.1 depmixS4

The depmixS4 package is an R-statistics package for HMMs. It implements a generalframework for defining and estimating dependent mixture models in the R programminglanguage. This includes standard Markov models, see Section 2.2.1, HMMs, see Section2.2, and latent class and finite mixture distribution models. The models can be fitted onmixed multivariate data.

The model fitting is done in two steps using the depmixS4 package: first, the HMM isspecified through the the depmix function; second, the model needs to be fitted by usingthe f it function.

The standard output of the fitted model includes the following: Optimized parametersestimated by the EM algorithm, see Section 2.3, based on the Baum-Welch algorithm,see Section 2.6; the posterior densities for the states and the optimal state sequenceboth which are computed via the Viterbi algorithm, see Section 2.5, which utilizes theforward-backward algorithm, see Section 2.4, as default to calculate the log likelihood,state and transition smoothed probabilities (Visser & Speekenbrink, 2016).

21

Page 30: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 4. METHOD

4.2 Data Preprocessing

In order to be able to use the data for modeling some data processing is required, due tothe problem of missing data and that some of the features withholding different types ofinformation that is believed to be correlated to the latent process of OpRisk. Hence, thefeatures are preprocessed in the following ten steps.

• Step 1. Missing value replacement:

If the feature is of categorical form, the missing value is replaced with a newcategory "missing".

• Step 2. Trade changes feature:

In the second step, the Trade changes feature is converted into a numeric valuerepresenting the number of fields (unique attributes, i.e interest rate, counter party)updated for the observation. This feature is then entered into 11 bins ranging fromno updated fields to 10+ updated fields. This approach to transforming the featureis due to the extremely large amount of information contained in the feature. Ifit would be expanded to include all information, over 1000 new variables of bothcategorical and numerical information would be required, which is not feasible.Some alternative approach will be discussed in the Further studies Section.

• Step 3. Delta feature:

The integer value in the Delta feature represents the time difference in days be-tween the trade and the date and time of update. The feature is transformed into10 different bins depending on the size. Quantiles are used as cut-off points. Thisapproach is used to reduce the number of different options in the feature.

• Step 4. Update Number feature:

A new feature called Update Number is created. The Update Number featurerepresents how many times each unique Trade Number has been changed or up-dated. The feature will show 1, the first time the trade number occurs and 2 forthe second time it occurs etc, etc. . . The feature is entered into 5 bins ranging from1 update to 5+ updates.

• Step 5. Equal trader feature:

A new feature called Equal Trader is created. Each observation in the data set istagged to a specific trader, create user and update user. Equal trader is a binaryfeature that represents if the same user was the trader, creates user and updatesuser for the observed Audit change. The inclusion of this feature is due to thebelief that this behavior may be linked with the OpRisk process.

• Step 6. Time between updates feature:

A new feature called Time between updates is created. If a trade has been updatedat more than one occasion, this feature will represent the time difference between

22 June 11, 2018

Page 31: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 4. METHOD

the observed trade change and the previous observed trade change. The featurevalue will be 0 for all trades that have only been updated one and as well as for allfirst time observations. This feature is then entered into 10 bins by using quantilesas cutoff points. This feature is included as there is a belief that waiting for a whilebetween sequential updates in the trade may be linked with a cover up process.

• Step 7. Counter party feature & Intern dept feature:

The original Counter party type-feature form contains very infrequent optionsfrom the Broker and Market observations in that feature, that seems unlikely tobe linked with the underlying latent process of OpRisk. We therefore remove theoriginal feature and create two new features. The counter party feature is a binaryvariable where 1 represents if the trade was made with a counter party and 0 if itwas not. The Inter dept feature is a binary variable where 1 represents if the tradewas made with a internal department and 0 if it was not.

• Step 9. Entry feature:

Entry is a bi-categorical feature that takes value Update or Delete depending onif the observation is an update or delete in SKNT.

• Step 10. Numerical data:

Table 4.1: Kurtosis and Skewness values for the Price and Nominal featuredescribing how far away from normally distributed the features are. Wherethe normal distribution have a Kurtosis of 3 and a Skewness of 0.

Feature Kurtosis Skewness

Price 5901.9 72.4Nominal 6545.4 -49.9

The skewness of the features, seen in Table 4.1, leads to problems in parameterestimation because here one usually wishes to find the best finite local maximumof the likelihood and the infinite spikes of the likelihood function are missed bythe maximization algorithm (Zucchini et. al., 2016). Hence, to avoid some datapoints in these features to give infinite contribution to the log-likelihood in theEM algorithm, we choose to transform them into their market value through thefollowing equation,

Price×|Nominal|= Market value. (4.2.1)

The resulting feature attained from Equation (4.2.1) is then transformed into 10bins to reduce the effects of minor observation errors. Quantiles are used as cutoffpoints.

23 June 11, 2018

Page 32: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 4. METHOD

The resulting data set after preprocessing is presented in Table 4.2 below. The TradeNumber-feature is kept in the data set as a key to the labeled observations. The UpdateTime-feature is used to for the time series structure and the Trader-feature is used forcomponent series structure.

Table 4.2: Table of the features obtained after data preprocessing, containing informa-tion on the type of feature, examples of the feature attributes and the correspondinglevels of the categorical features

Feature Feature Type Sample of Feature Attributes Levels

Trade Number Integer 12345 -Update Time Time YYYY-MM-DD HH:MM -Trader Categorical ZXC01, QWE02, ASD00 -Trade Changes Categorical 0,1,2 11Delta Categorical [0-1],[1-2],[2-3] 10Entry Categorical Update, Delete 2Update Number Categorical 1,2,3 5Equal Trader Binary 0,1 2Time between updates Categorical [0-1],[1-2],[2-3] 10Counter party Binary 0,1 2Internal Department Binary 0,1 2Market Value Categorical [0-1],[1-2],[2-3] 10

How each of these features, seen in Table 4.2 will be used is discussed in Section 4.3.

4.3 Modeling

The following sections give a description of the method which is tested when specifyingand fitting HMMs. The method is implemented in the R-statistical software with themain package depmixS4, see Section 4.1.1. Furthermore, as described in Section 4.1.1,the model fitting is done in two steps using the depmixS4 package: first, the HMM isspecified through the depmix function; second, the model is fitted by the fit function.

For all models, the data are structured as longitudinal data (also called component se-ries). Such data consist of K time series of the same type of observation on each ofK subjects, example of such data would be disease status for each of K individuals, orin the context of this thesis, observed Audit change behavior for each of the K traders.For the application of HMM on component series, we will suppose that the same typeof model is used to describe each of the K series but the parameters of the model (λ )may vary across the subjects. It is assumed that the component series are independent,each with its own underlying sequence of states (Zucchini, 2016). An argument ntimes

24 June 11, 2018

Page 33: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 4. METHOD

is therefore used in the modeling and represents a vector specifying the length of in-dividual, independent time, as given in the depmixS4 package description by Visser &Speekenbrink (2016). We let each unique trader, represent an independent time seriesordered by the update time of each trade, and let the ntimes argument represent thelength of these time series. All the labeled observations belong to one specific trader,hence we will train each HMM on K−1 independent time series corresponding to allother traders and predict the behavior of the trader with the labeled observations. This isdone due to all the labeled observations belonging to the last trader. We are interested ininvestigating the behaviour of this trader to see if we can find a model that generalizeswell with the underlying (latent) process of OpRisk for this individual.

Furthermore, the specification of the HMM with the depmixS4-package is done in twosteps; first, a list corresponding to the response features to be included in the modelis created. Each response variable included in the list is assumed to be independentfrom every other response variable as well as conditionally independent on the under-lying state; second, a list specifying the distribution family of each response variable iscreated.

Since the true number of states in the latent process is unknown (Rabiner, 1989), domainknowledge at the HCRM department is used in order to decide upon how many HMMsthat should be specified with regard to how many states the underlying latent process ofOpRisk, in reality, potentially could consist of. Based on this knowledge, nine HMMsare specified, with their own unique number of states, N = 2,3 . . . ,10, where N corre-sponds to the number of states in the HMM.

First, the response variables in the list are modeled as a GLM formula to fit only anintercept so that it includes no dependencies between different features. Secondly, theresponse variables distribution is specified in a list, where each of the categorical vari-ables are specified as belonging to the multinomial distribution, see Table 4.2. Theinitial guess for the parameters of the multinomial distributions for each response vari-able will start of as being equally divided between their different outcomes, and thenlater optimized by the EM algorithm, see Section 2.3.

The specified HMMs are then separately fitted using the fit function where each modelwith increased number of states, with respect to it’s predecessor, can be interpretedas a zoom-in onto the latent process (as the model formulation could neglect differentaspects of the data process). This may to some extent capture additional hidden stateswhich could describe the characteristics of the latent process (OpRisk) at a higher level(Pohle et al., 2017).

All the features described in Section 4.2, except Trade Number, Update Time andTrader, are added as conditionally independent (given the states) multinomial distributedresponse variables in the model, and no covariates are included, see table 4.3. The TradeNumber feature is kept to keep track of each Unique trader, the Update Time is used asthe time line for each component series and Trader as the key to each component series..

25 June 11, 2018

Page 34: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 4. METHOD

Table 4.3: Model specification

Feature Response Feature distribution (Family)

Delta True MultinomialEntry True MultinomialEqual Trader True MultinomialUpdate Number True MultinomialTime between updates True MultinomialMarket Value True MultinomialNumber of Changes True MultinomialCounterparty True MultinomialIntern Dept True Multinomial

4.4 Model Selection

In the process of model selection of the specified and fitted models from Section 4.3, theR-statistics software is used to compute AIC, BIC and ICL for each model, respectively.

In practice, model selection is a notoriously difficult task for HMMs. When the max-imum likelihood approach is taken, it is relatively straightforward to use informationcriteria like BIC or AIC (see Section 2.7.1 and 2.7.2) or variations thereof, to selectbetween models with different number of states (Pohle et al., 2017). Furthermore Pohleet al. (2017) states that when observing more than one individual (corresponding totraders in this thesis) it’s natural to assume the individuals differ in their characteristics.

In the context of the financial time series data used in this thesis, one could imagine thata trader that performs a trade which could be associated with operational risk may beshown in the patterns in the way the trader creates and updates the specific trade, e.g.,the trader creates a trade, waits for the trade to be registered by the system and thenupdates the trade again at a later point in time to hide a fraudulent trade. Pohle et al.(2017) further states that by not accounting for such individual heterogeneity within themodel formulation, could lead to information criteria favoring models with more states,where the resulting models with a higher amount of states may capture, in our case, thepremeditate behavior of the trader, while another model very well may be associatedwith some other malicious behavior of the trader with relation to operational risk.

In this thesis, we’re dealing with the same kind of problems due to way the HMMs arespecified, described in Section 4.3, and the complexity of the financial time series dataitself, leading to AIC and BIC favoring models with a large number of states. Further-more Pohle et al. (2017) states that in general, the more states that are included, themore difficult it becomes to assign the associated meaning to the states, which in ourcase is the operational risk meaning of the states. Because of all of the outlined prob-lems, together with the aim of the thesis, the following pragmatic step-by-step processbased on the suggestions by Pohle et al. are used for selecting the number of states:

26 June 11, 2018

Page 35: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 4. METHOD

• Step 1. Decide a priori on the candidate models, in particular the minimum and themaximum number of states that seem plausible, and fit the corresponding rangeof models.

• Step 2. Consider model selection criteria for guidance, i.e. AIC, BIC, ICL andlog-likelihood (LogLik) as to how much of an improvement, if any, is obtainedfor each increment in the number of states and select a the range of models thatseems appropriate.

• Step 3. Closely inspect each of the chosen models from the previous step, byconsidering their Viterbi-decoded state sequences.

• Step 4. Make a pragmatic choice of the number of states taking into account find-ings from Steps 2-3, but also the study aim, expert knowledge and computationalconsiderations.

• Step 5. In cases where there is no strong reason to prefer one particular modelover another (or several other) candidate model(s), results for each of the modelsshould be reported and discussed.

The model selection is done according to the step-by-step approach outlined above;Step 1, is described in Section 4.3; Step 2, AIC, BIC, ICL and LogLik are computedfor each of the fitted HMMs and plotted together, see Figure 5.1, from which a numberof six HMMs are selected for further investigation; Step 3, the selected HMMs Viterbi-decoded state sequence are plotted with the labeled observation marked, see Figure5.6-5.7; Step 4, as a last step, a pragmatic choice for the number of states, taking intoaccount the findings from the previous steps together with the study aim and expertknowledge is made.

4.5 Model Evaluation

The most intuitively way to investigate what type of behavior (or characteristics) thefeatures that the states of the selected HMMs consist of is to look closer at how each ofthe features are distributed over the states. The selected HMMs feature distribution overthe states are therefore plotted in R-statistical software by utilizing the ggplot2-package,see Figures 5.8 and 5.9.

27 June 11, 2018

Page 36: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

Chapter 5

Results

In this Chapter, the results from the thesis are presented. The data which have beenused in order the get the results are a preprocessed subset of the original data set, seeSection 4.2. Furthermore, all of the HMMs are specified in the same fashion except forthe number of hidden states, as described in Section 4.3.

5.1 Model selection criterion

Model selection criterions

3132

3334

3536

−17

.5−

17.0

−16

.5−

16.0

−15

.5−

15.0

LogL

ik (

K)

AIC

,BIC

& IC

L (K

)

2 3 4 5 6 7 8 9 10States

BIC

AIC

ICL

LogLik

nPar 114 174 236 300 366 434 504 576 650

Figure 5.1: Model selection criterions for the fitted HMMs where the number of param-eters which each of the HMMs were required to estimate are shown on nPar-axis.

28

Page 37: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 5. RESULTS

The results in Figure 5.1 indicate that the information criterion (AIC, BIC, ICL andLogLik) reaches its minimum (maximum for LogLik) value at 7 states for BIC, 8 forICL, 9 for AIC and the LogLik value converges almost completely at 9 states.

5.2 Viterbi-decoded state sequence

●●●●●●●●●

●●●●●●●●●

●●●

●●

●●●●

●●●●●

●●●

●●●●●●●●●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●●

●●●

●●●●●●●●

●●●●●●●

●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●

●●●●

●●

●●●●

●●

●●●●●●●●

●●●●

●●●●●●●●

●●●

●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●

●●

●●

●●

●●●●●

●●

●●●

●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●●●

●●

●●

●●●●

●●●●

●●●

●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●

●●

●●●●

●●

●●●●

●●●

●●●●

●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●

●●●●

●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●

●●●●●

●●●●●●

●●

●●

●●●●

●●

●●●●●●●●●

●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●

●●●●●●●●●●●●●

●●

●●

●●

●●●●●●●●

●●●

●●●●●●●●

●●

●●●●●●●●●●

●●

●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●

●●

●●

●●●●●

●●●●

●●●●●●●

●●●

●●●

●●

●●●

●●●

●●●●●●●●●●●●●●●

●●

●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●

●●●●●●●●●●●●●●●●●●●●●

●●●

●●

●●

●●●●●●

●●●

●●●

●●

●●●●●●●●●●●●

●●

●●●

●●●

●●●●

●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●

●●●●●●●●●●●

●●●●●●●●

●●

●●●●●●

●●●●●●●

●●

●●●●

●●

●●●

●●●●

●●

●●●

●●

●●●●●●●

●●●●

●●

●●

●●●

●●

●●

●●●●●●●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●●●●●●

●●

●●●●●●

●●

●●●●●

●●●●

●●

●●●●●●●●●●●●●●●

●●

●●●

●●

●●●●●

●●●

●●

●●

●●●

●●●●

●●

●●●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●

●●●

●●

●●●

●●●●

●●

●●●●

●●●

●●●●●●

●●●●

●●

●●●●●●●●●●●●●

●●●●●●●

●●●●●●●

●●●●●●●

●●●●

●●●●●●

●●●●●●●●●●●

●●●●●●●

●●●●●●●

●●●

●●

●●●●●●

●●●●●●

●●●

●●●●●●●●

Viterbi Sequence

Time

Sta

te

2014−07−03 2015−04−09 2015−10−08 2016−05−06 2017−02−07

12

34

● State sequenceLabeled

Figure 5.2: Viterbi-decoded state sequence for the 4-state HMM.

Table 5.1: Table of the 4-state HMM. Where the number of unlabeled and labeled ob-servations for each state are shown.

State Unlabeled observations Labeled observations

q1 455 40q2 0 0q3 1268 23q4 27 0

Figure 5.2 and Table 5.1 indicate that the Viterbi-decoded state sequence for the 4-stateHMM is dominated by state q1 and q3 which consist of a total of 455 and 1268 unlabeledobservations as well as 40 and 23 labeled observations, respectively. The q2-state is anempty state in this model of which none of the labeled or unlabeled observations belongto and the q4-state is a small state consisting of only 27 unlabeled observations.

29 June 11, 2018

Page 38: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 5. RESULTS

●●●●●

●●

●●

●●●●●●●●●

●●●

●●

●●●●

●●●●

●●●

●●●●●●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●●●●●●●

●●●●●●●

●●

●●

●●

●●

●●●●●●●●●

●●●●

●●●

●●●

●●●●

●●●

●●●●

●●

●●●●

●●

●●●●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●●●●●

●●●●●●●●●

●●

●●

●●

●●●●●

●●

●●●

●●●●

●●●●●●●●●●

●●●●●●●●●●●

●●●

●●●

●●

●●●

●●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●●

●●●●

●●●

●●●●●●●●

●●●

●●●●●●●

●●●●●●●

●●●

●●

●●●●●

●●

●●●●●

●●●●

●●●●

●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●

●●●●

●●●●●●●

●●●

●●●●●●●●●●●●●

●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●●

●●●

●●●●

●●

●●●●●●●●●●●●●●●●

●●●●●●●●●

●●

●●

●●

●●●

●●●●●

●●

●●●●

●●

●●●●

●●

●●●

●●

●●

●●●●●

●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●

●●●●

●●

●●●●

●●

●●●●●●●●

●●

●●●●●●●

●●●●

●●●●●●●●●

●●●●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●●●●●●●

●●●●●

●●

●●●

●●●

●●●

●●●

●●●●●●●●

●●

●●

●●●●●●●

●●●●

●●

●●

●●

●●

●●●●●

●●●

●●●●●●●●

●●

●●●●●●●●

●●

●●●

●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●●●●

●●

●●●

●●

●●●

●●●

●●●●●●

●●●

●●●●●

●●

●●●●

●●●

●●●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●●●●●●●

●●

●●●●●

●●●

●●

●●●●●●

●●●

●●●

●●

●●●●●●●●●●●●

●●

●●

●●●

●●●

●●●

●●●

●●●

●●

●●

●●●●

●●

●●●●●

●●

●●●

●●●

●●●●●

●●●●●●

●●●●●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●●●●●

●●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●●●

●●

●●●●●●

●●

●●●●

●●

●●

●●●●●

●●●●

●●●●●●●●●●●●●●●

●●

●●

●●

●●●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●●●

●●

●●●

●●●

●●●●●●

●●●●

●●

●●●●●●●●●●●●●

●●●●●●●

●●●●●●●

●●●●●●●

●●●●

●●●●●

●●●●●●●●●●●

●●●●●●●

●●

●●●●

●●

●●

●●

●●

●●●●●●

●●●

●●●

●●●●

Viterbi Sequence

Time

Sta

te

2014−07−03 2015−04−09 2015−10−08 2016−05−06 2017−02−07

12

34

5● State sequence

Labeled

Figure 5.3: Viterbi-decoded state sequence for the 5-state HMM.

Table 5.2: Table of the 5-state HMM. Where the number of unlabeled and labeled ob-servations for each state are shown.

State Unlabeled observations Labeled observations

q1 0 0q2 610 5q3 455 40q4 27 0q5 658 18

Figure 5.3 and Table 5.2 indicate that the Viterbi-decoded state sequence for the 5-stateHMM is dominated by the three large states q2, q3 and q5 which consist of a total of 610,455 and 658 unlabeled observations together with 5, 40 and 18 labeled observations, re-spectively. The q1-state and the q4-state consists of 0 and 27 observations, respectively,with no labeled observations. It is clear that the q3 -state is the most interesting since itconsists of 40 out of 63 labeled observations. However the proportion of total observa-tions in this state is quite high with respect to the total number of observations over allthe states.

30 June 11, 2018

Page 39: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 5. RESULTS

●●●●

●●

●●

●●●●●●

●●●

●●

●●●●

●●●●

●●●

●●●●●●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●●●●●●●

●●●

●●

●●

●●

●●

●●

●●●●●●●●●

●●●●

●●●

●●●

●●●●

●●●●●●●●

●●

●●●●

●●

●●●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●●

●●●●●●●●●

●●

●●

●●

●●●●●

●●

●●●

●●●●

●●●●

●●●●●●

●●●●●●●●●●●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●●●●●●●●●●

●●

●●

●●

●●●●●●

●●●●

●●●●●●●

●●●●

●●●

●●●●●●●

●●●●●

●●●

●●

●●●●●

●●●●●●

●●

●●

●●●●

●●●

●●●●

●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●

●●●●●●●

●●●●●

●●

●●●●●●

●●●

●●

●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●●●●●●●●●●●●●●●

●●●●●●●●●

●●

●●

●●

●●●

●●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●

●●

●●●●

●●

●●●●●●●

●●

●●●●●●●

●●●●

●●●●●●●●●

●●●●●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●●●●●

●●●●●

●●

●●●

●●●

●●●

●●●

●●●●●●●●

●●

●●

●●●●●●●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●●●●●

●●

●●●●●●●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●●●●●●●●

●●●●●

●●●

●●●●●●●

●●●

●●●

●●●●●●●●●●●●

●●

●●●

●●●

●●●

●●

●●●

●●●●

●●

●●●●●

●●

●●●

●●

●●●●●●

●●●●●●

●●●●●●●

●●

●●●●

●●

●●●

●●

●●●

●●●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●●●●●

●●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●●●

●●

●●●●●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●●●●●●●●●●●●●●

●●

●●

●●

●●●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●●

●●●

●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●

●●●●●●

●●●●●●●

●●

●●

●●●●●

●●●●●●●

●●

●●●●●●●

●●

●●●●

●●

●●

●●●●

●●●

●●●

●●●

Viterbi Sequence

Time

Sta

te

2014−07−03 2015−04−09 2015−10−08 2016−05−06 2017−02−07

12

34

56

● State sequenceLabeled

Figure 5.4: Viterbi-decoded state sequence for the 6-state HMM

Table 5.3: Table of the 6-state HMM. Where the number of unlabeled and labeled ob-servations for each state are shown.

State Unlabeled observations Labeled observations

q1 207 30q2 365 5q3 772 18q4 131 0q5 21 0q6 254 10

Figure 5.4 and Table 5.3 indicate that the Viterbi-decoded state sequence for the 6-stateHMM is dominated by state q3 which consist of 772 unlabeled and 18 labeled obser-vations. Furthermore, we can see that state q1, q2 and q6 consists of 207, 365 and 254unlabeled and 30, 5 and 10 labeled observations, respectively. The Table indicate, whenlooking at the proportion of unlabeled to unlabeled observations, that the most favorableone is state q1. However, the labeled observations are spread out between states q1, q2,q3 and q6, which could indicate that the HMM captures different characteristics of thelatent process of OpRisk in different states, which is not something that we desire.

31 June 11, 2018

Page 40: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 5. RESULTS

●●●●●●●

●●●●●●●●●

●●●

●●

●●●●

●●●●

●●●

●●●●●●●●●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●●

●●●

●●●●●●●●

●●●●●●●

●●

●●

●●

●●

●●●●●●●●●

●●●●

●●●●●

●●●●

●●●●●●●●

●●

●●●●

●●

●●●●●

●●

●●●

●●

●●

●●●●

●●

●●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●●

●●●●●●●●●

●●

●●

●●

●●●●●

●●

●●●

●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●

●●

●●●

●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●●●●●●

●●●●

●●●

●●●●●●●

●●●●●●●

●●●

●●

●●●●●

●●●●●●

●●

●●

●●●●●●

●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●

●●

●●●●

●●●●●●●

●●●

●●●●●●●●

●●●●●

●●●

●●

●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●

●●●●

●●●

●●●●

●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●●●●●●

●●●●

●●

●●●●

●●

●●●●●●●●

●●

●●●●●●●

●●●●

●●●●●●●●●

●●●●●●●

●●●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●●●●●●●

●●●●●

●●

●●●●●

●●●

●●●

●●●●●●●●

●●

●●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●●●

●●

●●

●●●●●●●●●●

●●

●●●●●●●

●●●

●●

●●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●●●

●●●●●●●

●●●

●●

●●●

●●●

●●●●

●●●

●●●●●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●●●●●●●●

●●●●●●●●●

●●

●●

●●

●●●●●●●

●●●

●●●

●●

●●●●●●●●●●●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●●●

●●●

●●●●●

●●

●●●

●●●●●●●●●●●●●●

●●●●●●●

●●

●●●●

●●

●●●

●●●●●●

●●

●●

●●●

●●

●●●

●●●●

●●●

●●●

●●

●●

●●

●●●●●●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●●●●

●●●●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●●●●●●●●●●●●●●

●●

●●

●●

●●●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●●

●●●

●●●●

●●

●●●●

●●●

●●●

●●●

●●●●

●●

●●●●●●●●●●●●●

●●●●●●●

●●●●●●

●●●●●●●

●●●●

●●●●●●

●●●●●●●●●

●●

●●●●●●●

●●

●●●●

●●●

●●

●●

●●

●●●●●●

●●●

●●●●●●●

Viterbi Sequence

Time

Sta

te

2014−07−03 2015−04−09 2015−10−08 2016−05−06 2017−02−07

12

34

56

7● State sequence

Labeled

Figure 5.5: Viterbi-decoded state sequence for the 7-state HMM.

Table 5.4: Table of the 7-state HMM. Where the number of unlabeled and labeled ob-servations for each state are shown.

State Unlabeled observations Labeled observations

q1 0 0q2 248 0q3 290 40q4 783 23q5 178 0q6 237 0q7 14 0

Figure 5.5 and Table 5.4 indicate that the Viterbi-decoded state sequence of the 7-stateHMM is dominated by state q4 which consists of 783 unlabeled observations and 23labeled observations. Furthermore, there are 4 other fairly large states q2, q3, q5 andq6 with 248, 290, 178 and 237 unlabeled observations, respectively. However, stateq3 also contains 40 out of the 63 labeled observations. The last state q7 contains only14 unlabeled observations. The most prominent state for capturing the OpRisk latentprocess is q3 as it captures a fair proportion of the labeled observations while not beingoverwhelmingly large.

32 June 11, 2018

Page 41: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 5. RESULTS

●●●

●●

●●●●●●●●●

●●●

●●

●●●●

●●●●

●●●●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●●●●●●●

●●●●●●●

●●

●●

●●

●●

●●●●●●●●●

●●●●

●●●

●●●

●●●●

●●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●●●●

●●

●●●

●●●●

●●●

●●●●●●●

●●●●●●●●●●●●

●●

●●●

●●

●●

●●●

●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●

●●●●●

●●

●●●●

●●●●

●●●

●●●●●●●

●●●

●●●

●●

●●●●

●●

●●●●●

●●

●●

●●●●

●●●●●●●

●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●

●●

●●●●

●●●●●●●

●●●

●●●●●●●●●●●●●

●●●

●●

●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●●

●●●

●●●●

●●

●●●●●●●●●●●●●●●●

●●●

●●●●●●●●

●●

●●

●●●

●●●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●

●●●●

●●

●●●●

●●

●●●●●●●●

●●

●●●●●●●

●●●

●●●●●●●●●●

●●●●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●●●●●●●●

●●●●

●●

●●●

●●

●●●

●●●

●●●●●●●●

●●

●●

●●●●●●●

●●●●

●●

●●

●●

●●

●●●●●

●●●

●●●●

●●

●●

●●●●●●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●

●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●●●●

●●

●●●

●●

●●●

●●●

●●

●●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●●●●●●●

●●

●●●●●

●●

●●●●●●●

●●●

●●●

●●●●●●●●●●●●

●●

●●●

●●●

●●●

●●

●●●

●●●●

●●

●●●●●

●●

●●●

●●

●●●●●●

●●●●●●

●●●●●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●●●●●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●●●●

●●

●●●●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●●●●●●●●●●●●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●●

●●●

●●●

●●●●●●●●●●●●●

●●●●●●●

●●●●●●

●●●●●●●

●●●●

●●●●●

●●●●●●●

●●

●●●●●●●

●●

●●●●

●●

●●

●●

●●●●●●

●●●

●●●

●●●

Viterbi Sequence

Time

Sta

te

2014−07−03 2015−04−09 2015−10−08 2016−05−06 2017−02−07

12

34

56

78

● State sequenceLabeled

Figure 5.6: Viterbi-decoded state sequence for the 8-state HMM.

Table 5.5: Table of the 8-state HMM. Where the number of unlabeled and labeled ob-servations for each state are shown.

State Labeled observations Labeled observations

q1 225 1q2 102 0q3 633 17q4 0 0q5 189 0q6 0 0q7 293 40q8 308 5

Figure 5.6 and Table 5.5 indicate that the Viterbi-decoded state sequence of the 8-stateHMM is very similar to that one of the 7-state HMM. It can be seen that the 8-stateHMM is dominated by the q3 state which consists of 633 unlabeled and 17 labeled ob-servations. State q4 and q6 are both empty states. Furthermore, it is clear the mostpromising state is that one q7 which consists of 293 unlabeled and 40 labeled observa-tions, which indicate that this state best captures the behavior of OpRisk for this HMM.

33 June 11, 2018

Page 42: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 5. RESULTS

●●●●

●●●

●●

●●●●●●●●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●

●●●

●●●

●●●●●●●●

●●●●●●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●●

●●

●●●●

●●●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●●●●

●●●●●●●

●●●●●●●

●●

●●●●●●

●●

●●

●●

●●●●●

●●

●●●

●●●●

●●●●●●●●●●

●●●●

●●●●●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●

●●

●●●●

●●

●●●●●

●●●●

●●●

●●●●●●●

●●●

●●●

●●

●●●●●

●●

●●

●●●●●●

●●

●●●

●●

●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●

●●

●●

●●

●●●●●●●●●●

●●●

●●●●●●●●●●●●●

●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●●●

●●●●

●●

●●●●●●●●●●●●●

●●●

●●●●●●●●

●●

●●

●●●

●●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●

●●●●

●●

●●●●

●●

●●●●●●●●

●●

●●●●●●●

●●●

●●●●●●●●●

●●●●●●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●●●

●●

●●●●●●●●●

●●●●

●●

●●●

●●

●●●●●●●●●

●●●●●●●●●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●●

●●●

●●●●

●●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●●●

●●●

●●

●●●

●●●

●●●●●●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●●●

●●●●●

●●●●●

●●

●●

●●●●●●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●●●●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●●●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●●●●●

●●

●●●●●

●●●●

●●●●●●●●●●●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●●

●●

●●●

●●●

●●●●●●

●●●

●●

●●●●●●●●●●

●●

●●●●●●●●●

●●●●●●●●●●●●●●

●●●●

●●●●●

●●●●●●●●●●●●●●●●●●

●●

●●●●

●●

●●●●●

●●●●●●

●●●

●●●

●●●

Viterbi Sequence

Time

Sta

te

2014−07−03 2015−04−09 2015−10−08 2016−05−06 2017−02−07

12

34

56

78

9● State sequence

Labeled

Figure 5.7: Viterbi-decoded state sequence for the 9-state HMM.

Table 5.6: Table of the 9-state HMM. Where the number of unlabeled and labeled ob-servations for each state are shown.

State Unlabeled observations Labeled observations

q1 14 0q2 208 8q3 0 0q4 370 10q5 154 0q6 468 40q7 310 5q8 0 0q9 226 0

Figure 5.7 and Table 5.6 indicate that the Viterbi-decoded state sequence of the 9-stateHMM is dominated by the q6-state which consists of 468 unlabeled and 40 labeledobservations. The labeled observations are spread out between the states q2, q4, q6 andq7 consisting of 8, 10, 40 and 5 labeled observations, respectively.

34 June 11, 2018

Page 43: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 5. RESULTS

5.3 Feature distribution over the states

0.00

0.25

0.50

0.75

1.00

q2 q3 q4 q5 q6 q7

State

Per

cent

age

Delta

Bin 1

Bin 2

Bin 3

Bin 4

Bin 5

Bin 6

Bin 7

Bin 8

Bin 9

Bin 10

Delta by state

0.00

0.25

0.50

0.75

1.00

q2 q3 q4 q5 q6 q7

State

Per

cent

age Entry

Delete

Update

Entry by state

0.00

0.25

0.50

0.75

1.00

q2 q3 q4 q5 q6 q7

State

Per

cent

age Equal trader

0

1

Equal trader by state

0.00

0.25

0.50

0.75

1.00

q2 q3 q4 q5 q6 q7

State

Per

cent

age

Update number

1

2

3

4

5+

Update number by state

Figure 5.8: Plot of the 7-state HMM and it’s feature distribution over the states forDelta, Entry, Equal Trader and Update number.

0.00

0.25

0.50

0.75

1.00

q2 q3 q4 q5 q6 q7

State

Per

cent

age

Time between updates

Bin 1

Bin 2

Bin 3

Bin 4

Bin 5

Bin 6

Bin 7

Bin 8

Bin 9

Bin 10

Time between updates by state

0.00

0.25

0.50

0.75

1.00

q2 q3 q4 q5 q6 q7

State

Per

cent

age

Market Value

Bin 1

Bin 2

Bin 3

Bin 4

Bin 5

Bin 6

Bin 7

Bin 8

Bin 9

Bin 10

Market value by state

0.00

0.25

0.50

0.75

1.00

q2 q3 q4 q5 q6 q7

State

Per

cent

age

Number of changes

0

1

2

3

4

5

6

7

8

9

10+

Number of changes by state

0.00

0.25

0.50

0.75

1.00

q2 q3 q4 q5 q6 q7

State

Per

cent

age Counter party type

Counter party

Intern Dept

Counter party type by state

Figure 5.9: Plot of the 7-state HMM and it’s feature distribution over the states for Timebetween updates, Market value, Number of changes and Counter party type.

35 June 11, 2018

Page 44: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 5. RESULTS

Figures 5.8 and 5.9 visualize how the emitted features are distributed over the differentstates for the 7-state HMM.

The two most interesting states q3 and q4 favor the lower bins for the Delta-featurewhich means they are usually updated only a few days after the trade. For the Entry-feature state q4 is heavily in favor of emitting Update signs compared to Delete. For q3the emitted Update or Delete is about equally likely. For the Equal trader-feature bothq3 and q4 as well as q2 have a high probability of emitting 1, corresponding to a truestatement. Looking at the Update number-feature we can see that q3 and q4 are the onlystates with a probability of emitting all of the outcomes, indicating that they manage tocapture longer sequential audit change behavior.

In the Time between updates we can see a similar behavior for q3 and q4 compared tothe Update number-feature in that those are the only state with a probability of emittingall of the categories. The large proportion of Bin 1 emissions over all states for both theUpdate number and Time between updates-feature is due to the first bin corresponds tofirst time observations. For the Market Value-feature all states seem to have a similarcomposition of emitting all types of categories except q7. This was at first a bit unex-pected since the belief prior to modeling was that a high market value would be a bigdriving factor of the underlying OpRisk process. However, it may make sense since verylarge losses may be more difficult to cover up than many medium or small losses. Inthe Number of changes-feature we can see that q3 and q4 once again are the only statesthat have a probability of emitting all bins of number of fields changed per observationswhile the other states are mainly only emitting one field change. The last feature plot,the Counter party type-feature, is a combination of the two counter party type featuresand we can see that each state only emits one type of observation. In particular, it’sinteresting that state q3 only emits Internal department while q4 only emits Counterparty.

36 June 11, 2018

Page 45: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

Chapter 6

Discussion and Conclusion

In this Chapter, the results are discussed and the final conclusions are made. Further-more, some suggestions for further studies are presented.

6.1 Model Evaluation

In this Section, an evaluation of the models is made based on the results in Section 5.This evaluation aims to compare the results attained from the models and the selectedHMMs feature distribution. The states that capture the behavior of the latent process ofOpRisk are then discussed in depth, in order to derive what type of emitted observationsthat are connected to OpRisk.

6.1.1 Model selection criterion

The results of the model selection critierion is presented in Section 5.1. Typically, onewould want to select the HMM where the model selection criterion stops decreasing, andstart to increase (due to the information gained being less than the penalty of increasingthe number of parameters). We choose to take a closer look at the HMMs in the intervalof 4− 9 states. The upper limit of 9-states is motivated by the minimum AIC valuewhile a lower bound of 4 is motivated by Pohle et al. (2017) who states that the absoluteminimum value in the information criterion (especially BIC) should be regarded as anupper limit and that the true number of states is usually lower.

37

Page 46: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 6. DISCUSSION AND CONCLUSION

6.1.2 Viterbi-decoded state sequence

The Viterbi-decoded state sequence together with the updates which are labeled as riskyare presented in Section 5.2. Based on the assumption that OpRisk from Audit changesis a relatively infrequent event for the traders, what one would expect to see in theseFigures is that the majority of the labeled updates belong to the same state, while theproportion of the observations in that specific state is relatively small with respect to thetotal amount of updates in the state sequence.

Out of the 9 models, presented in previous Section, we believe that the 7-state modelbest represents the latent risk process. This is based on the fact that state q3 in this modelhas the smallest number of total observations, while still consisting of a fair proportionof the labeled observations, see Figure 5.5 and Table 5.4. Furthermore, the fact that theBIC value is the smallest for this particular model also points towards that this HMMis the best out of the presented models. We therefore choose to move forward andinvestigate this model further, by looking at the feature distributions over the states, toanalyze if there are any clear patterns, or interesting differences between the states inthe model.

6.2 Conclusion

This thesis presents a solid theoretical framework while also addressing the problemsof HMMs that need to be dealt with in order to model HMMs. The 7-state HMM seemsto perform quite well when it comes to separating observations into different states witha majority of the labeled observations in the same state. This indicate that the HMMmanages to capture the underlying (latent) process of OpRisk. Furthermore, it can beconcluded from the results and discussion regarding the feature distribution over thestates for the 7-state HMM, that the driving process of OpRisk was the number fieldchanged for each observation (Number of changes), the time between the next updateand the last update (Time between updates), if the same person was the trader, thecreator and the updater of the trade. The latent process also seems to be perfectly splitinto different states depending on if the trader was made with an internal department orcounter party.

The final conclusion is therefore that HMMs are a suitable model and that they can beused to give an inference about OpRisk. To be more specific, one could, e.g., follow atrade over time and flag it, if it would reach a state of OpRisk, and then take requiredaction. However, there are some problems when it comes to the interpretation of theHMMs as the number of labeled observations are very small. Essentially meaning thatit is impossible to express exactly how well the model performs, which is something thatshould be addressed in further studies, with a more rigorous feature selection methodfor response features.

38 June 11, 2018

Page 47: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 6. DISCUSSION AND CONCLUSION

6.3 Further Studies

When working with this thesis a lot of ideas for further studies have been discovered,which potentially could significantly improve the performance of the HMMs. The ideasfor further studies are therefore listed below.

• Conduct a field study or survey looking in to how the traders would conduct fraud-ulent trading or cover up mistakes in the audit change system. The study is sug-gested because it could lead to the possibility of a better feature selection andunderstanding of how the Audit changes and Audit prices are related to OpRisk.

• Further collection of labeled data. This is required in order to create a more robustvalidation method for the HMMs. One way to do this is to start a workshopor project group consisting of people from HCRM and traders and create toyexamples of trades and audit changes that mimic a risky behavior. This set canthen be used for validation of models.

• More data including the whole range of trades and changes to those trades.

• It’s possible to include covariates on both the class membership as well as thetransition probability matrix in the model specification. This is suggested to beinvestigated further as it could increase the performance of the HMM.

• As the Trade changes seems to be a driving factor of the risk process, one couldlook to include more specific information in the model about what type of infor-mation that was changed and how. With a more complete data set of labeled dataor better knowledge of the data or process (the previously mentioned field studymay come in handy) one could measure the importance of such features in termsof inducing operational risk. Some of the most important features can then beincluded in the model and if they are deviating from the norm.

39 June 11, 2018

Page 48: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

Bibliography

Basel Committee on Banking Supervision. (2017). Basel III: Finalising post-crisisreforms.

Bishop, C. (2006). Pattern recognition and machine learning (Information science andstatistics). New York, NY: Springer.

Colombi, R., Giordano, S., (2015). Multiple hidden Markov models for categorical timeseries. Journal of Multivariate Analysis, 140. pp. 19-30.

Chartered Institute of Management Accountants (CIMA), (2008), Operational Risk,Topic Gateway series No. 51, Prepared by Helen Matthews and Technical InformationService.

Dionne, G., Saissi Hassani, S., (2017). Hidden Markov Regimes in Operational LossData: Application to the Recent Financial Crisis. Interuniversity Resarch Centre onEnterprise Networks, Logistics and Transportation. CIRRELT-2015-29.

Evans, M., Hastings, N., Peacock, B., (2000). Statistical Distributions (3rd ed). NewYork: Wiley. pp. 134–136.

Gupta, A., Dhingra, B., (2012). Stock market prediction using hidden Markov models.In: Engineering and Systems (SCES), 2012 Students Conference on. IEEE. pp. 1-4.

Li, J., Pedrycz, W., Jamal, I., (2017). Multivariate time series anomaly detection: Aframework of Hidden Markov Models. Applied Soft Computing, 60. pp. 229-240.

Xu, Y., Pinedo, M., Xue, M., (2017). Operational Risk in Financial Services: A Reviewand New Research Opportunities. Production and Operations Management, Vol. 26.pp. 426-445.

Pohle, J. Langrock, R. M. van Beest, F. Martin Schmidt, N., (2017). Selecting theNumber of States in Hidden Markov Models: Pragmatic Solutions Illustrated usingAnimal Movement. Journal of Agricultural, Biological, and Environmental Statistics,Vol 22. pp. 270-293.

Rabiner, L. R., (1989). A tutorial on hidden Markov models and selected applicationsin speech recognition. Proceedings of the IEEE. pp. 257-286.

40

Page 49: v Can hidden Markov models be used for inference about ...umu.diva-portal.org/smash/get/diva2:1216584/FULLTEXT01.pdf · Nick Leeson, a rogue trader at Barings Bank in Singapore, lost

CHAPTER 6. DISCUSSION AND CONCLUSION

Rabiner, L. R., Juang, B., (1986). An introduction to hidden Markov models. IEE ASSPMagazine, 3(1). pp. 4-16.

Visser, I. (2011). Seven things to remember about hidden Markov models: A tutorialon Markovian models for time series. Journal of Mathematical Psychology, 55(6). pp.403-415.

Visser, I., Speekenbrink, M., (2016). Dependent Mixture Models - Hidden MarkovModels of GLMs and Other Distributions in S4. pp. 2-43.

Zucchini, W., MacDonald, I., Langrock, R., (2016). Hidden Markov Models for TimesSeries - An introduction Using R. Second Edition.

41 June 11, 2018