57
False Alarm Reduction in Maritime Anomaly Detection with Contextual Verification by Aungon Nag Radon B.Sc., Bangladesh University of Engineering & Technology, 2011 Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in the School of Computing Science Faculty of Applied Sciences c Aungon Nag Radon 2015 SIMON FRASER UNIVERSITY Summer 2015 All rights reserved. However, in accordance with the Copyright Act of Canada, this work may be reproduced without authorization under the conditions for “Fair Dealing.” Therefore, limited reproduction of this work for the purposes of private study, research, criticism, review and news reporting is likely to be in accordance with the law, particularly if cited appropriately.

FalseAlarmReductioninMaritime AnomalyDetectionwithContextual …summit.sfu.ca/system/files/iritems1/15620/etd9173_ARadon.pdf · 2021. 1. 21. · FalseAlarmReductioninMaritime AnomalyDetectionwithContextual

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

  • False Alarm Reduction in MaritimeAnomaly Detection with Contextual

    Verificationby

    Aungon Nag Radon

    B.Sc., Bangladesh University of Engineering & Technology, 2011

    Dissertation Submitted in Partial Fulfillmentof the Requirements for the Degree of

    Master of Science

    in theSchool of Computing ScienceFaculty of Applied Sciences

    c© Aungon Nag Radon 2015SIMON FRASER UNIVERSITY

    Summer 2015

    All rights reserved.However, in accordance with the Copyright Act of Canada, this work may bereproduced without authorization under the conditions for “Fair Dealing.”

    Therefore, limited reproduction of this work for the purposes of private study,research, criticism, review and news reporting is likely to be in accordance with

    the law, particularly if cited appropriately.

  • Approval

    Name: Aungon Nag Radon

    Degree: Master of Science (Computing Science)

    Title: False Alarm Reduction in Maritime Anomaly Detectionwith Contextual Verification

    Examining Committee: Dr. William Sumner (chair)Assistant Professor, Computing ScienceSimon Fraser University

    Dr. Ke WangSenior SupervisorProfessor, Computing ScienceSimon Fraser University

    Dr. Uwe GlässerSupervisorProfessor, Computing ScienceSimon Fraser University

    Dr. Hans WehnInternal ExaminerProfessor (Adjunct), Computing Sci-enceSimon Fraser University

    Date Defended: 18 August, 2015

    ii

  • Abstract

    Automated vessel anomaly detection is immensely important for preventing and reducingillegal activities (e.g., drug dealing) and for effective emergency response and rescue ina country’s territorial waters. A major limitation of previously proposed vessel anomalydetection techniques is the high rate of false alarms as these methods mainly consider ves-sel kinematic information which is generally obtained from AIS data. In many cases, ananomalous vessel in terms of kinematic data can be completely normal and legitimate if the“context” at the location and time (e.g., weather and sea conditions) of the vessel is factoredin. We propose a novel anomalous vessel detection framework that utilizes such contextualinformation to reduce false alarms through “contextual verification”. We evaluate our pro-posed framework for vessel anomaly detection using real-life AIS data sets obtained fromU.S. Coast Guard.

    Keywords: Anomaly Detection; Contextual Verification; False Alarm Reduction; DataWarehouse; Maritime Domain

    iii

  • Dedication

    This work is dedicated to my parents.

    iv

  • Acknowledgements

    At the very outset, I would like to express my heartfelt gratitude to my senior supervisorProfessor Dr. Ke Wang for his support and guidance in carrying out this research work. Ithas been a great learning curve for me to work under the supervision of leading data miningresearcher like Dr. Ke Wang.

    I would also like to thank my supervisor Professor Dr. Uwe Glässer for providing meevery opportunity to collaborate with the research analysts of MDA Systems Ltd. I mustthank Hamed Yaghoubi Shahir for sharing his experience in industrial projects that helpedme tremendously.

    It has been a pleasure for me to work with the following highly dignified persons fortheir every advice in different phases of this research work.

    • Dr. Hans Wehn (MDA Systems Ltd.)

    • Dr. Andrew Westwell-Roper (MDA Systems Ltd.)

    • Shawn McCann (Amazon.com, Inc.)

    Last but not the least, I would like to thank my lab mates in the Database and DataMining Lab of Computing Science department of Simon Fraser University.

    v

  • Table of Contents

    Approval ii

    Abstract iii

    Dedication iv

    Acknowledgements v

    Table of Contents vi

    List of Tables ix

    List of Figures x

    1 Introduction 11.1 Importance of Contextual Information . . . . . . . . . . . . . . . . . . . . . 11.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2 Preliminaries & Problem Statement 42.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    3 Related Works 83.1 Anomaly Detection Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 8

    3.1.1 Classification Based Methods . . . . . . . . . . . . . . . . . . . . . . 83.1.2 Statistical Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 93.1.3 Nearest Neighbour Based Techniques . . . . . . . . . . . . . . . . . . 93.1.4 Clustering Based Techniques . . . . . . . . . . . . . . . . . . . . . . 9

    3.2 Trajectory Anomaly Detection Approaches . . . . . . . . . . . . . . . . . . 103.2.1 Clustering Based Trajectory Anomaly Detection . . . . . . . . . . . 113.2.2 Non-clustering Based Trajectory Anomaly Detection . . . . . . . . . 11

    3.3 Maritime Anomaly Detection Approaches . . . . . . . . . . . . . . . . . . . 12

    vi

  • 3.3.1 Knowledge Driven Techniques . . . . . . . . . . . . . . . . . . . . . . 123.3.2 Data Driven Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 123.3.3 Hybrid Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    4 Framework Overview 144.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.2 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4.2.1 Phase 1: Normal Pattern Extraction . . . . . . . . . . . . . . . . . . 154.2.2 Phase 2: Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . 16

    5 Maritime Data Warehouse 175.1 Motivation for Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . 175.2 Data Sources for Maritime Anomaly Detection . . . . . . . . . . . . . . . . 18

    5.2.1 Open Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.2.2 Restricted Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . 18

    5.3 Schema Design of Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . 185.3.1 Traditional Schema Design Approaches . . . . . . . . . . . . . . . . 185.3.2 Proposed Schema Design Approach . . . . . . . . . . . . . . . . . . . 19

    5.4 Data Pre-processing for Maritime Data Warehouse . . . . . . . . . . . . . . 21

    6 Normal Pattern Extractor 236.1 Partitioning Vessel Tracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236.2 TSC: Track Segment Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 25

    7 Anomaly Detector 267.1 Potential Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 267.2 Contextual Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277.3 Real-Time Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    8 Empirical Studies 298.1 Data Set Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    8.2.1 False Alarm Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 328.2.2 F Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328.2.3 Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338.2.4 Quantitative Comparison . . . . . . . . . . . . . . . . . . . . . . . . 338.2.5 Anomaly Detection in Testing Set . . . . . . . . . . . . . . . . . . . 34

    8.3 Other Contextual Information . . . . . . . . . . . . . . . . . . . . . . . . . 358.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    8.4.1 Variation in Segment Size . . . . . . . . . . . . . . . . . . . . . . . . 368.4.2 Irregular Sampling Rate of AIS Data . . . . . . . . . . . . . . . . . . 37

    vii

  • 8.5 Prototype Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    9 Software Documentation 399.1 Required Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399.2 Execution of Anomaly Detection Process . . . . . . . . . . . . . . . . . . . . 399.3 Execution of Normal Pattern Extraction Process . . . . . . . . . . . . . . . 419.4 Execution of Ground Truth Detection Process . . . . . . . . . . . . . . . . . 42

    10 Conclusion 44

    Bibliography 45

    viii

  • List of Tables

    Table 8.1 Tracks Partitioned into Groups (by voyage duration) and Segments(PartitionWindow = 2 hr) for Training Set and Testing Set . . . . . 30

    Table 8.2 Comparison of F Measure in the 6 Groups of Testing Tracks . . . . . 33Table 8.3 Kinematic & Contextual Features of 2 Potential Anomalies . . . . . . 34

    ix

  • List of Figures

    Figure 2.1 AIS Message Structure . . . . . . . . . . . . . . . . . . . . . . . . . 4Figure 2.2 Example Vessel Tracks . . . . . . . . . . . . . . . . . . . . . . . . . 5Figure 2.3 Example Track Segments . . . . . . . . . . . . . . . . . . . . . . . . 6

    Figure 4.1 Overview of MADCV Framework . . . . . . . . . . . . . . . . . . . 14Figure 4.2 High Level Steps of Normal Pattern Extraction . . . . . . . . . . . 15Figure 4.3 High Level Steps of Anomaly Detection . . . . . . . . . . . . . . . 16

    Figure 5.1 Representation of the U.S. Coast Guard AIS Data for UTM Zone 10of January 2009 in Data Warehouse . . . . . . . . . . . . . . . . . . 20

    Figure 5.2 Representation of Tracks and Normal Patterns for UTM Zone 10 inData Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    Figure 5.3 Representation of Weather Information of UTM Zone 10 in DataWarehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    Figure 5.4 Data Pre-processing Work Flow for Storing AIS Data in MaritimeData Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    Figure 6.1 Set of Vessel Tracks T . . . . . . . . . . . . . . . . . . . . . . . . . 24Figure 6.2 4 Segments of Vessel Tracks T . . . . . . . . . . . . . . . . . . . . . 24

    Figure 8.1 Tracks between Origin A to Destination Vancouver of Cargo Ships 30Figure 8.2 No. of False Alarms (Without CV vsWith CV ) in Different Segment

    of the 6 Groups of Testing Tracks for 2 hr PartitionWindow . . . . 32Figure 8.3 A Case Study of Anomaly Detection . . . . . . . . . . . . . . . . . 34Figure 8.4 No. of False Alarms (Without CV vsWith CV ) in Different Segment

    of the 6 Groups of Testing Tracks for 1 hr PartitionWindow . . . . 36Figure 8.5 No. of False Alarms (Without CV vs With CV ) in 6 Groups of

    Testing Tracks for 1 hr and 2 hr PartitionWindow . . . . . . . . . 37

    Figure 9.1 Interface for Anomaly Detection . . . . . . . . . . . . . . . . . . . . 40Figure 9.2 Input Selection for Anomaly Detection . . . . . . . . . . . . . . . . 41Figure 9.3 Input Selection for Normal Pattern Extraction . . . . . . . . . . . . 42Figure 9.4 Input Selection for Ground Truth Detection . . . . . . . . . . . . . 43

    x

  • Chapter 1

    Introduction

    Maritime transportation represents approximately 90% of global trade by volume, placingsafety and security challenges as a high priority for nations across the globe [11]. Accordingto the U.S. Department of Homeland Security, anomaly detection is one of many enablingtechnologies for Maritime Domain Awareness (MDA) that impacts the security, safety, en-vironment and economy of a country [9]. Anomalous vessel detection (i.e. finding abnormalvessel movement) is immensely important for protecting sea lanes, ports, harbours, fisheriesand infrastructure against threats and illegal activities, including contraband smuggling,drug, weapon and human trafficking, piracy, and terrorism. Early anomaly detection is alsocritical for emergency response and rescue at sea.

    Automatic Identification System (AIS) technology1 provides a vast amount of near real-time vessel movement (i.e. kinematic) information. As an example, the Centre for MaritimeResearch and Experimentation (CMRE) is currently receiving an average of 600 Million AISmessages per month from multiple sources, and the rate is increasing [11]. Observed AISmessages over time for a particular vessel renders a trajectory for that vessel. This vastamount of vessel trajectory data calls for an ever-increasing degree of automation to extractmeaningful information from this big data to support operational decision makers. Auto-mated anomaly detection by extracting normal vessel routes from vast amount of historicalvessel trajectory data and sending alerts in real-time for possible movement deviation of aparticular vessel is a promising big data mining research direction.

    1.1 Importance of Contextual Information

    So far the main source of data for automated anomaly detection in the maritime domainhas been AIS data. A major limitation of this approach is the high number of false alarmsas contextual information is ignored in the detection process [17]. As an example, approxi-

    1http://www.imo.org/OurWork/Safety/Navigation/Pages/AIS.aspx

    1

    http://www.imo.org/OurWork/Safety/Navigation/Pages/AIS.aspx

  • mately 20% false alarms were generated from the real world AIS messages (approximately28 Million) in the Western coast of Sweden during January 2008[23]. As a result, for CoastGuard, filtering the large number of false alarms is one of the most important tasks in bigdata application because this is a time consuming process. This situation motivates us toconsider “contextual information” in the automated vessel anomaly detection process inorder to reduce false alarms.

    In many cases, an anomalous vessel in terms of kinematic data can be completely normaland legitimate if the context at the location and time is taken into account. In principle, thecontext could include any factor that potentially impacts the movement of a vessel, includingmarine currents, waves, weather conditions, oil price, change or cancellation of contracts,change of destinations and routes, and seaport maintenance, to name a few. For example,high waves or poor weather conditions could be the cause of an abnormally slow vessel; ahike in oil prices could be the reason for taking a shorter, but more dangerous route; anda severe disaster, such as a hurricane could cause a vessel to deviate significantly from itsnormal route [12]. In this thesis, the term “contextual information” refers to informationabout events or factors that are external to the kinematic data but have impact in themovement of vessels. We propose to use contextual information to reduce false alarms.

    1.2 Challenges

    Reducing high number of false alarms is extremely important for vessel anomaly detection.However, reducing false of alarms is a big challenge. First, it requires considering otherexternal information sources apart from AIS data and the cost of false negatives is high forthe safety and security of a country. Second, this is accelerated by the fact that the collectedAIS data is often incomplete to start with anomaly detection. For example, vessels engagingin illegal activities may turn off broadcasting AIS data for some time. In addition, vesseltrajectories are long (over many hours) and densely sampled (typically 1 min). Resamplingthe entire trajectory at a coarser time resolution may miss an anomalous movement if suchmovement occurs during the time between sampled points. Third, required data for anomalydetection may come from diverse sources (e.g., AIS data and weather centers) and oftencertain information is not available at all, calling for an efficient data cleaning technique.Fourth, real-time anomaly notification through analyzing big AIS data stream is a challengeif the system is expected to adapt to the changes and new contextual knowledge.

    1.3 Contribution

    • We propose a novel vessel anomaly detection framework for minimizing false alarmsin the maritime domain with the help of contextual information. To our knowledge,

    2

  • we are the first to combine both vessel kinematic information and contextual infor-mation in the automated single vessel anomaly detection process to support maritimesituation analysis.

    • Our framework is able to extract normal vessel movement patterns from long vesseltrajectories even if the trajectories are incomplete. We have written scripts for cleaningthe data that comes from diverse sources following the convention in maritime domain.

    • Our anomaly detection method can incorporate new contextual knowledge providedby domain experts or obtained from external sources which is a major criterion forfalse alarm reduction. We evaluate the proposed framework through empirical studieson real life AIS data sets obtained from the U.S. Coast Guard, which suggests thatthe framework is able to reduce false alarms.

    • Our developed prototype provides visualization of normal patterns and anomalies foruser support through Google Earth.

    1.4 Thesis Organization

    • We describe the preliminaries and high level problem statement of the thesis in Chap-ter 2.

    • In Chapter 3, we review research related to maritime anomaly detection and discusshow our work is different from other works in the literature.

    • We provide an overview of our proposed framework with a high level approach foranomaly detection in Chapter 4.

    • The design of scalable Maritime Data Warehouse of the proposed framework for Mar-itime Anomaly Detection is illustrated in Chapter 5.

    • The core components of the framework, i.e. Normal Pattern Extractor and AnomalyDetector, are described in Chapters 6 and 7 respectively.

    • Empirical studies that include evaluation of the framework are provided in Chapter 8.

    • We provide software documentation of the developed prototype in Chapter 9.

    • Concluding remarks of the thesis are provided in Chapter 10.

    3

  • Chapter 2

    Preliminaries & ProblemStatement

    We first describe several frequently used terms in this thesis and then define the problemthat we studied with a discussion on how our studied problem differs from other studiedproblems in the literature for maritime anomaly detection.

    2.1 Preliminaries

    Voyage Table

    VoyageID

    Destination

    ETA

    StartTime

    EndTime

    MMSI

    Broadcast Features

    Longitude

    Latitude

    SpeedOverGround(SOG)

    CourseOverGround(COG)

    Heading

    RateOfTurn(ROT)

    BaseDateTime

    Status

    MMSI

    VoyageID

    Vessel Table

    MMSI

    IMO

    CallSign

    VesselName

    VesselType

    Length

    Width

    Figure 2.1: AIS Message Structure

    Vessel Position Report. A Vessel Position Report (also called an AIS data point) pro-vides vessel kinematic information at a certain time interval (typically 1 min in our dataset, though AIS transponders send data more frequently, but the data set we are using hasbeen pre-processed). The four kinematic features Longitude, Latitude, SpeedOverGround(SOG) and CourseOverGround (COG) form a vessel position report at each time instantBaseDateT ime. These values are extracted from the Broadcast Features table of an AIS

    4

  • Message (see Fig. 2.1). The position of a vessel can be tracked from Longitude and Latitudeattributes, whereas velocity and direction can be obtained from SOG and COG, respectively.The Voyage Table and Vessel Table of the AIS Message provide voyage and vessel relatedinformation, respectively. For example, the Destination attribute of Voyage Table providesport of destination information and the V esselType attribute of a Vessel Table providesinformation about the type of vessel (cargo, tanker, passenger ship, etc.).

    Vessel Track. We represent a vessel’s movement by a track (can also be called trajectory)Ti =< t1, t2, t3, . . . , tn > from a port of origin to a port of destination, where ti denotesthe Vessel Position Report received at the ith time point into the voyage. Notice thatthe absolute time at the ith point is not required to be the same for different tracks. Forexample, if one voyage from Vancouver to Seattle starts at 1 p.m. and another starts at 3p.m., t1 for the first voyage corresponds to the data at 1 p.m. and t1 for the second voyagecorresponds to the data at 3 p.m. Figure 2.2 shows 3 vessel tracks T1, T2 and T3, where theX-axis denote time into the voyage. In the Broadcast Features table, each vessel track isassociated with a unique VoyageID, whereas each vessel is associated with a 9-digit MMSI.

    T1

    T2

    T3

    Origin Destination

    2 3 4 5 6 7 81 9

    Longitude,Latitude,SOG,COG

    i:

    ti

    Figure 2.2: Example Vessel Tracks

    Track Segment. A track segment is formed by consecutive Vessel Position Reports thatare a subset of the full vessel track. For example, a vessel going from Vancouver to Seattlemay be completely normal for most parts, but deviates from normal movement patternsonly for a short < t1, t2, t3, t4 > segment of the full track Ti. If the entire track is considered,it is difficult to detect the anomalous movement because typically anomalous behaviour ofa vessel in the sea is observed for a short period of time. This anomalous behaviour canbe efficiently detected if the system analyzes the observed track segment instead of the fullvessel track as large portion of the track generally behaves normally. Fig. 2.3 shows twotrack segments < t1, t2, t3, t4 > and < t4, t5, t6, t7 > of T2 overlapping at t4.

    5

  • T1

    T2

    T3

    Origin Destination

    2 3 4 5 6 7 81 9i:

    ti

    Segment

    Figure 2.3: Example Track Segments

    Normal Movement Pattern. A normal vessel movement pattern is discovered by clus-tering the set of historical vessel tracks between a particular origin and destination. Eachnormal pattern represents a typical route that is followed by many vessels. Note that theorigin and destination do not necessarily refer to the actual port of origin and port of desti-nation of the tracks. Instead, the operator can select any geographically bounded polygons(via visualization tools) of interest to specify the origin and destination of the tracks formining the movement patterns. In general, such tracks can be a part of the entire journeyof a vessel, in which case the actual departure port and destination port may not be needed.

    Potential Anomaly. A vessel track segment is a potential anomaly if the movement ofthe track segment deviates from the normal movement patterns. This deviation can be inone or more of the four kinematic features.

    Anomaly. A potentially anomalous vessel is signaled as an anomaly if the contextual fea-tures are within the normal range for the duration of that vessel’s movement. For example,if wind speed at the location and time of the movement deviation is within the normal rangein the direction of the deviation, then wind speed is not a factor in the deviation, so thepotential anomaly is confirmed.

    2.2 Problem Statement

    Our objective is to detect anomalous track segments in real-time within operator’s geograph-ical area of interest from a received AIS data stream and on-demand contextual informationwith the focus on reducing false alarms.

    Our problem significantly differs from other studied anomaly detection problems in themaritime domain in following two aspects.

    6

  • • How to minimize the high number of false alarms which is a major concern for enduser (e.g., Coast Guard)?

    • How to combine both AIS data and contextual information in the automated vesselanomaly detection process?

    Note that not all anomalies necessarily exhibit kinematic deviation. For example, vesselsmay perform illegal activities in the sea through turning off the AIS broadcast. In this paper,we focus on those vessel anomalies that have observable kinematic deviations. A taxonomyof 16 different types of anomalies due to kinematic deviation is listed in [17].

    7

  • Chapter 3

    Related Works

    We start our literature review with a discussion of general anomaly detection approachesin Section 3.1. In Section 3.2 we provide a high level overview of traditional trajectoryanomaly detection approaches. We conclude this Chapter with the discussion of anomalydetection approaches in maritime domain in Section 3.3.

    3.1 Anomaly Detection Approaches

    In the data mining community, “anomalies are patterns in data that do not conform to awell defined notion of normal behaviour” [4]. Traditionally, normal behaviour is obtainedby a normal pattern extractor model from the training data. The obtained normal modelis used for detecting anomalies from the testing data. Although we have found differentapproaches (see [4]) in the literature for anomaly detection, most of them are domainspecific such as follows.

    • Cyber-Intrusion Detection

    • Medical Anomaly Detection

    • Textual Anomaly Detection

    Chandola et al. [4] categorize anomaly detection algorithms as belonging to one ormore of the following major classes: classification based techniques, statistical techniques,nearest neighbour based techniques and clustering based techniques. We provide a highlevel overview of each of the anomaly detection methods in following subsections.

    3.1.1 Classification Based Methods

    The primary assumption of classification based methods is that a decision boundary infeature space that separates normal and anomalous data points can be obtained from atraining data set. Classification based methods can be divided into two main categories:

    8

  • multi-class and one-class anomaly detection techniques.

    Multi-class techniques assume that training data includes class labels from multiplenormal classes. Typically, new data points are either classified as belonging to one of thenormal classes, or classified as anomaly if it they do not fit into any of the normal classes.On the other hand one-class techniques assume that all data points in the training set havethe same class label. One-class methods typically learn a discriminating boundary aroundthe training data which is used to check if new data points are anomalous or not from thetesting data. Note that most of the classification based methods for anomaly detection arebased on neural networks [18], support vector machines (SVM) [28] or rule-based methods.

    3.1.2 Statistical Techniques

    The assumption behind statistical methods for anomaly detection is as follows: “normaldata instances occur in high probability regions of a stochastic model, while anomaliesoccur in the low probability regions of the stochastic model” [4]. Generally, it is assumedthat normal data points constitute independent and identically distributed samples from astationary probability distribution which can be estimated from sample data. As a reason,statistical methods are based on semi-supervised learning.

    3.1.3 Nearest Neighbour Based Techniques

    Nearest-neighbour methods for anomaly detection assume that “normal data points occurin dense neighbourhoods, while anomalies occur far from their closest neighbours” [4]. Inorder to apply this anomaly detection method, a distance or similarity measure betweendata points is required. Traditionally, Euclidean distance (ED) is a popular distance mea-sure that has been applied in many nearest neighbor-based anomaly detection methods.However, other distance measures have also been proposed (see [4]).

    Different algorithms for calculating anomaly scores based on the nearest neighbour prin-ciple have been proposed in the literature. A typical algorithm is based on the followingprinciple: “the anomaly score of a data point is defined as its distance to its kth nearestneighbour in a given data set”. Other variants of this principle calculate the anomaly scoreas the sum of the distances to the k nearest neighbours.

    3.1.4 Clustering Based Techniques

    Clustering is an unsupervised learning technique that groups similar data points into clus-ters. Chandola et al. [4] categorize clustering based techniques for anomaly detection intothree categories.

    9

  • The first category assumes that “normal data points belong to a cluster in the data,while anomalies do not belong to any cluster”. These techniques group the data points intosome cluster. Those data points that are not found to belong to any cluster are classified asanomaly. According to Chandola et al. [4], these methods are designed for finding clustersrather than anomalies.

    The second group of clustering based anomaly detection techniques are more focusedtowards finding anomalies, where the assumption is as follows: “normal data points lieclose to their closest cluster centroid, while anomalies are far away from their closest clustercentroid” [4]. Typically, these methods first cluster the data and then assign an anomalyscore to each data point based on the distance to its nearest cluster centroid. Some ofthe traditional clustering algorithms include kmeans clustering [16], self-organising maps(SOM) [13] and expectation-maximisation (EM) [8]. Techniques from the second groupcan also operate in a semi-supervised mode, where a new test data point is compared to aalready obtained cluster model from the training data.

    Not surprisingly it is a matter of fact that anomalies may form clusters by themselves. Ifthis is the case then algorithms based on the previous two clustering based methods will notbe able to detect those anomalies. To address this issue, algorithms have been proposed thatrely on the following assumption: “normal data points belong to large and dense clusters,while anomalies either belong to small or sparse clusters” [4]. These algorithms assign ananomaly score, which reflects the size or density of the cluster to which the correspondingdata points belongs. An example of such methods is Cluster-Based Local Outlier Factor [4].

    3.2 Trajectory Anomaly Detection Approaches

    Our problem particularly relates to the general anomalous trajectory detection problem,where the goal is to find out the outlier trajectories that significantly differ from the normaltrajectories in the underlying data. Research related to automated anomaly detection intrajectory data has attracted a lot of attention in recent years due to the increasing amountsof historical and real-time trajectory data obtained from sensors of moving objects, such aspeople, vehicles, vessels and animals. As an example, the use of video cameras for publicsurveillance and safety has increased dramatically since the beginning of the 21th century,and a significant research effort has been performed towards developing object tracking al-gorithms.

    For an overview of solution strategies for the anomalous trajectory detection problem,reader is referred to [1]. Trajectory anomaly detection approaches can be classified into two

    10

  • categories: clustering based methods and non-clustering based methods. We briefly discussthis two methods below.

    3.2.1 Clustering Based Trajectory Anomaly Detection

    The objective of the trajectory clustering is to learn the underlying routes in data bygrouping similar trajectories together in a cluster. All clustering algorithms require thatan appropriate similarity measure, also known as distance measure, is defined which con-stitutes a valid metric. Euclidian distance (ED) is the most simple and intuitive similaritymeasure (e.g., Piciarelli et al. [22]), but it requires that the preprocessed trajectories are ofequal length and ED performs poorly if they are not properly aligned (see [19]). Other sim-ilarity measures and modifications of ED have been proposed in the literature for relaxingalignment and length constraints, such as Dynamic Time Warping (DTW) [19]. LongestCommon Sub-Sequence (LCSS) is another similarity measure appropriate for trajectoriesthat are of unequal length and/or are not well-aligned [29].

    Typically anomaly detection is carried out after the clustering is done on training data.At first, we determine the cluster that best explains the new trajectory i.e., having theminimal distance to the new trajectory from rest of the clusters. The corresponding distanceis then usually compared to an pre-defined anomaly threshold for deciding whether the newtrajectory is anomaly or not.

    3.2.2 Non-clustering Based Trajectory Anomaly Detection

    Other approaches to trajectory anomaly detection do not involve clustering of trajectories.We discuss few of these approaches below.

    Owens et al. [21] proposed an algorithm based on self-organising maps (SOM) that isappropriate for sequential (online) anomaly detection in trajectory data. Each data pointfrom a trajectory is represented by a fixed-length feature vector encompassing the currentlocation, velocity, and acceleration. The SOM is trained using a set of feature vectors asinput. During sequential anomaly detection, the feature vector corresponding to each newdata point is submitted to the SOM for finding the winning neuron. If this distance exceedsa predefined threshold, the corresponding trajectory is classified as anomaly.

    Lee et al. [15] proposed a partition-and-detect framework for the detection of anomaloussub-trajectories in a trajectory database in a two step process. At first, trajectories arepartitioned into a number of line segments. Next, anomalous trajectory partitions, i.e.,line segments, are detected using density-based analysis. In order to measure the distance

    11

  • between two line segments, a combination of spatial distance and angular distance betweenline segments is used.

    3.3 Maritime Anomaly Detection Approaches

    Detecting anomalous vessel tracks in the maritime domain has unique challenges as dis-cussed in Section 1.2 and has been studied mostly in the context of homeland securityand maritime surveillance. In this thesis, we confine our literature review to single vesselanomaly detection approaches in the maritime domain. Anomaly detection approaches inthe maritime domain can be categorized into three classes: data driven, knowledge driven,and hybrid approaches. Mainly, data generated from AIS sensors installed in vessels hasbeen used in these anomaly detection techniques. We provide a brief high level overview ofeach of the maritime anomaly detection methods mentioned above in following subsections.

    3.3.1 Knowledge Driven Techniques

    Knowledge driven techniques construct rules based on maritime experts’ knowledge regard-ing suspicious behaviour of vessels at sea [20, 26]. Whenever a new vessel violates anypre-defined rules stored in a database, the system alerts the operator about that vessel. Adrawback of this technique is that the detection process is fully dependent on static rules.Over the course of time, it will become necessary to incorporate new rules and update oldones, but making sure that the resulting rules are consistent is non-trivial. Moreover, thisapproach does not take into account the contextual information at the vessels’ specific lo-cation and time. For example, a vessel may be forced to violate certain rules because ofuncontrollable situations, such as storms or hurricanes, but the system does not have suchweather information to distinguish these forced violations from those resulting from illegalactivities.

    3.3.2 Data Driven Techniques

    Data driven methods build normal patterns from historical vessel track data and if any vesseldeviates from normal patterns, the system alerts the operator that a vessel is a potentialanomaly [6, 14]. Many instances of suspicious behaviour may not be detected using onlydata driven methods. Inclusion of expert knowledge would be beneficial for detecting thevast majority of anomalies which are not detected using only data driven approaches [25].

    3.3.3 Hybrid Approaches

    Hybrid approaches are the combination of both knowledge driven and data driven methods.A normal model of vessel behaviour from AIS data using Self Organizing Map and Gaus-sian mixture model was built, and expert knowledge was incorporated through IF-THEN

    12

  • rules in [24]. Any deviation from constructed rules and the normal model is signaled as ananomaly.

    Though we detect potential anomalies following the traditional convention of data driventechniques, potential anomalies are filtered through the proposed contextual verificationtechnique that can be plugged in with any trajectory anomaly detection methods (e.g..TN-Opt algorithm [30]) for reducing false alarms. In a recent work, Hamed et al. incorpo-rated static information such as vessel type and location of port in multi-vessel interactiondetection [27]. Unlike such static information or rules, the contextual information we con-sider refers to real-time information from external sources that are specific to the time andlocation of vessels, such as sea and weather conditions experienced by the vessel underanomaly detection, which is not previously stored in a data storage used by the system.Moreover, a major limitation of automated anomaly detection approaches in the maritimedomain is that the number of false alarms is often too high for human filtering. Reducingfalse alarms using contextual information while not missing true anomalies is an importantand challenging task. To our knowledge, there is very little work in this direction.

    13

  • Chapter 4

    Framework Overview

    We start this Chapter with an overview of the architecture of our proposed frameworkin Section 4.1. We conclude this Chapter with the high level discussion of our proposedmaritime anomaly detection approach in Section 4.2.

    4.1 Architecture

    Data Preprocessor

    Anomaly Detector

    AIS Data Stream

    Contextual Data

    Normal Movement Patterns

    Normal Pattern

    Extractor

    Vessel Tracks

    Maritime Data Warehouse

    Contextual Information

    Graphical User Interface

    Websites, Blogs, etc.

    Figure 4.1: Overview of MADCV Framework

    Our anomaly detection framework is called MADCV (shown in Fig. 4.1), for MaritimeAnomaly Detection with Contextual Verification. Our framework design is based on a 3-layerarchitecture, where each layer works as an independent building block.

    • The upper layer provides a graphical interface to user for interacting with the system.

    • The middle layer consists ofData Preprocessor, Normal Pattern Extractor, andAnomalyDetector.

    • In the bottom layer, we have the Maritime Data Warehouse unit as data storage.

    14

  • The Data Preprocessor unit mainly does data preprocessing where data is received fromheterogeneous sources (such as AIS and contextual data) and is transferred into the Mar-itime Data Warehouse for storage. Apart from storing AIS data streams, the MaritimeData Warehouse also stores vessel tracks, normal movement patterns from an origin to adestination, and contextual information at a given time and location. Though contextualinformation can be obtained from social media (e.g., Facebook, Twitter, LinkedIn, etc.),websites, blogs, to name a few, for our current study and implementation, we are onlyconsidering the weather information obtained from National Data Buoy Center (http://www.ndbc.noaa.gov) as the contextual information. The primary task of the AnomalyDetector unit is to detect anomalous vessel track segments and provide supporting infor-mation of the detection to user. User can interact with the Normal Pattern Extractor unitfor extracting normal vessel movement patterns from historical vessel tracks.

    4.2 Our Approach

    We have two phases in the anomaly detection process: Normal Pattern Extraction andAnomaly Detection. In the first phase, we extract normal movement patterns from histor-ical vessel tracks within a particular origin and destination stored in the Maritime DataWarehouse, and in the second phase, we detect anomalous track segments from a receivedAIS data stream within the operator’s geographical area of interest. We provide a highlevel overview of the two phases below.

    Step 1: Partition into Segments

    Input: Set of Vessel Tracks

    Step 2: Cluster Track Segments

    Maritime Data Warehouse

    Output: Normal Movement Patterns

    Figure 4.2: High Level Steps of Normal Pattern Extraction

    4.2.1 Phase 1: Normal Pattern Extraction

    The system has to know normal vessel movement patterns in order to detect anomalousmovement in real time. The user can ask the Normal Pattern Extractor to extract nor-mal patterns from an origin to a destination, and then store them in the Maritime DataWarehouse. This task is usually done offline. The user also has the option to visualize

    15

    http://www.ndbc.noaa.govhttp://www.ndbc.noaa.gov

  • normal movement patterns using Google Earth. We extract normal movement patterns ina two-step process (see Fig. 4.2). In step 1, we partition the given vessel tracks into differ-ent segments, and in step 2, we cluster the track segments. Details of the normal patternextraction process are discussed in Chapter 6.

    4.2.2 Phase 2: Anomaly Detection

    Anomaly detection is carried out through two step process: Potential Anomaly Detectionand Contextual Verification (see Fig. 4.2). A given track segment is compared with storednormal track segments. In case of any deviation, the particular segment is detected as apotential anomaly in this step. The contextual verification process is carried out only whena potential anomaly is detected and requires further verification with additional contextualinformation. The user is notified whenever an instance of anomalous behaviour is confirmedafter the contextual verification step of anomaly detection. Otherwise, a detected potentialanomaly is deemed to be a false alarm. Contextual information that supports the decision isalso output in the interface. The user makes the final decision regarding whether the trackis an anomaly based on various sources of information, including the information output bythe Anomaly Detector. We elaborate on the anomaly detection process in Chapter 7.

    Step 1: Potential Anomaly Detection

    Input: A Track Segment

    Step 2: Contextual Verification

    Maritime Data Warehouse

    Output: Notify if Anomaly

    Figure 4.3: High Level Steps of Anomaly Detection

    16

  • Chapter 5

    Maritime Data Warehouse

    At first, we discuss the motivation for using data warehouse for supporting anomaly de-tection process in Section 5.1. Then we describe possible data sources for inclusion intodata warehouse for anomaly detection purpose in Section 5.2. After that we discuss schemadesign technique of maritime data warehouse in Section 5.3. We conclude this Chapter witha discussion on data pre-processing for maritime data warehouse in Section 5.4.

    5.1 Motivation for Data Warehouse

    Data warehouse is a centralized repository that stores data from heterogeneous sources andtransforms them into a common, multidimensional data model for efficient querying andfurther analysis. Typical online transaction processing system(OLTP) is not suitable forreal-time decision support. Data warehouse came into existence in order to allow real-timedecision support. According to William H. Inmon, “A data warehouse is subject oriented,integrated, time-variant and non-volatile collection of data in support of management’s de-cision making process.” Traditional database uses query driven approach for heterogeneousdatabase integration. This requires complex queries to be executed on local sites whichare inefficient for high performance driven system. On the other hand data warehouse usesupdate driven approach, where information from heterogeneous sources are integrated inadvance and stored into data warehouse for direct querying and analysis.

    In maritime anomaly detection (MAD) system, typically we obtain data from differentheterogeneous sources that includes both structured and unstructured data. As a result wehave millions of records of both historical data as well as huge amount of new incoming data.We can call these data as big data as it possess volume, velocity and variety. TraditionalOLTP system is not suitable for efficiently storing this huge amount of data for providingdecision support. As a result Data warehouse becomes an obvious choice for decision support(e.g., anomaly detection) in maritime domain.

    17

  • 5.2 Data Sources for Maritime Anomaly Detection

    Anomaly detection (AD) in maritime domain is a challenging task because of huge datainflow from different sources. Generally, data received from sensors are used for AD butthere are a number of additional data sources regarding maritime activities (i.e. contextualdata) that can be useful for this purpose. We can categorize these data sources into followingtwo groups and provide a brief overview of these two groups of data sources below.

    • Open Data Sources (publicly accessible)

    • Restricted Data Sources (accessible to marine authorities)

    5.2.1 Open Data Sources

    We call those data sources as open which are publicly available in internet and are free to ac-cess. These data sources consists of vessels traffic data and reports or news that are relatedto the maritime domain and can be found in different blogs, websites or social networks.Additionally, publicly available weather data is also included in this group. InternationalMaritime Organization (IMO) has provided an information resources document. By inves-tigating this document it is possible to make a chart for applicable open data sources forAD system.

    5.2.2 Restricted Data Sources

    We can divide the restricted data sources in two categories. The first category consistsof sensors. Sensors provide kinematic data for each object in their coverage area. Thesecond restricted category of data sources includes the authorized databases which containinformation about vessels, cargoes, crews, etc. AIS is the main data source for AD systemwhich is included in both categories of restricted data sources.

    5.3 Schema Design of Data Warehouse

    It is basically a representation of data structure for holding data in warehouse. At first,we describe traditional approaches for designing data warehouse schema in Section 5.3.1and then we describe our proposed scalable schema design technique in Section 5.3.2 fordeveloping data warehouse in maritime domain.

    5.3.1 Traditional Schema Design Approaches

    Theoretically, we can follow any of the following data warehouse design technique for de-signing the schema for Maritime Data Warehouse.

    18

  • Star Schema: The star schema design approach is the simplest data warehouse schemadesign approach. It is called a star schema because the diagram resembles a star, withpoints originating from a center. The center of the star consists of fact table and thepoints of the star are the dimension tables.

    A fact table typically has two types of columns: foreign keys to dimension tables andmeasures that contain numeric facts. A fact table can contain data related to the facton detail or aggregated level. A dimension is a structure usually composed of one ormore hierarchies that categorizes data. The primary keys of each of the dimensiontables are part of the composite primary key of the fact table. Dimensional attributeshelp to describe the dimensional value. They are normally descriptive, textual values.Dimension tables are generally small in size than the fact table.

    Snow-flake Schema: A snowflake schema design resembles a snowflake in shape. Thesnowflake schema is represented by the centralized fact tables which are connectedto multiple dimension tables. In the snowflake schema, however, dimensions arenormalized into multiple related tables, whereas the dimensions of star schema aredenormalized with each dimension being represented by a single table.

    Fact Constellation Schema: For each star schema it is possible to construct fact con-stellation schema (for example, by splitting the original star schema into more starschema, where each of them describes facts on another level of dimension hierarchies).The fact constellation design contains multiple fact tables that share many dimensiontables. The main drawback of the fact constellation schema is the complication of thedesign. Moreover, dimension tables in this design are very large.

    5.3.2 Proposed Schema Design Approach

    We propose scalabale star schema design for representing the obtained AIS data from U.S.Coast Guard in our Maritime Data Warehouse. Traditionally in star schema design, theremust be a fact table surrounded by dimension tables. Fig. 5.1 shows the star schema rep-resentation of the the AIS data for UTM Zone 10 of January 2009 obtained from the U.S.Coast Guard, where Table UTM-10-200901-processed is the fact table. Table Vessel-10-200901-processed and Voyage-10-200901-processed represent the dimension tables in thisrepresentation.

    We argue that this representation is scalable because for each month of the year in eachUTM Zone, we follow the same schema representation. For example, for representing AISdata of UTM Zone 9 for the month of February 2009, we will have following three tables.

    • UTM-9-200902-processed

    19

  • Figure 5.1: Representation of the U.S. Coast Guard AIS Data for UTM Zone 10 of January2009 in Data Warehouse

    • Vessel-9-200902-processed

    • Voyage-9-200902-processed

    Note that our obtained raw U.S. Coast Guard AIS data is also partitioned accordingto month in each UTM Zone. At present, we stored both processed and raw AIS data ofUTM Zone 1 to UTM Zone 11 that covers the entire West Coast of North America of theyear 2009 in our Maritime Data Warehouse.

    Figure 5.2: Representation of Tracks and Normal Patterns for UTM Zone 10 in DataWarehouse

    We also store tracks and normal movement patterns in our data warehouse. Fig. 5.2shows the representation of the tracks of UTM Zone 10 in Table Track-10-2009 for the year

    20

  • of 2009 and normal patterns in Table Pattern-10.

    Figure 5.3: Representation of Weather Information of UTM Zone 10 in Data Warehouse

    Our Maritime Data Warehouse also supports storing the weather information as the“contextual information”. Fig. 5.3 shows the representation of the Tables WeatherStationand Weather-10 that contain the information related to the weather stations and wind andsea condition at specific time and location respectively in UTM Zone 10. For each UTMZone, we follow the same representation for storing weather information.

    5.4 Data Pre-processing for Maritime Data Warehouse

    Typically, data pre-processing is done using ETL (Extract, Transform and Load) tool indata warehouse. ETL is a process that is responsible for pulling data out of the sourcesystems, doing the data transformation and placing it into a data warehouse. ETL involvesthe following tasks.

    Extract: Extraction task is carried out in order to extract data from different sourcesystems that is converted into one consolidated data warehouse format.

    Transform: Data transformation is usually done on the extracted data from differentsources. Data transformation includes applying business rules, cleaning, filtering,merging data from multiple sources etc.

    Load: Loading task is followed after the transformation task is finished. Main objective ofloading task is to load the transformed data into suitable storage (typically RDMS).

    ETL tool can extract data from text file, geo database, XML file, RDBMS etc. Thoughthere are different ETL tools availble, for our current implementation, we have used SQL

    21

  • ArcGISAIS Data

    (gdb format)AIS Data

    (text format) SSIS

    Bulk Load

    Data Warehouse

    Data Warehouse Cleaning Script

    Figure 5.4: Data Pre-processing Work Flow for Storing AIS Data in Maritime Data Ware-house

    Server Integration Service (SSIS) as the ETL tool for loading the bulk data (AIS andweather data) into data warehouse. Fig. 5.4 shows the work flow of storing AIS data intoMaritime Data Warehouse.

    At first, we transform the raw AIS data set for each month in each UTM Zone whichis in gdb format to text format using ArcGIS software. Then we load the AIS data whichis in text format into data warehouse (SQL Server 2014) using SSIS. Finally, we run ourcleaning script for cleaning the loaded AIS data. We follow following rules for writing ourcleaning script using C# programming language.

    • MMSI should have 9 digit number format.

    • SOG should lie between 0 knots to 102 knots.

    • COG should lie between 0 degree to 359 degree.

    • Interpolate out of range values of SOG and COG.

    • VoyageID should be unique in each month of voyage in each UTM Zone.

    • MMSI should be unique for each month of vessel static data in each UTM Zone.

    22

  • Chapter 6

    Normal Pattern Extractor

    Input: A set of historical vessel tracks T = {T1, . . . , Tn} between an origin and destination,where each Ti denotes a vessel track.Output: The normal movement patterns for T to be stored in the Maritime Data Ware-house.

    Normal movement patterns are extracted by clustering the tracks in T. However, clus-tering the tracks in full length is computationally difficult and challenging (see Section 1.2).In fact, full length clustering is undesirable in terms of detecting anomalous movement.Typically an instance of anomalous movement is confined to a small portion of a track anda large portion of the voyage is similar to other normal tracks. In this case, full lengthclustering may tend to treat an anomalous track as a normal track. A better approach is topartition each vessel track into shorter segments, i.e. a subset of consecutive Vessel PositionReports, and extract normal patterns and detect anomalous movement within the shortersegments.

    We describe the partitioning of T into segments in Section 6.1 which is followed by theillustration of the proposed clustering algorithm TSC: Track Segment Clustering in Section6.2. Algorithm TSC: Track Segment Clustering is used to extract normal patterns withineach segment of T.

    6.1 Partitioning Vessel Tracks

    Though different criteria and technique could be adopted in partitioning tracks (e.g., see[15]), from practical point of view Coast Guard would be interested to observe the vesselmovement patterns in different time phase from the beginning of the voyage. As a reason,we partition the tracks in T with same voyage duration into s segments by time, denotedS1(T), . . . , Ss(T), where Si(T) contains the ith segment of each track in T and s is the

    23

  • T1

    T2

    T3

    Origin Destination

    2 3 4 5 6 7 81 9

    Longitude,Latitude,SOG,COG

    i:

    ti

    Figure 6.1: Set of Vessel Tracks T

    T1

    T2

    T3

    2 3 4 5 6 7 81 9i:

    Origin Destination

    Figure 6.2: 4 Segments of Vessel Tracks T

    number of segments. Though the voyage duration of each track must be the same, it is notnecessary for each track to have identical start or finish times.

    Let us consider the set of three vessel tracks T = {T1, T2, T3} depicted in Fig.6.1,where the x-axis represents the time instant indexed by i and ti represents the Vessel Po-sition Report at time instant i. Suppose that we partition the tracks into segments ofPartitionWindow units of time, where PartitionWindow can be specified by the end user.For example, each track is partitioned into four segments by the vertical lines (in red) asshown in Fig. 6.2. The first segment of each track corresponds to the data at the first threetime instants 1, 2, 3, and the second segment corresponds to the data at the next three timeinstants 3, 4, 5 (with 3 being overlapped), and so on. That means the set of the ith tracksegments Si(T) = {Si(T1), Si(T2), Si(T3)}, where 1 ≤ i ≤ 4.

    Though it is not straightforward to find the optimal PartitionWindow, the followingcriteria could be of interest to the user. If the chosen PartitionWindow is too big (the worst

    24

  • case being that it is the total voyage duration of the entire track), we may miss unusualvessel movement, since anomalous events at sea generally do not occur for long periodsof time. On the other hand, if the chosen PartitionWindow is too small (the worst casebeing that it is a single time series point), this would require enormous computation forclustering each segment, since we would have a very large number of segments. Aside fromcomputation overhead, if the PartitionWindow is too small, there would not be enoughAIS data points to form the track segment for anomalous movement detection.

    6.2 TSC: Track Segment Clustering

    In this step, we extract the normal movement patterns from Si(T), 1 ≤ i ≤ s. To find nor-mal movement patterns for Si(T), the idea is to partition Si(T) into several clusters. Ourchoice of clustering algorithm has two considerations. First, we need a distance measurefor determining the distance between two track segments that can tolerate missing values.Second, the clusters may have arbitrary shapes because the route of vessels is subject tosea lanes, which could have irregular shapes.

    To address the requirement for the distance measure, we use the distance measure pro-posed in [2]. This distance measure has been previously applied to cluster incomplete cartracks using OPTICS [3], which are quite similar to vessel tracks. In order to address thearbitrary cluster shape requirement, we propose the TSC: Track Segment Clustering algo-rithm, a density-based clustering algorithm for finding clusters in a set of track segmentsSi(T), which uses similar concepts to DBSCAN [10] and Line Segment Clustering [15].Note that DBSCAN and Line Segment Clustering can be applied to find clusters in set ofpoints and sets of line segments respectively, whereas our input is a set of track segments.Therefore, we cannot directly apply DBSCAN or Line Segment Clustering for our purposes.

    Assuming each track segment Si(Ti) ∈ Si(T) as a point, our clustering approach issimilar to DBSCAN. TSC: Track Segment Clustering requires two input parameters Epsilonand MinTrs. Note that we renamed the input parameters of DBSCAN (i.e. Eps andMinPts) to Epsilon and MinTrs respectively. Epsilon is the radius of the neighborhoodregion of a track segment and MinTrs is the minimum threshold for number of tracksegments within the Epsilon neighborhood region of a track segment. Unlike a circularneighborhood region of a point in DBSCAN, the shape of the neighborhood region for atrack segment is a polygon. The user can set the value of MinTrs, which is dependenton an experimental data set. However, the value of Epsilon can be obtained following theapproach to finding the value of Eps in DBSCAN. At the end of the clustering process, thenormal vessel tracks of each segment Si(T) with the corresponding Epsilon of that segmentare stored in the Maritime Data Warehouse.

    25

  • Chapter 7

    Anomaly Detector

    Input: A segment of vessel track Tq i.e. Si(Tq) within operator’s geographical area ofinterest.Output: A decision whether Si(Tq) is an anomaly with respect to the known normal move-ment patterns within that particular geographical area of interest.

    Assuming we have the full track Tq within the geographical area of interest, first, wedescribe two main steps in the anomaly detection process: potential anomaly detection andcontextual verification in Sections 7.1 and 7.2, respectively, for detecting anomalous tracksegments of Tq. After that, we discuss how this detection process can be adapted in realtime when the input is an individual track segment, i.e. Si(Tq), instead of a full track Tq inSection 7.3.

    7.1 Potential Anomaly Detection

    We assume that Normal Pattern Extraction process described in Chapter 6 has been appliedto extract the normal patterns for Si(T) within the geographical area of interest, whereeach normal pattern for Si(T) is represented by a cluster Cl. For 1 ≤ i ≤ s, we checkif Si(Tq) belongs to some cluster Cl of Si(T), that is, if Si(Tq) is within the Epsilondistance from any track in the cluster. Note that Cl is called the reference cluster forSi(Tq). The distance measure is the same one that is used by the clustering algorithm inSection 6.2. If Si(Tq) does not belong to any cluster Cl of Si(T), then it is considered apotential anomaly. The Anomaly Detector collects all potential anomalies Si(Tq), 1 ≤ i ≤ sfor contextual verification purposes, which is carried out immediately after this PotentialAnomaly Detection step if any potential anomalies are detected.

    26

  • 7.2 Contextual Verification

    Input: Potential anomaly Si(Tq).Output: For every potential anomaly Si(Tq), a decision whether Si(Tq) is an anomaly andthe contextual information that supports the decision.

    The main objective of contextual verification is to verify whether a potential anomalySi(Tq) detected by the method in Section 7.1 is an anomaly using contextual features CFi,where CFi represents features of contextual information at the ith segment of Tq. In prin-ciple, contextual information refers to any factor that could potentially impact the nor-mal movement of a vessel. Most such info comes from external sources such as weather,oil prices, etc. In the current implementation CFi contains WindDirection, WindSpeed,GustSpeed, and WaveHeight, but this can be extended to include any known factors thatmight contribute to anomalous vessel movement. WindDirection, WindSpeed, GustSpeedand WaveHeight represent the wind direction, wind speed, gust speed and wave height atthe location and time in which the instance of anomalous vessel behaviour was detected.

    Essentially, CFi is used to explain the anomalous behaviour of the vessel track segmentSi(Tq). To this end, a match function f(CFi, Si(Tq)) can be specified to measure the matchdegree between CFi and Si(Tq), such that a better match represents a better explanationof the anomalous behaviour of the vessel by the factors captured in CFi. The choice ofCFi and f is application dependent, because they reflect domain knowledge about possiblefactors for anomalous vessel behaviour.

    In our current experiment we followed rule-based strategy for specifying the matchfunction f that is illustrated in the following example. Suppose that the potential anomalySi(Tq) is sailing on less than usual average speed (SOG), and if WindDirection is oppo-site to the movement direction (COG) of the vessel during the movement and WindSpeedexceeds a specified range (provided by experts or obtained from external sources), the devia-tion due to speed is likely caused by the wind. Otherwise, the anomalous deviation remainsunexplained, and therefore, would be tagged as an anomaly. In both cases, the operator isinformed of the result.

    Unlike the knowledge driven approach discussed in Section 3.3.1, the values for CFiare dependent on the time and location of the query track segment Si(Tq) and have to beobtained from external sources as an on-demand query. Such CFi cannot be stored as staticrules as in the knowledge driven approach because queries are not known in advance.

    27

  • Flexibility of this method can be addressed through the addition of new domain specificcontextual features (see Section 8.3), where we just need to add or update the rules forexplaining the match between CFi and Si(Tq). For example, if oil price becomes a newcontextual information for anomalous movement, a new handle will be created for extract-ing such information from relevant sources and a match function f will be provided by thisfeature.

    7.3 Real-Time Detection

    In real time, we do not have the entire vessel track Tq in advance, but one segment Si(Tq)of Tq at a time, as the vessel is moving, where the segment is determined by the chosenPartitionWindow. We collect one segment of Tq at a time as it becomes available. In fact,our detection methods in Sections 7.1 and 7.2 are applied to a single segment, thus do notrequire the entire track Tq to be available.

    28

  • Chapter 8

    Empirical Studies

    At first, we describe our experimental data set in Section 8.1. Then we quantitativelyevaluate our proposed framework in Section 8.2. After that we provide a discussion ofother contextual information for possible inclusion in the Maritime Data Warehouse inSection 8.3. In Section 8.4 we provide a discussion on the effect of variation in the segmentsize and handling the irregular sampling rate of AIS data. We conclude our empirical studieswith the discussion of our current prototype development in Section 8.5.

    8.1 Data Set Description

    To evaluate the proposed approach, we experimented with an AIS data set obtained fromthe U.S. Coast Guard (http://marinecadastre.gov/data/) for the year 2009 in UTM Zone10 (including the west coast of British Columbia, Canada, and Washington State, U.S.),where there are 1047 tracks of cargo ships from origin A to destination Vancouver (shownin Fig. 8.1) with a sampling rate of 1 AIS data point per minute for each track. This is thedata set used in the evaluation, where MinTrs = 8. We used the default value of Epsilonfor each segment that was stored in the Maritime Data Warehouse during normal patternextraction.

    We divided this data set into the training set (560 tracks) and the testing set (487tracks), corresponding to the first 7 months (January–July) of data and the last 5 months(August–December) of data, respectively. The training set is used to extract normal pat-terns and the testing set is used to evaluate the performance of the framework. Due tovariation of voyage duration of vessels, we divided tracks in both training and testing setsinto 6 groups according to the total voyage duration (8 to 13 hours) and further partitionedthe tracks in each group into different segments according to different PartitionWindowvalues of 1, 2, 3 and 4 hours, as described in Section 6.1. We evaluate the framework for the2 hour PartitionWindow in this thesis and provide a discussion on the effect of changing

    29

  • A

    Vancouver

    Figure 8.1: Tracks between Origin A to Destination Vancouver of Cargo Ships

    Group No. Segment Training Testing Ground Truth

    1 (8 hr duration)

    1 17 20 None2 17 20 None3 17 20 None4 17 20 15696

    2 (9 hr duration)

    1 16 17 212622 16 17 212623 16 17 None4 16 17 None5 16 17 None

    3 (10 hr duration)

    1 58 43 None2 58 43 788783 58 43 788784 58 43 72995 58 43 None

    4 (11 hr duration)

    1 193 160 12733,792942 193 160 None3 193 160 None4 193 160 None5 193 160 None6 193 160 None

    5 (12 hr duration)

    1 211 181 50314,524372 211 181 5823 211 181 50314,18299,734584 211 181 None5 211 181 None6 211 181 None

    6 (13 hr duration)

    1 65 66 None2 65 66 None3 65 66 133914 65 66 34262,55525 65 66 None6 65 66 None7 65 66 None

    Table 8.1: Tracks Partitioned into Groups (by voyage duration) and Segments(PartitionWindow = 2 hr) for Training Set and Testing Set

    the PartitionWindow e.g., effect of choosing PartitionWindow = 1 hr.

    The statistics of the different segments of the 6 groups of tracks are shown in Table 8.1for the 2 hr PartitionWindow. The columns “Group No.” and “Segment” contain different

    30

  • groups of tracks (in terms of total voyage duration) and different segments in each group,respectively. For example, the number of segments in Group 1 is 4, as total voyage durationis 8 hours and PartitionWindow = 2 hours. The columns “Training” and “Testing” containthe number of track segments in the training and testing sets.

    For evaluation, the “ground truth” of anomalous segments in the testing set is needed.The candidates of anomalous segments detected by our method will then be compared withsuch ground truth for accuracy evaluation. Our AIS data set from the U.S. Coast Guard didnot provide any information regarding ground truth. Typically, information about anoma-lous vessels that are indeed associated with illegal activities is limited to domain experts,e.g. Coast Guard and defence personnel. Due to this sensitivity nature, it is difficult toobtain ground truth from real-life AIS data sets.

    In the literature visualization approach has been used for constructing ground truth inunlabeled data [7]. The idea is to find out the objects that visually seem different than restof the objects. We use following two steps for constructing ground truth.

    • At first, we visualized the track segments using Google Earth and flagged those tracksegments as “Potential Ground Truth” which visually seemed to exhibit unusual move-ment.

    • Then we factor in contextual information (e.g., wind speed, wave height, etc.) at thelocation and time for each potential ground truth. If contextual features are withinspecified range (provided by experts), we mark the particular potential ground truthas “Ground Truth”. This is how we obtained the numbers in the “Ground Truth”column of Table 8.1 representing the VoyageIDs of true anomalies in a segment of thetesting set. “None” means that particular segment did not contain any true anomalies.

    In a similar fashion we can obtain the statistics of the different segments of the 6 groupsof tracks for the 1 hr PartitionWindow. Note that number of segments will be double ineach group of tracks for the 1 hr PartitionWindow. Moreover, for different segment size(e.g., 1 hr and 2 hr), the ground truth has to be determined separately though the strategyfor ground truth detection mentioned above remains same for each particular segment size.

    8.2 Evaluation

    We quantitatively evaluate MADCV for the 487 testing tracks based on the number of FalseAlarm Reduction, F Measure and Execution Time for anomaly detection. In Section 8.2.4,we quantitatively compare the performance between MADCV and other existing anomalydetection prototypes. We conclude the evaluation of MADCV with a case study on anomalydetection within the fourth segment of Group 6 of the testing set in Section 8.2.5.

    31

  • 8.2.1 False Alarm Reduction

    1 2 3 40123456789

    10N

    um

    be

    r o

    f F

    als

    e A

    larm

    s

    Segments

    Number of False Alarms in Group 1

    Without CVWith CV

    1 2 3 4 50123456789

    10

    Nu

    mb

    er

    of

    Fa

    lse

    Ala

    rms

    Segments

    Number of False Alarms in Group 2

    Without CVWith CV

    1 2 3 4 50123456789

    10

    Nu

    mb

    er

    of

    Fa

    lse

    Ala

    rms

    Segments

    Number of False Alarms in Group 3

    Without CVWith CV

    1 2 3 4 5 60123456789

    10

    Nu

    mb

    er

    of

    Fa

    lse

    Ala

    rms

    Segments

    Number of False Alarms in Group 4

    Without CVWith CV

    1 2 3 4 5 60123456789

    10N

    um

    be

    r o

    f F

    als

    e A

    larm

    s

    Segments

    Number of False Alarms in Group 5

    Without CVWith CV

    1 2 3 4 5 6 70123456789

    10

    Nu

    mb

    er

    of

    Fa

    lse

    Ala

    rms

    Segments

    Number of False Alarms in Group 6

    Without CVWith CV

    Figure 8.2: No. of False Alarms (Without CV vs With CV ) in Different Segment of the 6Groups of Testing Tracks for 2 hr PartitionWindow

    The term “false alarm” refers to the detection by MADCV that is not true anomaly(i.e. the detection does not match with the ground truth listed in column “Ground Truth”of Table 8.1). Fig. 8.2 and shows a comparison of the number of false alarms withoutcontextual verification (i.e. Without CV) and with contextual verification (i.e. With CV)in the different segment of the 6 groups of testing tracks for 2 hr PartitionWindow. Recallthat the value of Epsilon used for each segment was obtained during Normal PatternExtraction. Observe that with contextual verification, false alarms have been reduced inevery segment except Segment 6 of Group 6. Inclusion of other contextual information (e.g.port information, oil price etc.) could further reduce false alarms in this Segment.

    8.2.2 F Measure

    F Measure is the harmonic average of Precision and Recall. We used Eq. 8.1 and Eq. 8.2 forcalculating Precision and Recall respectively. The symbols NT D , NF A and NF N representthe number of true detection, the number of false alarms and the number of false negativesrespectively. Table 8.2 shows a comparison of F Measure for anomaly detection withoutcontextual verification and with contextual verification in 6 groups of testing tracks.

    Precision = NT DNT D + NF A

    (8.1)

    32

  • Recall = NT DNT D + NF N

    (8.2)

    As the Recall is 100% for each group of testing tracks, we did not report it in theTrable 8.2. While all ground truth anomalies were correctly detected with and withoutcontextual verification, i.e., Recall of 100%, the Precision without contextual verification issignificantly lower than the Precision with contextual verification due to the high numberof false alarms.

    Without CV With CVGroup No. Precision F Measure Precision F Measure

    1 0.50 0.67 1.00 1.002 0.20 0.33 0.28 0.443 0.60 0.75 1.00 1.004 0.33 0.49 1.00 1.005 0.32 0.48 0.46 0.636 0.23 0.37 0.43 0.60

    Table 8.2: Comparison of F Measure in the 6 Groups of Testing Tracks

    8.2.3 Execution Time

    All experiments were conducted on a 3.4 GHz Intel Core i7 processor with 16 GB of memory.It took 1 hour to extract normal patterns from the training set, and it took 0.18 seconds todetect anomalies in each track of the testing set, assuming that normal patterns are storedin the Maritime Data Warehouse. Note that normal pattern extraction is done offline inthe batch mode, whereas anomaly detection is done in real-time on all vessel tracks in thegeographical area of interest. As a matter of fact the efficiency of anomaly detection is farmore critical than that of normal pattern extraction.

    8.2.4 Quantitative Comparison

    Though we can find different approaches (see section 3.3) to detect anomalies in maritimedomain literature, most of the approaches are not supported by quantitative evaluationwith real life data as mentioned in [17]. A few prototype systems (e.g. SeeCoast, LEPER)reported in [17] do not provide much quantitative evaluation either, even though real-lifeAIS data has been incorporated in developing these prototypes. However, LEPER (see [17])is reported to detect 100% of anomalies, i.e. 100% Recall (comparable to our prototype interms of Recall), though no quantitative information is provided regarding generated falsealarms.

    33

  • 8.2.5 Anomaly Detection in Testing Set

    To explain how our approach reduces false alarms, let us consider the track segments withVoyageIDs 34262 and 88063 (see Fig. 8.3), which were detected as potential anomalies withinthe fourth segment of Group 6 of the testing set due to kinematic deviation. Notice thatVoyageID 34262 is a ground truth anomaly (see “Ground Truth” column of Table 8.1) butVoyageID 88063 is not. Table 8.3 shows the observed average of kinematic features (SOG

    VoyageID = 34262 Without CV: Anomaly With CV: Anomaly

    VoyageID = 88063 Without CV: Anomaly With CV: False Alarm

    Tracks Towards East

    Wind Towards West

    Origin A (in West) Destination Vancouver (in East)

    Figure 8.3: A Case Study of Anomaly Detection

    and COG) of the 2 vessels & contextual features (WindDirection, WindSpeed, GustSpeedand WaveHeight) at the occurrence location and time. The last row of Table 8.3 contains

    VoyageID Avg. Kinematic Features Contextual Features

    SOG (knots) COG (degree) WindDirection (degree) WindSpeed (m/s) GustSpeed (m/s) WaveHeight (m)

    34262 6.62 129.4 (W→E) 26 (E→W) 3.2 3.9 1.24

    88063 9.16 109.43 (W→E) 62 (E→W) 14.8 18.4 3.48

    Normal Ranges 10.1 71.49 (W→E) Not Applicable 0 – 10.7 0 – 10.7 0 – 2.5

    Table 8.3: Kinematic & Contextual Features of 2 Potential Anomalies

    average of kinematic features of the reference cluster (under the column “Avg. KinematicFeatures”) along with normal ranges of contextual features (under the column “ContextualFeatures”). In general, the normal ranges of WindSpeed, GustSpeed and WaveHeight canbe provided by experts. In our experiments, they were obtained from the “Beaufort Scaleand Probable Wave Height” website 1. Note that the normal range of WindDirection isnot required for the contextual verification process as WindDirection is mainly used tomatch vessel movement direction.

    The track segments with VoyageIDs 34262 and 88063 were moving from origin A to des-tination Vancouver into the wind (both track movement direction and wind direction are

    1http://www.peardrop.co.uk/beaufort.htm

    34

    http://www.peardrop.co.uk/beaufort.htm

  • shown in Fig. 8.3) and deviated in both average speed and average movement direction fromthe respective normal average speed of 10.1 knots and normal average movement directionof 71.49 degree (see Table 8.3). As a result, they were detected as potential anomalies.Typically, sailing with the wind may help a vessel ( e.g., boat) to increase speed, whereassailing into the wind may force a vessel to lower speed. More detailed information can beobtained from this site2.

    However, with contextual verification, VoyageID 88063 is discarded as an anomalybecause the vessel was moving into the wind, where observed WindSpeed (14.8 m/s),GustSpeed (18.4 m/s) and WaveHeight (3.48 m) were above the normal range of WindSpeed(0 – 10.7 m/s), GustSpeed (0 – 10.7 m/s) and WaveHeight (0 – 2.5 m) at the relevantregion and time, which caused the vessel to sail at less than usual average speed and de-viate from the usual movement direction. On the other hand, VoyageID 34262 remains ananomaly after contextual verification because its significant deviation in average SOG (6.62knots) and COG (129.4 degree) from normal averages cannot be explained by the normalwind and sea conditions at the relevant time and location.

    8.3 Other Contextual Information

    So far, we have mainly used weather information (wind direction, wind speed, gust speedand wave height) as contextual information to improve the precision of anomaly detectionthrough the reduction of false alarms. The precision can be further improved by includingother contextual information such as crew information, oil price, and seaport data. Thisonly requires extending the contextual features CFi for segment i and modifying the matchfunction f between CFi and the segment i of a query track; the rest of the approach remainsunchanged. Thus, our approach is highly adaptive to new background knowledge.

    Though we consider contextual information in a structured form, contextual informationcan be unstructured such as text. For example, consider the tweet by @IHS4Maritime on27th March, 2015: “Nigeria closes borders for election period: Nigeria’s sea and land bordersare currently closed as the country...”, where the key information is “sea border is closed inNigeria”. If any vessel does not arrive at the seaport in Nigeria within the scheduled time, thesystem should not flag this vessel as an anomaly because the sea border is closed. Note thatstructured queries can be performed to extract structured information from unstructuredtexts, e.g. tweets as mentioned in [5], and can be stored into any relational database i.e.the Maritime Data Warehouse for contextual verification. While it is highly interesting toconsider unstructured contextual information, that topic is beyond the scope of this thesis.

    2http://newt.phys.unsw.edu.au/~jw/sailing.html

    35

    http://newt.phys.unsw.edu.au/~jw/sailing.html

  • 8.4 Discussion

    In this section we provide a discussion on following two issues.

    • How the variation in the segment size (i.e. different choice of PartitionWindowsetting) affect the performance of the framework?

    • Can our framework detect anomalies in the AIS data set with irregular sampling rate?

    8.4.1 Variation in Segment Size

    1 2 3 4 5 6 7 80123456789

    10

    Nu

    mb

    er

    of

    Fa

    lse

    Ala

    rms

    Segments

    Number of False Alarms in Group 1

    Without CVWith CV

    1 2 3 4 5 6 7 8 90123456789

    10

    Nu

    mb

    er

    of

    Fa

    lse

    Ala

    rms

    Segments

    Number of False Alarms in Group 2

    Without CVWith CV

    1 2 3 4 5 6 7 8 9 100123456789

    10

    Nu

    mb

    er

    of

    Fa

    lse

    Ala

    rms

    Segments

    Number of False Alarms in Group 3

    Without CVWith CV

    1 2 3 4 5 6 7 8 9 10110123456789

    10

    Nu

    mb

    er

    of

    Fa

    lse

    Ala

    rms

    Segments

    Number of False Alarms in Group 4

    Without CVWith CV

    1 2 3 4 5 6 7 8 9 1011120123456789

    10

    Nu

    mb

    er

    of

    Fa

    lse

    Ala

    rms

    Segments

    Number of False Alarms in Group 5

    Without CVWith CV

    1 2 3 4 5 6 7 8 9 101112130123456789

    10

    Nu

    mb

    er

    of

    Fa

    lse

    Ala

    rms

    Segments

    Number of False Alarms in Group 6

    Without CVWith CV

    Figure 8.4: No. of False Alarms (Without CV vs With CV ) in Different Segment of the 6Groups of Testing Tracks for 1 hr PartitionWindow

    In Section 8.2 we evaluated the MADCV framework for the segment size of 2 hrPartitionWindow. It would be interesting to analyze the performance of the frameworkwith the variation in segment size (particularly if PartitionWindow = 1 hr). We obtainedFig. 8.4 following the similar strategy discussed in Section 8.2.1. Observe that with con-textual verification, false alarms have been reduced in most of the segments. Inclusion ofother contextual information (e.g. port information, oil price etc.) could further reducefalse alarms in those segments where generated number of false alarms with contextual ver-ification remained same.

    Fig. 8.5 shows the total number of false alarms (Without CV vs With CV) in all the 6groups of testing tracks for two different PartitionWindow setting i.e. for 1 hr and 2 hr

    36

  • PartitionWindow. It is interesting to observe that the generated number of false alarmsare more for the 1 hr PartitionWindow than for the 2 hr PartitionWindow. So, we canconclude that reducing the segment size could produce more false alarms.

    1 20

    102030405060708090

    100

    Tot

    al N

    umbe

    r of

    Fal

    se A

    larm

    s

    PartitionWindow (in hr)

    Number of False Alarms vs PartitionWindow

    Without CVWith CV

    Figure 8.5: No. of False Alarms (Without CV vs With CV ) in 6 Groups of Testing Tracksfor 1 hr and 2 hr PartitionWindow

    8.4.2 Irregular Sampling Rate of AIS Data

    In our experimental data set the sampling rate of the AIS data was 1 min per AIS datapoint, where on an average the rate of missing data was around 2.5%. Note that our AISdata set covered the coastal region. According to this website 3, the sampling rate of AISdata could be less than 10 seconds depending on the speed of the vessel. On the other hand,in the open ocean the sampling rate of AIS data could be very less than the sampling ratein the coastal region. However, our method would still be applicable to detect anomaliesin the open ocean as the distance measure (see Section 6.2) that we used for the normalpattern extraction and anomaly detection purpose can handle the irregular sampling rateof AIS data point. Note that if the rate of missing point is too much then the detectionperformance may degrade.

    8.5 Prototype Development

    As a proof of concept we have built a prototype of the proposed framework for solvingthe vessel anomaly detection problem discussed in this thesis. The potential end users ofa fully developed system could be Coast Guards, Defence Department, and other private

    3https://en.wikipedia.org/wiki/Automatic_Identification_SystemIn

    37

    https://en.wikipedia.org/wiki/Automatic_Identification_SystemIn

  • and public sector organizations related to the safety and security of a country. In orderto research on this real world problem, we have received generous financial support fromCanada’s Natural Sciences and Engineering Research Council (NSERC) and our industrialpartner MDA (http://www.mdacorporation.com) under the Collaborative Research andDevelopment Grants. MDA is a global communications and information company providingoperational solutions to commercial and government organizations worldwide. Our currentwork addresses maritime surveillance problems identified by MDA. At the current phase ofthe project, we are evaluating the developed prototype on different test case scenarios.

    We have implemented major components of different layers of the framework using C#programming language. Bottom layer of the framework i.e. Maritime Data Warehouse hasbeen developed using Microsoft SQL Server 2014. In order to provide visualization supportto user through Google Earth, we have written scripts in Matlab utilizing the Google Earthtoolbox for Matlab4.

    4http://www.mathworks.com/matlabcentral/fileexchange/12954-google-earth-toolbox

    38

    (http://www.mdacorporation.com)http://www.mathworks.com/matlabcentral/fileexchange/12954-google-earth-toolbox

  • Chapter 9

    Software Documentation

    We start with a discussion of necessary software for executing our developed prototypeapplication in Section 9.1. Then we provide discussion on how to execute anomaly de-tection process, normal pattern extraction process and ground truth detection process inSections 9.2, 9.3 and 9.4 respectively.

    9.1 Required Software

    In order to execute the developed prototype, we need following installed software in themachine.

    • Windows Operating System 7 or later version.

    • Microsoft .Net Framework 4 or later version.

    • ArcGIS 10.2 or later version.

    • Google Earth.

    • Matlab R2014a or newer version.

    • Microsoft SQL Server 2014 (storage for data warehouse).

    9.2 Execution of Anomaly Detection Process

    At first, user needs to execute the file named SVAAPP.exe (executable of the developedprototype) which will open the interface (shown in Fig. 9.1) for anomaly detection. Theinterface is divided into following three panels.

    • input panel

    • output panel

    39

  • Figure 9.1: Interface for Anomaly Detection

    • visualization panel

    Input panel has following input fields.

    • UTMZone

    • Source

    • Destination

    • VesselType

    • StartDate (Training)

    • EndDate (Training)

    • StartDate (Testing)

    • EndDate (Testing)

    40

  • • PartitionWindow (hr)

    • Epsilon (%)

    • NumberofTracks (%)

    User has to provide inputs as shown in Fig. 9.2 and then click the button named StartDetection for executing the anomaly detection process.

    In the output panel, information about anomalies with explanation is shown for finaluser decision. Visualization panel shows the actual tracks matching the input constraintsto end user.

    Figure 9.2: Input Selection for Anomaly Detection

    9.3 Execution of Normal Pattern Extraction Process

    User can follow the similar strategy mentioned in Section 9.2 with slight difference in inputconstraints for executing normal pattern extraction process. User has to provide inputs as

    41

  • shown in Fig. 9.3 and then click the button named Extract Pattern for executing the normalpattern extraction process. In the output panel, system provides useful messages to userfor showing the execution status.

    Figure 9.3: Input Selection for Normal Pattern Extraction

    9.4 Execution of Ground Truth Detection Process

    As our AIS data set didn’t provide any information any information regarding “GroundTruth”, we followed the strategy mentioned in Section 8.1 for detecting ground truth. Userhas to provide inputs as sh