5
Big Data Infrastructure for heterogeneous sources and Data fusion services applied to Maritime Surveillance Giuseppe Vella a , Giovanni Barone a , Viviana Latino b , Domenico Messina a , Vito Morreale a a Engineering Ingegneria Informatica, P.le dell’Agricoltura 24, 00144 – Rome (Italy); b Eka Srl Via Monteroni s.n., C/O Edificio Dhitech - Ecotekne, 73100 Lecce ABSTRACT The large amount of data coming from different sensors involved in Maritime Surveillance context that are available today are not usable by maritime security systems since they are not accessible at the same time and, often, they are not interoperable and their usage is often regulated by different national and transnational policies that makes difficult the sharing of information between public authorities. The main challenge described in this paper is to overcome such difficulties and provide a Big Data Infrastructure (BDI) that will be the information hub used by all the Data fusion Services of the partners involved in the MARISA (Maritime Integrated Surveillance Awareness) project. The BDI will allow the Data fusion services to store and retrieve different kind of data coming from different sensors like Automatic Identification System (AIS) or Synthetic Aperture Radar. Once the data are stored according to a specific project data model that extends the Common Information Sharing Environment (CISE) model, all the information fused at different levels are represented by a unified multi layered Maritime Situational Awareness application. Keywords: Big Data Infrastructure, heterogeneous source integration, Data Fusion Services, CISE data model 1. INTRODUCTION Europe has a coastline almost 68000 km and the maritime area under the jurisdiction of European Union (EU) Member States is larger than the total land area of the EU. This means that from one side sea is vital for European economy for the development of commercial activities like tourism, fishery, transports or even mineral extraction or wind farms, while from the other side it can also be used by criminals and terrorists to commit crimes that unavoidably raise threats like: piracy, trafficking of drugs, irregular immigration, smuggling, illegal fishing, environmental crimes and maritime accidents/disasters. Indeed Europe needs to enhance cross-border and cross-sectoral cooperation to deliver maritime security, to optimize the information exchange between Legacy systems used by the National Coordination Centers, to secure maritime ecosystem, to reduce maritime traffic and human lives, to prevent and react to maritime crimes committed in the sea or near the costs. Countries need a common operating framework, a common format for the information to be shared, an interactive environment that can provide not only a Maritime Situational Awareness for a real operational environment but a toolkit that will support end users to observe the current situation at the sea, comprehend which can be potential threats, determine if vessels on the sea are fulfilling pre defined criminal behaviors, react whenever a crime occurs. The Big Data Infrastructure implemented and described in this paper aims at integrating data coming from heterogeneous sources and giving the chance to the different machine learning algorithms identified with the MARISA Toolkit as Data Fusion Services (nearly 30) and developed in the MARISA project to identify suspicious and even recurring criminal behaviors of vessels and reduce risks at the sea. 2. BACKGROUND Large amounts of “raw” data are being collected nowadays, at unprecedented scale, coming from different sources, from different sorts of assets from different EU Member States, from the Internet and social networks, and gathered for different security purposes, in a variety of formats, are available but not necessarily exploitable because they are not accessible at the same time nor interoperable, until they are “fused” and made “understandable” to all systems supporting information exchange, situational awareness, and decision-making and reaction capability at the EU external maritime borders. In [1] the improper usage of the same MMSI (or IMO, call sign, destination extracted by AIS or Long Range Identification and Tracking) by more than one vessel is described as an operation managed by a data fusion approach

Big Data Infrastructure for source integration and

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data Infrastructure for source integration and

Big Data Infrastructure for heterogeneous sources and Data fusion services applied to Maritime Surveillance

Giuseppe Vellaa, Giovanni Baronea, Viviana Latinob, Domenico Messinaa, Vito Morrealea

aEngineering Ingegneria Informatica, P.le dell’Agricoltura 24, 00144 – Rome (Italy); bEka Srl Via Monteroni s.n., C/O Edificio Dhitech - Ecotekne, 73100 Lecce

ABSTRACT

The large amount of data coming from different sensors involved in Maritime Surveillance context that are available today are not usable by maritime security systems since they are not accessible at the same time and, often, they are not interoperable and their usage is often regulated by different national and transnational policies that makes difficult the sharing of information between public authorities. The main challenge described in this paper is to overcome such difficulties and provide a Big Data Infrastructure (BDI) that will be the information hub used by all the Data fusion Services of the partners involved in the MARISA (Maritime Integrated Surveillance Awareness) project. The BDI will allow the Data fusion services to store and retrieve different kind of data coming from different sensors like Automatic Identification System (AIS) or Synthetic Aperture Radar. Once the data are stored according to a specific project data model that extends the Common Information Sharing Environment (CISE) model, all the information fused at different levels are represented by a unified multi layered Maritime Situational Awareness application.

Keywords: Big Data Infrastructure, heterogeneous source integration, Data Fusion Services, CISE data model

1. INTRODUCTION Europe has a coastline almost 68000 km and the maritime area under the jurisdiction of European Union (EU) Member States is larger than the total land area of the EU. This means that from one side sea is vital for European economy for the development of commercial activities like tourism, fishery, transports or even mineral extraction or wind farms, while from the other side it can also be used by criminals and terrorists to commit crimes that unavoidably raise threats like: piracy, trafficking of drugs, irregular immigration, smuggling, illegal fishing, environmental crimes and maritime accidents/disasters. Indeed Europe needs to enhance cross-border and cross-sectoral cooperation to deliver maritime security, to optimize the information exchange between Legacy systems used by the National Coordination Centers, to secure maritime ecosystem, to reduce maritime traffic and human lives, to prevent and react to maritime crimes committed in the sea or near the costs. Countries need a common operating framework, a common format for the information to be shared, an interactive environment that can provide not only a Maritime Situational Awareness for a real operational environment but a toolkit that will support end users to observe the current situation at the sea, comprehend which can be potential threats, determine if vessels on the sea are fulfilling pre defined criminal behaviors, react whenever a crime occurs. The Big Data Infrastructure implemented and described in this paper aims at integrating data coming from heterogeneous sources and giving the chance to the different machine learning algorithms identified with the MARISA Toolkit as Data Fusion Services (nearly 30) and developed in the MARISA project to identify suspicious and even recurring criminal behaviors of vessels and reduce risks at the sea.

2. BACKGROUND Large amounts of “raw” data are being collected nowadays, at unprecedented scale, coming from different sources, from different sorts of assets from different EU Member States, from the Internet and social networks, and gathered for different security purposes, in a variety of formats, are available but not necessarily exploitable because they are not accessible at the same time nor interoperable, until they are “fused” and made “understandable” to all systems supporting information exchange, situational awareness, and decision-making and reaction capability at the EU external maritime borders. In [1] the improper usage of the same MMSI (or IMO, call sign, destination extracted by AIS or Long Range Identification and Tracking) by more than one vessel is described as an operation managed by a data fusion approach

Page 2: Big Data Infrastructure for source integration and

that tackles the recognition and the maintenance of each ship-track at different stages. (1) Gating: a selection of the track candidates is accomplished by using geometric and kinematic considerations. (2) Data Association: an association between the incoming message and existing tracks is made through a nearest neighbor approach. (3) Track Management: Track Management: track initiation, confirmation and deletion are implemented [2][3]. [4] explains how different sensors can be used to monitor and to track different targets to fuse them into one single target utilizing optical and microwave sensors on platforms such as satellites and airplanes, thus avoiding the limitations of the sensors, but this introduces limitations in the platform. The most limiting factor is the interrupted data availability, since no airplane is able to stay in the air constantly during the whole year and during all weather conditions. Meanwhile, satellites, which are orbiting around the earth, will be over the zone of interest for a limited time only. The combination of rule-based architecture for the mining of AIS data stream and statistical models to BDI and frameworks largely remains to be investigated [5]. Some progresses have been made in the extraction of maritime Pattern of Life through an unsupervised approach that has been tested using extensive datasets[6]. We propose to use a Big Data Infrastructure that orchestrates the flow related to data coming from the CISE network, the sensors (AIS, Radar) to reconstruct maritime Patterns of Life and data related to the Meteo Oceanographic Conditions, provides access to the different data fusion services that elaborates data through several steps and stores newly produced information in the proper type of storage allowing all the rule-based behavior analysis Data Fusion services to detect anomalies and risks that are displayed in a multi-layered Maritime Situational Picture.

3. METHOD 3.1 Overview

The MARISA Toolkit architecture is composed of different layers that allow all the modules to exchange data (Vessels detected, Alerts, Routes, Documents and Open Source INTelligence data) starting from the data ingestion till to the visualization phase. In particular we will focus on the anatomy of the BDI that is the knowledge hub for the whole toolkit and allows storing all the information coming from the external data sources.

Figure 1. Overall MARISA Architecture.

The BDI allows to Data Fusion Services to retrieve data, elaborate them and create data products, and to the Representational State Transfer (REST) hub layer, the Data Fusion Distribution Services (DFDS), to distribute data to any other external system all the information produced by the Data Fusion Services (DFS). Last but not least through the

Page 3: Big Data Infrastructure for source integration and

BDI rest Application Programming Interface (API) and the Geoserver API the BDI allows to present the information to the end users. In the following paragraphs the different layers that interact with the BDI will be introduced.

Interfaces for external data sources - this layer allows the different adapters to ingest data from external sources: Interchange VTS Exchange Format (IVEF), CISE, Satellite, OSINT, Weather data and External GIS servers that in particular store Web Feature Service (WFS) that will be fused by the different Data fusion services and Web map Service (WMS) that will be represented in the multi layered Maritime Situation Picture. All the data coming from these sources will be adapted to the MARISA data model and stored through the streaming platform based on Apache Kafka [7] and Spring Cloud Data Flow [8]. The BDI allows to manage different processes: real time data that are ingested by means of streaming services and stored temporarily on Apache Kafka and on Redis [9] for further queries and offline ingestion of data stored into big table databases like Cassandra [10] or on Geoserver [11] in case of data related to Satellite, AIS or Meteo Oceanographic conditions.

Data Fusion Service layer - after the ingestion is finished, all the data fusion services of the different levels (observation, comprehension and projection of future states) can consume either in one step data using Kafka API and producing information that will be stored and available as data fusion products or can produce further new data that will be elaborated in several steps by other data fusion services that in the end will produce information that will be ready to be visualized.

User interactions layer - the user interaction layer has the objective to present the Data Products of the Data Fusion Services according to the configurations that have been set for each service. This layer accesses the BDI via Kafka consumers. Moreover, the UI layer is able to retrieve textual data produced by some DFS by means of the REST API and maps with geographical features directly accessing to the BDI geoserver that aggregates data coming from proprietary partners geoservers (e.g. WMS and WFS layers).

Data Fusion Distribution Services - The Data Fusion Distribution Services layer interacts with the MARISA REST BDI API, the Geoserver API and the Data Fusion Services API in order to retrieve information for each data product provided by each Data Fusion Service, allowing queries to external systems.

3.2 Specific approach

The core of the BDI is the streaming platform that is based on Apache Kafka and Spring Cloud Data Flow. The processes for the streaming platform are modeled using Domain Specific Language by creating a stream that contains: the name of the topic (the MARISA entity) that contains data coming from sensors or processed information as output of the Data Fusion Services; the source of data with a direct access to external geoservers as data providers; the sink or the component that will allow to store information both in real time on Redis or on Cassandra. Moreover External Data Fusion services can store previously analyzed information directly on Postgres. In case of external Geoservers that produce daily amount through the processing of AIS or Synthetic Aperture Radar data and inject them into the centralized geoserver, the BDI retrieves those information from the geoserver by means of an authenticated HTTP Client Source component that stores information through the streaming platform on Kafka with a sink on Redis and Cassandra for Business Intelligence purposes.

Figure 2. The Big Data Infrastructure

Page 4: Big Data Infrastructure for source integration and

3.3 Data Fusion results

Once information has been stored according to the processes described in the previous section, they are displayed in different layers of the MARISA Situational picture enabling a further analysis of the situation thanks to the rule-based Behavioral Analysis services that detect anomalies starting from the detected vessels fused by DFS. Meteo Oceanographic Condition service output will be not only displayed to the end users of the MARISA project (Marina Militare, Hellenic Minister of Defence, Guardia Civil, Portuguese Navy, Dutch Coast Guard) but they will be used by other prediction services to forecast routes or arrange missions to solve critical situations.

Figure 3. The multi layered Maritime Situational Picture and the real time Alarms console.

4. BIG DATA INFRASTRUCTURE RESULTS The BDI is an infrastructure that distributed on a cluster composed by 4 nodes. The operating system of every node is based on CentOS Linux 7.5.1804 and each node is composed by 4 high frequency single core Intel Xeon Gold, 47 GiB of RAM and a storage capacity of 200 GiB. The orchestration of the components of the infrastructure is performed by Kubernetes. Kubernetes master is on the master node while the other three nodes are used to distribute the workload.

Data stored in the BDI is managed by three main technological components: Apache Kafka, Redis and Apache Cassandra. Kafka is composed by a unique instance of Apache Zookeeper and Apache Kafka that stores between an average of 20 and 50 messages per second with an average size of 1500 byte per message. The daily volume of data managed by Kafka is nearly 1.7 GiB taking into account traffic data in the network. For what concerns Redis, it allows to manage up to 8 GiB of data. It represents the live streaming situation and it allows to store up to 3 days. Apache Cassandra is like a data center composed by three nodes with a Simple Strategy as a replication strategy and a replication factor is equal to 3. This configuration guarantees high reliability and fault tolerance. By means of Cassandra is possible to store up to 2 months data. The size of data managed by the previously described components will increase in a more powerful infrastructure with the increasing of the cluster nodes as explained in the next section.

4.1 Historical data elaboration on Cassandra via Spark.

In order to test the scalability and the performances of the BDI we have performed some tests on data stored in Cassandra. Data are elaborated and extracted through the usage of Apache Spark [12] distributed process engine. In particular for the MARISA project purposes we have tested Spark using a driver (master node) and two workers (executors) executing it in cluster via Kubernetes.

Page 5: Big Data Infrastructure for source integration and

In the table below there is an example of the response time after a request for elaboration of the data stored on Cassandra after Spark is executed to elaborate and perform queries on data using 1 executor (worker) or 2 executors.

Table 1. Overview of the messages managed by Cassandra and the related response time of Apache Spark analysis

# of records on Cassandra

Response time (in seconds) – 1 worker

Response time (in seconds) – 2 workers

1.000 28’’ 29’’

10.000 28’’ 39’’

100.000 78’’ 62’’

1.000.000 92’’ 64’’

After the performing of the tests illustrated above, we can conclude that Apache Spark scales horizontally with the increasing of nodes and an increasing dataset. Response time on a dataset of 1000 records is almost the same by using one or two workers. With the increasing of the size of the dataset, response time improves with the increasing of workers considering that the analysis service run with Apache Spark includes the overhead of data traffic on the network and the time needed to deploy drivers and workers.

ACKNOWLEDGEMENTS

The MARISA Project is coordinated by Leonardo SpA. The MARISA project has received funding from the European Union's Horizon 2020 Research and Innovation programme under Grant Agreement No 740698.

REFERENCES

[1] Fabio Mazzarella, Alfredo Alessandrini, Harm Greidanus, Marlene Alvarez, Pietro Argentieri, Domenico Nappo, Lukasz Ziemba, "Data Fusion for Wide-Area Maritime Surveillance", MOVE Workshop on Moving Objects at Sea, Brest, France (2003).

[2] Monica Posada, Harm Greidanus, Marlene Alvarez, Michele Vespe, Tulay Cokacar, Silvia Falchetti, “Maritime awareness for counter-piracy in the Gulf of Aden”, IGARSS, 249-252 (2011).

[3] Silvia Falchetti, Marlene Alvarez Alvarez, Tulay Cokacar, Harm Greidanus, Michele Vespe, “Improving Co-operative Vessel Tracking for a Maritime Integrated Surveillance Platform”, Proc. NATO RTO SCI-247 Symposium on “Port and Regional Maritime Security”, Lerici, Italy, (21-23 May 2012)

[4] Dejan Nikolic, Nikola Stojkovic, Zdravko Popovic, Nikola Tosic, Nikola Lekic, Zoran Stankovic, Nebojsa Doncov: Maritime Over the Horizon Sensor Integration: HFSWR Data Fusion Algorithm. Remote Sensing 11(7): 852 (2019)

[5] Ronan Fablet, Nicolas Bellec, Laetitia Chapel, Chloé Friguet, René Garello, Pierre Gloaguen, Guillaume Hajduch, Sébastien Lefèvre, François Merciol, Pascal Morillon, et al., “Next Step for Big Data Infrastructure and Analytics for the Surveillance of the Maritime Traffic from AIS Sentinel Satellite Data Streams”, Conference: Big Data for Space Conference, BiDS'2017, Toulouse, France, 1-4 (November 2017)

[6] Nicola Forti, Leonardo Millefiori, Paolo Braca, "Unsupervised Extraction of Maritime Patterns of Life from Automatic Identification System Data" in Proc. of the OCEANS 2019 MTS/IEEE Conference, 2019

[7] https://kafka.apache.org/ [8] https://spring.io/projects/spring-cloud-dataflow [9] https://redis.io/ [10] http://cassandra.apache.org/ [11] http://geoserver.org/ [12] https://spark.apache.org/