17
Disy Informationssysteme GmbH 1 Disy Informationssysteme GmbH Andreas Abecker 1 , Torsten Brauer 1 , Johannes Kutterer 1 , Jens Nimis 2 , Patrick Wiener 2 Toward An Architecture for Processing Spatial Big Data www.disy.net 2 Hochschule Karlsruhe - Technik und Wirtschaft 2016, May 25 th Geospatial World Forum, Rotterdam (NL)

Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

Disy Informationssysteme GmbH

1 Disy Informationssysteme GmbH

Andreas Abecker1, Torsten Brauer1, Johannes Kutterer1,

Jens Nimis2, Patrick Wiener2

Toward An Architecture for Processing

Spatial Big Data

www.disy.net

2 Hochschule Karlsruhe - Technik und Wirtschaft

2016, May 25th – Geospatial World Forum, Rotterdam (NL)

Page 2: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

Disy Informationssysteme GmbH

4

Most characteristic big data dimensions

2016, May 25th – Geospatial World Forum, Rotterdam (NL)

Page 3: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

In the upcoming years the availability of spatial data

will be exploding

• Cheaper, easier accessible and more detailed

satellite data (incl. micro and nano satellites)

• More and more application ideas for

Unmanned Aerial Vehicles (UAV)

• Cheaper and more powerful in-situ sensors

with real-time remote data transfer

• Cheaper and more powerful mobile sensors

with real-time remote data transfer mounted

on vehicles, coupled to Smartphones etc.

• Internet-of-things, Industry 4.0 etc.

• Volunteered Geographic Information

• Georeferenced social media content

Disy Informationssysteme GmbH

Volume

Ve

loc

ity

Veracity

Va

rie

ty

5

2016, May 25th – Geospatial World Forum, Rotterdam (NL)

Page 4: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

Analytics of large, heterogeneous and highly-frequent spatial data

will drive promising application scenarios

• Precision agriculture

• Smart city monitoring and control

• Disaster management

• Smart energy

• Context-specific

marketing and

information services

• …

Disy Informationssysteme GmbH

Page 5: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

The BigGIS Research Project

• Project: BigGIS: Prescriptive and Predictive GIS

Based on High-Dimensional Spatio-Temporal Data Structures

• Duration: April 2015 – March 2018

• Funded By: German Ministry for Education and Research (BMBF)

3 Application Partners

Remote sensing SME

Data-integration researcher

Data-mining researcher Decision-support researcher

Data-visualization researcher

Spatial analytics SME In-memory DB SME

Infrastructure researcher

Page 6: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

BigGIS Pilot Application 1: Urban Heat Islands

• Context: Urban micro-climate depends on weather, pollution, land-use, urban green, architecture, …

• Goal: More exact assessment of actual situation and short-term, fine-grained prediction of urban micro-climate (temperature, ozone, PM10, …)

• Approach: new measurements plus inter-/extrapolation of measurement data

• Applications: • Routing people with minimum heat

exposure (cp. OpenSense project)

• Targeted warnings for high-risk groups

• Warnings for kindergarten, old-age homes, etc.

• Predominant big data characteristics: variety, veracity

Data sources:

• Official topographic and cadastral data

• Thermography aerial survey Karlsruhe

• Normalized Difference Vegetation Index

(EnviSAT, Landsat)

• Level-of-detail 2: 3D model Karlsruhe

• Sensors of meteorological service and

environment agency (DWD, LUBW)

• Climate data of Karlsruhe University

• Planned: Mobile sensors

• Planned: Participatory sensing

• Planned: Radar data

• Planned: Social media analysis

Page 7: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

BigGIS Pilot Application 2: Disaster/Emergency Management

• Context: Disaster Management

(floods, (wild)fires, chemical

accidents, terrorist attacks, …)

• Goal: Within 15min after the

event, have an emergency map

for fire brigades – plus

continuous updates – plus

predictions about further

evolution (e.g., movement of

cloud of poisenous gas)

• Approach: Combine UAV

remote sensing data with

background knowledge and in-

situ observations; data focus

and dimension reduction is key

• Predominant big data

characteristics: volume, variety,

veracity, (velocity)

Disy Informationssysteme GmbH

Data sources:

• Micro Rapid Mapping: micro flight robot

(AiD MC8 Octocopter) with sensors

such as RGB camera (Sony Smart Shot

IL CE QX1), thermal camera (FLIR

Quark 2), Hyperspectral (Cubert UHD

185 Firefly), RTK GPS

• Official topographic + cadastral data:

critical infrastructures, endangered

population, protected sites, …

• Crowdmapping + social media content

9

2016, May 25th – Geospatial World Forum, Rotterdam (NL)

Page 8: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

BigGIS Pilot Application 3: Invasive Species

Data sources:

• Land-use and land-cover

data (as fine-grained as

possible)

• Official observation data of

species (collected by

environment agencies, e.g.

by traps)

• Weather observations and

weather forecasts (as fine-

grained as possible)

• Crowdmapping for some

species

Disy Informationssysteme GmbH

11

• Context: Invasive species may

create serious economic

damages or health problems

• Example: Drosophila suzukii

• Goal: understand and predict

the distribution patterns and

dynamics of imvasive species

depending on vegetation,

weather etc.

• Approach: learn distribution

mechanisms from

historic data

• Predominant big data

characteristics:

variety, veracity

2016, May 25th – Geospatial World Forum, Rotterdam (NL)

Page 9: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

Toward a pipeline architecture for processing geo data (streams)

Pre-Analytics/Storage

Consumer

Source

Events

Collector

Producer

Ingestion/Queueing

Broker

Analytics

Decider

Delivery

Endpoint

Dashboard

Tm

Messaging System

(Kafka)

S2

S3

Sn

S Source T Target system (e.g.

visualisation)

Data flow Rimpl,expl implicit/explicit Raster Vec Vector

Web

...

Mobile

S1

API3

APIn Brokern

API1

API2

Cadenza

Integration

(R, Java, ...)

Predictive

Analytics

Data Mining

In-Memory DB

(EXASolution)

λ-Architecture

Batch

Stream

HDFS EFTAS

Semantics / Metadata

System Management (docker)

Rimpl

Rexpl

Vec

Primitives

Broker1

Broker2

Page 10: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

Toward a pipeline architecture for processing geo data (streams)

Pre-Analytics/Storage

Consumer

Source

Events

Collector

Producer

Ingestion/Queueing

Broker

Analytics

Decider

Delivery

Endpoint

Dashboard

Tm

Messaging System

(Kafka)

S2

S3

Sn

S Source T Target system (e.g.

visualisation)

Data flow Rimpl,expl implicit/explicit Raster Vec Vector

Web

...

Mobile

S1

API3

APIn Brokern

API1

API2

Cadenza

Integration

(R, Java, ...)

Predictive

Analytics

Data Mining

In-Memory DB

(EXASolution)

λ-Architecture

Batch

Stream

HDFS EFTAS

Semantics / Metadata

System Management (docker)

Rimpl

Rexpl

Vec

Primitives

Broker1

Broker2

Embedding into existing Spatial Data

Infrastructures seems to be mandatory

(cp. standardization / OGC / …)

Embedding into existing Spatial Data Infra-

structures seems to be mandatory (cp. standar-

dization / OGC / domain-specific standards…)

Page 11: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

Toward a pipeline architecture for processing geo data (streams)

Pre-Analytics/Storage

Consumer

Triple-Store

Source

Events

Collector

Producer

Ingestion/Queueing

Broker

Analytics

Decider

Delivery

Endpoint

Dashboard

Tm

Messaging System

(Kafka)

S2

S3

Sn

S Source T Target system (e.g.

visualisation)

Data flow Rimpl,expl implicit/explicit Raster Vec Vector

Web

...

Mobile

S1

API3

APIn Brokern

API1

API2

Cadenza

Integration

(R, Java, ...)

Predictive

Analytics

Data Mining

In-Memory DB

(EXASolution)

λ-Architecture

Batch

Stream

HDFS EFTAS

Semantics / Metadata

System Management (docker)

Rimpl

Rexpl

Vec

Primitives

Broker1

Broker2

In-memory DB technology for spatial

analytics is inevitable (experiments

with EXAsolution, SAP HANA, Oracle

Spatial … )

Page 12: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

Toward a pipeline architecture for processing geo data (streams)

Pre-Analytics/Storage

Consumer

Source

Events

Collector

Producer

Ingestion/Queueing

Broker

Analytics

Decider

Delivery

Endpoint

Dashboard

Tm

Messaging System

(Kafka)

S2

S3

Sn

S Source T Target system (e.g.

visualisation)

Data flow Rimpl,expl implicit/explicit Raster Vec Vector

Web

...

Mobile

S1

API3

APIn Brokern

API1

API2

Cadenza

Integration

(R, Java, ...)

Predictive

Analytics

Data Mining

In-Memory DB

(EXASolution)

λ-Architecture

Batch

Stream

HDFS EFTAS

Semantics / Metadata

System Management (docker)

Rimpl

Rexpl

Vec

Primitives

Broker1

Broker2 Special treatment of

raw data from remote

sensing seems to be

indispensable (in many

respects)

Page 13: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

Toward a pipeline architecture for processing geo data (streams)

Pre-Analytics/Storage

Consumer

Source

Events

Collector

Producer

Ingestion/Queueing

Broker

Analytics

Decider

Delivery

Endpoint

Dashboard

Tm

Messaging System

(Kafka)

S2

S3

Sn

S Source T Target system (e.g.

visualisation)

Data flow Rimpl,expl implicit/explicit Raster Vec Vector

Web

...

Mobile

S1

API3

APIn Brokern

API1

API2

Cadenza

Integration

(R, Java, ...)

Predictive

Analytics

Data Mining

In-Memory DB

(EXASolution)

λ-Architecture

Batch

Stream

HDFS EFTAS

Semantics / Metadata

System Management (docker)

Rimpl

Rexpl

Vec

Primitives

Broker1

Broker2

Not treated here: system parts with severe

resource limitations (mobile devices, hardware

on fire-brigade car, hardware on-board UAV, …)

as well as network limitations

- What shall be moved?

- Data or code?

- Raw data or processed data?

Page 14: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

Toward a pipeline architecture for processing geo data (streams)

Pre-Analytics/Storage

Consumer

Triple-Store

Source

Events

Collector

Producer

Ingestion/Queueing

Broker

Analytics

Decider

Delivery

Endpoint

Dashboard

Tm

Messaging System

(Kafka)

S2

S3

Sn

S Source T Target system (e.g.

visualisation)

Data flow Rimpl,expl implicit/explicit Raster Vec Vector

Web

...

Mobile

S1

API3

APIn Brokern

API1

API2

Cadenza

Integration

(R, Java, ...)

Predictive

Analytics

Data Mining

In-Memory DB

(EXASolution)

λ-Architecture

Batch

Stream

HDFS EFTAS

Semantics / Metadata

System Management (docker)

Rimpl

Rexpl

Vec

Primitives

Broker1

Broker2

Not yet shown here: user-feedback loops

There are many novel and useful

big (or smart) data applications

- not so much data-driven

- not so much real-time

Page 15: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

Toward a pipeline architecture for processing geo data (streams)

Pre-Analytics/Storage

Consumer

Triple-Store

Source

Events

Collector

Producer

Ingestion/Queueing

Broker

Analytics

Decider

Delivery

Endpoint

Dashboard

Tm

Messaging System

(Kafka)

S2

S3

Sn

S Source T Target system (e.g.

visualisation)

Data flow Rimpl,expl implicit/explicit Raster Vec Vector

Web

...

Mobile

S1

API3

APIn Brokern

API1

API2

Cadenza

Integration

(R, Java, ...)

Predictive

Analytics

Data Mining

In-Memory DB

(EXASolution)

λ-Architecture

Batch

Stream

HDFS EFTAS

Semantics / Metadata

System Management (docker)

Rimpl

Rexpl

Vec

Primitives

Broker1

Broker2

Not yet shown here:

- semantic harmonization, geocoding, …

- Ideally, done automatically based on semantic metadata

about sources and algorithms

Three most important big data

dimensions in our experience:

variety, variety, variety

>> so, the „no ETL“ approach

seems to be questionable

Page 16: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

Some concluding and some additional remarks

• In our experience, nowadays, variety is the key dimension

• Volume comes with remote-sensing raw data

• Velocity will come with more and more sensors

• Nevertheless, today‘s applications are already pretty demandig !

• Big data technology has to offer already valuable bits and pieces (in-

memory DB, distributed storage and processing, virtualization)

• But embedding spatial big data applications optimally into legacy

hardware/software landscapes still requires some ideas and

experience

• Overall, there is a huge „usability gap“ between raw data / domain-

expert knowledge and machine-learning / decision-support level

• Security and privacy may be significant blockers

• Some working areas for the technical guys:

• Machine learning with dynamically changing spatial aggregations

• Spatial Complex-Event Processing

2016, May 25th – Geospatial World Forum, Rotterdam (NL)

19

Page 17: Toward An Architecture for Processing Spatial Big … › speaker › SpeakersImages › ...Toward a pipeline architecture for processing geo data (streams) Pre-Analytics/Storage Consumer

Disy Informationssysteme GmbH

Thank you !

www.disy.net

Dr. Andreas Abecker Dipl.-Inform.

Head of Innovation Management

Ludwig-Erhard-Allee 6

76131 Karlsruhe, Germany

www.disy.net

Tel. +49 721 16006-256

Fax +49 721 16006-05

[email protected]

Disy Informationssysteme GmbH

20 2016, May 25th – Geospatial World Forum, Rotterdam (NL)