40
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH Big Data Solution Architectures 29.9.2016 DOAG 2016 Big Data Days Guido Schmutz Trivadis

Big Data Architectures

Embed Size (px)

Citation preview

BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH

Big Data Solution Architectures29.9.2016 – DOAG 2016 Big Data DaysGuido SchmutzTrivadis

Guido Schmutz

Working for Trivadis for more than 19 yearsOracle ACE Director for Fusion Middleware and SOACo-Author of different booksConsultant, Trainer, Software Architect for Java, SOA & Big Data / Fast DataMember of Trivadis Architecture BoardTechnology Manager @ Trivadis

More than 25 years of software development experience

Contact: [email protected]: http://guidoschmutz.wordpress.comSlideshare: http://www.slideshare.net/gschmutzTwitter: gschmutz

29.9.2016 Big Data Solution Architectures2

Agenda

Big Data Solution Architectures3 29.9.2016

1. Introduction2. Big Data Reference Architectures

• Traditional Big Data• Event / Stream-Processing• Lambda Architecture• Kappa Architecture• Unified Architecture

3. Big Data Ecosystem – many choices sorted!

Introduction

Big Data Solution Architectures29.9.20164

Why talking about Big Data Architectures

Choosing the right architecture is key for any (big data) project

Big Data is still quite a rather young field and therefore a “moving target”

no standard architectures available which have been used for years

In the past years, some architectures and best practices have evolved

Know your use cases before choosing your architecture / technologies

To have a reference architecture in place helps in choosing the right/matching technologies

Big Data Solution Architectures29.9.20165

How to do Big Data? Why is a structure / architecture important

Big Data Solution Architectures29.9.20166

Big Data Ecosystem – many choices sorted!

Big Data Solution Architectures29.9.20167

Important Properties for choosing (Big) Data Architecture

Latency

Keep raw and un-interpreted data “forever” ?

Volume, Velocity, Variety, Veracity

Ad-Hoc Query Capabilities needed ?

Robustness & Fault Tolerance

Scalability

Big Data Solution Architectures29.9.20169

Big Data Reference Architectures -Traditional Big Data

Big Data Solution Architectures29.9.201610

“Traditional Architecture” for Big Data

DataIngestion (Analytical)DataProcessing ResultStoreData

SourcesData

Consumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

Batchcompute

PushingIngestion ResultStore

QueryEngine

ComputedInformation

RawData(Reservoir)

=DatainMotion =DataatRestBig Data Solution Architectures

PullingIngestion

Channel

29.9.201611

“Traditional Architecture” for Big Data – Hadoop Technology Mapping

DataIngestion (Analytical)DataProcessing ResultStoreData

SourcesData

Consumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

Batchcompute

PushingIngestion ResultStore

QueryEngine

ComputedInformation

RawData(Reservoir)

=DatainMotion =DataatRestBig Data Solution Architectures

PullingIngestion

Channel

29.9.201612

“Traditional Architecture” for Big Data – Spark Technology Mapping

DataIngestion (Analytical)DataProcessing ResultStoreData

SourcesData

Consumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

Batchcompute

PushingIngestion ResultStore

QueryEngine

ComputedInformation

RawData(Reservoir)

=DatainMotion =DataatRestBig Data Solution Architectures

PullingIngestion

Channel

29.9.201613

“Traditional Architecture” for Big Data – Feeding in High-Volume Event Streams

DataIngestion (Analytical)DataProcessing ResultStoreData

SourcesData

Consumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

Batchcompute

PushingIngestion ResultStore

QueryEngine

ComputedInformation

RawData(Reservoir)

=DatainMotion =DataatRestBig Data Solution Architectures

PullingIngestion

Channel

?

?

29.9.201614

Traditional Architecture for Big Data

• Batch Processing - “Data at Rest”

• Not for low latency use cases• Responses are delivered “after the fact”• Maximum value of the identified situation is lost• Decision are made on old and stale data

• Spar Core is a faster alternative to Hadoop Map Reduce, but still Batch Processing

• Spark Ecosystems offers a lot of additional advanced analytic capabilities (machine learning, graph processing, …)

Big Data Solution Architectures29.9.201615

Big Data Reference Architectures –Event/Stream Processing

Big Data Solution Architectures29.9.201616

Event / Stream Processing – “Data in Motion”

“Data in motion”

Events are analyzed and processed in real-time as the arrive

Decisions are timely, contextual and based on fresh data

Decision latency is eliminated

Big Data Solution Architectures29.9.201617

Event / Stream Processing Architecture

DataIngestion

Batchcompute

DataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

Logfiles

Social

RDBMS

ERP

Sensor

Machine

(Analytical)Real-TimeDataProcessing

Stream/EventProcessing

ResultStore

Messaging

ResultStore

Big Data Solution Architectures

=DatainMotion =DataatRest29.9.201618

Continuous Ingestion

DBSource

BigData

Log

StreamProcessing

IoT Sensor

EventHub

Topic

Topic

REST

Topic

IoT GW

CDCGW

Conn

ectCDC

DBSource

Log CDC Native

IoT Sensor

IoT Sensor

19

DataflowGW

Topic

Topic

Queue

MQTTGW

Topic

DataflowGW

Dataflow

TopicRE

ST19FileSourceLog

Log

Log

Social

Native

29.9.2016 Big Data Solution Architectures19

Topic

Topic

Challenges for Ingesting Sensor Data

Big Data Solution Architectures

Multitude of sensors

Real-Time Streaming

Multiple Firmware versions

Bad Data from damaged sensors

Regulatory Constraints

Data Quality

20 29.9.2016

SQL Polling

Change Data Capture (CDC)

File Stream (File Tailing)

File Stream (Streaming Appender)

Enabling Continuous Data Ingestion

Sensor Stream

Big Data Solution Architectures21 29.9.2016

Event / Stream Processing Architecture – Open Source Technology Mapping

DataIngestion

Batchcompute

DataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

Logfiles

Social

RDBMS

ERP

Sensor

Machine

(Analytical)Real-TimeDataProcessing

Stream/EventProcessing

ResultStore

Messaging

ResultStore

Big Data Solution Architectures

=DatainMotion =DataatRest29.9.201622

Event / Stream Processing Architecture – Oracle Technology Mapping

DataIngestion

Batchcompute

DataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

Logfiles

Social

RDBMS

ERP

Sensor

Machine

(Analytical)Real-TimeDataProcessing

Stream/EventProcessing

ResultStore

Messaging

ResultStore

Big Data Solution Architectures

=DatainMotion =DataatRest29.9.201623

Event / Stream Processing Architecture

The solution for low latency use cases

Process each event separately => low latency

Process events in micro-batches => increases latency but offers better reliability

Previously known as “Complex Event Processing”

Keep the data moving / Data in Motion instead of Data at Rest => raw events were not stored

Big Data Solution Architectures29.9.201624

Event / Stream Processing Architecture - Keep raw event data

DataIngestion

Batchcompute

DataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

Logfiles

Social

RDBMS

ERP

Sensor

Machine

(Analytical)Real-TimeDataProcessing

Stream/EventProcessing

ResultStore

Messaging

ResultStore

(Analytical)BatchDataProcessing

RawData(Reservoir)

Big Data Solution Architectures

=DatainMotion =DataatRest29.9.201625

Big Data Reference Architectures -Lambda Architecture for Big Data

Big Data Solution Architectures29.9.201626

“Lambda Architecture” for Big Data

DataIngestion

(Analytical)BatchDataProcessing

Batchcompute

ResultStoreDataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

(Analytical)Real-TimeDataProcessing

Stream/EventProcessing

Batchcompute

Messaging

ResultStore

QueryEngine

ResultStore

ComputedInformation

RawData(Reservoir)

Big Data Solution Architectures

=DatainMotion =DataatRest

PullingIngestion

29.9.201627

“Lambda Architecture” for Big Data

DataIngestion

(Analytical)BatchDataProcessing

Batchcompute

ResultStoreDataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

(Analytical)Real-TimeDataProcessing

Stream/EventProcessing

Batchcompute

Messaging

ResultStore

QueryEngine

ResultStore

ComputedInformation

RawData(Reservoir)

Big Data Solution Architectures

=DatainMotion =DataatRest

PullingIngestion

29.9.201628

Lambda Architecture for Big Data

Combines (Big) Data at Rest with (Fast) Data in Motion

Closes the gap from high-latency batch processing

Keeps the raw information forever

Makes it possible to rerun analytics operations on whole data set if necessary => because the old run had an error or => because we have found a better algorithm we want to apply

Have to implement functionality twice• Once for batch• Once for real-time streaming

Big Data Solution Architectures29.9.201629

Big Data Reference Architectures -„Kappa“ Architecture

Big Data Solution Architectures29.9.201630

“Kappa Architecture” for Big Data

DataIngestion

“RawDataReservoir”

Batchcompute

DataSources

Messaging

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

(Analytical)Real-TimeDataProcessing

Stream/EventProcessing

ResultStore

Messaging

ResultStore

RawData(Reservoir)

ComputedInformation

Big Data Solution Architectures

=DatainMotion =DataatRest29.9.201631

Big Data Reference Architectures -„Unified“ Architecture

Big Data Solution Architectures29.9.201632

“Unified Architecture” for Big Data

DataIngestion

(Analytical)BatchDataProcessing(CalculateModelsofincomingdata)

Batchcompute

ResultStoreDataSources

Channel

DataConsumer

Reports

Service

AnalyticTools

AlertingTools

Content

RDBMS

Social

ERP

Logfiles

Sensor

Machine

(Analytical)Real-TimeDataProcessing

Stream/EventProcessing

Batchcompute

Messaging

ResultStore

QueryEngine

ResultStore

ComputedInformation

RawData(Reservoir)

=DatainMotion =DataatRest

PredictionModels

Big Data Solution Architectures29.9.201633

Big Data Ecosystem – many choices sorted!

Big Data Solution Architectures29.9.201634

Building Blocks for (Big) Data ProcessingData

Acquisition

FormatFile System

Stream Processing

Batch SQL

Graph DBMS

Document DBMS

Relational DBMS

Visualization

IoT

Messaging

Analytics

OLAP DBMS

Query Federation

Table-Style DBMS

Key Value DBMS

Batch Processing

In-Memory

Big Data Solution Architectures29.9.201635

Big Data Ecosystem – many choices sorted!

Big Data Solution Architectures29.9.201636

NoSQL Datastores

Big Data Solution Architectures29.9.201637

Organizing NoSQL Datastores – Different Types KeyValueStore

Big Data Solution Architectures38

Wide-columnstore

Documentstore

Graphstore

29.9.2016

Key ValueK1 V1K2 V2K3 V3

Document{k1:v1,k2:v2,k3:[v1,v2,v3]}

RowkeyCK1

RK1V1

CK2V2

CK3V3

CK4V4

……

CK1RK2V1

CK4V4

CK6V6

……

…………

CK3V3

Organizing NoSQL Datastores – and the Products KeyValueStore

Big Data Solution Architectures39

Wide-columnstore

Documentstore

Graphstore

29.9.2016

Big Data Solution Architectures29.9.201640

Guido SchmutzTechnology Manager

[email protected]

Big Data Solution Architectures29.9.201641