Upload
guido-schmutz
View
362
Download
0
Embed Size (px)
Citation preview
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Big Data Solution Architectures29.9.2016 – DOAG 2016 Big Data DaysGuido SchmutzTrivadis
Guido Schmutz
Working for Trivadis for more than 19 yearsOracle ACE Director for Fusion Middleware and SOACo-Author of different booksConsultant, Trainer, Software Architect for Java, SOA & Big Data / Fast DataMember of Trivadis Architecture BoardTechnology Manager @ Trivadis
More than 25 years of software development experience
Contact: [email protected]: http://guidoschmutz.wordpress.comSlideshare: http://www.slideshare.net/gschmutzTwitter: gschmutz
29.9.2016 Big Data Solution Architectures2
Agenda
Big Data Solution Architectures3 29.9.2016
1. Introduction2. Big Data Reference Architectures
• Traditional Big Data• Event / Stream-Processing• Lambda Architecture• Kappa Architecture• Unified Architecture
3. Big Data Ecosystem – many choices sorted!
Why talking about Big Data Architectures
Choosing the right architecture is key for any (big data) project
Big Data is still quite a rather young field and therefore a “moving target”
no standard architectures available which have been used for years
In the past years, some architectures and best practices have evolved
Know your use cases before choosing your architecture / technologies
To have a reference architecture in place helps in choosing the right/matching technologies
Big Data Solution Architectures29.9.20165
How to do Big Data? Why is a structure / architecture important
Big Data Solution Architectures29.9.20166
Important Properties for choosing (Big) Data Architecture
Latency
Keep raw and un-interpreted data “forever” ?
Volume, Velocity, Variety, Veracity
Ad-Hoc Query Capabilities needed ?
Robustness & Fault Tolerance
Scalability
…
Big Data Solution Architectures29.9.20169
“Traditional Architecture” for Big Data
DataIngestion (Analytical)DataProcessing ResultStoreData
SourcesData
Consumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batchcompute
PushingIngestion ResultStore
QueryEngine
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRestBig Data Solution Architectures
PullingIngestion
Channel
29.9.201611
“Traditional Architecture” for Big Data – Hadoop Technology Mapping
DataIngestion (Analytical)DataProcessing ResultStoreData
SourcesData
Consumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batchcompute
PushingIngestion ResultStore
QueryEngine
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRestBig Data Solution Architectures
PullingIngestion
Channel
29.9.201612
“Traditional Architecture” for Big Data – Spark Technology Mapping
DataIngestion (Analytical)DataProcessing ResultStoreData
SourcesData
Consumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batchcompute
PushingIngestion ResultStore
QueryEngine
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRestBig Data Solution Architectures
PullingIngestion
Channel
29.9.201613
“Traditional Architecture” for Big Data – Feeding in High-Volume Event Streams
DataIngestion (Analytical)DataProcessing ResultStoreData
SourcesData
Consumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batchcompute
PushingIngestion ResultStore
QueryEngine
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRestBig Data Solution Architectures
PullingIngestion
Channel
?
?
29.9.201614
Traditional Architecture for Big Data
• Batch Processing - “Data at Rest”
• Not for low latency use cases• Responses are delivered “after the fact”• Maximum value of the identified situation is lost• Decision are made on old and stale data
• Spar Core is a faster alternative to Hadoop Map Reduce, but still Batch Processing
• Spark Ecosystems offers a lot of additional advanced analytic capabilities (machine learning, graph processing, …)
Big Data Solution Architectures29.9.201615
Big Data Reference Architectures –Event/Stream Processing
Big Data Solution Architectures29.9.201616
Event / Stream Processing – “Data in Motion”
“Data in motion”
Events are analyzed and processed in real-time as the arrive
Decisions are timely, contextual and based on fresh data
Decision latency is eliminated
Big Data Solution Architectures29.9.201617
Event / Stream Processing Architecture
DataIngestion
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
ResultStore
Messaging
ResultStore
Big Data Solution Architectures
=DatainMotion =DataatRest29.9.201618
Continuous Ingestion
DBSource
BigData
Log
StreamProcessing
IoT Sensor
EventHub
Topic
Topic
REST
Topic
IoT GW
CDCGW
Conn
ectCDC
DBSource
Log CDC Native
IoT Sensor
IoT Sensor
19
DataflowGW
Topic
Topic
Queue
MQTTGW
Topic
DataflowGW
Dataflow
TopicRE
ST19FileSourceLog
Log
Log
Social
Native
29.9.2016 Big Data Solution Architectures19
Topic
Topic
Challenges for Ingesting Sensor Data
Big Data Solution Architectures
Multitude of sensors
Real-Time Streaming
Multiple Firmware versions
Bad Data from damaged sensors
Regulatory Constraints
Data Quality
20 29.9.2016
SQL Polling
Change Data Capture (CDC)
File Stream (File Tailing)
File Stream (Streaming Appender)
Enabling Continuous Data Ingestion
Sensor Stream
Big Data Solution Architectures21 29.9.2016
Event / Stream Processing Architecture – Open Source Technology Mapping
DataIngestion
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
ResultStore
Messaging
ResultStore
Big Data Solution Architectures
=DatainMotion =DataatRest29.9.201622
Event / Stream Processing Architecture – Oracle Technology Mapping
DataIngestion
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
ResultStore
Messaging
ResultStore
Big Data Solution Architectures
=DatainMotion =DataatRest29.9.201623
Event / Stream Processing Architecture
The solution for low latency use cases
Process each event separately => low latency
Process events in micro-batches => increases latency but offers better reliability
Previously known as “Complex Event Processing”
Keep the data moving / Data in Motion instead of Data at Rest => raw events were not stored
Big Data Solution Architectures29.9.201624
Event / Stream Processing Architecture - Keep raw event data
DataIngestion
Batchcompute
DataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
ResultStore
Messaging
ResultStore
(Analytical)BatchDataProcessing
RawData(Reservoir)
Big Data Solution Architectures
=DatainMotion =DataatRest29.9.201625
Big Data Reference Architectures -Lambda Architecture for Big Data
Big Data Solution Architectures29.9.201626
“Lambda Architecture” for Big Data
DataIngestion
(Analytical)BatchDataProcessing
Batchcompute
ResultStoreDataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
Batchcompute
Messaging
ResultStore
QueryEngine
ResultStore
ComputedInformation
RawData(Reservoir)
Big Data Solution Architectures
=DatainMotion =DataatRest
PullingIngestion
29.9.201627
“Lambda Architecture” for Big Data
DataIngestion
(Analytical)BatchDataProcessing
Batchcompute
ResultStoreDataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
Batchcompute
Messaging
ResultStore
QueryEngine
ResultStore
ComputedInformation
RawData(Reservoir)
Big Data Solution Architectures
=DatainMotion =DataatRest
PullingIngestion
29.9.201628
Lambda Architecture for Big Data
Combines (Big) Data at Rest with (Fast) Data in Motion
Closes the gap from high-latency batch processing
Keeps the raw information forever
Makes it possible to rerun analytics operations on whole data set if necessary => because the old run had an error or => because we have found a better algorithm we want to apply
Have to implement functionality twice• Once for batch• Once for real-time streaming
Big Data Solution Architectures29.9.201629
“Kappa Architecture” for Big Data
DataIngestion
“RawDataReservoir”
Batchcompute
DataSources
Messaging
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
ResultStore
Messaging
ResultStore
RawData(Reservoir)
ComputedInformation
Big Data Solution Architectures
=DatainMotion =DataatRest29.9.201631
“Unified Architecture” for Big Data
DataIngestion
(Analytical)BatchDataProcessing(CalculateModelsofincomingdata)
Batchcompute
ResultStoreDataSources
Channel
DataConsumer
Reports
Service
AnalyticTools
AlertingTools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical)Real-TimeDataProcessing
Stream/EventProcessing
Batchcompute
Messaging
ResultStore
QueryEngine
ResultStore
ComputedInformation
RawData(Reservoir)
=DatainMotion =DataatRest
PredictionModels
Big Data Solution Architectures29.9.201633
Building Blocks for (Big) Data ProcessingData
Acquisition
FormatFile System
Stream Processing
Batch SQL
Graph DBMS
Document DBMS
Relational DBMS
Visualization
IoT
Messaging
Analytics
OLAP DBMS
Query Federation
Table-Style DBMS
Key Value DBMS
Batch Processing
In-Memory
Big Data Solution Architectures29.9.201635
Organizing NoSQL Datastores – Different Types KeyValueStore
Big Data Solution Architectures38
Wide-columnstore
Documentstore
Graphstore
29.9.2016
Key ValueK1 V1K2 V2K3 V3
Document{k1:v1,k2:v2,k3:[v1,v2,v3]}
RowkeyCK1
RK1V1
CK2V2
CK3V3
CK4V4
……
CK1RK2V1
CK4V4
CK6V6
……
…………
CK3V3
Organizing NoSQL Datastores – and the Products KeyValueStore
Big Data Solution Architectures39
Wide-columnstore
Documentstore
Graphstore
29.9.2016