Upload
confluent
View
3.744
Download
2
Embed Size (px)
Citation preview
1Confidential
Apache Kafka + Machine LearningAnalytic Models Applied to Real Time Stream Processing
Kai WaehnerTechnology Evangelist
@KaiWaehner
www.kai-waehner.de
2Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World2) Building an Analytic Model3) Applying an Analytic Model in Real Time4) Online Training of Models
3Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World2) Building an Analytic Model3) Applying an Analytic Model in Real Time4) Online Training of Models
4Apache Kafka and Machine Learning
Machine Learning
... allows computers to find hidden insights without being explicitly programmed where to look.
5Apache Kafka and Machine Learning
Real World Examples of Machine Learning
Spam Detection Search Results +Product Recommendation
Picture Detection(Friends, Locations, Products)
Your Company
The Next Disruption:Google Beats Go Champion
6Apache Kafka and Machine Learning
Leverage Machine Learning to Analyze and Act on Critical Business Moments
Seconds Minutes Hours
Price Optimization
Predictive Maintenance
Fraud Detection
Cross Selling
Transportation Rerouting
Customer Service
Inventory Management
Windows of Opportunity
7Apache Kafka and Machine Learning
How to realize these use cases?
8Apache Kafka and Machine Learning
Big Data Analytics
Volume(terabytes,petabytes)
Variety(social networks, blog posts, logs,
sensors, etc.)
Velocity(„real time“)
Value
9Apache Kafka and Machine Learning
Big Data Analytics for Actionable Insights
From Insight to Action
(continuously closed loop)
10Apache Kafka and Machine Learning
Streaming Platform
Big Data Analytics
Database
IoT Device
Streaming Producer
…..
DWH
Data Integration
CONNECT
CONNECT
DataLakeModel
Building
Batch
Real Time
Stream Processing
RESTInterface
IoT Device
Mobile App
Streaming Consumer
CONNECT
CONNECT
BI Tool
Messaging
Web Application
Model
Schema Registry / Governance
1) Data Producer2) Analytics Platform3) Streaming Platform4) Data Consumer
11Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World2) Building an Analytic Model3) Applying an Analytic Model in Real Time4) Online Training of Models
12Apache Kafka and Machine Learning
Streaming Platform
Big Data Analytics
Database
IoT Device
Streaming Producer
…..
DWH
Data Integration
CONNECT
CONNECT
DataLakeModel
Building
Batch
Real Time
Stream Processing
RESTInterface
IoT Device
Mobile App
Streaming Consumer
CONNECT
CONNECT
BI Tool
Messaging
Web Application
Model
Schema Registry / Governance
1) Data Producer2) Analytics Platform3) Streaming Platform4) Data Consumer
13Apache Kafka and Machine Learning
Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Writing source codeis not the
time-consumingtask!
!
14Apache Kafka and Machine Learning
Analytical Pipeline
1. Data Access
2. Data Preparation
3. Exploratory Data Analysis
4. Model Building
5. Model Execution
6. Model Validation
7. Deployment
15Apache Kafka and Machine Learning
Data Access
Find insights to createadded business value
by correlating various data sources!
16Apache Kafka and Machine Learning
Data Preparation
http://www.slideshare.net/odsc/feature-engineering
Data Preparation
17Apache Kafka and Machine Learning
Exploratory Data Analysis
© Copyright 2000-2017 TIBCO Software Inc.
• Scripting
• Visual Analytics
• Machine Learning
18Apache Kafka and Machine Learning
Model Building
A model is a simplification of the truth that helps you with decision making.
19Apache Kafka and Machine Learning
Model Execution (Coding)
Apply Model to New Data
20Apache Kafka and Machine Learning
Model Execution (Tooling)
Apply Model to New Data
21Apache Kafka and Machine Learning
Model Validation
https://genome.tugraz.at/proclassify/help/pages/XV.html
Cross-Validation Procedure
22Apache Kafka and Machine Learning
Frameworks and Tooling?
23Apache Kafka and Machine Learning
Languages, Frameworks and Tools
Many more ….
Portable Format for Analytics (PFA)
24Apache Kafka and Machine Learning
Live Demos with Open Source Technologies
Development of Analytic Modelswith R, TensorFlow, Apache Spark, H2O.ai, RapidMiner
25Apache Kafka and Machine Learning
Live Demo
Use Case: Customer Churn Prediction
Machine Learning Algorithm:Generalized Linear Model (GLM)using Logistic Regression
Technology:Open Source R
26Apache Kafka and Machine Learning
Live Demo
Use Case: Airline Flight Delay Prediction
Machine Learning Algorithm:Gradient Boosted Machines (GBM)using Decision Trees
Technology:H2O.ai
27Apache Kafka and Machine Learning
Live Demo
Use Case: Predictive Maintenance(Anomaly Detection in Telco Networks)
Deep Learning Algorithm:Artificial Neural Networks (ANN)using Autoencoders
Technology:TensorFlow + Python API
28Apache Kafka and Machine Learning
Live Demo
Use Case: Classification (Prediction of Titanic Survivors)
Deep Learning Algorithm:Recurrent Neural Networks (RNN)
Technology:RapidMiner
29Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World2) Building an Analytic Model3) Applying an Analytic Model in Real Time4) Online Training of Models
30Apache Kafka and Machine Learning
Analytical Pipeline
1. Data Access
2. Data Preparation
3. Exploratory Data Analysis
4. Model Building
5. Model Execution
6. Model Validation
7. Deployment
31Apache Kafka and Machine Learning
Streaming Platform
Big Data Analytics
Database
IoT Device
Streaming Producer
…..
DWH
Data Integration
CONNECT
CONNECT
DataLakeModel
Building
Batch
Real Time
Stream Processing
RESTInterface
IoT Device
Mobile App
Streaming Consumer
CONNECT
CONNECT
BI Tool
Messaging
Web Application
Model
Schema Registry / Governance
1) Data Producer2) Analytics Platform3) Streaming Platform4) Data Consumer
32Apache Kafka and Machine Learning
Definition of Stream Processsing
Data at Rest Data in Motion
33Apache Kafka and Machine Learning
Key Concepts
34Apache Kafka and Machine Learning
Key Concepts
35Apache Kafka and Machine Learning
Key Concepts
36Apache Kafka and Machine Learning
Stream Processing
Use Cases• Real Time Applications• Stateful Streaming Analytics• Stateless “Real Time ETL”
37Apache Kafka and Machine Learning
Event Processing Windows
Various Options for Windowing (Fixed, Sliding, Session, …)
38Apache Kafka and Machine Learning
How to apply analytic models to real time processing without redevelopment?
39Apache Kafka and Machine Learning
Application of Analytic Models to Real Time without Redevelopment
StreamProcessing
H20.ai
R
PythonSpark ML
MATLAB
SAS
PMML
40Apache Kafka and Machine Learning
Streaming Analytics - Processing Pipeline
APIs
Adapters / Channels
Integration
Messaging
Stream Ingest
Transformation
Aggregation
Enrichment
Filtering
StreamPreprocessing
Process Management
Analytics (Real Time)
Applications& APIs
Analytics / DW Reporting
StreamOutcomes
• Contextual Rules
• Windowing
• Patterns
• Analytics
• Machine Learning
• …
Stream Analytics
Index / SearchNormalization
Applying an Analytic Modelis just a piece of the puzzle!
41Apache Kafka and Machine Learning
Frameworks and Tooling?
42Apache Kafka and Machine Learning
Frameworks and Products
OPEN SOURCE CLOSED SOURCE
PRODUCT
FRAMEWORK
Azure MicrosoftStream Analytics
43Apache Kafka and Machine Learning
When to use Kafka Streams for Stream Processing?
44Apache Kafka and Machine Learning
When to use Kafka Streams for Stream Processing?
No need for a Big Data cluster
Deploy in your existing infrastructure
Kafka managesscalability / fail-over
Focus on developmentof business logic
in your department
45Apache Kafka and Machine Learning
Kafka Streams
Map, filter, aggregate, apply analytic model, „any business logic“
Input Stream(Kafka Topic)
Kafka Cluster
Output Stream(Kafka Topic)
Kafka Cluster
Stream ProcessingMicroservice
(Kafka Streams)
Deployed anywhere: Docker, Kubernetes, Mesos, Java App, …
46Apache Kafka and Machine Learning
A complete streaming microservices, ready for production at large-scale
WordCount
App configuration
Define processing(here: WordCount)
Start processing
47Apache Kafka and Machine Learning
Confluent Platform: the Free, Open-Source Streaming Platform
Open Source ExternalCommercial
Confluent Platform
Monitoring
Analytics
Custom Apps
Transformations
Real-time Applications
…
CRM
Data Warehouse
Database
Hadoop
DataIntegration
…
Control Center Auto-dataBalancing
Multi-Data Center Replication 24/7 Support
Supported Connectors Clients Schema
RegistryREST Proxy
Apache Kafka
KafkaConnect
KafkaStreams
KafkaCore
Database Changes Log Events loT Data Web Events …
48Apache Kafka and Machine Learning
Streaming Platform
Big Data Analytics
Database
IoT Device
Streaming Producer
…..
DWH
Data Integration
CONNECT
CONNECT
DataLakeModel
Building
Batch
Real Time
Stream Processing
RESTInterface
IoT Device
Mobile App
Streaming Consumer
CONNECT
CONNECT
BI Tool
Messaging
Web Application
Model
Schema Registry / Governance
1) Data Producer2) Analytics Platform3) Streaming Platform4) Data Consumer
49Apache Kafka and Machine Learning
STREAMING PLATFORM
BIG DATA ANALYTICS
Oracle DB
CoaP IoT
Kafka Java Client
…..
HP Vertica
Data Integration
FLUME
H2O.ai, Spark,
TensorFlow
Batch
Real Time
ConfluentREST Proxy
MQTT IoT
iPhone App
KafkaGo Client
CK OA NF NK EA C
T
HIVE
Grafana
Kafka
Java EE Web App
Hadoop
CK OA NF NK EA C
T
Confluent Schema Registry
Kafka Streams
H2O.aiMesos
Kafka Streams
TensorFlow
Kubernetes
Avro
Avro
1) Data Producer2) Analytics Platform3) Streaming Platform4) Data Consumer
50Apache Kafka and Machine Learning
Live Demos with Open Source Technologies
Development of Analytic Modelswith Apache Kafka Messaging, Kafka Streams, Kafka Connect, Confluent Schema Registry
51Apache Kafka and Machine Learning
Live Demo
Use Case: Airline Flight Delay Prediction
Machine Learning Algorithm:Any! (in our example, H2O.ai GBM)
Streaming Platform:Apache Kafka Core, Kafka Connect, Kafka Streams, Confluent Schema Registry
52Apache Kafka and Machine Learning
H2O.ai Model + Kafka Streams
Filter
Map
1) Create H2O ML model
2) Configure Kafka Streams Application
3) Apply H2O ML model to Streaming Data
4) Start Kafka Streams App
53Apache Kafka and Machine Learning
End-to-End Stream Monitoring and Alerting
Confluent Control CenterData Stream Monitoring and AlertingMulti-cluster monitoring and management Kafka Connect Configuration
• Message delivery?• Delays? • Where got it stuck?• Lost messages?• Broker issues?• Performance?
http://docs.confluent.io/3.2.0/control-center/docs/monitoring.html
54Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World2) Building an Analytic Model3) Applying an Analytic Model in Real Time4) Online Training of Models
55Apache Kafka and Machine Learning
Let’s improve the analytic model
continuously…
56Apache Kafka and Machine Learning
Analytical Pipeline
1. Data Access
2. Data Preparation
3. Exploratory Data Analysis
4. Model Building
5. Model Execution
6. Model Validation
7. Deployment
Online Training
Continuously train and improve the model with every new event
57Apache Kafka and Machine Learning
Online Model Training of Analytic Models
How to improve models?
1.Manual Update
2.Automated Batch
3.Real Time
58Apache Kafka and Machine Learning
STREAMING PLATFORM
BIG DATA ANALYTICS
FLUME
H2O.ai, Spark,
TensorFlow
HIVE
Kafka
Hadoop
Confluent Schema Registry
Kafka Streams
H2O.aiMesos
Kafka Streams
TensorFlow
Kubernetes
Avro
Avro
1) Get new Input Event via Kafka Topic
2) Improve Model inBig Data Cluster
3) Update deployed Modelvia Kafka Topic
4) LeverageImproved Modelfor new Events
59Apache Kafka and Machine Learning
Caveats for Online Model Training
• Processes and infrastructure not ready
• Validation needed before production
• Slows down the system
• Only a few ML implementations supported
• Many use cases do not need it
60Apache Kafka and Machine Learning
Key Take-Aways
Ø Insights are hidden in Historical Data on Big Data Platforms
Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models
Ø Streaming Platform uses these Models (without Redevelopment) to take Action in Real Time
61Apache Kafka and Machine Learning
Kai WaehnerTechnology Evangelist
[email protected]@KaiWaehnerwww.kai-waehner.deLinkedIn
Questions? Feedback?Please contact me!