Upload
pivotal
View
1.004
Download
0
Embed Size (px)
Citation preview
Journey To Becoming a Data-Driven Enterprise:
Pivotal Big Data Suite Technical Overview
Feras Alsamawi Senior Field Engineer Pivotal
2 © Copyright 2015 Pivotal. All rights reserved.
Agenda
• Big Data Challenges
• The Value of Data
• Pivotal Big Data Suite
• The Open Data Platform
• Big Data Architectures
© Copyright 2015 Pivotal. All rights reserved.
THE POWER OF 1
R X
Increasing Freight Utilization Rail
Predictive Maintenance Healthcare
Predictive Diagnostics Power
Driving Outcomes That Matter
One Percent Improvement Equals
$27B Industry Value by Reducing System
Inefficiency
$63B Industry Value by Reducing Process
Inefficiency
$66B Industry Value with
Efficiency Improvements In Gas-fired Power
Plant Fleets Source: General Electric
4 © Copyright 2015 Pivotal. All rights reserved.
BIG DATA CHASM
70% of data
generated by customers
80% of data stored
3% prepared for
analysis
0.5% being
analyzed
<0.5% being
operationalized
4
THE DATA DIVIDE
5 © Copyright 2015 Pivotal. All rights reserved.
Software Is Eating The World
Data Is Fueling Software
SOFTWARE IS EATING THE WORLD
6 © Copyright 2015 Pivotal. All rights reserved.
JOURNEY TO A DATA-DRIVEN ENTERPRISE
Deploy analytic apps and automate at scale
Perform advanced analytics Discover insights
Modernize data infrastructure
7 © Copyright 2015 Pivotal. All rights reserved.
The value of Data
Time
Value of Information
µs ms s hour day month year yr+
8 © Copyright 2015 Pivotal. All rights reserved.
Traditional Systems
The value of Data
Time
Value of Information
µs ms s hour day month year yr+
9 © Copyright 2015 Pivotal. All rights reserved.
Speed Layer
Traditional Systems
Pivotal’s λ Architecture
Serving Layer “Big Data”
Time
Value of Information
µs ms s hour day month year yr+
Spring XD
Pivotal HD
Pivotal GemFire
Batch Layer
10 © Copyright 2015 Pivotal. All rights reserved.
WHY PIVOTAL FOR BIG DATA ? Complete platform
SQL on Hadoop leadership
Deployment options
Open source
Flexible licensing
Advanced data services
Pivotal Data Engineering Pivotal Labs Pivotal Data Science
11 © Copyright 2015 Pivotal. All rights reserved. 11 © Copyright 2013 Pivotal. All rights reserved.
Big Data Suite Components
12 © Copyright 2015 Pivotal. All rights reserved.
HDFS Data Lake Expert System /
Machine Learning
In-Memory Real-Time Data
Continuous Learning Continuous Improvement
Continuous Adapting
Data Stream Pipeline
Multiple Data Sources Real-Time Processing Store Everything
Pro-Active, Self-Improving, Machine Learning Systems
13 © Copyright 2015 Pivotal. All rights reserved.
Ingest Transform Sink
SpringXD GemFire
Data Stream Needs an Agile, Scalable and Fast Solution
HAWQ GPDB
Data Lake
14 © Copyright 2015 Pivotal. All rights reserved.
Ingest Transform Sink
SpringXD
Distributed Computing
In-Memory Real-Time Data
Spring XD Orchestrates and Automates all the Steps on Data Stream Pipelining
Expert System / Machine Learning
Extensible Open-Source Fault-Tolerant Horizontally Scalable HAWQ GPDB
Data Lake
15 © Copyright 2015 Pivotal. All rights reserved.
Ingest / SINK Process Analyze
• No coding required
• Dozens of built-in connectors
• Seamless integration with Kafka, Sqoop
• Create new connectors easily using Spring
• Call Spark, Reactor or RxJava
• Built-in configurable filtering, splitting and transformation
• Out-of-box configurable jobs for batch processing
• Import and invoke PMML jobs easily
• Call Python, R, Madlib and other tools
• Built-in configurable counters and gauges
Spring XD State of the Art Data Pipeline Automation
16 © Copyright 2015 Pivotal. All rights reserved.
Ingest Transform Sink
SpringXD
Distributed Computing
GemFire Provides Scalable, Low-Latency Data Access, Storage and Event Processing
Expert System / Machine Learning
GemFire
Extensible Open-Source Fault-Tolerant Horizontally Scalable HAWQ GPDB
Data Lake
17 © Copyright 2015 Pivotal. All rights reserved.
GemFire
• In-Memory Enterprise Data Grid • Horizontally Scalable, Consistent, Highly
Available
• Event handling • Continuous Queries • Enterprise Data Geo Distribution
In-memory Real Time Data
18 © Copyright 2015 Pivotal. All rights reserved.
Ingest Transform Sink
SpringXD
Distributed Computing
Pivotal Provides SQL Based Advanced Analytics
GemFire
Extensible Open-Source Fault-Tolerant Horizontally Scalable
Data Lake
HAWQ GPDB
19 © Copyright 2015 Pivotal. All rights reserved.
HAWQ
• Massively Parallel Processing RDBMS on HADOOP
• ANSI SQL on Hadoop • Extremely high performance for
analytics (not like Hive) • Stores all data directly on
HDFS
• Functions in MADlib, R/Python/Java, Perl, pgsql or C languages
Advanced SQL analytics in Hadoop
Combining SQL with Hadoop is key for analytics
SQL remains #1 choice for Data Science
20 © Copyright 2015 Pivotal. All rights reserved.
Data Streaming Reference Architecture Data Feeds Transactional Apps Analytic Apps
Data Stream Pipeline
Distributed Computing Real-Time Data
Expert Systems & Machine Learning
Advanced Analytics
HDFS Data Lake
21 © Copyright 2015 Pivotal. All rights reserved.
Data Streaming Reference Architecture Data Feeds Transactional Apps Analytic Apps
Data Stream Pipeline
HDFS Data Lake
GemFire HAWQ GPDB
SpringXD
BUILT FOR THE SPEED OF BUSINESS
23 © Copyright 2015 Pivotal. All rights reserved.
Financial Compliance
BUSINESS PROBLEM • Ensure compliance with Dodd-Frank and Basel
Committee regulations
• Identify underlying risk and fraud while reducing the compliance department’s overburdened
Emails Chats Trades
Transactions Policy Securities
Phone Calls Watch Lists …
Financial compliance Data Lake
Data integration
Data clean up Modeling
Classification and ranking
Analyst user interfaces Feedback
Analytics
Analyst feedback Data integration: e.g., append trade information with email and chat communications
Data cleanup: e.g., identify newsletters and spam emails
Modeling: • Predictive modeling to flag
messages and trades • Graph and cohort analysis
Analyst feedback Reviewed fraud instances included in periodic model refreshes
SOLUTION � A data lake platform coupled with cutting edge data
science techniques
� Flexible user interface to promote an adaptive, continuously learning compliance framework
24 © Copyright 2015 Pivotal. All rights reserved.
Pivotal Topic & Sentiment Analysis Engine
External Tables
PXF
HDFS
Source: http Sink: hdfs
Parallel Parsing of JSON
(PL/Python)
HAWQ
Nightly Cron Jobs
Topic Analysis through MADlib pLDA
Unsupervised Sentiment Analysis
(PL/Python)
D3.js
Spring XD
Twitter Decahose (~55 million tweets/day)
25 © Copyright 2015 Pivotal. All rights reserved. 25 © Copyright 2013 Pivotal. All rights reserved.
IoT Architectures – The connected Car
26 © Copyright 2015 Pivotal. All rights reserved.
The Connected Car Architecture INGESTION
JSON / HTTP
STREAM PROCESSING
Spring XD Transform Enrich
DATA LAKE
Pivotal HD Sink
ADVANCED ANALYTICS
HAWQ
REAL-TIME DATA INSIGHTS
GemFire
MOBILE SERVICES
MICROSERVICES
Pivotal CF Dashboard Analytics App Simulator
IoT APPS
Rabbit MQ
PUSH
27 © Copyright 2015 Pivotal. All rights reserved.
Horizontally Scalable Fault Tolerant Extensible Open-Source
STREAM PROCESSING
Spring XD
Rabbit MQ
DATA LAKE
Pivotal HD
ADVANCED ANALYTICS
HAWQ
ENRICHER PREDICTIVE ANALYTICS
+ Timestamp & GUID
+ MPG, rangE & route
MOBILE APP
JSON
REAL-TIME DATA INSIGHTS
GemFire
CAR SENSOR
Sink
Tap
DASHBOARD
28 © Copyright 2015 Pivotal. All rights reserved.
FOR FURTHER INFO…
• Pivotal Data Product Info, Docs and Downloads @ http://pivotal.io/big-data
• Pivotal Blog @ http://blog.pivotal.io
• Pivotal Data Science Blog @ http://blog.pivotal.io/data-science-pivotal
• Pivotal Academy @ https://pivotal.biglms.com
• Or reach out to your local Pivotal Account Executive…
Digital Transformation Forum
Disrupt or Be Disrupted 19 OCTOBER · BMW WELT EVENT CENTRE · MUNICH