Upload
c-scyphers
View
263
Download
0
Embed Size (px)
Citation preview
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Fast Data Open SourceAn Overview
Chuck ScyphersBig Data LeadEast Coast
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 3
Safe Harbor StatementThe following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 4
• 25+ years experience with highly available, highly scalable, high throughput, globally-spanning enterprise class systems.
• 7 startups (2 wins, 2 break-evens,3 losses); last two built on Hadoop and NoSQL systems (resume analytics and behavioral analysisof network traffic
• Chief Data Architect, US-Visit (Department of State) and US Department of Energy SLD Project
Chuck Scyphers
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 5
Agenda
Fast Data Definition
Popular Open Source Platforms
What Do We Want To Be When We Grow Up?
Refreshments
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 6
Agenda
Fast Data Definition
Popular Open Source Platforms
What Do We Want To Be When We Grow Up?
Refreshments
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 7
What Is Fast Data?“Fast data is the application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value. The goal of fast data is to quickly gather and mine structured and unstructured data so that action can be taken.”
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
New Concepts for a Modern Data Platform ArchitecturePolyglot
Fit for Purpose Data
Lambda
Speed Layer
Batch LayerData
SourcesData
Services
Kappa
DataServices
Data PipelineDataSources
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 9
How Data Impacts The Organization
67%
executives who say drawing intelligence from data is top priority
Source: Oracle Research Study - From Overload to Impact: An Industry Scorecard on Big Data Business Challenges, July 2012
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 10
How Data Impacts The Organization
89%
executives who would grade themselves C or lower in preparedness
Source: Oracle Research Study - From Overload to Impact: An Industry Scorecard on Big Data Business Challenges, July 2012
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 11
How Data Impacts The Organization
93%believe their organization is losing revenue as a result of not being able to fully leverage information
Source: Oracle Research Study - From Overload to Impact: An Industry Scorecard on Big Data Business Challenges, July 2012
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Velocity Matters
of executives say too much critical information is delivered too late
53%
Source: Aberdeen Group – January 2012, survey of 247 executives - Data Management for BI – Big Data, Bigger Insight, Superior Performance
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Why Do We Care?
It’s about getting more from in-flight data It’s about faster action, faster insights It’s about visibility and predictability It’s about running your business in real-time
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 14
Key Value Drivers of Timely Accurate ActionDelivering Tangible Results With Fast Data
Higher QualityIn Operations
ImprovedEfficiency
NewServices
Better Customer Experience
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 15
Fast Data Is Universal Across Industries
Financial Services Transportation & Logistics Telecommunications Manufacturing &
Retail
Utilities & Oil and GasHealth carePublic Sector
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 16
Fast Data Characteristics
ANALYZEMOVE &TRANSFORM
FILTER & CORRELATE
ACT
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 17
Fast Data Characteristics
ANALYZEMOVE &TRANSFORM
FILTER & CORRELATE
ACT
Complete In-Flight Event Processing
• Eliminate, Consolidate, Correlate, And/OrFilter Data While In Flight
• Analyze Data Streams• Enrich Data For More Accurate Decisions• Process Data In The Stream To Free Up
Back End Resources
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 18
Fast Data Characteristics
ANALYZEMOVE &TRANSFORM
FILTER & CORRELATE
ACT
Work With The Stream
• Apply Basic Filtering At Capture• Improve Trusted Quality Of
Information• Move Data (duh)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 19
Fast Data Characteristics
ANALYZEMOVE &TRANSFORM
FILTER & CORRELATE
ACT
Speed Up The OODA Loop
• Get ActionableInsights
• PredictOutcomes
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 20
Fast Data Characteristics
ANALYZEMOVE &TRANSFORM
FILTER & CORRELATE
ACT
Make Decisions That Matter Faster
• Deliver Real-Time Decisions And Recommendations To Customers/Employees
• Automatically Render Decisions Within A ProcessWith Tailored Messaging
• Integrate Human Workflow, Process Management,Activity Monitoring
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 21
Fast Data CustomersFrom Oracle (naturally)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 22
Fast Data And Financial ServicesImproving Customer Experiences
• Improve Customer Experience: The goal is to connect all data about the customers to improve customer service experience and to lower the burden of hiring new representatives.
• Reduce Staffing Demands: For customers calling to discuss a claim or their coverage, it means fewer annoying waits as an agent accesses data from any of dozens of different places.
• Consolidate information in real-time: All a customer’s transactions: claims, records, status, possible cross-sell information (e.g., if someone lives in an apartment and might need renter’s insurance)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 23
Fast Data And Retail
• Price optimization - leveraging analytics to price goods and services on the fly based on real-time metrics such as competitor pricing, supply chain and inventory data, market data and consumer behavior data.
• Product placement analysis - processing video data to identify shopping trends, assesses effectiveness of displays to improve store layouts and product placements.
• Staffing - The largest retailers are analyzing weather forecasts, promotional campaigns and dates to effectively meet staffing requirements on holidays all year round.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 24
Fast Data And Public SectorLA City Planning And Traffic Analysis
• Dynamic Pricing for Toll Lanes: if a driver is paying to drive in the HOT (high-occupancy tolling) lane, he’s guaranteed a consistent speed of 45 miles per hour. If traffic starts backing up, prices for individual cars will rise to discourage them from entering, saving the lanes for high-occupancy vehicles
• Express Park: It’s not enough to know how to set the price, you have to make sure that data gets to users in real time. Drivers also need to know parking spots will still be there when they arrive in 40 minutes.
• Combining M2M: The answer lies in combining information from other sources, such as mass-transit systems, toll highways, traffic sensors and weather data to paint a real-time picture of what traffic actually looks like
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 25
Fast Data And TelecommLocation Based Mobile Billboard Advertising at Turkcell
• Processing over 800,000 subscriberrelated events per second (with 1.5Billion Events Daily)
• Provided and executed over 50 simultaneous campaigns
• Ensured customer responsivenesswith less than 1 second times witha scalable architecture, ready toexpand on demand
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 26
Agenda
Fast Data Definition
Popular Open Source Platforms
What Do We Want To Be When We Grow Up?
Refreshments
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 27
HDFS Based• Spark• HBase• Impala
• H20• Apex
Other Based• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr
Composite• SMACK• PANCAKE
Open Source PlatformsGeneral Classifications
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 28
HDFS Based• Spark• HBase• Impala
• H20• Apex
Other Based• Druid• Flink• ElasticSearch• Storm• Kafka• Lucene/Solr
Composite• SMACK• PANCAKE
Open Source PlatformsGeneral Classifications
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 29
SparkHDFS Based
HDFS Based• Spark• HBase• Impala• H20• Apex
• In-Memory Distributed Processing Framework• Will Spill To Disk As Needed• Handles Streaming Data Through Micro-batching
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 30
HBaseHDFS Based
HDFS Based• Spark*• HBase• Impala• H20• Apex
• A NoSQL Columnar Store Built On Top Of HDFS• Provides A Big Table–esque Processing Model
• Compression• In-memory• Bloom Filters By Column
• Offers Both Real Time Read/Write AccessAnd Random Access To HDFS
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 31
ImpalaHDFS Based
HDFS Based• Spark*• HBase• Impala• H20• Apex
• Real-time SQL queries over data storedin HDFS or HBase• No MapReduce processing
• Uses a MPP query engine on the Hadoop cluster• Utilizes Hive metastore for metadata repository• Leveraged by numerous BI tools and applications• Not ANSI SQL
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 32
H20HDFS Based
HDFS Based• Spark*• HBase• Impala• H20• Apex
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 33
ApexHDFS Based
HDFS Based• Spark*• HBase• Impala• H20• Apex
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 34
HDFS Based• Spark*• HBase• Impala
• H20• Apex
Other Based• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr
Composite• SMACK• PANCAKE
Open Source PlatformsGeneral Classifications
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 35
Open Source Platforms
• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr
Other Based
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 36
Other Based
• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr
Druid
MySQL
Zookeeper
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 37
Other Based
• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr
Flink
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 38
Other Based
• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr
Storm
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 39
Other Based
• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr
Kafka
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 40
SamzaOther Based
• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 41
Search Based
• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr
ElasticSearch
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 42
Search Based
• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr
Lucene/Solr
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 43
A Quick Comparison
Guarantee Throughput
Fault Tolerance Overhead
Computation Model Windowing
Memory Management
DAG Based
Batch Support Latency
Stateful Operations
Spark Exactly Once 100k+ records/sec Low Microbatches Time Based Moving towards
automatic yes Yes seconds yes
Flink Exactly Once Low Continuous Flow Operation
Record Based / User Defined
Automatic Yes milliseconds
StormAt least Once/Exactly Once (+ Trident)
100k+ records/sec Continuous
Flow Operation yesNo (unless paired with Trident)
milliseconds no (unless with Trident)
Samza At least Once 10k+ records/sec Continuous Flow Operation milliseconds yes
Hadoop Lower High Batch Only Nope YARN is helping No Only
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 44
HDFS Based• Spark*• HBase• Impala
• H20• Apex
Other Based• Druid• Flink• ElasticSearch• Storm• Kafka• Samza• Lucene/Solr
Composite• SMACK• PANCAKE
Open Source PlatformsGeneral Classifications
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 45
Open Source Platforms
Composite• SMACK• PANCAKE
Composite
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 46
Open Source Platforms
• SMACK• PANCAKE
SMACK Stack
Spark Mesos Akka CassandraKafka
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 47
Open Source Platforms
• SMACK• PANCAKE
PANCAKE Pile
PrestoArrowNiFiCassandraAirFlowKafkaElastic Search
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 48
Open Source Platforms
• SMACK• PANCAKE
PANCAKE STACK
PrestoArrowNiFiCassandraAirFlowKafkaElasticSearchSparkTensorFlowAlgebirdCoreNLPKibana
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 49
Agenda
Fast Data Definition
Popular Open Source Platforms
What Do We Want To Be When We Grow Up?
Refreshments
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 50
What Do We Want This Meetup To Be?• How Often Do We Want To Meet?• Where? (other than here)• From Whom Do We Want To Hear?
– Vendors?– Never Vendors?
• Demos & Code?• Sponsors?• Who’s Hiring? Who’s Looking?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 51
Agenda
Fast Data Definition
Popular Open Source Platforms
What Do We Want To Be When We Grow Up?
Refreshments
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 52
Refreshments
Reston Town Center1888 Explorer StReston VA
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 53