54

Fast Data Overview

Embed Size (px)

Citation preview

Page 1: Fast Data Overview
Page 2: Fast Data Overview

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Fast Data Open SourceAn Overview

Chuck ScyphersBig Data LeadEast Coast

Page 3: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 3

Safe Harbor StatementThe following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Page 4: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 4

• 25+ years experience with highly available, highly scalable, high throughput, globally-spanning enterprise class systems.

• 7 startups (2 wins, 2 break-evens,3 losses); last two built on Hadoop and NoSQL systems (resume analytics and behavioral analysisof network traffic

• Chief Data Architect, US-Visit (Department of State) and US Department of Energy SLD Project

Chuck Scyphers

Page 5: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 5

Agenda

Fast Data Definition

Popular Open Source Platforms

What Do We Want To Be When We Grow Up?

Refreshments

Page 6: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 6

Agenda

Fast Data Definition

Popular Open Source Platforms

What Do We Want To Be When We Grow Up?

Refreshments

Page 7: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 7

What Is Fast Data?“Fast data is the application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value. The goal of fast data is to quickly gather and mine structured and unstructured data so that action can be taken.”

Page 8: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

New Concepts for a Modern Data Platform ArchitecturePolyglot

Fit for Purpose Data

Lambda

Speed Layer

Batch LayerData

SourcesData

Services

Kappa

DataServices

Data PipelineDataSources

Page 9: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 9

How Data Impacts The Organization

67%

executives who say drawing intelligence from data is top priority

Source: Oracle Research Study - From Overload to Impact: An Industry Scorecard on Big Data Business Challenges, July 2012

Page 10: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 10

How Data Impacts The Organization

89%

executives who would grade themselves C or lower in preparedness

Source: Oracle Research Study - From Overload to Impact: An Industry Scorecard on Big Data Business Challenges, July 2012

Page 11: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 11

How Data Impacts The Organization

93%believe their organization is losing revenue as a result of not being able to fully leverage information

Source: Oracle Research Study - From Overload to Impact: An Industry Scorecard on Big Data Business Challenges, July 2012

Page 12: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Velocity Matters

of executives say too much critical information is delivered too late

53%

Source: Aberdeen Group – January 2012, survey of 247 executives - Data Management for BI – Big Data, Bigger Insight, Superior Performance

Page 13: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Why Do We Care?

It’s about getting more from in-flight data It’s about faster action, faster insights It’s about visibility and predictability It’s about running your business in real-time

Page 14: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 14

Key Value Drivers of Timely Accurate ActionDelivering Tangible Results With Fast Data

Higher QualityIn Operations

ImprovedEfficiency

NewServices

Better Customer Experience

Page 15: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 15

Fast Data Is Universal Across Industries

Financial Services Transportation & Logistics Telecommunications Manufacturing &

Retail

Utilities & Oil and GasHealth carePublic Sector

Page 16: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 16

Fast Data Characteristics

ANALYZEMOVE &TRANSFORM

FILTER & CORRELATE

ACT

Page 17: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 17

Fast Data Characteristics

ANALYZEMOVE &TRANSFORM

FILTER & CORRELATE

ACT

Complete In-Flight Event Processing

• Eliminate, Consolidate, Correlate, And/OrFilter Data While In Flight

• Analyze Data Streams• Enrich Data For More Accurate Decisions• Process Data In The Stream To Free Up

Back End Resources

Page 18: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 18

Fast Data Characteristics

ANALYZEMOVE &TRANSFORM

FILTER & CORRELATE

ACT

Work With The Stream

• Apply Basic Filtering At Capture• Improve Trusted Quality Of

Information• Move Data (duh)

Page 19: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 19

Fast Data Characteristics

ANALYZEMOVE &TRANSFORM

FILTER & CORRELATE

ACT

Speed Up The OODA Loop

• Get ActionableInsights

• PredictOutcomes

Page 20: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 20

Fast Data Characteristics

ANALYZEMOVE &TRANSFORM

FILTER & CORRELATE

ACT

Make Decisions That Matter Faster

• Deliver Real-Time Decisions And Recommendations To Customers/Employees

• Automatically Render Decisions Within A ProcessWith Tailored Messaging

• Integrate Human Workflow, Process Management,Activity Monitoring

Page 21: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 21

Fast Data CustomersFrom Oracle (naturally)

Page 22: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 22

Fast Data And Financial ServicesImproving Customer Experiences

• Improve Customer Experience: The goal is to connect all data about the customers to improve customer service experience and to lower the burden of hiring new representatives.

• Reduce Staffing Demands: For customers calling to discuss a claim or their coverage, it means fewer annoying waits as an agent accesses data from any of dozens of different places.

• Consolidate information in real-time: All a customer’s transactions: claims, records, status, possible cross-sell information (e.g., if someone lives in an apartment and might need renter’s insurance)

Page 23: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 23

Fast Data And Retail

• Price optimization - leveraging analytics to price goods and services on the fly based on real-time metrics such as competitor pricing, supply chain and inventory data, market data and consumer behavior data.

• Product placement analysis - processing video data to identify shopping trends, assesses effectiveness of displays to improve store layouts and product placements.

• Staffing - The largest retailers are analyzing weather forecasts, promotional campaigns and dates to effectively meet staffing requirements on holidays all year round.

Page 24: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 24

Fast Data And Public SectorLA City Planning And Traffic Analysis

• Dynamic Pricing for Toll Lanes: if a driver is paying to drive in the HOT (high-occupancy tolling) lane, he’s guaranteed a consistent speed of 45 miles per hour. If traffic starts backing up, prices for individual cars will rise to discourage them from entering, saving the lanes for high-occupancy vehicles

• Express Park: It’s not enough to know how to set the price, you have to make sure that data gets to users in real time. Drivers also need to know parking spots will still be there when they arrive in 40 minutes.

• Combining M2M: The answer lies in combining information from other sources, such as mass-transit systems, toll highways, traffic sensors and weather data to paint a real-time picture of what traffic actually looks like

Page 25: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 25

Fast Data And TelecommLocation Based Mobile Billboard Advertising at Turkcell

• Processing over 800,000 subscriberrelated events per second (with 1.5Billion Events Daily)

• Provided and executed over 50 simultaneous campaigns

• Ensured customer responsivenesswith less than 1 second times witha scalable architecture, ready toexpand on demand

Page 26: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 26

Agenda

Fast Data Definition

Popular Open Source Platforms

What Do We Want To Be When We Grow Up?

Refreshments

Page 27: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 27

HDFS Based• Spark• HBase• Impala

• H20• Apex

Other Based• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr

Composite• SMACK• PANCAKE

Open Source PlatformsGeneral Classifications

Page 28: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 28

HDFS Based• Spark• HBase• Impala

• H20• Apex

Other Based• Druid• Flink• ElasticSearch• Storm• Kafka• Lucene/Solr

Composite• SMACK• PANCAKE

Open Source PlatformsGeneral Classifications

Page 29: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 29

SparkHDFS Based

HDFS Based• Spark• HBase• Impala• H20• Apex

• In-Memory Distributed Processing Framework• Will Spill To Disk As Needed• Handles Streaming Data Through Micro-batching

Page 30: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 30

HBaseHDFS Based

HDFS Based• Spark*• HBase• Impala• H20• Apex

• A NoSQL Columnar Store Built On Top Of HDFS• Provides A Big Table–esque Processing Model

• Compression• In-memory• Bloom Filters By Column

• Offers Both Real Time Read/Write AccessAnd Random Access To HDFS

Page 31: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 31

ImpalaHDFS Based

HDFS Based• Spark*• HBase• Impala• H20• Apex

• Real-time SQL queries over data storedin HDFS or HBase• No MapReduce processing

• Uses a MPP query engine on the Hadoop cluster• Utilizes Hive metastore for metadata repository• Leveraged by numerous BI tools and applications• Not ANSI SQL

Page 32: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 32

H20HDFS Based

HDFS Based• Spark*• HBase• Impala• H20• Apex

Page 33: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 33

ApexHDFS Based

HDFS Based• Spark*• HBase• Impala• H20• Apex

Page 34: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 34

HDFS Based• Spark*• HBase• Impala

• H20• Apex

Other Based• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr

Composite• SMACK• PANCAKE

Open Source PlatformsGeneral Classifications

Page 35: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 35

Open Source Platforms

• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr

Other Based

Page 36: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 36

Other Based

• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr

Druid

MySQL

Zookeeper

Page 37: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 37

Other Based

• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr

Flink

Page 38: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 38

Other Based

• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr

Storm

Page 39: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 39

Other Based

• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr

Kafka

Page 40: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 40

SamzaOther Based

• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr

Page 41: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 41

Search Based

• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr

ElasticSearch

Page 42: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 42

Search Based

• Druid• Flink• Storm• Kafka• Samza• ElasticSearch• Lucene/Solr

Lucene/Solr

Page 43: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 43

A Quick Comparison

Guarantee Throughput

Fault Tolerance Overhead

Computation Model Windowing

Memory Management

DAG Based

Batch Support Latency

Stateful Operations

Spark Exactly Once 100k+ records/sec Low Microbatches Time Based Moving towards

automatic yes Yes seconds yes

Flink Exactly Once Low Continuous Flow Operation

Record Based / User Defined

Automatic Yes milliseconds

StormAt least Once/Exactly Once (+ Trident)

100k+ records/sec Continuous

Flow Operation yesNo (unless paired with Trident)

milliseconds no (unless with Trident)

Samza At least Once 10k+ records/sec Continuous Flow Operation milliseconds yes

Hadoop Lower High Batch Only Nope YARN is helping No Only

Page 44: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 44

HDFS Based• Spark*• HBase• Impala

• H20• Apex

Other Based• Druid• Flink• ElasticSearch• Storm• Kafka• Samza• Lucene/Solr

Composite• SMACK• PANCAKE

Open Source PlatformsGeneral Classifications

Page 45: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 45

Open Source Platforms

Composite• SMACK• PANCAKE

Composite

Page 46: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 46

Open Source Platforms

• SMACK• PANCAKE

SMACK Stack

Spark Mesos Akka CassandraKafka

Page 47: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 47

Open Source Platforms

• SMACK• PANCAKE

PANCAKE Pile

PrestoArrowNiFiCassandraAirFlowKafkaElastic Search

Page 48: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 48

Open Source Platforms

• SMACK• PANCAKE

PANCAKE STACK

PrestoArrowNiFiCassandraAirFlowKafkaElasticSearchSparkTensorFlowAlgebirdCoreNLPKibana

Page 49: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 49

Agenda

Fast Data Definition

Popular Open Source Platforms

What Do We Want To Be When We Grow Up?

Refreshments

Page 50: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 50

What Do We Want This Meetup To Be?• How Often Do We Want To Meet?• Where? (other than here)• From Whom Do We Want To Hear?

– Vendors?– Never Vendors?

• Demos & Code?• Sponsors?• Who’s Hiring? Who’s Looking?

Page 51: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 51

Agenda

Fast Data Definition

Popular Open Source Platforms

What Do We Want To Be When We Grow Up?

Refreshments

Page 52: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 52

Refreshments

Reston Town Center1888 Explorer StReston VA

Page 53: Fast Data Overview

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 53

Page 54: Fast Data Overview