42
Grab some coee and enjoy the pre-show banter before the top of the hour!

Time Difference: How Tomorrow's Companies Will Outpace Today's

Embed Size (px)

Citation preview

Grab some

coffee and

enjoy the

pre-show

banter before

the top of the

hour!

The Briefing Room

Time Difference: How a New Architecture Changes the Game

Twitter Tag: #briefr The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected] @eric_kavanagh

Twitter Tag: #briefr The Briefing Room

  Reveal the essential characteristics of enterprise software, good and bad

  Provide a forum for detailed analysis of today’s innovative technologies

 Give vendors a chance to explain their product to savvy analysts

  Allow audience members to pose serious questions... and get answers!

Mission

Twitter Tag: #briefr The Briefing Room

Topics

February: DATA IN MOTION

March: BI/ANALYTICS

April: BIG DATA

Twitter Tag: #briefr The Briefing Room

Parmenides and the Truth of Now

"Parmenides". Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Parmenides.jpg#mediaviewer/File:Parmenides.jpg

There is no tomorrow

There is no yesterday

There is only today

There is only now

Twitter Tag: #briefr The Briefing Room

Analyst: Mark Madsen

Mark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, data integration and data management. Mark is an award-winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributor to Forbes Online and on the O’Reilly Strata program committee. For more information or to contact Mark, follow @markmadsen on Twitter or visit http://ThirdNature.net

Twitter Tag: #briefr The Briefing Room

WebAction

WebAction offers real-time data-driven apps and the underlying enterprise platform

  The platform captures structured and unstructured data from a wide variety of data sources and allows users to correlate and enrich data streams

WebAction leverages in-memory data processing and is architected to scale up and scale out

Twitter Tag: #briefr The Briefing Room

Guest: Sami Akbay

Sami Akbay is a founder of WebAction. Prior to WebAction, he served as the CEO of Altibase, Inc., an in-memory RDBMS company with customers in financial services, utilities, and telecommunications. Sami was Vice President of Marketing and Product Management for GoldenGate Software from 2004 through its acquisition by Oracle. Prior to GoldenGate, he served in senior product marketing and business development roles at Embarcadero and AltoWeb. He spent his earlier career in technical and consulting roles working at Rabobank Nederlands, Hearst New Media, American Stock Exchange, MediaMetrix, OneMain.com (Earthlink), and ALK Associates. He is a graduate of Rutgers University.

High-Velocity Big Data Analytics February 2015

PROPRIETARY & CONFIDENTIAL

Because actionable insights come from combining analyzed history and what is

happening right now.

PROPRIETARY & CONFIDENTIAL

•  Insights come from analyzing historic data:

–  What is the average hourly sales for our Boston store on a typical weekday in February?

–  Who are my top 1% passengers by revenue for 2014?

–  How many dropped calls does my average subscriber experience before cancelling service if they have a 2 year contract and $250 cancellation penalty?

PROPRIETARY & CONFIDENTIAL

•  Events without context are not very meaningful

–  In the last 30 minutes, we had a revenue of $8,000 in our Boston store.

–  Mark Madsen will miss his connection from ORD to EWR because his flight departed late from SFO

–  Sami Akbay dropped calls 3 times in the last 30 minutes

PROPRIETARY & CONFIDENTIAL

•  Actionable insights combine analyzed history with realtime event streams:

–  We typically sell $3000 per hour on a weekday in February at our Boston store. In the last 30 minutes we sold $8,000. Alert the store manager and require ID check at checkout.

–  Mark Madsen is a top 1% passenger by revenue. Have an agent meet him at the gate and deliver his boarding pass for the next flight.

–  A subscriber will drop 8 calls before becoming a churn risk. Don’t give him a service discount as an incentive if he calls 611.

PROPRIETARY & CONFIDENTIAL

PROPRIETARY & CONFIDENTIAL

Client  Server   OLTP  

Data  Warehouse  

Applica9on  Server  

Transaction Data

PROPRIETARY & CONFIDENTIAL

Data  Warehouse  

Device Data

Industry Data

Social Feeds

Transaction Data

System/ IT Data

Hadoop

ETL

(Existing) ETL

WebAction

Batch  /  

High-­‐Laten

cy  

Real9m

e  /  

Low-­‐Laten

cy  

EDW

Realtime Applications

Legacy Applications

Pig Hive

Map/Reduce Applications

Users  

Hadoop

Device Data

Industry Data

Social Feeds

Transaction Data

System/ IT Data

PROPRIETARY & CONFIDENTIAL

WebAction® delivers the most comprehensive Realtime Stream Analytics Platform

enabling the tailored enterprise-scale Big Data Applications

for the Agile Enterprise

PROPRIETARY & CONFIDENTIAL

Acquire Store Process

Acquire Process in Memory Deliver

BI / Analytics RDBMS EDW

Structured Data

Machine Data

Location Click Stream

Structured Data

Machine Data

Location Click Stream

Data Driven Apps

Batch Reactive

R E A LT I M E B A R R I E R

Proactive Realtime

Visualizations Store

Alerts Integrate

PROPRIETARY & CONFIDENTIAL

Anomaly and Pattern Detection in Real-time

PROPRIETARY & CONFIDENTIAL

Structured and unstructured data

Distributed, in-memory, as data is created

Correlated, enriched, and filtered real-time big data records

Deliver

Process

Acquire

PROPRIETARY & CONFIDENTIAL

Acquire Structured and unstructured data

§  Data from transactional sources is acquired via redo or transaction logs

§  Structured and non-Structured data

§  No Production Impact

§  No Application changes

Device Data

Industry Data

Social Feeds

Real-Time Transaction Data

System/ IT Data

Common File Format

TYPE EXAMPLE COMPLEXITY

CSV, JSON, XML

Facebook, Twitter

Syslogs, weblogs, Netflow

SmartMeter, Medical Device, RFID

SWIFT, HL7, FIX

Oracle, DB2, SQLServer, MySQL, HP NonStop

SIMPLE

VERY HIGH

SIMPLE TO MEDIUM

MEDIUM

MEDIUM

HIGH

PROPRIETARY & CONFIDENTIAL

Process Distributed, in-memory, as data is created

§  Enrich live Big Data with historical data sources

§  Process Big Data faster using partitioned streams, caches, and additional nodes

§  Execute SQL-like queries of in-memory Big Data

§  Alert in real-time based on predictive analytic model results

Acquire Structured and unstructured data

PROPRIETARY & CONFIDENTIAL

Acquire

Process

Structured and unstructured data

Distributed, in-memory, as data is created

Deliver Correlated, enriched, and filtered real-time big data records

§  Continuous Big Data Records §  Real-Time Dashboards §  Predictive Alerts §  Business Trends §  Data Patterns §  Outliers

PROPRIETARY & CONFIDENTIAL

Metadata

High Speed D

ata Acquisition

WActionStore

Distributed WAction Cache

Distributed DIM Processor

Tungsten Visualization Device Data

Big Data Infrastructure

Industry Data

Social Feeds

Transaction Data

Enterprise Applications

Enterprise Data Warehouse

RDBMS

Data Driven Apps

System/ IT Data

PROPRIETARY & CONFIDENTIAL

•  How is it different from –  CEP? –  ETL? –  Messaging? –  in-memory database?

Twitter Tag: #briefr The Briefing Room

Perceptions & Questions

Analyst: Mark Madsen

Copyright  Third  Nature,  Inc.  

We  are  in  a  transi*onal  phase  in  IT  architecture  Then   State  of  Prac*ce   Now,  forward  

Architecture   Timeshare   Client/server   Cloud  

Data   Core  TXs   All  TXs,  some  events,  docs  

All  data  

Rate  of  change   Slow   Rapid   Con9nuous  

Uses   Few   Many   Everything  

Latency   Daily+++   <  daily  to  minutes  

Immediate  

Data  plaAorm   Uniprocessor   SMP,  cluster   Shared  nothing  

Copyright  Third  Nature,  Inc.  

Majority  use  of  compu*ng  over  *me  

1930s-­‐1950s:  Calculate  

1960s-­‐1980s:  Automate  

1990s-­‐2010s:  Informate  

2010s+:    Analyze  and        Actuate  

Computing technology has become a tool of observation and actuation, not just a recipient of human-entered data

Rising organizational com

plexity

Copyright  Third  Nature,  Inc.  

The  data  warehouse  vs  business  agility  

All  the  data  

Ready-­‐to-­‐use  common,  typed,  tabular  data  

The  bo[leneck  is  you  

0  1  2  3  4  5  6  7  

Polling  is  not  streaming,  minutes  is  not  real  *me  

32

0  1  2  3  4  5  6  7  

The problem is visible here after 2.5 minutes, at the earliest

The problem is visible here 4 seconds after the first bad event

 Stream

ing  mod

el  Po

lling  m

odel  

Events recorded, processed, stored in DB and ready after 2.5 minutes  

Action taken after 3 minutes, at 3.5 minutes  

Problem completely resolved at 4 minutes  

Something broke  

1st bad event detected  

Action taken after 3 minutes, at 6 minutes  

Problem completely resolved at 6.5 minutes  

Reaction takes 3 minutes…  

Reaction takes 3 minutes…  

Streaming  

Polling  

Alert  threshold  Problem

gets worse  

Action taken  

Copyright  Third  Nature,  Inc.  

The  data  warehouse  is  not  designed  for  real  *me  A  polling  architecture  does  not  work  well  for  event  data  ▪  Introduces  latency  ▪  Polling  creates  performance  and  scaling  problems  

The  DW  can’t  handle  real-­‐9me  ingest  ▪  One  of  the  original  DW  design  assump9ons:  solve  for  conflic9ng  workloads  by  separa9ng  them  in  9me  ▪ Workload  management  has  limits  ▪  Scalability  problem  for  event  streams  ▪  Spiky  flow  pa[erns  and  dynamic  scaling  

Sta9c  schema:  ▪ What  happens  first,  upstream  change  or  data  model  change?  ▪ What  is  your  reac9on  9me?  The  problem  of  dropped  packets  

Copyright  Third  Nature,  Inc.  

The  crea*on  and  flow  of  data  is  different  for  transac*ons  and  machine-­‐generated  events  

Data entry Extract Cleanse Load Use

Data Generation

Store

Store

Use

Use

The process for most human-entered data; human speed

The process for machine-generated data; machine speed

Cleanse

Program

Copyright  Third  Nature,  Inc.  

Real-­‐9me  monitoring  is  not  polling  Real-­‐9me  monitoring  o"en  needs  to  access  history  The  data  in  mo9on  and  the  data  at  rest  is  the  same  data.    

Therefore:    

Real  9me  (in  mo9on)  and  persistence  (at  rest)  must  be  supported  by  the  same  architecture      

Copyright  Third  Nature,  Inc.  

Flowing Unloaded

Sliding window of “now”

Persisted but not yet loaded into DB

Queryable history

Stored in database / datastore

Real  *me  isn’t  either-­‐or,  it’s  part  of  the  architecture  

A DB can get you to within minutes (at large scale) but it won’t be easy or cheap

Streaming SQL, stream engines, CEP may be used for these

Real-time monitoring doesn’t use only real-time data: windows, restarts, detecting deviation, so the above boundaries are crossed.

ESB Cache/Queue Database

Copyright  Third  Nature,  Inc.  

Deliver

Refine

Manage

Store

Ingest

This  implies  a  new  DW  architecture,  data  modeling  approach  

Analyze

Use

Decouple the data architecture layers

Copyright  Third  Nature,  Inc.  

Stream

If  you  want  to  do  real  *me  and  s*ll  manage  your  data  effec*vely  then  you  need  this  data  architecture  

Collect Refine Manage Deliver

Flowing Managed history Persisted

Metadata? Metadata ?

Flow, persisted, managed define different storage and retrieval requirements

Copyright  Third  Nature,  Inc.  

Ques*ons  Why  an  integrated  product  rather  than  other  alterna9ves  like  a  RT  streaming  engine  or  a  streaming  SQL  database?  What  do  you  do  at  the  metadata  layer  to  expose  data  this  is  a  message,  a  table,  or  both?  What  mechanisms  does  it  use  to  scale?  How  does  one  deploy  the  user  interface  por9on  of  an  applica9on?  What  happens  if  there’s  a  reader  /  writer  lag  or  failure?  How  do  you  handle  recovery  in  the  event  of  a  stream  failure  (one  stream,  correlated  stream)?  Can  you  /  how  do  you  persist  data  that  you  calculate  and  display?  What  types  of  streaming  func9ons  do  you  support  (e.g.,  windows  –  sliding  /jump  9me,  count,  9me  series  alignment)?  How  complex  of  a  calcula9on  can  you  create?  

Twitter Tag: #briefr The Briefing Room

Twitter Tag: #briefr The Briefing Room

Upcoming Topics

www.insideanalysis.com

February: DATA IN MOTION

March: BI/ANALYTICS

April: BIG DATA

Twitter Tag: #briefr The Briefing Room

THANK YOU for your

ATTENTION!

Some images provided courtesy of Wikimedia Commons and Wikipedia