51
Grab some coffee and enjoy the pre-show banter before the top of the hour!

No Time Like the Present – The Case for Streaming Analytics

Embed Size (px)

Citation preview

Grab some coffee and enjoy the pre-show banter before the top of the hour!

The Briefing Room

No Time Like the Present – The Case for Streaming Analytics

Twitter Tag: #briefr

The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected]

Twitter Tag: #briefr

The Briefing Room

!   Reveal the essential characteristics of enterprise software, good and bad

!   Provide a forum for detailed analysis of today’s innovative technologies

!   Give vendors a chance to explain their product to savvy analysts

!   Allow audience members to pose serious questions... and get answers!

Mission

Twitter Tag: #briefr

The Briefing Room

Topics

This Month: ANALYTICS

February: BIG DATA

March: CLOUD

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

Twitter Tag: #briefr

The Briefing Room

Analytics

What do you MEAN you need your data NOW?

Twitter Tag: #briefr

The Briefing Room

Analyst: John Myers

John Myers is Research Director of Business

Intelligence at Enterprise Management

Associates

Twitter Tag: #briefr

The Briefing Room

SQLstream

! SQLstream is an enterprise software company focused on making businesses responsive to real-time Big Data assets

!   Its platform provides a relational stream for analyzing large volumes of service, sensor, and machine and log file data

!   SQL queries in SQLstream generate results continuously as data becomes available

Twitter Tag: #briefr

The Briefing Room

Guests: Damian Black & Christian Lees

Damian Black CEO, SQLstream

Christian Lees CTO, InfoArmor

•  Career in high tech, real-time software sector, with senior positions at HP, XACCT (now Amdocs) and Followap (now Neustar)

•  Holds 11 US patents

•  Finalist in the 1995 International Management Challenge

•  Over 15 years of information security, network security and intrusion detection experience

•  CTO of InfoArmor, with previous experience at Level 3 Communications, Trustwave and owner of Sage Technologies

| 10 Copyright © 2014 | +1 877 571 5775 | [email protected]

SQLs t ream: Real - t ime B ig Data P la t form

facts

o  Launched 2009

o  Deployments across many industries

o  Real world benchmarks

capabilities

o  Unstructured and structured data

o  Accelerates and extends Hadoop & RDBMS

o  Not only SQL

innovations

o  Massively scalable streaming data platform

o  Only standard SQL streaming engine

o  Five patents for stream processing

Streaming Analytics from

High-velocity Machine Data

| 11 Copyright © 2014 | +1 877 571 5775 | [email protected]

Se lec ted Cus tomers & Par tners

Internet of Things & Sensors

Intelligent Transportation

IT Operations

Telecommunications Security Intelligence

Smarter Internet

Selected Strategic Partners

| 12 Copyright © 2014 | +1 877 571 5775 | [email protected]

B r idg ing The Chasm

As we move toward a real-t ime business e n v i ro n m e n t , t h e capability to process data flows swiftly and flexibly will become i n c r e a s i n g l y important. SQLstream leads the industry in t h i s k i n d o f capability.

” Robin Bloor

Chief Analyst for Bloor Group

Business Intelligence

Post-hoc Analysis

Data Warehousing

Strategic insights

Operations

Transaction Processing

Machine Data

Everyday business

Operational Intelligence integrates Operations and BI

| 13 Copyright © 2014 | +1 877 571 5775 | [email protected]

B r idg ing The Chasm

Business Intelligence

Post-hoc Analysis

Data Warehousing

Strategic insights

Operations

Transaction Processing

Machine Data

Everyday business

Operational Intelligence integrates Operations and BI

Operational Intelligence Optimizes tactical decisions from real-time actionable insights

Combines operations data with BI data continuously

Provides Real-time integrated view of the business and operations

Security Compliance

Fraud Quality

Promotion Advertising Cross-selling

As we move toward a real-t ime business e n v i ro n m e n t , t h e capability to process data flows swiftly and flexibly will become i n c r e a s i n g l y important. SQLstream leads the industry in t h i s k i n d o f capability.

” Robin Bloor

Chief Analyst for Bloor Group

| 14 Copyright © 2014 | +1 877 571 5775 | [email protected]

T he I n fo r ma t i on Va l ue Cha i n

| 15 Copyright © 2014 | +1 877 571 5775 | [email protected]

T he I n fo r ma t i on Va l ue Cha i n

What is happening?

What might happen?

What just happened?

Make it happen!

STREAMING ANALYT ICS

| 17 Copyright © 2014 | +1 877 571 5775 | [email protected]

Ana ly t i c s p rev i ou s l y mean t H igh - la ten cy

Current architectures o  Multi-stage processing o  Batch ETL o  Interim operational data stores

IMPACT o  High Cost of Ownership o  Delays to internal customers and consumers o  Delays to external customers and partners

WAREHOUSE

Near-term data storage

PLATFORMS

ETL

| 18 Copyright © 2014 | +1 877 571 5775 | [email protected]

S t r eam ing Ana ly t i c s M a s s i v e l y p a r a l l e l w i t h i n c r e m e n t a l e v a l u a t i o n

Enhancing with historical information

Storage of intermediate & final query results

Ope

ratio

nal I

ntel

ligen

ce

Logs

Sensors

Mobile

Networks

Wireless

Radio

M2M

Internet

Security gateways

¤  Continuous queries on unstructured & structured streaming data ¤  Incremental query results ¤  Predictive analytics & automated actions

| 19 Copyright © 2014 | +1 877 571 5775 | [email protected]

SQL W h e r e i s t h e i n t e l l i ge n c e ?

TRANS,2013-02-17-15:30:22,3458783,2347897953,128.56.0.253,STATUS:-15, DE69975, 4157588342 Transaction Log Details

Web Server Logs

CDRs

Device Locations

Twitter {"created_at:Thu Feb 17 15:30:55 +0000 2013,id:304612775055998976,id_str:304612775055998976,text:@MyServiceProvider today sucks, keeps dropped!,source:u006ca href=http:www.url.com rel=nofollow,followers_count:147,friends_count:10142, location: San Francisco, time_zone: Pacific, geo_enabled:true, location:u00dcT: -6.1987552,106.8661953, screen_name:APerson

<id>1597831220</id><deviceid>0198873465</deviceid><lat>lat=47.643957</lat><lon>lon= -122.3269</lon><time>2013-02-17T15:37:26Z</time><bearing>223.4535</bearing>

<id>1597865781</id><deviceid>0198873465</deviceid><lat>lat=47.645982</lat><lon>lon=-122.327500</lon><time>2013-02-17T15:37:26Z</time><bearing>200.6138</bearing>

<id>1597940125</id><deviceid>0198873465</deviceid><lat>lat=47.647381</lat><lon>lon=-122.326501</lon><time>2013-02-17T15:37:26Z</time><bearing>87.4357</bearing>

[Sun Feb 17 15:30:49 2013] [notice] srv-sfo-08 caught SIGTERM, shutting down [Sun Feb 17 15:30:49 2013] [notice] Apache/2.2.21 -- resuming normal operations

TERMINATE,ctl09gsx,01299796304,GMT-08:00,02-17-13,15:21:00,9,387,64ms,02-17-13,15:30:55,0005, IP-TO-IP,4157588342,8775715775,1,0,4157588342,RD_AXY_NN0_001,SFR01AAG34,40.50.245.60, 234.234.60.75,65678,411,399,SIP,SANFRANCISCO,0x4B1698,0x0005E,0x49768,4157588342,0198873465

| 20 Copyright © 2014 | +1 877 571 5775 | [email protected]

SQL W h e r e i s t h e i n t e l l i ge n c e ?

TRANS,2013-02-17-15:30:22,3458783,2347897953,128.56.0.253,STATUS:-15, DE69975, 4157588342 Transaction Log Details

Web Server Logs

CDRs

Device Locations

Twitter {"created_at:Thu Feb 17 15:30:55 +0000 2013,id:304612775055998976,id_str:304612775055998976,text:@MyServiceProvider today sucks, keeps dropped!,source:u006ca href=http:www.url.com rel=nofollow,followers_count:147,friends_count:10142, location: San Francisco, time_zone: Pacific, geo_enabled:true, location:u00dcT: -6.1987552,106.8661953, screen_name:APerson

<id>1597831220</id><deviceid>0198873465</deviceid><lat>lat=47.643957</lat><lon>lon= -122.3269</lon><time>2013-02-17T15:37:26Z</time><bearing>223.4535</bearing>

<id>1597865781</id><deviceid>0198873465</deviceid><lat>lat=47.645982</lat><lon>lon=-122.327500</lon><time>2013-02-17T15:37:26Z</time><bearing>200.6138</bearing>

<id>1597940125</id><deviceid>0198873465</deviceid><lat>lat=47.647381</lat><lon>lon=-122.326501</lon><time>2013-02-17T15:37:26Z</time><bearing>87.4357</bearing>

[Sun Feb 17 15:30:49 2013] [notice] srv-sfo-08 caught SIGTERM, shutting down [Sun Feb 17 15:30:49 2013] [notice] Apache/2.2.21 -- resuming normal operations

TERMINATE,ctl09gsx,01299796304,GMT-08:00,02-17-13,15:21:00,9,387,64ms,02-17-13,15:30:55,0005, IP-TO-IP,4157588342,8775715775,1,0,4157588342,RD_AXY_NN0_001,SFR01AAG34,40.50.245.60, 234.234.60.75,65678,411,399,SIP,SANFRANCISCO,0x4B1698,0x0005E,0x49768,4157588342,0198873465

Timestamp

Timestamp

Timestamp

Timestamp

Timestamp

| 21 Copyright © 2014 | +1 877 571 5775 | [email protected]

SQL W h e r e i s t h e i n t e l l i ge n c e ?

TRANS,2013-02-17-15:30:22,3458783,2347897953,128.56.0.253,STATUS:-15, DE69975, 4157588342 Transaction Log Details

Web Server Logs

CDRs

Device Locations

Twitter {"created_at:Thu Feb 17 15:30:55 +0000 2013,id:304612775055998976,id_str:304612775055998976,text:@MyServiceProvider today sucks, keeps dropped!,source:u006ca href=http:www.url.com rel=nofollow,followers_count:147,friends_count:10142, location: San Francisco, time_zone: Pacific, geo_enabled:true, location:u00dcT: -6.1987552,106.8661953, screen_name:APerson

<id>1597831220</id><deviceid>0198873465</deviceid><lat>lat=47.643957</lat><lon>lon= -122.3269</lon><time>2013-02-17T15:37:26Z</time><bearing>223.4535</bearing>

<id>1597865781</id><deviceid>0198873465</deviceid><lat>lat=47.645982</lat><lon>lon=-122.327500</lon><time>2013-02-17T15:37:26Z</time><bearing>200.6138</bearing>

<id>1597940125</id><deviceid>0198873465</deviceid><lat>lat=47.647381</lat><lon>lon=-122.326501</lon><time>2013-02-17T15:37:26Z</time><bearing>87.4357</bearing>

[Sun Feb 17 15:30:49 2013] [notice] srv-sfo-08 caught SIGTERM, shutting down [Sun Feb 17 15:30:49 2013] [notice] Apache/2.2.21 -- resuming normal operations

TERMINATE,ctl09gsx,01299796304,GMT-08:00,02-17-13,15:21:00,9,387,64ms,02-17-13,15:30:55,0005, IP-TO-IP,4157588342,8775715775,1,0,4157588342,RD_AXY_NN0_001,SFR01AAG34,40.50.245.60, 234.234.60.75,65678,411,399,SIP,SANFRANCISCO,0x4B1698,0x0005E,0x49768,4157588342,0198873465

Timestamp

Timestamp

Timestamp

Timestamp

Timestamp

Mobile # Customer

Mobile # Device ID Term Reason

Device ID Location

Location

Service Provider

Fail Code

Server

| 22 Copyright © 2014 | +1 877 571 5775 | [email protected]

CLEANING & FILTERING

STREAMING ANALYTICS

STREAMING AGGREGATION

CONTINUOUS INTEGRATION

QoE Rating Fraud

Monitoring Billing

Network Analysis

S t ream ing Ana ly t i c s P la t fo r m

Log Sensors Mobile Networks M2M Radio towers

| 23 Copyright © 2014 | +1 877 571 5775 | [email protected]

Data Warehouse

Rea l - t ime A r c h i t e c t u re Continuous Raw Data Ingestion, Integration, Analysis and Output of Derived Data in Real-time

Streaming Agent/Adapter Layer + JDBC API

Query Planner & Optimizer for MPP Execution SQL

Developer Tools

Platform Administration

Streaming SQL Real-time Applications

Real-time Dashboards & Visualization

Logs

Sensors

GPS

Networks Social Media Servers

M2M Telematics

Impala SQL

HBase

HDFS / MR

Hadoop for Stream Persistence, Enrichment & Replay (Optional)

External Data Warehouses & Systems

| 24 Copyright © 2014 | +1 877 571 5775 | [email protected]

SQL s t ream s -S t ream ing P roduc t Po r t fo l i o

s-Server Data Management Platform for Streaming Big Data

s-Analyzer Drag and Drop Application Builder for

Streaming Analytics Applications

s-Transport Geo-Analytics for Location-based

Applications

s-Visualizer Advanced Enterprise

Visualization

s-Cloud s-Server EC2 AMI Deployment

s-St

udio

D

evel

oper

& A

dmin

Con

sole

Stre

amA

pps

Fast

Sta

rt S

trea

min

g A

pps

Dashboards

Case s t ud ie s

| 26 Copyright © 2014 | +1 877 571 5775 | [email protected]

SELECT STREAM ROWTIME, url, numErrorsLastMinute FROM ( SELECT STREAM ROWTIME, url, numErrorsLastMinute, AVG(numErrorsLastMinute) OVER lastMinute AS avgErrorsPerMinute, STDDEV(numErrorsLastMinute) OVER lastMinute AS stdDevErrorsPerMinute FROM ServiceRequestsPerMinute WINDOW lastMinute AS (PARTITION BY url RANGE INTERVAL ‘1’ MINUTE PRECEDING) ) AS S WHERE S.numErrorsLastMinute > S.avgErrorsPerMinute + 2 * S.stdDevErrorsPerMinute;

CLOUD INFRASTRUCTURE MONITOR ING C l o u d i n f r a s t r u c t u r e m o n i t o r i n g w i t h B o l l i n ge r b a n d s

BUSINESS NEED: Detect run-away applications

before resource consumption becomes an issue.

| 27 Copyright © 2014 | +1 877 571 5775 | [email protected]

SQLstream

Cus tomer Benc hmarked Performance L a r g e n e t w o r k & t e l e c o m e q u i p m e n t m a n u f a c t u r e r

Network Data

Network Data

Network Data

Network Data

Network Data

ENRICH SHARE ANALYZE

Remote Agent

Remote Agent

Remote Agent

Remote Agent

Remote Agent

Data Warehouse

External Systems

External Data

PERFORMANCE STATISTICS System Throughput: 1.35M events / sec

Server Configuration: 1 x 4-core CPU

Event Size: ~1KB

Data Sources: Many

SYSTEM CHARACTERISTICS Collection: Intelligent Remote Agents (Distributed)

Enrichment: Streaming data augmentation

Analytics: Temporal & spatial pattern detection

Output: Data warehouse + applications (JDBC)

| 28 Copyright © 2014 | +1 877 571 5775 | [email protected]

“SQLstream allows Veracity to provide vital real-time reports to our customers that previously took hours to create. SQLstream also provides real-time monitoring and insight into network concerns allowing Veracity to proactively address any such issues.”

Case s tudy : Cal l Rat ing & Fraud

Veracity Networks

| 29 Copyright © 2014 | +1 877 571 5775 | [email protected]

Case s tudy : f raud prevent ion ( con t . )

Alerts

Triggers

Reports

STREAMING

ANALYTICS

•  Call suspension •  Acct. suspension •  Emails

Destination

Location

IP spoofing alerts

Customer call profile

dura

tion

Mo Tue Wed Thu Fri Sat Sun

① LA ② Nairobi ③ NY ④ …..

① LA ② SF ③ NY ④ ….

① LA ② Detroit

① LA ② LA1

I n foAr mor ca se s t udy

| 31 Copyright © 2014 | +1 877 571 5775 | [email protected]

Case s tudy : Cybersecur i ty

InfoArmor

¤ Founded by Washington Mutual to protect 10M credit card holders

¤ Growing at triple digit rates ¤ Engaged, satisfied subscribers

NEEDS ¤  Decision engine

¤ Consume agnostic data sources ¤ Scalable ¤ Real-time

| 32 Copyright © 2014 | +1 877 571 5775 | [email protected]

Ca se s t udy : Cyber se cu r i t y a g row i ng mar ke t

¤  No longer an unorganized hacker world ¤  Innovation and technology ¤  Global economy ¤  Political support

$207 Billion

Entrepreneur.com

In 2012, U.S. Navy databases were hacked and 200,000 sailors’ information was put at risk.

| 33 Copyright © 2014 | +1 877 571 5775 | [email protected]

Cybe r A t ta c k s | DAMAGES

î  12.6 Million Americans were ID Theft victims last year

î  608,271,950 and growing records have been compromised due to security breaches since 2005

î  94% of healthcare organizations surveyed had at least one data breach in the past 2 years

î  1 in 4 data breach notification recipients became a victim of identity fraud

î  5 times more likely to be a fraud victim if your Social Security Number has been compromised in a data breach

| 34 Copyright © 2014 | +1 877 571 5775 | [email protected]

INTERNET SURVEILLANCE

InfoArmor Internet Surveillance uses bots to continuously monitor the Underground Economy to uncover compromised,

sensitive information. Whether it is personal identifying data or a medical insurance card, Internet Surveillance

uncovers breached data and alerts in real time.

What We Monitor:

¤  Malicious Command & Control Networks

¤  Black Market Forums

What is the Underground

Economy?

An ever-evolving complex of compromised machines, networks and web services identified by InfoArmor and leading cyber security firms.

¤  Phishing Networks

¤  Exploited Websites

¤  Known Compromised Machines & Servers

| 35 Copyright © 2014 | +1 877 571 5775 | [email protected]

INTERNET SURVEILLANCE

How We Monitor:

¤  Proprietary hardware and software solution

¤  Unparalleled alert accuracy (minimized false positives)

¤  Secure: separate reconnaissance and analysis efforts, plus no refined search queries

What We Monitor:

¤  Credentials, SSNs, names, addresses, emails and DOBs

¤  Wallet items (i.e. credit cards, medical insurance card)

INFOARMOR BOTS monitor UNDERGROUND ECONOMY

COMPROMISED DATA sent back to INFOARMOR

SENSOR compares compromised to subscriber data in secure environment, creating

ALERTS with 100% accuracy

X

| 36 Copyright © 2014 | +1 877 571 5775 | [email protected]

Case s tudy : S t reaming analy t i c s

SQLstream BENEFITS ¤ Ability to adapt to many data sources ¤ Real Time analysis and alerting ¤ Offset database load ¤ Data Hygiene prior to data warehousing

RESULTS ¤ Real-time actionable alerts ¤ Unity in Ingress Data points ¤ Dual Purpose solution

•  Helps Compliance ¤ Plans to expand engagement

o f f l i n e o n l i n e

Damian Black

Email | [email protected]

Website | www.sqlstream.com

DOWNLOADS | http://www.sqlstream.com/downloads/

Twitter Tag: #briefr

The Briefing Room

Perceptions & Questions

Analyst: John Myers

John L Myers Enterprise Management Associates Research Director [email protected]

Importance of Speed of Response in Big Data

© 2012 Enterprise Management Associates, Inc.

Speaker

John Myers joined Enterprise Management Associates in 2011 as senior analyst of the business intelligence (BI) practice area. John has 10+ years of experience working in areas related to business analytics in professional services consulting and product development roles, as well as helping organizations solve their business analytics problems, whether they relate to operational platforms, such as customer care or billing, or applied analytical applications, such as revenue assurance or fraud management.

John L Myers Enterprise Management Associates Research Director

JohnLMyers44 © 2013 Enterprise Management Associates, Inc. Slide 40

Disruptive Forces in Data Management: Changing the Speed of Business

25 35 45 55 65 75

© 2013 Enterprise Management Associates, Inc. Slide 41

Use Cases met with Big Data Implementations

•  Speed of processing response •  Combining data by structure •  Pre-processing data •  Utilization of streaming data •  Staging structured data •  Online archiving

Rogers, Myers and Devlin, "Big Data: Operationalizing the Buzz", Enterprise Management, http://research.enterprisemanagement.com/big-data-2013-webinar-nl.html

© 2013 Enterprise Management Associates, Inc. Slide 42

Big Data Platforms have Multiple Use Cases

© 2013 Enterprise Management Associates, Inc. Slide 43

Top 5 Business Challenges Met with Big Data Projects

•  Risk management •  Fraud Analysis, Liquidity Risk Assessment

•  Ad-hoc operational queries •  Customer Relations Management

•  Asset optimization •  Staff Scheduling, Logistical Asset Planning

•  Operational event and policy processing •  Billing, Rating

•  Campaign Optimization •  Market Basket Analysis, Cross-sell/Up-sell Recommendation

•  Clustering, social graph analysis •  Grouping and Relationship Analysis, Geographic Optimization

Rogers, Myers and Devlin, "Big Data: Operationalizing the Buzz", Enterprise Management, http://research.enterprisemanagement.com/big-data-2013-webinar-nl.html

© 2013 Enterprise Management Associates, Inc. Slide 44

Building the Bridge between Operational Processes and Analytical Results

© 2013 Enterprise Management Associates, Inc. Slide 45

Hybrid Data Ecosystem 2013: From Requirements to Consumers

© 2013 Enterprise Management Associates, Inc. Slide 46

Questions

© 2013 Enterprise Management Associates, Inc. Slide 47

•  This version of “streaming analytics” sounds a lot like “complex event processing.” How does SQLstream differentiate from those solutions?

•  The open source community, such as Apache Hadoop, has been coming up with solutions to problems like streaming. What advantages does a proprietary solution like SQLstream have over these solutions?

•  “Streaming analytics” appears to be well suited for the upcoming trends in the “location based services” in mobile telecom and “telematics” in automotive. Which use cases appear to have the best chances of success? Marketing activities such as “location coupons?” Operational optimization such as “managed highways?”

Questions

© 2013 Enterprise Management Associates, Inc. Slide 48

•  What are the best types of datasets to be used in the world of “streaming analytics?” Structured big data or large volumes of single row event data (i.e., log information)? Formatted multi-row event data (i.e., JSON)? •  What types of datasets should be avoided?

•  What types of analytical techniques are best used with “streaming analytics?” Advanced analytical models associated with predictive or clustering algorithms? Rules-based, policy techniques (i.e., decision trees)? Simple descriptive analytics?

•  What types of analytics techniques should be avoided?

Twitter Tag: #briefr

The Briefing Room

Twitter Tag: #briefr

The Briefing Room

Upcoming Topics

www.insideanalysis.com

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: ANALYTICS

February: BIG DATA

March: CLOUD

Twitter Tag: #briefr

The Briefing Room

Thank You for Your

Attention