22
@Kognitio #SparkEvent Hadoop meets Mature BI: Where the rubber meets the road for the Modern Data Platform Michael Hiskey Futurist, Product Evangelist (and VP, Marketing and Business Development www.kognitio.com

Kognitio Spark! Modern Data Platform

Embed Size (px)

DESCRIPTION

Hadoop Meets Mature BI: Where the rubber meets the road for the Modern Data Platform. This presenattion supports the Spark! Event in Atlanta, where Kognitio is a key sponsor. The event discusses the shift in how information is collected, stored and analyzed in a Big Data World. More on the Radiant Advisors' Spark! Events at http://radiantadvisors.com/spark/

Citation preview

Page 1: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

Hadoop meets Mature BI: Where the rubber meets the road for 

the Modern Data Platform

Michael HiskeyFuturist, Product Evangelist

(and VP, Marketing and Business Development

www.kognitio.com

Page 2: Kognitio Spark! Modern Data Platform
Page 3: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

Today, and the Future

Big DataAdvanced Analytics

In-memory

Modern Data Platform

Hybrid Data Ecosystem ‘Logical Data Warehouse’

Predictive Analytics

Data Scientists

Data

Page 4: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

64% Have invested/plan to invest in Big Data Tech

Have started using it8%Via TechCrunch, 23 Sept 2013

Average TBs  of stored data200Walmart DW in 19992x

Insights & Publications  May, 2011

Page 5: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

The Data ScientistSexiest job of the 21st Century?

Page 6: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

Data Scientist

The Analytical Enterprise

Business Analyst

Systems Admin

Page 7: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

Remember: Decision Support Systems?

…accessed with easeand simplicity

Historical information, latency

BI tools have plateaued

0 1 2 3 4 5 6 7 8 9

Advanced analytics & data science

More math…a lot more math

Page 8: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

create externalscript LM_PRODUCT_FORECAST environment rsintreceives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES partition by PRODNO order by PRODNO, ROW_IDsends ( R_OUTPUT varchar )isolate partitionsscript S'endofr( # Simple R script to run a linear fit on daily sales

prod1<-read.csv(file=file("stdin"), header=FALSE,row.namescolnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES")dim1<-dim(prod1)daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW),daily1[,2]<-daily1[,2]/sum(daily1[,2])basesales<-array(0,c(dim1[1],2))basesales[,1]<-prod1$IDbasesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2])colnames(basesales)<-c("ID","BASESALES")fit1=lm(BASESALES ~ ID,as.data.frame(basesales))

Behind the numbers

Page 9: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

What has changed?

More connected-users?

More-connected users?

Page 10: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

Don’t be a Railroad Stoker!Highly skilled engineering required … but the world innovated around them.

Page 11: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

Machine learning algorithms Dynamic

Simulation

Statistical Analysis

Clustering

Behaviormodelling

The drive for deeper understanding

Reporting & BPMFraud detection

Dynamic Interaction

Technology/Automation

Analytical Com

plexity

Campaign Management

Page 12: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

Key: “Graduation”Projects will need 

to Graduatefrom the 

Data Science Lab and become part 

of Business as Usual

Page 13: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

Your goal: 

PRESS HERE…and really cool Big Data stuff happens!

Page 14: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

Data flow

Page 15: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

© 20th Century Fox

Page 16: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

No need to pre‐process No need to align to schema

No need to triage 

Null storage concerns

Page 17: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

Hadoop just too slow for interactive 

BI!

…loss of train‐of‐thought

“while Hadoop shines as a processingplatform, it is painfully slow as a query tool”

Page 18: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

Lots of these

Not so many of theseinherently disk oriented

typically low ratio of CPU to Disk

Hadoop is… 

Page 19: Kognitio Spark! Modern Data Platform

@Kognitio #SparkEvent

Analytics needslow latency, no I/O wait

High speed in‐memory processing

Page 20: Kognitio Spark! Modern Data Platform

A*Modern Data Platform Reference Architecture

AnalyticalPlatform Near‐line

Storage(optional)

AccessApplication &Client Layer

All BI Tools All OLAP Clients Excel

PersistenceLayer

HadoopClusters

Enterprise DataWarehouses

LegacySystems

Reporting

Cloud Storage

*(not THE)

Page 21: Kognitio Spark! Modern Data Platform

© Hortonworks Inc. 2013

(another) Next-Generation Data Architecture

Page 21

APPLICAT

IONS

DAT

A SYSTEM

S

Microsoft Applications

DAT

A SO

URC

ES

Traditional Sources (RDBMS, OLTP, OLAP)

In‐memory MPP Accelerator

BI Tools & OLAP Clients

TRADITIONAL REPOSRDBMS EDW MPP

OPERATIONALTOOLS

MANAGE & MONITOR

DEV & DATATOOLS

BUILD & TEST

New Sources (web logs, email, sensors, social media)

HORTONWORKS DATA PLATFORM

Page 22: Kognitio Spark! Modern Data Platform

Analytical Platform