44
Big Data and the BI Wild West Don’t Bring an Elephant to a Gun Fight! Paul Groom

Big Data and the BI Wild West

  • Upload
    elita

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Big Data and the BI Wild West. Don’t Bring an Elephant to a Gun Fight!. Paul Groom. Tools Processes Objectives. Why Business Intelligence?. Community. Acquire. View. Learn. Action. What is Business Intelligence?. Numbers Tables Charts I ndicators. Time - History - Lag. Access - PowerPoint PPT Presentation

Citation preview

Page 1: Big Data and the BI Wild West

Big Data and the BI Wild WestDon’t Bring an Elephant

to a Gun Fight!

Paul Groom

Page 2: Big Data and the BI Wild West
Page 3: Big Data and the BI Wild West

ToolsProcessesObjectives

Page 4: Big Data and the BI Wild West

Why Business Intelligence?

ViewLearn

Action

CommunityAcquire

Page 5: Big Data and the BI Wild West

What is Business Intelligence?

NumbersTablesChartsIndicators

Time - History - Lag

Access - to view (portal) - to data - to depth - Control/Secure

Consumption - digestion

…with ease and simplicity

Page 6: Big Data and the BI Wild West

Business [Intelligence] Desires

More timelyLower latency

More granularityMore users interactions

Richer data model

Self service

Page 7: Big Data and the BI Wild West

View and generate

Page 8: Big Data and the BI Wild West

Got mobile?

200 millionEmployees bring their own

device to work

Nearly halfOf the workforce will be made

up of millennials by 2020

50%Companies BYOD orgs have had

a security breach

1/3Have broken or would break

corporate policy on BYOD

Page 9: Big Data and the BI Wild West

Data flow

Page 10: Big Data and the BI Wild West
Page 11: Big Data and the BI Wild West

Dynamic accessDrill unlimited

Disruption: Data Discovery tools

Page 12: Big Data and the BI Wild West

BI tools have plateaued…again

Decision Support (Reporting) in late 90’s

Business Intelligence of 00’s

…led to data mining

…leading to analytics and data science

Page 13: Big Data and the BI Wild West

More math

…a lot more math

Page 14: Big Data and the BI Wild West

Machine learning algorithms Dynamic

Simulation

Statistical Analysis

Clustering

Behaviour modelling

The drive for deeper understanding

Reporting & BPMFraud detection

Dynamic Interaction

Technology/Automation

Anal

ytica

l Com

plex

ity

Campaign Management

Page 15: Big Data and the BI Wild West

create external script LM_PRODUCT_FORECAST environment rsint receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES INTEGER ) partition by PRODNO order by PRODNO, ROW_ID sends ( R_OUTPUT varchar ) isolate partitions script S'endofr( # Simple R script to run a linear fit on daily sales

prod1<-read.csv(file=file("stdin"), header=FALSE,row.names=1)colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES")dim1<-dim(prod1)daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW), median)daily1[,2]<-daily1[,2]/sum(daily1[,2])basesales<-array(0,c(dim1[1],2))basesales[,1]<-prod1$IDbasesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2])colnames(basesales)<-c("ID","BASESALES")fit1=lm(BASESALES ~ ID,as.data.frame(basesales))forecast<-array(0,c(dim1[1]+28,4))colnames(forecast)<-c("ID","ACTUAL","PREDICTED","RESIDUALS")

select Trans_Year, Num_Trans,count(distinct Account_ID) Num_Accts,sum(count( distinct Account_ID)) over (partition by Trans_Year order by Num_Trans) Total_Accts,cast(sum(total_spend)/1000 as int) Total_Spend,cast(sum(total_spend)/1000 as int) / count(distinct Account_ID) Avg_Yearly_Spend,rank() over (partition by Trans_Year order by count(distinct Account_ID) desc) Rank_by_Num_Accts,rank() over (partition by Trans_Year order by sum(total_spend) desc) Rank_by_Total_Spendfrom( select Account_ID,

Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) Num_Trans, sum(Transaction_Amount) Total_Spend, avg(Transaction_Amount) Avg_Spend

from Transaction_fact where extract(year from Effective_Date)<2009 and Trans_Type='D' and Account_ID<>9025011 and actionid in (select actionid from DEMO_FS.V_FIN_actions where actionoriginid =1) group by Account_ID, Extract(Year from Effective_Date) ) Acc_Summarygroup by Trans_Year, Num_Transorder by Trans_Year desc, Num_Trans;

select dept, sum(sales) from sales_fact Where period between date ‘01-05-2006’ and date ‘31-05-2006’ group by depthaving sum(sales) > 50000;

select sum(sales) from sales_history where year = 2006 and month = 5 and region=1;

select total_sales from summary where year = 2006 and month = 5 and region=1;

Behind the numbers

Page 16: Big Data and the BI Wild West

It’s all about getting work done

Bottlenecks

Used to be simple fetch of valueTasks evolving:

Then was compute dynamic aggregate

Now complex algorithms!

Bottlenecks

Page 17: Big Data and the BI Wild West

Time to influence

Reaction – what? – potential value

Action – opportunity - interaction

BI is becoming democratized

Page 18: Big Data and the BI Wild West

BI Wild WestData

Page 19: Big Data and the BI Wild West

Business [Intelligence] Desiresin relation to Big Data

More timelyLower latency

More granularityMore users interactions

Richer data model

Self service

Page 20: Big Data and the BI Wild West

The Data Warehouse?

Page 21: Big Data and the BI Wild West
Page 22: Big Data and the BI Wild West

Realities

Page 23: Big Data and the BI Wild West
Page 24: Big Data and the BI Wild West

Reports against the DW are just plain dull, boring even!

Page 25: Big Data and the BI Wild West

And then came…

Page 26: Big Data and the BI Wild West

Hadoop ticks many but not all the boxes

a

a a a a a a aa a a a aa a a a aa a a aa a a a aa a a a a

Page 27: Big Data and the BI Wild West

Stomped on costsMade economics of scale practical

Page 28: Big Data and the BI Wild West

Talk to BI team about plugging into Hadoop – should be simple?

No need to pre-process before storage i.e. no need to align to storage

No need to triage before storage

New economics = New attitude just grab and retain all datathe data science team will dig into it later

Call IT: Why SQL so limited?

Page 29: Big Data and the BI Wild West

Early bridge Building

Early Hadoop integration tools

Page 30: Big Data and the BI Wild West

The new bounty hunters:DrillImpalaPivotalStinger

The No SQL Posse

WantedDead or Alive

SQL

Page 31: Big Data and the BI Wild West

…but Hadoop too slow for interactive BI

…loss of train-of-thought

still

Page 32: Big Data and the BI Wild West

For once technology is on our side

…oh and BTW RAM is cheap!

CPU

NetworkStorage

Page 33: Big Data and the BI Wild West

Lots of these

Not so many of these

Hadoop is…

Hadoop inherently disk oriented

Typically low ratio of CPU to Disk

Page 34: Big Data and the BI Wild West

‘Flash’ washing is not the solution

Page 35: Big Data and the BI Wild West

Analytics needs

low latency, no I/O wait

Page 36: Big Data and the BI Wild West

Analytical Platform Reference Architecture

AnalyticalPlatform

LayerNear-lineStorage

(optional)

Application &Client Layer

All BI Tools All OLAP Clients Excel

PersistenceLayer

HadoopClusters

Enterprise DataWarehouses

LegacySystems

KognitioStorage

Reporting

Cloud Storage

Page 37: Big Data and the BI Wild West

SQL MDX

Cognos

Page 38: Big Data and the BI Wild West

Reach out, actively select and pull back to consume

Page 39: Big Data and the BI Wild West

MPP everything – get more work done

“No SQL” graduates to “not-only-SQL”

SQL remains preferred data access language … for business community

SQL can encapsulate other processing - in-line Python, R, Java etc.

Page 40: Big Data and the BI Wild West

Discovery

Production

Page 41: Big Data and the BI Wild West

Big Data + Hadoop + in-memory for BI

a

a a a a a a a aa a a a a a a aa a a a a a a aa a a a a a a aa a a a a a aa a a a a a a a

Page 42: Big Data and the BI Wild West

Wild West 1865 to 1890

"The Significance of the Frontier in American History" (1893) a thesis by Fredrick Jackson Turner.

The West not as a particular geographic place, but a frontier process - as a series of Wests on a receding frontier line - the point where savagery meets civilization.

For Turner, American history was largely a tale of people leaving settled areas for the frontier, and their struggle to survive in new lands.

Page 43: Big Data and the BI Wild West

Driving the golden spike for Hadoop and BI

Page 44: Big Data and the BI Wild West

connect

kognitio.com

kognitio.tel

kognitio.com/blog

twitter.com/kognitio

linkedin.com/companies/kognitio

tinyurl.com/kognitio

youtube.com/kognitio

contact

Michael HiskeyVP, Marketing & Business [email protected]

Paul GroomChief Innovation [email protected]

Steve Friedberg - press contactMMI [email protected]

Kognitio is a Platinum Sponsor of the Hadoop Summit – see us at booth #31 – center!