Upload
elita
View
38
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Big Data and the BI Wild West. Don’t Bring an Elephant to a Gun Fight!. Paul Groom. Tools Processes Objectives. Why Business Intelligence?. Community. Acquire. View. Learn. Action. What is Business Intelligence?. Numbers Tables Charts I ndicators. Time - History - Lag. Access - PowerPoint PPT Presentation
Citation preview
Big Data and the BI Wild WestDon’t Bring an Elephant
to a Gun Fight!
Paul Groom
ToolsProcessesObjectives
Why Business Intelligence?
ViewLearn
Action
CommunityAcquire
What is Business Intelligence?
NumbersTablesChartsIndicators
Time - History - Lag
Access - to view (portal) - to data - to depth - Control/Secure
Consumption - digestion
…with ease and simplicity
Business [Intelligence] Desires
More timelyLower latency
More granularityMore users interactions
Richer data model
Self service
View and generate
Got mobile?
200 millionEmployees bring their own
device to work
Nearly halfOf the workforce will be made
up of millennials by 2020
50%Companies BYOD orgs have had
a security breach
1/3Have broken or would break
corporate policy on BYOD
Data flow
Dynamic accessDrill unlimited
Disruption: Data Discovery tools
BI tools have plateaued…again
Decision Support (Reporting) in late 90’s
Business Intelligence of 00’s
…led to data mining
…leading to analytics and data science
Machine learning algorithms Dynamic
Simulation
Statistical Analysis
Clustering
Behaviour modelling
The drive for deeper understanding
Reporting & BPMFraud detection
Dynamic Interaction
Technology/Automation
Anal
ytica
l Com
plex
ity
Campaign Management
create external script LM_PRODUCT_FORECAST environment rsint receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES INTEGER ) partition by PRODNO order by PRODNO, ROW_ID sends ( R_OUTPUT varchar ) isolate partitions script S'endofr( # Simple R script to run a linear fit on daily sales
prod1<-read.csv(file=file("stdin"), header=FALSE,row.names=1)colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES")dim1<-dim(prod1)daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW), median)daily1[,2]<-daily1[,2]/sum(daily1[,2])basesales<-array(0,c(dim1[1],2))basesales[,1]<-prod1$IDbasesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2])colnames(basesales)<-c("ID","BASESALES")fit1=lm(BASESALES ~ ID,as.data.frame(basesales))forecast<-array(0,c(dim1[1]+28,4))colnames(forecast)<-c("ID","ACTUAL","PREDICTED","RESIDUALS")
select Trans_Year, Num_Trans,count(distinct Account_ID) Num_Accts,sum(count( distinct Account_ID)) over (partition by Trans_Year order by Num_Trans) Total_Accts,cast(sum(total_spend)/1000 as int) Total_Spend,cast(sum(total_spend)/1000 as int) / count(distinct Account_ID) Avg_Yearly_Spend,rank() over (partition by Trans_Year order by count(distinct Account_ID) desc) Rank_by_Num_Accts,rank() over (partition by Trans_Year order by sum(total_spend) desc) Rank_by_Total_Spendfrom( select Account_ID,
Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) Num_Trans, sum(Transaction_Amount) Total_Spend, avg(Transaction_Amount) Avg_Spend
from Transaction_fact where extract(year from Effective_Date)<2009 and Trans_Type='D' and Account_ID<>9025011 and actionid in (select actionid from DEMO_FS.V_FIN_actions where actionoriginid =1) group by Account_ID, Extract(Year from Effective_Date) ) Acc_Summarygroup by Trans_Year, Num_Transorder by Trans_Year desc, Num_Trans;
select dept, sum(sales) from sales_fact Where period between date ‘01-05-2006’ and date ‘31-05-2006’ group by depthaving sum(sales) > 50000;
select sum(sales) from sales_history where year = 2006 and month = 5 and region=1;
select total_sales from summary where year = 2006 and month = 5 and region=1;
Behind the numbers
It’s all about getting work done
Bottlenecks
Used to be simple fetch of valueTasks evolving:
Then was compute dynamic aggregate
Now complex algorithms!
Bottlenecks
Time to influence
Reaction – what? – potential value
Action – opportunity - interaction
BI is becoming democratized
BI Wild WestData
Business [Intelligence] Desiresin relation to Big Data
More timelyLower latency
More granularityMore users interactions
Richer data model
Self service
The Data Warehouse?
Realities
Reports against the DW are just plain dull, boring even!
And then came…
Hadoop ticks many but not all the boxes
a
a a a a a a aa a a a aa a a a aa a a aa a a a aa a a a a
Stomped on costsMade economics of scale practical
Talk to BI team about plugging into Hadoop – should be simple?
No need to pre-process before storage i.e. no need to align to storage
No need to triage before storage
New economics = New attitude just grab and retain all datathe data science team will dig into it later
Call IT: Why SQL so limited?
Early bridge Building
Early Hadoop integration tools
The new bounty hunters:DrillImpalaPivotalStinger
The No SQL Posse
WantedDead or Alive
SQL
…but Hadoop too slow for interactive BI
…loss of train-of-thought
still
For once technology is on our side
…oh and BTW RAM is cheap!
CPU
NetworkStorage
Lots of these
Not so many of these
Hadoop is…
Hadoop inherently disk oriented
Typically low ratio of CPU to Disk
‘Flash’ washing is not the solution
Analytics needs
low latency, no I/O wait
Analytical Platform Reference Architecture
AnalyticalPlatform
LayerNear-lineStorage
(optional)
Application &Client Layer
All BI Tools All OLAP Clients Excel
PersistenceLayer
HadoopClusters
Enterprise DataWarehouses
LegacySystems
KognitioStorage
Reporting
Cloud Storage
SQL MDX
Cognos
Reach out, actively select and pull back to consume
MPP everything – get more work done
“No SQL” graduates to “not-only-SQL”
SQL remains preferred data access language … for business community
SQL can encapsulate other processing - in-line Python, R, Java etc.
Discovery
Production
Big Data + Hadoop + in-memory for BI
a
a a a a a a a aa a a a a a a aa a a a a a a aa a a a a a a aa a a a a a aa a a a a a a a
Wild West 1865 to 1890
"The Significance of the Frontier in American History" (1893) a thesis by Fredrick Jackson Turner.
The West not as a particular geographic place, but a frontier process - as a series of Wests on a receding frontier line - the point where savagery meets civilization.
For Turner, American history was largely a tale of people leaving settled areas for the frontier, and their struggle to survive in new lands.
Driving the golden spike for Hadoop and BI
connect
kognitio.com
kognitio.tel
kognitio.com/blog
twitter.com/kognitio
linkedin.com/companies/kognitio
tinyurl.com/kognitio
youtube.com/kognitio
contact
Michael HiskeyVP, Marketing & Business [email protected]
Paul GroomChief Innovation [email protected]
Steve Friedberg - press contactMMI [email protected]
Kognitio is a Platinum Sponsor of the Hadoop Summit – see us at booth #31 – center!