Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
IS HADOOP THE DEMISE OF DATA WAREHOUSING?
THOUGHTS ON THE IMPACT OF HADOOP ON BI SYSTEMS AND DATA WAREHOUSING
Part of our BI Demystified
Series
John Peterson
CEO & Co-Founder
Senturus
Today’s Presenter
2
With thanks to:
Guy Wilnai, Sujee Maniyam and Knowledge @ Senturus
• INTRODUCTION
• THE DATA CHALLENGE
• WHAT IS HADOOP?
• ADVANTAGES & CHALLENGES
• IMPLICATIONS, PREDICTIONS & MISC. MUSINGS
• CONCLUSIONS
• Q&A
AGENDA
3 Copyright 2014 Senturus, Inc. All Rights Reserved
PRESENTATION SLIDE DECK ON WWW.SENTURUS.COM
Copyright 2014 Senturus, Inc. All Rights Reserved 4
WHO WE ARE
SENTURUS INTRODUCTION
Our Team:
Business depth combined with technical expertise. Former CFOs, CIOs, Controllers, Directors, BI Managers
SENTURUS: BUSINESS ANALYTICS CONSULTANTS
6 Copyright 2014 Senturus, Inc. All Rights Reserved
Business Intelligence Enterprise Planning Predictive Analytics
Creating Clarity from Chaos
• Former Head of BI/ Lead Architect – VISA
• Former Chief BI Architect – Jamba Juice
• Former Head of BI – Dole
• Former Chief BI Architect – Cisco
• Former Chief BI Architect – Central Garden & Pet
• Former Head of BI – Experian
• Former Head of BI – Robert Half International
• Former Head of Training (IBM Cognos, Southern California)
• Former Controller – The GAP
• Two former CFO’s
• Former Partner - PWC ($50million+ projects)
• Several former Vice Presidents of Marketing, Sales & Manufacturing/Supply Chain
• Several former COO’s
• Several former CIO’s
• Average experience = over 20 years
A FEW OF OUR TEAM MEMBERS (FORMER ROLES) Deep & Pragmatic Experience
Copyright 2014 Senturus, Inc. All Rights Reserved. 7
750+ CLIENTS, 1600+ PROJECTS, 13+ YEARS
Copyright 2014 Senturus, Inc. All Rights Reserved. 8
Outpacing our ability to harness it
THE DATA CHALLENGE
THE CHALLENGES (AND OPPORTUNITIES)
10 Copyright 2014 Senturus, Inc. All Rights Reserved.
• Data volumes & velocity increasing exponentially
• Data types proliferating
• Rapid emergence of less structured (or unstructured) data sources
• Value of Data increasing
• Traditional ETL is time-consuming and costly
• Traditional storage costs skyrocketing (not $/TB)
• Business users increasingly frustrated at not being able to get access to information
THE NET RESULT
11 Copyright 2014 Senturus, Inc. All Rights Reserved.
Something is bound to
happen
A WARNING ABOUT TODAY’S FOCUS
12 Copyright 2014 Senturus, Inc. All Rights Reserved.
IS ABOUT:
Hadoop as a potential platform or tool for Business Analytics & DW
IS NOT ABOUT:
Yet another “How Big Data will change the world” paradigm-shift prediction
ROLE OF HADOOP IN YOUR ENVIRONMENT
QUICK POLL
Under the Covers
WHAT IS HADOOP?
WHAT IS HADOOP?
15 Copyright 2014 Senturus, Inc. All Rights Reserved.
Hadoop is a stuffed elephant
WHAT IS HADOOP REALLY?
16 Copyright 2014 Senturus, Inc. All Rights Reserved.
Database Tables
• Hadoop is an open source distributed storage and processing framework
• Hadoop vs. RDBMS
System Tables
SQL Query Engine
Typical RDBMS
HDFS Files*
Hcatalog & YARN
Multiple Engines
Hadoop Stack
Storage
Metadata
Queries
*Raw data
to highly
structured
All layers combined in a
proprietary bundle
All layers separate and
independent allowing flexible access
REFERENCE ARCHITECTURE
17 Copyright 2014 Senturus, Inc. All Rights Reserved. Source: Hortonworks
REFERENCE ARCHITECTURE (DETAILED)
18 Copyright 2014 Senturus, Inc. All Rights Reserved. Source: Hortonworks
HADOOP STACK DISTRIBUTIONS
19 Copyright 2014 Senturus, Inc. All Rights Reserved.
Distribution Open Source Premium
Apache Y N
Cloudera Y Y
Horton Works Y N
MapR Y (?) Y
Intel N Y
EMC Greenplum HD N Y
ADVANTAGES OF HADOOP (FOR BI)
20 Copyright 2014 Senturus, Inc. All Rights Reserved.
• Dramatically lower cost
– 50x to 100x (or more)
• Can store virtually any data type
• Can support multiple analytic engines
• Massively scalable
– Both Size and Performance
– 100’s of nodes, TB of RAM, PB of storage
• Open-source leads to rapid innovation
HADOOP OFFERS COST EFFECTIVE STORAGE
“A recent survey of large financial services firms,
telecommunications carriers and retailers indicated that
storing data in an RDBMS typically runs between $30,000
and $100,000 (USD) per TB per year in total costs”
--- Cloudera white paper
- Hadoop can bring down the cost to ~$1,000 / TB
BIG DATA COST COMPARISON
Source : Neustar
BIG DATA COST COMPARISON
Source: HortonWorks
COST CASE STUDY (TELECOM)
• The carrier’s previous data processing environment was costing $59 million (USD) each year to manage 1PB of data, broken down as follows:
– $2 million (USD) per year = storage for 1PB raw archive data on network-attached storage (NAS) at $2,000 per TB per year
– $55 million (USD) per year = management and backup of 1PB processed data on EDW at $55,000 per TB per year
– $2 million (USD) per year = administration costs calculated at $1,000 per TB per year
• Calculating costs for moving data processing onto Cloudera, the carrier
reduced infrastructure costs to $5.1 million (USD) total
– $5 million (USD) per year = hardware, software and infrastructure for 1PB at $5,000 per TB per year
– $100,000 (USD) per year = administration costs calculated at $100 per TB per year
HADOOP CAN STORE ANY DATA TYPE
• Key-value pairs
• Text and binary data
• Structured
– Database records
• Semi-structured
– Sensor & Machine data
– Log files
• Un-structured
– Emails, tweets
“Set structure at query time”
Can retain
atomic level
data
ANALYTICS IN HADOOP
• ‘Batch’ or ‘offline’ analytics
– MapReduce based tools (java mapreduce, streaming, pig, hive)
– Have been there from the start, Well understood
• Fast Ad-Hoc querying
– New wave of processing, answer to MPP databases (Teradata .etc)
– Impala (Cloudera), stinger / Tez (Hortonworks), Shark on Spark (Apache)
• Streaming / Near-RealTime workloads
– Storm, Spark
– Propelled by YARN processing framework in Hadoop version 2.x
ANALYTICS IN HADOOP (CONT.)
• BI Tools integration – Rich BI tool integration
– Various levels of integration (basic, native, high-speed)
– Lots of vendors : Datameer, Pentaho, Tableau, QlikView, IBM Cognos…
• NOSQL store – Find data very quickly (milliseconds, just like a traditional database)
– Hbase
• Statistical Tools – R
• And, of course, the old favorite – SQL
– Example: InfiniDB (Calpont)
CHALLENGES OF HADOOP
28 Copyright 2014 Senturus, Inc. All Rights Reserved.
• Everything is very NEW
• Playing field is changing DAILY
– The Wild West
• Tools still in v1.0 mode (at best)
• Does not eliminate the need for dimensional modeling
• Security TBD
• No “standard” (winners) declared yet
• Lots of rough edges still
• Simple things, like surrogate keys…
A DIZZYING FIELD OF PLAYERS • Alpine Data Labs, San Mateo, CA.
• Cloudera, Palo Alto, CA.
• Concurrent, San Francisco, CA.
• Continuum Analytics, Austin, TX.
• Continuuity, Palo Alto, CA.
• Couchbase, Mountain View, CA.
• Datameer, San Mateo, CA.
• DataSift, San Francisco, CA.
• DataStax, San Francisco, CA.
• DataXu, Boston, MA.
• Enigma, New York, NY.
• Factual, Los Angeles, CA.
• GoodData, San Francisco, CA.
• Gravity, New York, NY.
• Guavus, San Mateo, CA.
• Hadapt, Cambridge, MA
• Hopper, Cambridge, MA.
• Hortonworks, Palo Alto, CA.
• KarmaSphere, Cupertino, CA
• Lattice Engines, San Mateo, CA.
• MapR Technologies, San Jose, CA.
• MemSQL, New York, NY.
• Mortar Data, New York, NY.
• Mu Sigma, Northbrook, IL + India.
• Neo Technology, San Mateo, CA
• Opera Solutions, San Diego, CA + India.
• ParAccel, Campbell, CA.
• Pivotal Software, Palo Alto, CA
• Platfora:, San Mateo, CA.
• RainStor, San Francisco, CA.
• Rocket Fuel, Redwood City, CA.
• SiSense, Redwood Shores, CA and Israel.
• Skytree, Atlanta, GA.
• Splice Machine, San Francisco, CA.
• Splunk, San Francisco, CA
• Statwing, San Francisco, CA.
• SumAll, New York, NY.
• Talend, Los Altos, CA.
• WibiData, San Francisco, CA.
• Zettaset, Mountain View, CA
• Zoomdata, Reston, VA.
• 10gen, New York, NY
• 1010data, New York, NY.
29 Copyright 2014 Senturus, Inc. All Rights Reserved. Partial snapshop as of May 2014
IMPLICATIONS, PREDICTIONS & MISC. MUSINGS
TSUNAMI WARNING
IMPLICATIONS, PREDICTIONS & MUSINGS
31 Copyright 2014 Senturus, Inc. All Rights Reserved.
• Hadoop as a Data Staging environment
• Hadoop as an Archive
• Hadoop as the Data Warehouse
– “Enterprise Data Hub”
• Future role of RDBMS’s ?? – For OLTP
– For Data Warehouse
• How much Transformation and where?
TYPICAL “BEST PRACTICES” BI ARCHITECTURE INTEGRATED BUSINESS PROCESS DIMENSIONAL MODELS WITH METADATA LAYER(S)
32 Copyright 2014 Senturus, Inc. All Rights Reserved.
ERP Data
CRM Data
Data
Inte
grati
on
Conforming Business Process
Dimensional Models
Standard Reports
Web P
ort
al
Other Sources
Information Security
Data Warehouse
Data
Abst
ract
ion M
odel
Ad h
oc
Query
ing Planning Data
Slic
ing
&
Dic
ing
Dash
board
Auth
ori
ng
Report
Auth
ori
ng
Dashboards/ Scorecards
Sourc
e S
yste
ms
of
Reco
rd
Thre
shold
Ale
rtin
g
Self-service Reporting & Analysis
Single Version of the Truth
Threshold-based Alerts
POTENTIAL BI ARCHITECTURE USING HADOOP INTEGRATED BUSINESS PROCESS DIMENSIONAL MODELS WITH METADATA LAYER(S)
33 Copyright 2014 Senturus, Inc. All Rights Reserved.
ERP Data
CRM Data
Data
Inte
grati
on
Conforming Business Process
Dimensional Models
Standard Reports
Web P
ort
al
Other Sources Information Security
Data Warehouse
Data
Abst
ract
ion M
odel
Ad h
oc
Query
ing Planning Data
Slic
ing
&
Dic
ing
Dash
board
Auth
ori
ng
Report
Auth
ori
ng
Dashboards/ Scorecards
Sourc
e S
yste
ms
of
Reco
rd
Thre
shold
Ale
rtin
g
Self-service Reporting & Analysis
Single Version of the Truth
Threshold-based Alerts
Hadoop Data Staging
IMPLICATIONS, PREDICTIONS & MUSINGS (CONT.)
34 Copyright 2014 Senturus, Inc. All Rights Reserved.
• What have I got to learn?
– MapReduce = No
– Hand-coding = No
– Scoop = Maybe
– SQL = YES
• Role of Existing Tools going forward
– ETL
– BI Front-ends
• Role of DW Appliances?
– HANA
– IBM PureData System (formerly Netezza), etc.
IMPLICATIONS, PREDICTIONS & MUSINGS (CONT.)
35 Copyright 2014 Senturus, Inc. All Rights Reserved.
• What is the impact on end-users seeking information?
• We still need:
– Data delivered in business user-friendly state
– Rich, relevant and conforming dimensions
– Ability to account for dimension changes over time
– Good performance (transformation and aggregation)
– Ability to integrate with existing systems
JP’S CONCLUSION #1
36 Copyright 2014 Senturus, Inc. All Rights Reserved.
Wow, this stuff is a BIG game changer
JP’S CONCLUSION #2
37 Copyright 2014 Senturus, Inc. All Rights Reserved.
It’s too early to call on the specifics
JP’S CONCLUSION #3
38 Copyright 2014 Senturus, Inc. All Rights Reserved.
DW Architectures & Technologies
are in a huge state of flux
But…
DW Principles still apply
Resources, Upcoming Events, Q&A
NEED MORE INFO?
• Cloudera & Ralph Kimball – Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop
Professionals – http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/
best-practices-for-the-hadoop-data-warehouse-video.html
– Building a Hadoop Data Warehouse: Hadoop 101 for EDW Professionals – http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/
building-a-hadoop-data-warehouse-video.html
• MapR & Jack Norris – How (and Why) Hadoop is Changing the Data Warehousing Paradigm
– http://tdwi.org/articles/2013/08/13/hadoop-changing-dw-paradigm.aspx
• HortonWorks – http://hortonworks.com/hadoop/
• Senturus.com – http://senturus.com/resources/
– [email protected] or [email protected]
ADDITIONAL RESOURCES
40 Copyright 2014 Senturus, Inc. All Rights Reserved
Contact us for
help on a POC
www.senturus.com
UPCOMING EVENTS
41 Copyright 2014 Senturus, Inc. All Rights Reserved
42 Copyright 2014 Senturus, Inc. All Rights Reserved.
More info….
Q & A
Helping Companies Learn From the Past, Manage the Present and Shape
the Future
www.senturus.com 888-601-6010 [email protected]
Thank You
Copyright 2014 by Senturus, Inc. This entire presentation is copyrighted and may not be reused or
distributed without the written consent of Senturus, Inc.