Building the Modern Data Hub:
Beyond the Traditional Enterprise Data Warehouse
2www.datavail.com
The New World of Data
90% of the world’s information was created in the last two years. 80% of all enterprise data is unstructured, which means it’s not the neat and tidy data that for decades has been held in relational databases, which in turn plug nicely into “business intelligence” tools, enterprise data warehouses and other traditional data analytics systems.
Today’s data needs different tools. And it requires a different sort of data scientist.
3www.datavail.com
The EDW Analytic Conundrum
Modern DataHub
● Flexible - add new data easily
● Fresh - up to date data, near real time
● Any query no matter how complex
● Rapid deployment - days to weeks
Traditional EDW
● ETL based - Brittle, hard to add new data sources
● Stale - data can be out of date
● Limited - queries limited by what data available
● Slow - months to deploy or update
4www.datavail.com
The Traditional Data Warehouse
Extract Load &
Transform Processes Star Schema
Data Warehouse (EDW)
Data
Visualization
Significant
Investment in Planning,
Development, Monitoring
& Maintenance
5www.datavail.com
The Traditional Data Warehouse
Extract Load &
Transform Processes Star Schema
Data Warehouse (EDW)
Data
Visualization
Significant
Investment in Planning,
Development, Monitoring
& Maintenance
What’s the ROI?
How long is
this going to
take?
Are we sure
these are the
right reports?
How quickly can we make changes?
6www.datavail.com
Today’s Traditional EDW Problems Extraction, Transformation & Data Loading
•Highly transformative, structured ETLs are a costly investment on many levels from development, monitoring, tuning to operational maintenance & remediation
•Target schema structures require planning based on end goals but often those goals are not well defined•Often today the data we have is both structured and unstructured
• Traditional EDWs are a long term investment and the ROI is often hard to measure
• Perishable Insights are difficult to capture in traditional EDWs requiring fast turnaround (Superbowl,
Mother's Day, Thanksgiving, etc.)
7www.datavail.com
Today’s Traditional EDW Problems Visualization &
Reporting
• Traditional analytic reporting is predicated on structured schemas (star, snowflake, relational, etc..)
• if these are not planned well it can create performance problems
• hard structures can lead to missing metrics and reporting opportunities
• Any reworking of the final analytics requiring new metrics or data elements often require going back to the ETL to properly remediate the missing elements
• Producing insights and reporting for new trends can be time consuming when predicated on pre-planned data structures
• Missed opportunities on Perishing Insights (Superbowl,
Mother's Day, Thanksgiving, etc.)
8www.datavail.com
A Proposed Modern Approach
MongoDBJSON Data Warehouse
No Predetermined
Schema
Cubes
Unstructured
Data Star Schema
EDW
OLTPData Mart
Reporting
ETL / ELT
Staging
Immediate Access to Data
for Analytic Insights, Fast
ROI & PlanningOther Data
Sources
9www.datavail.com
NoSQL as Source for Visualization
JSON
Structured Data• RDBMS
• Cloud (AWS, Azure, etc)
-MongoDB
-Spark
BI Tools *
TableauPowerBI
Spotfire
Reporting
BI Connector
NoSQL
Hadoop
Hadoop HFS
JSON, CSV, XML Data Lake
No Predetermined Schema
10www.datavail.com
Hadoop Data Lakes & Data Hubs
• Hadoop is NOT a database it’s a filesystem• Impala, Cassandra or just JSON, XML, CSV files
• SlamData connects to Hadoop using Spark (both written in Scala)
• Much simpler to implement than 1st generation data hub/lakes.
Historical Data
Historical Data
Historical DataHadoop HFS
JSON, CSV, XML Data Lake
No Predetermined Schema
11www.datavail.com
What is SlamData?
• SlamData is not a Database
• SlamData is not a monitoring tool
• SlamData is not an ETL tool
• SlamData is not NoSQL
• SlamData is not a replacement SQL
Server, Oracle, DB2, MySQL,
Informix, etc...
• SlamData is not expensive
• SlamData is an analytics engine
• SlamData uses SQL2 for queries
• SlamData will natively connect to
MongoDB, Hadoop (eventually SQL,
Oracle, MySQL, Flatfiles, and more)
• SlamData solves the problem of
directly querying JSON, CSV, ect.
• SlamData spans a huge gap in
traditional data warehouse needs
NOT IS
Examples of SlamDatain Action
13www.datavail.com
Interactive reports
• Live interactive reports. Embed them as real-time visuals in your own Analytics Dashboard or share them as quick insights.
14www.datavail.com
Complex queries over nested data
15www.datavail.com
Chart out Machine Data
• Machine data visualizations are quick and easy. Embed them as real-time visuals in your own Analytics Dashboard or share them as quick insights.
The Value of SlamData
17www.datavail.com
When Could This Solution Make Sense?
1. You are using MongoDB and getting reporting out is a struggle
2. You’re planning a traditional data warehouse project, and the 6-12 month time frame is daunting and you need better report planning to determine ROI
3. You are using a product like Splunk to capture machine data and it’s become too expensive
4. You have Hadoop or are planning to implement Hadoop as a DataLake or DataHub
18www.datavail.com
Why this approach? Simple, save time and money
• Scoping EDW is more simple• Imagine the ability to eliminate the overhead of planning the data
structure before you know the end analytic needs
• ETL development is less complex• If the task is just defined as capturing and storing the data; it
becomes much more simple
• Implement solutions in days to weeks, not weeks to months
• SAVE $$$$, Less costly storage options, no ETL software, less maintenance, lower cost to implement.
19www.datavail.com
Case Studies
Global technology company
Needs:• Consolidated security and log analytics
• Needed ability to do complex ad-hoc queries without limitations.
• Share and publish results easily
Solution:• Using MongoDB to live capture logs
• SlamData for ad-hoc queries and visualizations
Large Government AgencyNeeds:• Consolidate data from 5+ data sources in
various formats
• Need to be able to answer ad-hoc questions in minutes to hours, not days to weeks
• Data is perishable, slow brittle ETL or data mapping was not a good option
Solution:• Consolidate data into MongoDB datahub
• Use SlamData for building rapid reports that can be shared and published
20www.datavail.com
So What’s Next Step?
• Lets us show you - give us your toughest data analytics problem
• Deliver a POC in two weeks or less• SlamData is the missing piece of data lake/data hub
• Fast time to value, less cost• Leverage current SQL skills, lower the learning curve
• Build powerful reports, dashboards in minutes, on live data