Upload
jaroslav-gergic
View
605
Download
0
Embed Size (px)
DESCRIPTION
The recent boom in big data processing and democratization of the big data space has been enabled by the fact that most of the concepts originated in the research labs of companies such as Google, Amazon, Yahoo and Facebook are now available as open source. Technologies such as Hadoop, Cassandra let businesses around the world to become more data driven and tap into their massive data feeds to mine valuable insights. At the same time, we are still at a certain stage of the maturity curve of these new big data technologies and of the entire big data technology stack. Many of the technologies originated from a particular use case and attempts to apply them in a more generic fashion are hitting the limits of their technological foundations. In some areas, there are several competing technologies for the same set of use cases, which increases risks and costs of big data implementations. We will show how GoodData solves the entire big data pipeline today, starting from raw data feeds all the way up to actionable business insights. All this provided as a hosted multi-tenant environment letting its customers to solve their particular analytical use case or many analytical use cases for thousands of their customers all using the same platform and tools while abstracting them away from the technological details of the big data stack.
Citation preview
2014 GoodData Corporation. All Rights Reserved.
GoodData – the Case Study #2:Big Data Pipeline for Analytics at Scale
DB Technologies for Big Data @ FIT CVUTNovember 19 2014
GoodData Corporation. All Rights Reserved.
GoodData Corporation
GoodData Corporation. All Rights Reserved.
End to End, Analytics Platform as a Service
Data VisualizationTableau, Qlikview, Spotfire, etc.
Analytics Engine Cognos, Oracle, Business Objects, etc.
Data MartsMySQL, PostgreSQL, etc.
Data Warehouse Oracle, Teradata, Netezza, Microsoft, etc.
ETLInformatica, DataStage, Boomi, Snaplogic, etc.
InfrastructureServers, Storage, Networking, etc.
Traditional BI
Data Collaboration
Data Visualization
Analytics Engine
Data Marts
Data Warehouse
ELT / ETL
Infrastructure
GoodData Corporation. All Rights Reserved.
For Your CustomersPowered By GoodData Partner Program
for disruptive ISVs including Zendesk, Switchfly, and Phizzle
For Your BusinessDrive your business with your data.
Experience and accelerators for Social, Sales, Marketing, Yammer
One Platform. Two Markets.
GoodData Corporation. All Rights Reserved.
Our Focus
Our Customers
GoodData Corporation. All Rights Reserved.
What The End Users See...
GoodData Corporation. All Rights Reserved.
What The End Users See...
GoodData Corporation. All Rights Reserved.
What Is In The Box...
GoodData Corporation. All Rights Reserved.
End to End, Analytics Platform as a Service
Data VisualizationTableau, Qlikview, Spotfire, etc.
Analytics Engine Cognos, Oracle, Business Objects, etc.
Data MartsMySQL, PostgreSQL, etc.
Data Warehouse Oracle, Teradata, Netezza, Microsoft, etc.
ETLInformatica, DataStage, Boomi, Snaplogic, etc.
InfrastructureServers, Storage, Networking, etc.
Traditional BI
Data Collaboration
Data Visualization
Analytics Engine
Data Marts
Data Warehouse
ELT / ETL
Infrastructure
GoodData Platform Zoom-In
End to End, AnalyticsPlatform as a Service
Data Collaboration
Data Visualization
Analytics Engine
Data Marts
Data Warehouse
ELT / ETL
Infrastructure
GoodData Analytics Platform - The Data Pipeline
GoodData Corporation. All Rights Reserved.
Let’s Start With The Outcome - The Insights
GoodData Corporation. All Rights Reserved.
Let’s Start With The Outcome - The Insights
• User Experience○ Visual Appeal○ Ease of Use○ Performance
• Analytical Power• Many Data Sources
○ Need to cross analyze all of them○ Need to add/remove sources as needed
• Cost Efficiency○ Computational density allowed by multi-tenancy
GoodData Corporation. All Rights Reserved.
Let’s Start With The Outcome - The Insights
● Analytical Engine / MAQL ● Exploration, Visualization
and Distribution Layer● Pluggable Database
Backends● 10s of GB up to TBs
GoodData Corporation. All Rights Reserved.
Behind The Scenes - The Big Data Pipeline
• Large Data Throughput○ Close to Real-time Updates
• Many Data Sources○ Need to cross analyze all of them○ Need to add/remove sources as needed
• Agility○ Capture all data without knowing the analytical use case in advance
• Cost Efficiency○ Computational density allowed by multi-tenancy
GoodData Corporation. All Rights Reserved.
Behind The Scenes - The Big Data Pipeline
• Big Data Store○ 100s of TBs per customer○ Persist All Incoming Data○ CSV, XML, JSON, ...
• Immutable○ Append Only○ Keep Ingestion History
• Technologies○ Amazon S3○ Cloud Files
GoodData Corporation. All Rights Reserved.
Behind The Scenes - The Big Data Pipeline
• Agile Data Warehouse○ 10s of TBs per customer○ Relational Model○ Semi-Cleansed○ Complete History Captured
• Technologies○ HP Vertica○ GoodData BI Integration Services
GoodData Corporation. All Rights Reserved.
Behind The Scenes - The Big Data Pipeline
• Combine Input Stage Data Sets○ Mapping, Cleansing
• Perform Data Transformations in Data Warehouse○ Benchmarking, Snapshotting, Sampling
• Generate Data Mart Input Data○ Data Warehouse : Data Mart relation is typically 1 : N○ 10s of thousands Data Marts in PbG (OEM) use case!
GoodData Corporation. All Rights Reserved.
Behind The Scenes - The Big Data Pipeline
• GoodData BI Integration Services○ CloudConnect Runtime○ Ruby Runtime○ Data Integration Console
Over 2M ETL jobs per week!
GoodData Corporation. All Rights Reserved.
The Wrap-Up - The Big Data PipelineProgression Through:• Big Data Store• Data Warehouse• Data Marts
As a means to satisfy the end user:• User Experience• Analytical Power• Many Data Sources• Cost Efficiency
GoodData Corporation. All Rights Reserved.
Questions?
GoodData Corporation. All Rights Reserved.
Thank you!