20
DATA LAKE - RE BIRTH OF ENTERPRISE DATA THINKING MAKING BIG DATA MEANINGFUL FOR ALL ENTERPRISE WWW.AGILEISS.COM 1 Making BiG Data meaningful for All By Raj Babu [email protected] HADOOP IS NOT FOR SELECTED FEW, BUT FOR ALL ENTERPRISE

Data lake making big data meaningful for all enterprise

Embed Size (px)

Citation preview

Page 1: Data lake   making big data meaningful for all enterprise

DATA LAKE - RE BIRTH OF ENTERPRISE DATA THINKING

MAKING BIG DATA MEANINGFUL FOR ALL ENTERPRISE

WWW.AGILEISS.COM

1

Making BiG Data meaningful for All

By Raj Babu [email protected]

HADOOP IS NOT FOR SELECTED FEW, BUT FOR ALL ENTERPRISE

Page 2: Data lake   making big data meaningful for all enterprise

About Agile iSS

Agile iSS , We are a BI & Analytics services company servicing our clients on Big Data, Data Lake, BI, BI on Cloud, BI/Analytics As Service.

Our Goal is to make Big Data meaningful for all Enterprises.

We are focused on helping our clients upgrade their current EXPENSIVE

and old tech based ineffective BI solution to a POWERFUL, EFFECTIVE BI & ANALYTICS solution that is effective and has

lower TCO.

WWW.AGILEISS.COM

2

Page 3: Data lake   making big data meaningful for all enterprise

WWW.AGILEISS.COM

DATA LAKE - RE BIRTH OF ENTERPRISE DATA THINKING ENTERPRISE DATA LAKE (EDL)

I have just two goal for my 25 minute presentation today…… To convince you all on following……

Big Data is not only a solution for the select few Enterprises…..who have 100’s of TB’s or ZB’s of data. Big Data through Enterprise Data Lake (EDL) is now Mainstream and should be part

of standard IT stack solution for all mid and large Enterprises.

EDL makes Enterprise BI systems more Agile, Nimble, Economical & Valuable.

Page 4: Data lake   making big data meaningful for all enterprise

WWW.AGILEISS.COM

DATA LAKE - RE BIRTH OF ENTERPRISE DATA THINKING MAKING BIG DATA MEANINGFUL FOR ALL ENTERPRISE

Why Enterprise Data Lake Solution (based on Big Data, No-SQL

technology) + Traditional BI as Enterprise BI & Analytics Solution is a significantly more effective, than its predecessor

EDW that has tried and failed in the last 2 decades ..?

Page 5: Data lake   making big data meaningful for all enterprise

Why EDW Failed ?

WWW.AGILEISS.COM

If you Google “Challenges with EDW”, you will get something like this……

Takes too long to get anything done

BI is too Expensive to Build and Manage and never on the schedule

that Business wants

Our BI team and system can’t

implement changes fast..

Over complicated Architecture…

Our BI cant do anything ad-hoc, they need requirements, design, architecture, ETL for everything & it never gets

done after all……

Our BI is Always incomplete, it never

has all the data we need Our BI is not suitable for

ad-hoc Analytics

Page 6: Data lake   making big data meaningful for all enterprise

WWW.AGILEISS.COM

6

It is extremely expensive and practically impossible to gather requirements, design, build ETL and store all the data

needed in EDW & DM. EDW or Data Marts are optimized for data

analysis by processing and storing only subsets of datasets.

An EDL is designed to “RETAIN ALL DATASETS“. This is the single most powerful feature of EDL as we will never know the future complete scope of datasets for analytics.

Why EDW Failed? & EDL is taking over

Page 7: Data lake   making big data meaningful for all enterprise

Why EDL clearly wins over EDW ?

WWW.AGILEISS.COM

Service ad-hoc request with no latency & no

development

Inexpensive and low maintenance cost to manage as there is no or very minimal

Build effort

Minimal development

team involvement, unless data is needed in Data

Mart

All Data is in Data Lake…

Can do ad-hoc, no need for any SDLC to access any new data.

No more waiting….Perfect

place to offload all new & ad-hoc

request.

In EDL, ETL or Database is not needed for

Reporting or Analytics

Offers a perfect solution..NO heavy

duty ETL

Page 8: Data lake   making big data meaningful for all enterprise

What is a Data Lake ?

WWW.AGILEISS.COM

8

From Wiktionary data lake

A massive, easily accessible data repository built on (relatively)

inexpensive computer hardware for storing “Big Data".

Techtarget A data lake is a large object-based storage repository that holds data in its native format until it is needed. Etymology

Pentaho CTO James Dixon is credited with coining the term "data lake". As he described it in his blog entry.

If you Google Data Lake you will get following results…….

Page 9: Data lake   making big data meaningful for all enterprise

What is Data Lake Cont…….

WWW.AGILEISS.COM

9

From Wiktionary……

Pentaho CTO James Dixon described it in his blog entry,

"If you think of a datamart as a store of bottled water – cleansed and packaged and

structured for easy consumption.

-The data lake is a large body of water in a more natural state. The contents of

the data lake stream in from a source to fill the lake, and

various users of the lake can come to examine, dive in, or take samples.

Page 10: Data lake   making big data meaningful for all enterprise

What Data Lake has to Offer

WWW.AGILEISS.COM

10

** EDL image by PWC

ETL

In here all kinds of Analytics happen. 85% Analytics, 15% Proto type Reporting

EDL, ODS, Warm Archive

Data Marts

Page 11: Data lake   making big data meaningful for all enterprise

Is EDL a Product or tool ?

WWW.AGILEISS.COM

11

EDL is really a Reference Architecture for the Enterprise BI solution using Hadoop based Big-Data as the foundation. There are now many leading DB vendors seeing EDL as a clear winner and are

incorporating it in their offering and calling it Data Hub

Page 12: Data lake   making big data meaningful for all enterprise

Traditional ETL

Analytics & Data Scientist

Meta Data

Enterprise Data

WWW.AGILEISS.COM

12

Big Data ETL

Direct Analytics & Reporting

Data Mart’s

Enterprise Data Lake (EDL) On-Premise Reference Architecture For BI & Analytics

Data Lake on Hadoop (Horton Works, Cloudera, MAPR )

Page 13: Data lake   making big data meaningful for all enterprise

Traditional ETL

WWW.AGILEISS.COM

13

Enterprise Data

Meta Data

Analytics & Data Scientist

Data Lake on Hadoop (Horton Works, Cloudera, MAPR )

Data Mart’s Data Mart’s Data Mart’s

Enterprise Data Lake (EDL) On-Premise Reference Architecture For BI & Analytics – Stack View

Page 14: Data lake   making big data meaningful for all enterprise

WWW.AGILEISS.COM

Reference Architecture for EDL on Cloud or Hybrid

Page 15: Data lake   making big data meaningful for all enterprise

Your EDL can be Following

WWW.AGILEISS.COM

• A central Enterprise Data Repository ODS, Data Hub

• Staging source for all systems

• A warm and Active Data Archive /Vault

• Hadoop Data Warehouse

Page 16: Data lake   making big data meaningful for all enterprise

WWW.AGILEISS.COM

• Anyone one and everyone who is impatient about getting their hands on data

• The ones that cant give requirement but wanted reports yesterday

• The ones that have no patience for ETL or Report development

• Analytics, Data Science team

• ETL team for Staging

• By not having to buy DB capacity to store all data in BI database • When volume of data too high to process through a regular DB

Your EDL can service following……

Page 17: Data lake   making big data meaningful for all enterprise

Who are all supporting Data Lake or Data Hub ?

WWW.AGILEISS.COM

17

Page 18: Data lake   making big data meaningful for all enterprise

Explore EDL - There is nothing to loose

WWW.AGILEISS.COM

18

With EDL there is no need for expensive ETL, Databases

and long delays associated with your

BI & Analytics Platform.

Page 19: Data lake   making big data meaningful for all enterprise
Page 20: Data lake   making big data meaningful for all enterprise

Questions ?

Email - [email protected]

Thanks

Raj Babu

WWW.AGILEISS.COM

20

www.AgileiSS.com