34
Analysing data analytics use cases to understand purpose of big data ecosystem components by

Analysing data analytics use cases to understand big data platform

Embed Size (px)

Citation preview

Analysing data analytics use casesto understand purpose of big data ecosystem components

by

Purpose of any data platform (big / not big)

is to enable analytics on data

dataeaze

Why?

Different analytics use cases expect different set of features from data platform

Components part of big data ecosystem

are madeto serve needed features of analytics use cases

dataeaze

Why?

So to understand data platformto understand data platform components

It is necessary to know purposeIt is necessary to know needs of analytics use cases

which are served by data platform

dataeaze

Why?

Here

We take look at all categories of analytics use cases on data platform

dataeaze

What?

Analytics data processing use case categories

dataeaze

What?

We analyse each use case as

Nature of data processing in order to serve this use case

Expectations from data platform to enable required data processing

dataeaze

What?

Static Reports

are summary reports prepared for the purpose of giving status to decision makers

ExampleReport for top management at end of day specifying

daily sales, transactions, revenue, total traffic

dataeaze

Nature of data processing

Static reports are

Scheduled to execute at fixed time interval,

Generate analysis reports for given time period,

Can execute on raw data directly or on intermediate store

dataeaze

Static Reports

Expectations from data platform

Scheduled data processingStatic reports are executed at predefined schedule repeatedly

Timely arrival of dataGenerated reports should represent complete picture of given

timeframe, and should be generated before deadline.

Process raw data to get resultCapability to generate report from raw data if it cannot be

extracted from intermediate data form

dataeaze

Static Reports

Dashboard ReportsDashboard is reporting user interface where user can interactively

choose his own view of data with limited set of filters.

ExampleAn e-commerce company having dashboard for sellers where

sellers get to know how much inventory sold across demographic, across product categories, across time range.

dataeaze

Nature of data processing

Periodically keep on processing raw data to bring it in form required by dashboards

Populate transformed data into interactive store backend of dashboards

dataeaze

Dashboard

Expectations from data platform

ETLTo convert raw data in format required by dashboard

Scheduled data processingTimely repeated executions of ETL jobs to populate

dashboards with latest updates

Interactive data storeDashboard reports are interactive in nature, so backend store

is supposed to return results in near real time

dataeaze

Dashboard

Ad Hoc data analysisThis is for business queries which are raised as per need,

This is not scheduled and is executed one time whenever necessary

ExampleA product manager wanting to know detail analysis about

customer behavior on a navigation panel, so as to define optimised ad placements.

dataeaze

Nature of data processing

Steps to serve an ad hoc report,

Identify data sources which will satisfy given request

Execute data processing (preferable sql like query) on identified source

Load results in data representation tool

dataeaze

Ad Hoc

Expectations from data platform

data processing SQL engineSQL query engine makes it easy to represent required analysis

in form of SQL query, saves analyst’s time

complex data processingA platform which supports writing custom complex data

analysis, which is not possible through SQL

dataeaze

Ad Hoc

BI ReportingBusiness Intelligence tools provide advanced general purpose

dashboards which host wide array of dimensions in backend data store. User can define and save transformations, analysis queries through BI tool and get back reports in tabular or graphical form.

ExampleA BI report representing weekly sales stats across multiple regions for previous 6 months. This report is once created and saved. Users

execute saved report whenever they want.

dataeaze

Nature of data processing

Scheduled ETL jobs to convert raw data to required intermediate data form

Data is loaded to interactive SQL data stores

BI tools are connected to SQL data store as backend

dataeaze

BI Reporting

Expectations from data platform

ETLRaw data should be transformed to required format and get

loaded to SQL data warehouse

Scheduling of ETLDefined ETL jobs should be scheduled to execute at fixed time

interval.

data processing SQL engineSQL query engine makes it easy to extract data out, saves

time. BI tools can connect to this SQL data store.

dataeaze

BI Reporting

Data Processing for ApplicationsThis is data processing done to provide feedback input to business applications. Business applications take better decisions based on

latest data feedback.

ExampleAd servers getting periodically updated about latest minimum ecpm to expect for an ad placement getting filled dynamically.

dataeaze

Nature of data processing

Complex data processing (machine learning) on raw data

Scheduled data processing

Update result into interactive key-value store which get fetched directly from applications

dataeaze

App data processing

Expectations from data platform

Capability to implement custom complex data processingUser should be able to easily define custom complex data processing

algorithms (like machine learning)

Scheduled data processingRequired for periodic execution of data processing jobs

dataeaze

App data processing

Real time stream data processingIt is analysing an event as soon as it happens. Sooner the analysis

better is value obtained from it.

ExampleStock ticker getting displayed on yahoo finance

dataeaze

Nature of data processing

As soon as event happens its log entry is collected

All log entries are buffered, made available for processing layer.

Pull records from message buffer and perform processing on it.

dataeaze

Real time stream

Expectations from data platform

Scalable message bufferA message buffer to keep received messages which are pulled

from this buffer for processing

Real time stream processing engineTo pull and process records in real time. Provide user ability to

define custom data processing.

dataeaze

Real time stream

Let us take a look at super set of expectations across all use cases

dataeaze

Expectations from data platformacross all use cases

Summarise all

dataeaze

Super set of expectations

Expectation / Capability Use caseNeeded by

Complex data analysis using query

language

Scheduled ETL data processing

Data store for interactive data

analysis

Data ingestion with timely arrival of

data

Scalable message buffer to be

consumed by stream data processing

Streaming data processing platform

Static reports

ad hoc data analysis

BI reporting

Dashboard reports

app specific data processing

Real time stream data processing

Summarise all

dataeaze

Let’s conclude

dataeaze

We have identified common set of features expected from data platform

by most of analytics use cases

Let us map these to data platform components

Conclude

dataeaze

Capabilities provided by data platform components

Expectation / Capability Data platform component

Supported by

Complex data analysis using query

language

Scheduled ETL data processing

Data store for interactive data

analysis

Data ingestion with timely arrival of

data

Scalable message buffer to be

consumed by stream data processing

Streaming data processing platform

Data Ingestion

Batch data processing

Workflow scheduler

Interactive data stores

Message buffers

Real time stream

engine

Data Platform Tools

Flume, Kafka, Scribe

Hive, Mapred

Oozie

Hbase, Spark, ..

Kafka

Storm, Spark

Conclude

dataeaze

Data platform components satisfying expectationsConclude

dataeaze

Going backwordsNow you know about

Data platform components

capabilities supported by those

satisfying features of analytics use cases

Conclude

dataeaze

Thank You

dataeaze