Building Big Data - cooladata.com · Software Costs Monthly 1 TB per Q 1TB per Month 3TB per Month...

Preview:

Citation preview

Building Big Data

The True Cost of Building Analytics

Extreme scalability - scale up to tracking billions of events

Serving more than one analytical app

Real time - streaming (not Hadoop) in order to get RT capabilities

Permanent history - events are stored for several months

Direct access to individual events - able to access granular data

Support any analysis - both business and technical roles can

answer any question

A Big Data Platform – Basic Assumptions

Load Once Use Many

Big Data Platform

Campaign

managementA/B testing

Recommendation

engine

Dashboards

Push

notifications

Data mining

modules

Affiliate

reporting

ETL

REAL TIME

PROCESSINGHBASEChouchbase CASSANDRA

INTERACTIVE

PROCESSINGExasolVerticaRedshift

BATCH

PROCESSINGHADOOP HIVE

STRUCTURED AND UNSTRUCTURED DATA

(HDFS, S3)

REAL-TIME

PROCESSING

(KAFKA, STORM, KINSESIS)

DATA VISUALIZATION

(EXCEL, TABLEAU, QlikView)

Typical Big Data Architecture

REAL-TIME STREAMS

Big Data Platform Components

Admin

Track Collect Enrich AnalyzeStore Visualize

Audit and Control

Best of Breed Approach

Component Cloud Service/Open Source On Premises/Private Cloud

Collectors Cloudfront, Amazon Kinesis Storm, Kafka

Process Amazon EMR , Google data pipeline Hadoop distributions, Talend, Informatica

Storage Amazon S3, Google storage EMC, IBM, HP

Analytical DBGoogle Big Query, Amazon

Redshift, Impala, SparkVertica, Exasol, Infrobright

Real-time DB MongoIO, Redis Labs MongoDB, Couchbase, Cassandra

Visualization Analytics ChartIO, D3.JS, Google SpreadsheetLooker, Tableau,

QlikView, MicroStrategy

HR Costs

The most significant cost of building a Big Data analytics solution is human resources.

The solution is complex, requires real know-how and involves expertise.

We included

• ETL

• Cloud infra experts

• Java/Python developers

• DBA

• Dashboard developers

• Analysts

We didn’t include

• QA

• A 24/7 support team

HR Costs

1 TB per Q 1TB per Month3TB per

MonthMonthly Costs

BackEnd dev 1 1.5 2.5 $8,000

Infra/system MGMT 0.2 0.5 0.5 $10,000

DBA 0.3 0.5 1 $10,000

Analyst 1 2 2 $8,000

Total headcount 2.5 4.5 6

Total monthly headcount cost $21,000 $38,000 $51,000

Cloud Infra Costs

The infrastructure of an analytics solution consists of data storage, servers, network and monitoring

tools. All costs are proportional to the platform’s size.

We included a production environment for

• Servers

• Storage

• Network

We did not include

• Dev and test environments

All costs are based on cloud commodity hardware.

Keep in mind: appliances or special requirements such as large memory (RAM) or SSD are much more

expensive.

Cloud Infra Costs

1 TB per Q 1TB per Month 3TB per Month

Servers $1,500 $3,000 $12,000

Storage $100 $800 $2,000

Network $300 $1,000 $3,000

Total infra cost $1,900 $4,800 $17,000

Software Costs Monthly

1 TB per Q 1TB per Month 3TB per Month

ETL/Hadoop $100 $500 $1,000

Analytical DB $500 $1,000 $5,000

Visualization tool $1,000 $1,000 $2,000 (10-25 Users)

Total software costs $1,600 $2,500 $8,000

We converted perpetual and maintenance costs in to ongoing monthly cost.

DB & ETL costs are based on cloud services.

The cost of visualization tools is base on market leaders.

Overall Costs

1 TB per Q 1TB per Month 3TB per Month

Infrastructure $1,900 $4,800 $17,000

Software $1,600 $2,500 $8,000

Human resources $21,000 $38,000 $51,000

Total monthly $24,500 $45,300 $76,000

Total annual $294,000 $543,600 $912,000

When Should You Build?

• Analytics is your core business

• Your analytics is highly tied into your operational system

• Your analytical requirements are special

• You have sufficient time and resources

Big Data is not a project – it’s an ongoing process!

Recommended