Big Data - Telco Cloud World Forum - For Distribution

Capturing Big Value in Big Data – How Use Case Segmentation Drives Solution Design and Technology Selection at Deutsche Telekom

Jürgen Urbanski Vice President Big Data & Cloud Architectures & Technologies, T-Systems Cloud Leadership Team, Deutsche Telekom Board Member, BITKOM Big Data & Analytics Working Group [email protected]

Session focus

§  What is Hadoop? §  What is the disruptive innovation in Hadoop? §  What are target use cases, horizontally and telco-specific? §  How do we start realizing value from Hadoop today? §  How does Hadoop complement existing investments in business

intelligence?

§  Audience participation: 1.  Who has heard of Hadoop? 2.  Who is using Hadoop somewhere in their organization? (can

be proof of concept or production)

2

Hadoop! Coming soon to a data center near you...

Hadoop is like a data warehouse, but it can store more data,

more kinds of data, and perform more flexible analyses.

Haoop is open source

and runs on industry standard hardware, so it's 1-2 orders of magnitude more economical

than conventional data warehouse solutions.

Hadoop provides more cost effective storage, processing, and analysis. Some existing workloads run faster, cheaper, better

Hadoop can deliver a foundation for profitable growth: Gain value from all your data by asking bigger questions

3

Reference architecture view of Hadoop

4

Security

Operations

Infrastructure

Virtualization Compute / Storage / Network

Workflow

and Scheduling M

anagement and M

onitoring

Data Isolation

Access M

anagement

Data Encryption

Data Integration

Data Processing

Batch Processing Real Time/Stream Processing Search and Indexing

Application

Analytics Apps Transactional Apps Analytics Middleware

Presentation

Data Visualization and Reporting Clients Real Time Ingestion

Batch Ingestion

Data Connectors

Metadata Services

Data Management Distributed Processing

(MapReduce) Non-relational

DB Structured In-

Memory

Distributed Storage (HDFS)

Hadoop Core

Hadoop Projects

Adjacent Categories

Disruptive innovation #1: Store first, ask questions later

5

SAN Storage

3-5€/GB

Based on HDS SAN Storage

NAS Filers 1-3€/GB

Based on Netapp FAS-Series

White Box DAS1)

0.50-1.00€/GB

Hardware can be self-assembled

Data Cloud1)

0.10-0.30€ /GB

Based on large scale object

storage interfaces

Enterprise Class Hadoop Storage

???€/GB

Based on Netapp E-Series (NOSH)

1) Hadoop offers Storage + Compute (incl. search). Data Cloud offers Amazon S3 and native storage functions

? ! Illustrative acquisition cost

Much cheaper storage but not just storage…

Disruptive innovation #2: Parallel processing

6

Traditional Database

SCALE (storage and processing) Hadoop

Distribution NoSQL MPP Analytics EDW

Schema Required on write Required on read

Speed Reads are fast Writes are fast

Governance Standards and structured Loosely structured

Processing Limited, no data processing Processing coupled with data, parallel processing / scale out

Data types Structured Multi and unstructured

Best fit use Interactive OLAP Analytics

Complex ACID Transactions Operational Data Store

Data Discovery Processing unstructured data Massive Storage/Processing

.. Big Data: It’s About Scale and Structure

Target use cases for Hadoop

7

IT Infrastructure & Operations

Business Intelligence &

Data Warehousing

Line of Business & Business Analysts

CXO

Time to value Longer Shorter

Lower

Higher

Potential value

§  Lower Cost Storage

§  Enterprise Data Lake

§  Enterprise Data Warehouse Offload

§  Enterprise Data Warehouse Archive

§  ETL Offload

§  Capacity Planning & Utilization

§  Customer Profiling & Revenue Analytics

§  Targeted Advertising Analytics

§  Service Renewal Implementation

§  CDR based Data Analytics

§  Fraud Management

§  New Business Models?

Cost effective storage, processing, and analysis

Foundation for profitable growth

The Challenge

Enterprise data warehouse offload use case

§  EDW at capacity; cannot support growing data volumes and variety

§  Expensive to scale, can only keep one year of data

§  Performance issues in business critical apps; little room for discovery, analytics

§  Older data is stored but “dark,” cannot swim around and explore it

8

The Solution §  Offload low value/byte data from

EDW, and rescue data from storage grid or tape

§  Hadoop for data storage and processing (parse, cleanse, apply structure and transform)

§  Free existing capacity for query workloads

§  Retain all data for analysis!

Operational (44%)

ELT Processing (42%)

Analytics (11%)

DATA WAREHOUSE

Analytics Processing Storage

HADOOP

Operational (50%)

Analytics (50%)

DATA WAREHOUSE

Enterprise data warehouse offload – benefits Illustrative economics from a large telco

§  100x overall project capex improvement –  $200,000 per TB for Teradata versus $2,000 per TB capex for Hadoop. –  Of the $2,000, 25% is HW, 25% SW, 50% Services. – 

§  Most of the services are as follows: –  The telco´s DW has 100ks of lines of SQL code –  All the ETL is in SQL and needs to be converted into MapReduce jobs –  Had developed their set of SQL queries over 6 years, and moved over to

MapReduce based queries over 6 months.

§  As a result, spend on TDC has decreased from $65m to $35m over a 5 year horizon, most of the remaining $35m spend is maintenance for the legacy TDC deployment.

§  Moreover, TDC performance up by 4x because TDC focused on high volume (lots of apps / users) but low latency (interactive response time) while the rest of the work is offloaded to the Hadoop cluster.

9

Common objections to Hadoop

10

We don’t have big data problems

We don’t have petabytes of data

We can’t justify the budget for a

new project

We don’t have the skills

We’re not sure Hadoop is mature/

secure/ enterprise-ready

We already have a scale-out strategy for our EDW/ETL

MYTH: Big Data means “Petabytes”

§  Not just Volume §  Remember Variety, Velocity §  Plenty of issues at smaller scales

–  Data processing –  Unstructured data

§  Often warehouse volumes are small because the technology is expensive, not because there is no relevant data

§  Scalability is about growing with the business, affordably and predictably

Every organization has data problems! Hadoop can help…

11

MYTH: Big Data means Data Science

§  Hadoop solves existing problems faster, better, cheaper than conventional technology, e.g. –  Landing zone – capturing and

refining multi-structured data types with unknown future value

–  Cost effective platform for retaining lots of data for long periods of time

§  Walk before you run §  Big Data Is a State of Mind

GOAL: Platform that natively supports

mixed workloads as shared service

AVOID: Systems separated by workload

type due to contention

From data puddles and ponds to lakes and oceans

Page 12

Big Data

BU1

Big Data

BU2

Big Data

BU3

Big Data Transactions, Interactions, Observations

Refine Explore Enrich

Batch Interactive Online

Waves of adoption – crossing the chasm

13

Wave 1 Batch Orientation

Wave 2 Interactive Orientation

Wave 3 Real-Time Orientation

§  Mainstream, 70% of organizations

§  Early adopters, 20% of organizations

§  Bleeding edge, 10% of organizations

Adoption today*

§  Refine: archival and transformation

§  Explore: query and visualization

§  Enrich: real-time decisions

Example use cases

§  Hour(s) §  Minutes §  Seconds Response time §  Volume §  Velocity Data

characteristic §  EDW / RDBMS talk

to Hadoop §  Analytic apps talk

directly to Hadoop §  Derived data also

stored in Hadoop Architectural characteristic

§  MapReduce, Pig, Hive

§  ODBC/JDBC, Hive §  HBase, NoSQL, SQL

Example technologies

* Among organizations using Hadoop

Where to start inserting Hadoop in your company? A call to action…

IT Infrastructure IT Applications LOB CXO §  Understanding Big Data

–  Definition –  Benefits over adjacent and

legacy technologies –  Current mode vs. future

mode for analytics §  Assessing the Economic

Potential –  Target use cases by

function and industry –  Best approach to adoption

§  Accelerating implementation –  Solution design driven by

target use cases –  Reference architecture –  Technology selection and

POC –  Implementation lessons

learnt

14

Key takeaways

§  The Hadoop open source ecosystem delivers powerful innovation in storage, databases and business intelligence, promising unprecedented price / performance compared to existing technologies.

§  Hadoop is becoming an enterprise-wide landing zone for large amounts of data. Increasingly it is also used to transform data.

§  Large enterprises have realized substantial cost reductions by offloading some enterprise data warehouse, ETL and archiving workloads to a Hadoop cluster.

§  Questions? [email protected]

15

Documents

Big Data - Telco Cloud World Forum - For Distribution