Upload
matias-bergmann
View
79
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Session focus- What is Hadoop?- What is the disruptive innovation in Hadoop?- What are target use cases, horizontally and telco-specific?- How do we start realizing value from Hadoop today?- How does Hadoop complement existing investments in businessintelligence?
Citation preview
Capturing Big Value in Big Data – How Use Case Segmentation Drives Solution Design and Technology Selection at Deutsche Telekom
Jürgen Urbanski Vice President Big Data & Cloud Architectures & Technologies, T-Systems Cloud Leadership Team, Deutsche Telekom Board Member, BITKOM Big Data & Analytics Working Group [email protected]
Session focus
§ What is Hadoop? § What is the disruptive innovation in Hadoop? § What are target use cases, horizontally and telco-specific? § How do we start realizing value from Hadoop today? § How does Hadoop complement existing investments in business
intelligence?
§ Audience participation: 1. Who has heard of Hadoop? 2. Who is using Hadoop somewhere in their organization? (can
be proof of concept or production)
2
Hadoop! Coming soon to a data center near you...
Hadoop is like a data warehouse, but it can store more data,
more kinds of data, and perform more flexible analyses.
Haoop is open source
and runs on industry standard hardware, so it's 1-2 orders of magnitude more economical
than conventional data warehouse solutions.
Hadoop provides more cost effective storage, processing, and analysis. Some existing workloads run faster, cheaper, better
Hadoop can deliver a foundation for profitable growth: Gain value from all your data by asking bigger questions
3
Reference architecture view of Hadoop
4
Security
Operations
Infrastructure
Virtualization Compute / Storage / Network
Workflow
and Scheduling M
anagement and M
onitoring
Data Isolation
Access M
anagement
Data Encryption
Data Integration
Data Processing
Batch Processing Real Time/Stream Processing Search and Indexing
Application
Analytics Apps Transactional Apps Analytics Middleware
Presentation
Data Visualization and Reporting Clients Real Time Ingestion
Batch Ingestion
Data Connectors
Metadata Services
Data Management Distributed Processing
(MapReduce) Non-relational
DB Structured In-
Memory
Distributed Storage (HDFS)
Hadoop Core
Hadoop Projects
Adjacent Categories
Disruptive innovation #1: Store first, ask questions later
5
SAN Storage
3-5€/GB
Based on HDS SAN Storage
NAS Filers 1-3€/GB
Based on Netapp FAS-Series
White Box DAS1)
0.50-1.00€/GB
Hardware can be self-assembled
Data Cloud1)
0.10-0.30€ /GB
Based on large scale object
storage interfaces
Enterprise Class Hadoop Storage
???€/GB
Based on Netapp E-Series (NOSH)
1) Hadoop offers Storage + Compute (incl. search). Data Cloud offers Amazon S3 and native storage functions
? ! Illustrative acquisition cost
Much cheaper storage but not just storage…
Disruptive innovation #2: Parallel processing
6
Traditional Database
SCALE (storage and processing) Hadoop
Distribution NoSQL MPP Analytics EDW
Schema Required on write Required on read
Speed Reads are fast Writes are fast
Governance Standards and structured Loosely structured
Processing Limited, no data processing Processing coupled with data, parallel processing / scale out
Data types Structured Multi and unstructured
Best fit use Interactive OLAP Analytics
Complex ACID Transactions Operational Data Store
Data Discovery Processing unstructured data Massive Storage/Processing
.. Big Data: It’s About Scale and Structure
Target use cases for Hadoop
7
IT Infrastructure & Operations
Business Intelligence &
Data Warehousing
Line of Business & Business Analysts
CXO
Time to value Longer Shorter
Lower
Higher
Potential value
§ Lower Cost Storage
§ Enterprise Data Lake
§ Enterprise Data Warehouse Offload
§ Enterprise Data Warehouse Archive
§ ETL Offload
§ Capacity Planning & Utilization
§ Customer Profiling & Revenue Analytics
§ Targeted Advertising Analytics
§ Service Renewal Implementation
§ CDR based Data Analytics
§ Fraud Management
§ New Business Models?
Cost effective storage, processing, and analysis
Foundation for profitable growth
The Challenge
Enterprise data warehouse offload use case
§ EDW at capacity; cannot support growing data volumes and variety
§ Expensive to scale, can only keep one year of data
§ Performance issues in business critical apps; little room for discovery, analytics
§ Older data is stored but “dark,” cannot swim around and explore it
8
The Solution § Offload low value/byte data from
EDW, and rescue data from storage grid or tape
§ Hadoop for data storage and processing (parse, cleanse, apply structure and transform)
§ Free existing capacity for query workloads
§ Retain all data for analysis!
Operational (44%)
ELT Processing (42%)
Analytics (11%)
DATA WAREHOUSE
Analytics Processing Storage
HADOOP
Operational (50%)
Analytics (50%)
DATA WAREHOUSE
Enterprise data warehouse offload – benefits Illustrative economics from a large telco
§ 100x overall project capex improvement – $200,000 per TB for Teradata versus $2,000 per TB capex for Hadoop. – Of the $2,000, 25% is HW, 25% SW, 50% Services. –
§ Most of the services are as follows: – The telco´s DW has 100ks of lines of SQL code – All the ETL is in SQL and needs to be converted into MapReduce jobs – Had developed their set of SQL queries over 6 years, and moved over to
MapReduce based queries over 6 months.
§ As a result, spend on TDC has decreased from $65m to $35m over a 5 year horizon, most of the remaining $35m spend is maintenance for the legacy TDC deployment.
§ Moreover, TDC performance up by 4x because TDC focused on high volume (lots of apps / users) but low latency (interactive response time) while the rest of the work is offloaded to the Hadoop cluster.
9
Common objections to Hadoop
10
We don’t have big data problems
We don’t have petabytes of data
We can’t justify the budget for a
new project
We don’t have the skills
We’re not sure Hadoop is mature/
secure/ enterprise-ready
We already have a scale-out strategy for our EDW/ETL
MYTH: Big Data means “Petabytes”
§ Not just Volume § Remember Variety, Velocity § Plenty of issues at smaller scales
– Data processing – Unstructured data
§ Often warehouse volumes are small because the technology is expensive, not because there is no relevant data
§ Scalability is about growing with the business, affordably and predictably
Every organization has data problems! Hadoop can help…
11
MYTH: Big Data means Data Science
§ Hadoop solves existing problems faster, better, cheaper than conventional technology, e.g. – Landing zone – capturing and
refining multi-structured data types with unknown future value
– Cost effective platform for retaining lots of data for long periods of time
§ Walk before you run § Big Data Is a State of Mind
GOAL: Platform that natively supports
mixed workloads as shared service
AVOID: Systems separated by workload
type due to contention
From data puddles and ponds to lakes and oceans
Page 12
Big Data
BU1
Big Data
BU2
Big Data
BU3
Big Data Transactions, Interactions, Observations
Refine Explore Enrich
Batch Interactive Online
Waves of adoption – crossing the chasm
13
Wave 1 Batch Orientation
Wave 2 Interactive Orientation
Wave 3 Real-Time Orientation
§ Mainstream, 70% of organizations
§ Early adopters, 20% of organizations
§ Bleeding edge, 10% of organizations
Adoption today*
§ Refine: archival and transformation
§ Explore: query and visualization
§ Enrich: real-time decisions
Example use cases
§ Hour(s) § Minutes § Seconds Response time § Volume § Velocity Data
characteristic § EDW / RDBMS talk
to Hadoop § Analytic apps talk
directly to Hadoop § Derived data also
stored in Hadoop Architectural characteristic
§ MapReduce, Pig, Hive
§ ODBC/JDBC, Hive § HBase, NoSQL, SQL
Example technologies
* Among organizations using Hadoop
Where to start inserting Hadoop in your company? A call to action…
IT Infrastructure IT Applications LOB CXO § Understanding Big Data
– Definition – Benefits over adjacent and
legacy technologies – Current mode vs. future
mode for analytics § Assessing the Economic
Potential – Target use cases by
function and industry – Best approach to adoption
§ Accelerating implementation – Solution design driven by
target use cases – Reference architecture – Technology selection and
POC – Implementation lessons
learnt
14
Key takeaways
§ The Hadoop open source ecosystem delivers powerful innovation in storage, databases and business intelligence, promising unprecedented price / performance compared to existing technologies.
§ Hadoop is becoming an enterprise-wide landing zone for large amounts of data. Increasingly it is also used to transform data.
§ Large enterprises have realized substantial cost reductions by offloading some enterprise data warehouse, ETL and archiving workloads to a Hadoop cluster.
§ Questions? [email protected]
15