Upload
hortonworks
View
255
Download
2
Tags:
Embed Size (px)
Citation preview
2 © RedPoint Global Inc. 2015 Confidential
Today’s Speakers
Justin Sears, Senior Manager, Product Marketing, Hortonworks
Jamie Keeffe, Product Marketing Manager, RedPoint Global
Kris Tomes, Solutions Director, RedPoint Global
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks: Hadoop for the Enterprise We Do Hadoop
Spring 2015
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP
Customer Momentum
• 330+ customers (as of year-end 2014)
Hortonworks Data Platform • Completely open multi-tenant platform for any app & any data. • A centralized architecture of consistent enterprise services for
resource management, security, operations, and governance.
Partner for Customer Success • Open source community leadership focus on enterprise needs • Unrivaled world class support
• Founded in 2011 • Original 24 architects, developers,
operators of Hadoop from Yahoo! • 600+ Employees • 1000+ Ecosystem Partners
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP
Spring 2015
Hortonworks. We do Hadoop.
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Traditional systems under pressure Challenges • Constrains data to app • Can’t manage new data • Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012 2.8 Zettabytes
2020 40 Zettabytes
LAGGARDS
INDUSTRY LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop emerged as foundation of new data architecture
Apache Hadoop is an open source data platform for managing large volumes of high velocity and variety of data • Built by Yahoo! to be the heartbeat of its ad & search business
• Donated to Apache Software Foundation in 2005 with rapid adoption by large web properties & early adopter enterprises
• Incredibly disruptive to current platform economics
Traditional Hadoop Advantages ü Manages new data paradigm ü Handles data at scale ü Cost effective ü Open source
Traditional Hadoop Had Limitations " Batch-only architecture " Single purpose clusters, specific data sets " Difficult to integrate with existing investments " Not enterprise-grade
Application
Storage HDFS
Batch Processing MapReduce
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Modern Data Architecture emerges to unify data & processing
Modern Data Architecture • Enable applications to have access to
all your enterprise data through an efficient centralized platform
• Supported with a centralized approach governance, security and operations
• Versatile to handle any applications and datasets no matter the size or type
Clickstream Web & Social
Geoloca3on Sensor & Machine
Server Logs
Unstructured
SOU
RC
ES
Existing Systems
ERP CRM SCM
AN
ALY
TIC
S
Data Marts
Business Analytics
Visualization & Dashboards
AN
ALY
TIC
S
Applications Business Analytics
Visualization & Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS (Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-Time Batch Partner ISV Batch Batch MPP EDW
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Modern Data Architecture emerges to unify data & processing
Clickstream Web & Social
Geoloca3on Sensor & Machine
Server Logs
Unstructured
SOU
RC
ES
Existing Systems
ERP CRM SCM
AN
ALY
TIC
S
Data Marts
Business Analytics
Visualization & Dashboards
AN
ALY
TIC
S
Applications Business Analytics
Visualization & Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS (Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-Time Batch Partner ISV Batch Batch MPP EDW
RedPoint Global is a Hortonworks Partner, cer3fied on HDP and YARN. With RedPoint, your exis:ng data analysts and database administrators can easily work with data stored in Hadoop. No new skills are required.
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop adoption follows a predictable journey Cost Optimization, new analytic apps, and ultimately to a “data lake”
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Cost optimization
Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer
Offload costly ETL process Free your EDW to perform high-value functions like analytics & operations, not ETL
Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context
AN
ALY
TIC
S
Data Marts
Business Analytics
Visualization & Dashboards
HDP helps you reduce costs and optimize the value associated with your EDW
AN
ALY
TIC
S D
ATA
SYST
EMS
Data Marts
Business Analytics
Visualization & Dashboards
HDP 2.2
ELT °
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data, Deeper Archive & New Sources
Enterprise Data Warehouse
Hot
MPP
In-Memory
Clickstream Web & Social
Geoloca3on Sensor & Machine
Server Logs
Unstructured
Existing Systems
ERP CRM SCM
SOU
RC
ES
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Single View Improve acquisition and retention
Predictive Analytics Identify your next best action
Data Discovery Uncover new findings
Financial Services
New Account Risk Screens Trading Risk Insurance Underwriting
Improved Customer Service Insurance Underwriting Aggregate Banking Data as a Service
Cross-sell & Upsell of Financial Products Risk Analysis for Usage-Based Car Insurance Identify Claims Errors for Reimbursement
Telecom Unified Household View of the Customer Searchable Data for NPTB Recommendations Protect Customer Data from Employee Misuse
Analyze Call Center Contacts Records Network Infrastructure Capacity Planning Call Detail Records (CDR) Analysis
Inferred Demographics for Improved Targeting Proactive Maintenance on Transmission Equipment Tiered Service for High-Value Customers
Retail 360° View of the Customer Supply Chain Optimization Website Optimization for Path to Purchase
Localized, Personalized Promotions A/B Testing for Online Advertisements Data-Driven Pricing, improved loyalty programs
Customer Segmentation Personalized, Real-time Offers In-Store Shopper Behavior
Manufacturing Supply Chain and Logistics Optimize Warehouse Inventory Levels Product Insight from Electronic Usage Data
Assembly Line Quality Assurance Proactive Equipment Maintenance Crowdsource Quality Assurance
Single View of a Product Throughout Lifecycle Connected Car Data for Ongoing Innovation Improve Manufacturing Yields
Healthcare Electronic Medical Records Monitor Patient Vitals in Real-Time Use Genomic Data in Medical Trials
Improving Lifelong Care for Epilepsy Rapid Stroke Detection and Intervention Monitor Medical Supply Chain to Reduce Waste
Reduce Patient Re-Admittance Rates Video Analysis for Surgical Decision Support Healthcare Analytics as a Service
Oil & Gas Unify Exploration & Production Data Monitor Rig Safety in Real-Time Geographic exploration
DCA to Slow Well Declines Curves Proactive Maintenance for Oil Field Equipment Define Operational Set Points for Wells
Government Single View of Entity CBM & Autonomic Logistic Analysis Sentiment Analysis on Program Effectiveness
Prevent Fraud, Waste and Abuse Proactive Maintenance for Public Infrastructure Meet Deadlines for Government Reporting
Hadoop Driver: Advanced analytic applications
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Enabling the data lake SC
ALE
SCOPE
Data Lake Definition • Centralized Architecture
Multiple applications on a shared data set with consistent levels of service
• Any App, Any Data Multiple applications accessing all data affording new insights and opportunities.
• Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value.
Drivers: 1. Cost Optimization 2. Advanced Analytic Apps
Goal: • Centralized Architecture • Data-driven Business
DATA LAKE
Journey to the Data Lake with Hadoop
Systems of Insight
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Case Study: 12 month Hadoop evolution at TrueCar D
ata
Plat
form
Cap
abili
ties
12 months execution plan
June 2013 Begin Hadoop Execution
July 2013 Hortonworks Partnership
May ‘14 IPO
Aug 2013 Training & Dev Begins
Nov 2013 Production Cluster 60 Nodes 2 PB
Jan 2014 40% Dev Staff Perficient
Dec 2013 Three Production Apps (3 total)
Feb 2014 Three More Production Apps (6 total)
12 Month Results at TRUECar • Six Production Hadoop Applications • Sixty nodes/2PB data • Storage Costs/Compute Costs
from $19/GB to $0.23/GB
“We addressed our data platform capabilities strategically as a pre-cursor to IPO.”
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks Data Platform Hadoop for the Enterprise
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Only HDP delivers a Centralized Architecture HDP is uniquely built around YARN serving as a data operating system that provides multi-tenant Resource Management, consistent Governance & Security and efficient Operations services across Hadoop applications.
Hortonworks Data Platform
YARN Data Operating System • A centralized architecture of
consistent enterprise services for resource management, security, operations, and governance.
• The versatility to support multiple applications and diverse workloads from batch to interactive to real-time, open source and commercial.
Key Benefits
• Multiple applications on a shared data set with consistent levels of service: a multitenant data platform.
• Provides a shared platform to enable new analytic applications.
• Delivers maximum cost efficiency for cluster resource management. Fewer servers fewer nodes.
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
Existing Applications
New Analytics
Partner Applications
Data Access: Batch, Interactive & Real-time
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP delivers a completely open data platform
Hortonworks Data Platform 2.2
Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized architecture of core enterprise services, for any application and any data.
Completely Open
• HDP incorporates every element required of an enterprise data platform: data storage, data access, governance, security, operations
• All components are developed in open source and then rigorously tested, certified, and delivered as an integrated open source platform that’s easy to consume and use by the enterprise and ecosystem.
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Apa
che
Pig
° °
° °
° ° °
° ° °
HDFS (Hadoop Distributed File System)
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
Apache Falcon
Apa
che
Hiv
e C
asca
ding
A
pach
e H
Bas
e A
pach
e A
ccum
ulo
Apa
che
Sol
r A
pach
e S
park
Apa
che
Sto
rm
Apache Sqoop
Apache Flume
Apache Kafka
SECURITY
Apache Ranger
Apache Knox
Apache Falcon
OPERATIONS
Apache Ambari
Apache Zookeeper
Apache Oozie
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP: Any Data, Any Application, Anywhere
Any Application • Deep integration with ecosystem
partners to extend existing investments and skills
• Broadest set of applications through the stable of YARN-Ready applications
Any Data Deploy applications fueled by clickstream, sensor, social, mobile, geo-location, server log, and other new paradigm datasets with existing legacy datasets.
Anywhere Implement HDP naturally across the complete range of deployment options
Clickstream Web & Social
Geoloca3on Internet of Things
Server Logs
Files, emails ERP CRM SCM
hybrid
commodity appliance cloud
Over 70 Hortonworks Certified YARN Apps
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Expansion
Architecture & Development
Production
Implementation
Hortonworks supports the full application lifecycle Hadoop usage follows a consistent lifecycle From architecture to expansion, all with a consistent support experience
Most Common Support Issues by Project Phase Issues address by Hortonworks Support by type for the past year
Issue Type Architecture 7%
Application Development 10%
Installation 10%
Performance 5%
Configuration 25%
Executing Jobs 20%
Cluster Administration 18%
HDP Upgrades 3%
Enhancement Requests 3%
TOTAL 100%
Hortonworks Support
Full Lifecycle Subscription Support
Support through EVERY phase of adoption of your Hadoop project to ensure your success
# tickets
Project 2
Project 3
Project N
.
.
.
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“Hortonworks loves and lives open source innovation” World Class Support and Services. Hortonworks' Customer Support received a maximum score and was significantly higher than both Cloudera and MapR
A Leader in Hadoop
The Forrester Wave™ Big Data Hadoop Solutions Q1 2014
22 © RedPoint Global Inc. 2015 Confidential
Overview of RedPoint Global
" Launched 2006
" Founded and staffed by industry veterans
" Headquarters: Wellesley, MassachuseJs
" Offices in US, UK, Australia, Philippines
" Global customer base
" Serves most major industries
MAGIC QUADRANT Data Quality
MAGIC QUADRANT Mul:channel Campaign
Management
MAGIC QUADRANT Integrated Marke:ng
Management
24 © RedPoint Global Inc. 2015 Confidential
New Data Straining Current Architectures
Unstructured documents, emails
Transac:onal data
Server logs Sen:ment, web data
Geoloca:on Sensor, machine data Clickstream
Hierarchical data OLTP, ERP, CRM Master data 2.8 ZB in 2013
85% from new data types
15x Machine Data by 2020
40 ZB by 2020
Source: IDC
25 © RedPoint Global Inc. 2015 Confidential
Key Functions for Data Management
Master Key Management
ETL & ELT Data Quality
Web Services Integra:on
Integra:on & Matching
Process Automa:on & Opera:ons
• Profiling, reads/writes, transforma:ons
• Single project for all jobs
• Cleanse data • Parsing, correc:on • Geo-‐spa:al analysis
• Grouping • Fuzzy match
• Create keys • Track changes • Maintain matches over :me
• Consume and publish • HTTP/HTTPS protocols • XML/JSON/SOAP formats
• Job scheduling, monitoring, no:fica:ons
• Central point of control • Meta Data Management
26 © RedPoint Global Inc. 2015 Confidential
Overview - What is Hadoop?
Hadoop 1.0 • All opera:ons based on Map Reduce
• Intrinsic inconsistency of code based solu:ons
• Highly skilled and expensive resources needed
• 3rd party applica:ons constrained by the need to generate code
Hadoop 2.0 • Introduc:on of the YARN: “a general-‐purpose, distributed, applica:on management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters.”
• Mature applica:ons can now operate directly on Hadoop
• Reduce skill requirements and increased consistency
HDFS (Hadoop Distributed File System)
YARN: Data Opera3ng System
Batch MapReduce
Batch & Interac3ve Tez
Real-‐Time Slider
Spark Other ISV
Other ISV
Stream
Storm
NoSQL
HBase Accumulo
Other ISV
Cascading
Scala Java
SQL
Hive
Scrip3ng
Pig
Direct
Java .NET
API
Engine
System
HADOOP 2.0
27 © RedPoint Global Inc. 2015 Confidential
RedPoint Data Management on Hadoop
Par::oning AM / Tasks
Execu:on AM / Tasks Data I/O Key / Split
Analysis
Parallel Sec:on
YARN
MapReduce
28 © RedPoint Global Inc. 2015 Confidential
Resource Manager
Launches Tasks
Node Manager
DM App Master
DM Task
Node Manager
DM Task
DM Task
Node Manager
DM Task
DM Task
Launches DM App Master
Data Management Designer
DM Execu3on
Server
Parallel Sec:on
Running DM Task
12
3
RedPoint DM for Hadoop: Processing Flow
29 © RedPoint Global Inc. 2015 Confidential
>150 Lines of MR Code ~50 Lines of Script Code 0 Lines of Code
6 hours of development 3 hours of development 15 min. of development
6 minutes run:me 15 minutes run:me 3 minutes run:me
Extensive op:miza:on needed
User Defined Func:ons required prior to running script
No tuning or op:miza:on required
RedPoint
Benchmarks – Project Gutenberg
Map Reduce Pig
Sample MapReduce (small subset of the entire code which totals nearly 150 lines): public static class MapClass extends Mapper<WordOffset, Text, Text, IntWritable> { private final static String delimiters = "',./<>?;:\"[]{}-=_+()&*%^#$!@`~ \\|«»¡¢£¤¥¦©¬®¯±¶·¿"; private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(WordOffset key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line, delimiters); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } }
Sample Pig script without the UDF: SET pig.maxCombinedSplitSize 67108864 SET pig.splitCombination true A = LOAD '/testdata/pg/*/*/*'; B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS word; C = FOREACH B GENERATE UPPER(word) AS word; D = GROUP C BY word; E = FOREACH D GENERATE COUNT(C) AS occurrences, group; F = ORDER E BY occurrences DESC; STORE F INTO '/user/cleonardi/pg/pig-count';
33 © RedPoint Global Inc. 2015 Confidential
Kris Tomes Solution Director at RedPoint
Demonstration Introduction
34 © RedPoint Global Inc. 2015 Confidential
Key Factors to Consider
" Traditional data architectures are challenged
" Maximize the scale & cost optimization of the Hortonworks Modern Data Architecture
" Leverage your DBAs to control development / implementation / production costs and schedules
" Smooth out your journey to a data lake
" Expedite the speed for getting business applications into production
" Insist on Any Data, Any Application, Any Environment
" Do your data quality and data integration in the cluster
35 © RedPoint Global Inc. 2015 Confidential
Thank You & Please Visit Us at www.RedPoint.net
Jamie Keeffe Product Marke:ng Manager RedPoint Global Inc. [email protected] +1 978-‐764-‐3839