Upload
hortonworks
View
830
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Learn how when an organizations combine HP and Vertica Analytics Platform and Hortonworks, they can quickly explore and analyze broad variety of data types to transform to actionable information that allows them to better understand how their customers and site visitors interact with their business, offline and online.
Citation preview
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Using HP Vertica and Apache Hadoop …for customer analytics
We do Hadoop.
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Your speakers…
John Kreisa, VP Strategic Alliance Marketing Hortonworks
Chris Selland, VP Business Development HP Software, Big Data Group
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Poll
Where are you in your Hadoop journey? • Researching our options • Currently evaluating some software • Deep in a trial • What’s Hadoop?
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Big Data Market Trends & Projections
Big Data Explosion
% by which org’s leveraging modern info management systems outperform peers by 2015
ñ Hadoop enabled DBMS’s
85% from new data types
50x data growth 2010 to
2020
1 Zettabyte (ZB) =
1 Billion TBs
15x
growth rate of machine generated
data by 2020
The US has 1/3 of the world’s data
Big Data is 1 of 5 US GDP Game Changers $325 billion incremental annual GDP from big data analytics in retail and manufacturing by
2020
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Cameras and microphones widely deployed
New routes to market via intelligent objects
Content and services via connected products
Everything has a URL
Remote sensing of objects and environment
Augmented reality
Situational decision support
Building and infrastructure management
Over 50% of Internet connections are things: 2011: 15+ billion permanent, 50+ billion intermittent 2020: 30+ billion permanent, >200 billion intermittent
Source: Gartner Keynote at Hadoop Summit 2013
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
A Data Architecture Under Pressure From New Data AP
PLICAT
IONS
DATA
SYSTEM
REPOSITORIES
SOURC
ES
Exis4ng Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Business Analy4cs
Custom Applica4ons
Packaged Applica4ons
Source: IDC
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
OLTP, ERP, CRM Systems
Unstructured documents, emails
Clickstream
Server logs
Sen>ment, Web Data
Sensor. Machine Data
Geoloca>on
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Within An Emerging Modern Data Architecture
OPERATIONS TOOLS
Provision, Manage & Monitor
DEV & DATA TOOLS
Build & Test
DATA
SYSTEM
REPOSITORIES
SOURC
ES
RDBMS EDW MPP
OLTP, ERP, CRM Systems
Documents, Emails
Web Logs, Click Streams
Social Networks
Machine Generated
Sensor Data
Geoloca>on Data
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
APPLICAT
IONS
Business Analy4cs
Custom Applica4ons
Packaged Applica4ons
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop: Typically Used For New Analytic Applications SC
ALE
SCOPE
New Analytic Apps New types of data LOB-driven
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Clickstream Capture and analyze website visitors’ data trails and optimize your website
Sensors Discover patterns in data streaming automatically from remote sensors and machines
Server Logs Research logs to diagnose process failures and prevent security breaches
New types of data Hadoop Value:
Sentiment Understand how your customers feel about your brand and products – right now
Geographic Analyze location-based data to manage operations where they occur
Unstructured Understand patterns in files across millions of web pages, emails, and documents
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
New Analytic Applications For New Types Of Data
$
• Supplier Consolidation • Supply Chain and Logistics • Assembly Line Quality Assurance • Proactive Maintenance • Crowdsourced Quality Assurance
• New Account Risk Screens • Fraud Prevention • Trading Risk • Maximize Deposit Spread • Insurance Underwriting • Accelerate Loan Processing
• Call Detail Records (CDRs) • Infrastructure Investment • Next Product to Buy (NPTB) • Real-time Bandwidth
Allocation • New Product Development
• 360° View of the Customer • Analyze Brand Sentiment • Localized, Personalized
Promotions • Website Optimization • Optimal Store Layout
Financial Services
Retail Telecom Manufacturing
Healthcare Utilities, Oil & Gas
Public Sector
• Genomic data for medical trials • Monitor patient vitals • Reduce re-admittance rates • Store medical research data • Recruit cohorts for
pharmaceutical trials
• Smart meter stream analysis • Slow oil well decline curves • Optimize lease bidding • Compliance reporting • Proactive equipment repair • Seismic image processing
• Analyze public sentiment • Protect critical networks • Prevent fraud and waste • Crowdsource reporting for
repairs to infrastructure • Fulfill open records requests
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
360° Customer View for Home Supply Retailer
Problem Lack of a unified customer record across all channels • Global distribution online, in home and across 2000+ stores • No “golden record” for analytics on customer buying behavior across all channels • Data repositories on website traffic, POS transactions and in-home services existed
in isolation of each other • Limited ability for targeted marketing to specific segments • Data storage costs increasing
Solution HDP delivers targeted marketing & data storage savings • Golden record enables targeted, customized marketing • Data warehouse offload saved millions in recurring expense • Customer team continues to find unexpected, unplanned uses for their 360 degree
view of customer buying behavior • New use case: price optimization versus competitors à several millions in top-line
revenue growth
Creating Opportunity Data: Clickstream,
Unstructured, Structured
Retail
Major home improvement retailer
>$74B in revenue
>300K employees
>2,200 stores
RT2
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Incrementally Delivers A ‘Data Lake’ SC
ALE
SCOPE
A Modern Data Architecture/Data Lake
New Analytic Apps New types of data LOB-driven
RDBMS
MPP
EDW
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
Data Lake An architectural shift in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop: An Integrated Part Of The Modern Data Architecture
DEPTH Hortonworks engages in deep engineered relationships with the leaders in the data center, applications and operations BREADTH Hundreds of partners work with us to certify their applications to work with Hadoop so they can extend big data to their users
Provision, Manage & Monitor
APPLICAT
IONS
DATA
SYSTEM
OPERATIONAL TOOLS
DEV & DATA TOOLS
INFRASTRUCTURE
HDP 2.1
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
Business Analy4cs
Custom Applica4ons
Packaged Applica4ons
REPOSITORIES
Build & Test
On Premise or in the Cloud
SOURC
ES
OLTP, ERP, CRM Systems
Documents, Emails
Web Logs, Click Streams
Social Networks
Machine Generated
Sensor Data
Geoloca>on Data
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Customer Analytics with HP Vertica + Hortonworks Chris Selland, VP Business Development HP Software, Big Data Group
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 15
Completing Analytical Vision
Data Types
Acc
urac
y an
d In
sigh
t
CRM ERP Data Warehouse Web Social Log Files Machine Data Images
Dark Data
Big Data Traditional Enterprise Data
Audio Video
Structured Semi-Structured Unstructured
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 16
Structured Data
Customer Analytics in the Big Data Era
Select Customers with < 2 Months Remaining on Contract with 5+ dropped calls per week and lifetime value > $500
From a database get me all matches from the CRM and Call Detail Records that match the query
From unstructured sources get me all matches for weblogs, calls, chat, email that were negative for the structured results
Unstructured Data
Customer expressed negative sentiment through social media, web log and/or support within the last 3 months
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 17
Faster answers from Big Data at a fraction of the cost of traditional data warehouses
Introducing HP Vertica Dragline
Store all your data in any format cost-effectively across Vertica + Hadoop
Explore all your data directly in Hadoop without moving or changing it
Serve all of your data consumers without compromise from individualized queries to large complex reports
HP Vertica
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HP Vertica Dragline: The Richest, Most Open SQL on Hadoop
Challenge Extracting Data from Hadoop requires complex and brittle ETL processes Solution: Hadoop Navigation and Analytics Benefits: • Navigate Hadoop data using its
native catalog • Quickly & easily load native data
types from Hadoop to Vertica • Avoid creating and maintaining
time-consuming schemas • Use the full power of HP Vertica
SQL and Analytics
Provision, Manage & Monitor
APPLICAT
IONS
DATA
SYSTEM
OPERATIONAL TOOLS
DEV & DATA TOOLS
INFRASTRUCTURE
HDP 2.1
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
Business Analy4cs
Custom Applica4ons
Packaged Applica4ons
REPOSITORIES
Build & Test
On Premise or in the Cloud
SOURC
ES
OLTP, ERP, CRM Systems
Documents, Emails
Web Logs, Click Streams
Social Networks
Machine Generated
Sensor Data
Geoloca>on Data
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Flexible Vertica Hadoop Connectivity Leverage existing tools in shared Vertica and Hadoop storage environment
webHDFS
ANSI SQL
webHDFS
HDP 2.1 Hortonworks Data Platform
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume NFS
WebHDFS
YARN: Data Operating System
DATA MANAGEMENT
SECURITY BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
GOVERNANCE & INTEGRATION
Authentication Authorization Accounting
Data Protection
Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon
Cluster: Knox
OPERATIONS
Script
Pig
Search
Solr
SQL
Hive HCatalog
NoSQL
HBase Accumulo
Stream
Storm
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
In-Memory
Spark
Tez Tez
Batch
Map Reduce
webHCAT Hadoop Connector
Storage Tiering
HDFS Connector External Tables and Copy
HCatalog Connector
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 20
Data Tiering and Cost Optimization
Tier-off older data
Value Discovery
Interactive Data Frequently queried Vertica data cache
Batch Data
Archive Data
Serve Convert data to Vertica storage format
Explore Any format
Store Any format Location Format
Cold
Cool
Hot
Dark Data
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 21
JSON Record-Unstructured Data
{"filter_level":"medium","contributors":null,"text":“Listening to Meg Whitman talk about the New Style of IT at #HPDiscover","geo":null,"retweeted":false,"in_reply_to_screen_name":null,"truncated":false, "lang":"en","entities":{"symbols":[],"urls":[],"hashtags":[{"text":"nope","indices":[51,56]}], "user_mentions":[]},"in_reply_to_status_id_str":null,"id":346104750565097474,"source":"!
<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone<\/a>", "in_reply_to_user_id_str":null,"favorited":false,"in_reply_to_status_id":null,"retweet_count":0,"created_at":“Tue Jun 11 03:19:37 +0000 2013","in_reply_to_user_id":null, "favorite_count":0,"id_str": "346104750565097474","place":null,"user":{"location":"","default_profile":false,"profile_background_tile":true,"statuses_count":2354,"lang":"en","profile_link_color":"FF0000","profile_banner_url":"https://pbs.twimg.com/profile_banners/271588683/1370571522","id":271588683,"following":null,"protected":false,"favourites_count":121,"profile_text_color":"3D1957","description":"Dance It is a part of me A part of who I am It has entered my life Taken over my body It is in my walk In my movements In my thoughts I have become a DANCER","verified":false,"contributors_enabled": false,"profile_sidebar_border_color":"65B0DA","name":"ashley tousignant", "profile_background_color":"642D8B","created_at": "Thu Mar 24 20:25:59 +0000 2011","default_profile_image":false,"followers_count":434,"profile_image_url_https":"https://si0.twimg.com/profile_images/3765534455/eee814d484d70b8eb9ca5db08a122cbb_normal.jpeg","geo_enabled":true,"profile_background_image_url":"http://a0.twimg.com/images/themes/theme10/bg.gif","profile_background_image_url_https":"https://si0.twimg.com/images/themes/theme10/bg.gif","follow_request_sent":null,"url":null,"utc_offset":null,"time_zone":null,"notifications":null,"profile_use_background_image":true,"friends_count":844,"profile_sidebar_fill_color":"7AC3EE","screen_name":"01ashleymt","id_str":"271588683","profile_image_url":"http://a0.twimg.com/profile_images/3765534455/eee814d484d70b8eb9ca5db08a122cbb_normal.jpeg","listed_count":0,"is_translator":false},"coordinates":null}!
!
More than 140 Characters
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
22
HP Vertica Flex Zone Avoid creating and maintaining time-consuming schemas
on semi-structured data Faster SQL querying
semi-structured data loading Auto-schematization
for JSON and delimited data Flexible parsers
for blazing-fast performance One-step schema
Load, manage, and explore semi-structured data
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
23
Analyzing Billions of Clicks
Challenge online • Millions of website visitors
generate billions of clicks per month
• Must store 5 years worth of data to get full value of year-over-year clickstream analysis
• Legacy database had sluggish performance – queries took 48 hours after each day’s transactions
• Extremely complex website – many pages are generated dynamically creating complex clickstream trails
Major Computer Products Manufacturer
HP Vertica Solution • Queries run in hours or even
minutes; 48x – 100x faster • Industry-standard SQL
accelerated acceptance and proficiency
• Speed of HP Vertica allows iterative and recursive analysis for deeper dives
• Functionality tailored to individual interactions based on nuanced understanding of user behavior at an individual level
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Next steps…
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
More about HP Vertica & Hortonworks http://hortonworks.com/partner/HP/
Don’t miss our next webinar! HP Converged Systems and Hortonworks Planning for the Impacts of Big Data in the Data Center http://info.hortonworks.com/hpconvergedandhortonworks.html
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
End