Upload
mapr-technologies
View
112
Download
0
Tags:
Embed Size (px)
Citation preview
© 2015 MapR Technologies 2
• The most common use cases for Hadoop
• The top considerations before "going live" with Hadoop
• Product Demo – multiple workloads in the Data Lake
Topics
© 2015 MapR Technologies 3
State of Big Data Adoption
Source: Gartner. Sept. 2014. Survey Analysis: Big Data Investment Grows but Deployments Remain Scarce in 2014
© 2015 MapR Technologies 5
Speeding The Journey To Value
Operational
Batch
Create Data Capital
Big data novice Mature
Empower BI users
Operational
Applications
Mine
Logs
Recommendation
Engine Data
Hub
Ad
Targeting 360
View
Anomaly
detection
Fraud
prevention Get fast value
© 2015 MapR Technologies 6
The As-it-happens Business
Speeding The Journey To Value
Operational
Batch
Create Data Capital
Big data novice Mature
Empower BI users
Operational
Applications
Mine
Logs
Recommendation
Engine Data
Hub
Ad
Targeting 360
View
Anomaly
detection
Fraud
prevention Get fast value
© 2015 MapR Technologies 7
ENTERPRISE
DATA HUB
MARKETING
OPTIMIZATION
RISK & SECURITY
OPTIMIZATION
OPERATIONAL
INTELLIGENCE
• Multi-structured
data staging & archive
• ETL / DW optimization
• Mainframe
optimization
• Data exploration
• Recommendation
engines & targeting
• Customer 360
• Click-stream analysis
• Social media analysis
• Ad optimization
• Network security
monitoring
• Security information &
event management
• Fraudulent behavioral
analysis
• Supply chain & logistics
• System log analysis
• Manufacturing quality
assurance
• Preventative
maintenance
• Smart meter analysis
Common Use Cases: Taking Advantage of Hadoop
© 2015 MapR Technologies 8
Hadoop Use Cases by Industry HEALTHCARE & LIFE SCIENCES
GOVERNMENT ADVERTISING, MEDIA & ENTERTAINMENT
• Improved ad targeting, analysis,
forecasting and optimization
• Personalized recommendations
• Superior analytics capability
• Enhanced game player engagement
FINANCIAL SERVICES
• Fraud Detection
• Customer Segmentation Analysis
• Customer Sentiment Analysis
• Risk Aggregation
• Counterparty Risk Analytics
• New Products and Services for
Consumer Card Holders
• Credit Risk Assessment
• 360-Degree Customer Service
• Cybersecurity, Intelligence
• Crime Prediction and Prevention
• Defense, National Security
• Pharmaceutical Drug Evaluation
• Scientific Research
• Weather Forecasting
• Fraud Detection
• Emergency Communications/Response
• Traffic Optimization
TELECOM MANUFACTURING OIL & GAS RETAIL
• Personalized Treatment Planning
• Assisted Diagnosis
• Fraud Detection
• Monitor Patient Vital Signs
• Assembly Line Quality Assurance
• Preventive Maintenance
• Supply Chain and Logistics
• Monitoring Product Quality through
Telemetry Data
• Real-time Parts Flow Monitoring
• Product Configuration Planning
• Market Pricing and Planning
• Oil Exploration and Discovery
• New oil prospect identification
• Seismic trace identification
• Oil Production
• Equipment Maintenance
• Reservoir Engineering
• Safety and Environment
• Security
• Up-Sell/Cross-Sell Recommendations
• Social Media Analysis
• Dynamic Pricing Across Multiple
Channels
• Fraud Detection
• Clickstream Analysis
• Loyalty Program Benefits
• 360° Customer View
• Operational Intelligence
• Customer Churn Analysis
• Fraud Detection
• Clickstream Analysis
• Recommendations
• Product Development
• Network Management/Optimization
© 2015 MapR Technologies 9
900B WORLDWIDE
BILLS
$
DATA STORED
10Years 100M+ CARDS
45s TERASORT
1.65TB MINUTESORT
Offer Serving,
Credit Risk & Fraud
<
Largest deployment
in financial services
1700+
SAVED FOR
CARDHOLDERS
$100M
MapR Hadoop nodes
FIN SERVICES
GOAL:
© 2015 MapR Technologies 10
Operations + Analytics = Real-time, Personalized Services
Fraud model Recommendations
table
MapR Distribution including Hadoop
Fraud
investigator
Interactive
marketer
Online
transactions
Fraud
detection
Personalized
offers
Clickstream
analysis
Fraud
investigation tool
Real-time Operational Applications
Analytics
Customer
Support
© 2015 MapR Technologies 11
Hadoop + Data Warehouse Architecture Improve data services to customers without increasing enterprise architecture costs
• Provide cloud, security, managed services, data center, & comms
• Report on customer usage, profiles, billing, and sales metrics
• Improve service: Measure service quality and repair metrics
• Reduce customer churn – identify and address IP network hotspots
• Cost of ETL & DW storage for growing IP and clickstream data; >3 months
• Reliability & cost of Hadoop alternatives limited ETL & storage offload
• MapR for data staging, ETL, and storage at 1/10th the cost
• MapR provided smallest datacenter footprint with best DR solution
• Enterprise-grade: NFS file management, consistent snapshots & mirroring
• Data warehouse for mission-critical reporting and analysis
OBJECTIVES
CHALLENGES
SOLUTION
Hadoop + Data Warehouse = New, Deeper Insights for the Business
• Increased scale to handle network IP and clickstream data
• Freed up processing on DW to maintain reporting SLA’s to business
• Unlocked new insights into network usage and customer preferences
Business Impact
FORTUNE 500
TELCO
© 2015 MapR Technologies 12
MapR Optimized Data Architecture
Sources
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
LOG FILES,
CLICKSTREAMS
SENSORS
BLOGS,
TWEETS,
LINK DATA
DATA WAREHOUSE
Data Movement
Data Access
Analytics
Search
Schema-less
data exploration
BI, reporting
Ad-hoc integrated
analytics
Data Transformation, Enrichment
and Integration
MAPR DISTRIBUTION FOR HADOOP
Streaming (Spark Streaming,
Storm)
NoSQL ODBMS
(HBase, Accumulo, …)
MapR Data Platform
MapR-DB
MAPR DISTRIBUTION FOR HADOOP
Batch/Search (MR, Spark, Hive, Pig)
MapR-FS
Operational Apps
Recommendations
Fraud Detection
Logistics
Optimized Data Architecture Machine Learning
Interactive (Impala, Drill)
© 2015 MapR Technologies 13
Bullet-proof data vault that meets SEC and FINRA requirements
46x cost savings over legacy system
Efficiency of MapR cluster that can store the Elasticsearch index for real-time search
Security Log Analysis & Enterprise Data Vault F100 bank accelerates log analytics to meet investigation and compliance mandates
• Meet compliance requirements to minimize lawsuits and fines
• Complete IT audits more quickly
• Prior system (flat files on Unix) was difficult to maintain for operations team
• HA and data protection issues in HDFS put critical data at risk
• File volume (300K files/day) was straining system
• Seamless Hadoop file movement & management: MapR NFS
• MapReduce enables archival of data for historical search and analysis
• Data is indexed into Elasticsearch from MapR for real-time search
• Customizable user interface and dashboard: Kibana (ELK stack)
OBJECTIVES
CHALLENGES
SOLUTION
Business Impact
LARGE FINANCIAL SERVICES INSTITUTION
© 2015 MapR Technologies 15
Key Questions for
Big Data Planning
Source: Gartner. Jan 2015. Answering Big Data's 10 Biggest Planning and Implementation Questions
© 2015 MapR Technologies 16
Big Data is Overwhelming Traditional Systems
• Mission-critical reliability
• Transaction guarantees
• Deep security
• Real-time performance
• Backup and recovery
• Interactive SQL
• Rich analytics
• Workload management
• Data governance
• Backup and recovery
Enterprise Data
Architecture
TREND
ENTERPRISE USERS
OPERATIONAL SYSTEMS
ANALYTICAL SYSTEMS
PRODUCTION REQUIREMENTS
PRODUCTION REQUIREMENTS
OUTSIDE SOURCES
© 2015 MapR Technologies 17
OPERATIONAL SYSTEMS
ANALYTICAL SYSTEMS
ENTERPRISE USERS
REALITY
• Data staging
• Archive
• Data transformation
• Data exploration
• Streaming,
interactions
Hadoop Relieves the Pressure from Enterprise Systems
2 Interoperability
1 Business continuity
4 Multi-tenacy
3 High performance
Keys for Production Success
© 2015 MapR Technologies 18
Key Reasons for Selecting the MapR Distribution including Hadoop Respondents who have had prior experience with another Hadoop distribution*
* Apache Hadoop, Cloudera or Hortonworks
© 2015 MapR Technologies 19
Business Continuity
High Availability
Data Protection
Disaster Recovery
What are your requirements?
What do you have for your enterprise storage,
databases and data warehouses?
© 2015 MapR Technologies 20
Seamless Integration with Direct Access NFS
• POSIX compliant – Random reads/writes
– Simultaneous reading and writing to a file
– Compression is automatic and transparent
• Industry-standard NFS interface (in addition to HDFS API)
– Stream data into the cluster
– Leverage thousands of tools and applications
– Easier to use non-Java programming languages
– No need for most proprietary Hadoop connectors
• Compression/parallel access/security from edge nodes to MapR cluster
© 2015 MapR Technologies 21
Narrow Foundations – Big and Fast are Separate
HDFS
Map/
Reduce HBase
Spark /
Storm Hive
RDBMS NAS
Sequential File
Processing OLAP
Data
Mining
WEB SERVICES
Big Data is
heavy and
expensive
to move
© 2015 MapR Technologies 22
Unify Big & Fast on One Platform
HDFS
Map
Reduce HBase
Spark /
Storm Hive
RDBMS NAS
Sequential File
Processing OLAP
Data
Mining
WEB SERVICES
NEXT GENERATION DISTRIBUTION HADOOP API’S NFS
© 2015 MapR Technologies 24
MapR: Best Solution for Customer Success
Premier
Investors High Growth
2X Growth In Direct Customers
90% Subscription Licenses
Software Margins
140% Dollar-based Net Expansion
700+ Customers
2X Growth In Annual
Subscriptions ( ACV)
Best Product
Apache Open Source
© 2015 MapR Technologies 25
The Power of the Open Source Community
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Spark Streaming
Storm
Streaming NoSQL & Search
Juju
Provisioning &
Coordination
Sahara
ML, Graph
Mahout
MLLib
GraphX
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow & Data
Governance
Pig
Cascading
Spark
Batch
MapReduce v1 & v2
Tez
HBase
Solr
Hive
Impala
Spark SQL
Drill
SQL
Sentry Oozie ZooKeeper Sqoop
Flume
Data Integration & Access
HttpFS
Hue
Data Platform MapR-FS MapR-DB
Manag
em
ent
© 2015 MapR Technologies 26
The MapR Distribution including Apache Hadoop
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Spark Streaming
Storm
Streaming NoSQL & Search
Juju
Provisioning &
Coordination
Sahara
ML, Graph
Mahout
MLLib
GraphX
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow & Data
Governance
Pig
Cascading
Spark
Batch
MapReduce v1 & v2
Tez
HBase
Solr
Hive
Impala
Spark SQL
Drill
SQL
Sentry Oozie ZooKeeper Sqoop
Flume
Data Integration & Access
HttpFS
Hue
Data Platform MapR-FS MapR-DB
Manag
em
ent
Data Hub Enterprise Grade Operational
© 2015 MapR Technologies 27
MapR Distribution including Hadoop
Theme Requirements Features Product
Enterprise Grade
• Uptime service levels
• Site to site DR
• Backup/recovery
• Security
• High velocity data ingress
• HW/SW HA
• Mirroring
• Snapshots
• Authorization, Kerberos
• 2X-5X performance
MapR
Enterprise Edition
Data Hub
• Hadoop
• Traditional applications
• Data of record
• Batch and interactive
• HDFS
• POSIX
• Strong consistency
• MapReduce and SQL
MapR
Enterprise Edition
Operational
• Real time
• NoSQL
• Operational analytics
• HBase
• Update in place
• Concurrent read/write
MapR
Enterprise Database Edition
MapR Patent Pending – “Table Format for Map Reduce”
“Map Reduce Ready Distributed File System”
Enterprise Grade
Operational
Data Hub
© 2015 MapR Technologies 29
Apache Hadoop NameNode High Availability
NameNode
A B C D E F
HDFS-based Distributions
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
Primary NameNode
A B C D E F
Standby NameNode
A B C D E F
NameNode
A B
NameNode
C D
NameNode
E F
NameNode
A B
NameNode
C D
NameNode
E F
HDFS HA HDFS
Federation
Single point of failure
Limited to 50-200 million files
Performance bottleneck
Metadata must fit in memory
Only one active NameNode
Limited to 50-200 million files
Performance bottleneck
Metadata must fit in memory
Double the block reports
Multiple single points
of failure w/o HA
Needs 20 NameNodes
for 1 Billion files
Performance bottleneck
Metadata must fit in memory
Double the block reports
© 2015 MapR Technologies 30
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
No-NameNode Architecture
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
NameNode
A B C D E F A A A B B B B C C C D D D E E E F F F
Up to 1T files (> 5000x advantage)
Significantly less hardware & OpEx
Higher performance
No special config to enable HA
Automatic failover & re-replication
Metadata is persisted to disk
© 2015 MapR Technologies 33
MapR: Fast and Dependable with Lowest TCO
Cost comparison for a 500 TB cluster vs HDFS-based distro’s
TCO: mapr.com/tco
© 2015 MapR Technologies 35
Committed to our Customers’ Success
Educational Services Professional Services Customer Support
Core
Hadoop
Services
Data
Engineering
Advanced
Analytics
M7/HBase
Practice
Hadoop engineering experts provide
24x7x365
global coverage
Instructor-led courses &
Free On-Demand training for Hadoop cluster
administration, HBase &
MapReduce programming
and more
Data
Engineering
Data
Science
© 2015 MapR Technologies 37
Key MapR Advantage Partners Business Services
INFRASTRUCTURE
& CLOUD
ANALYTICS &
BUSINESS INTELLIGENCE
APPLICATIONS
& OS
CONSULTANTS
& INTEGRATORS
DATA WAREHOUSE
& INTEGRATION
© 2015 MapR Technologies 38
Q & A
@mapr maprtech
Engage with us!
MapR
maprtech
mapr-technologies
GET STARTED NOW! mapr.com/sandbox