Upload
hoangduong
View
213
Download
0
Embed Size (px)
Citation preview
© 2015 MapR Technologies 1 © 2015 MapR Technologies
Ray M Sugiarto – MAPR Champion Indonesia 0815 167 2882
© 2015 MapR Technologies 2
Why Big Data?
University of Texas:
“The median Fortune 1000 company could
increase its revenue by more than $2 billion a year if it
increases data usability by 10 percent.”
Forbes, 12/4/14
© 2015 MapR Technologies 3
Market Outlook
“Industrial Internet Insights for 2015,” from GE (NYSE: GE) and Accenture (NYSE:ACN)
© 2015 MapR Technologies 4
Fastest Adoption of New Enterprise Technology
Hadoop trials,
science projects
in a corner
Large
mission-critical,
operational
deployments
© 2015 MapR Technologies 5
Leading Data Driven Customers
Empowering the
As-it-happens business
by speeding up the
data-to-action cycle
© 2015 MapR Technologies 6 © 2014 MapR Technologies
Customer Case Studies http://www.indonesiabigdata.com/gallery.html
© 2015 MapR Technologies 7
Largest Biometric Database in the World
PEOPLE
1.2B PEOPLE
7 7
© 2015 MapR Technologies 8
2.5M PEAK IMPRESSIONS
per second
100B+ BID REQUESTS
per day
80ms per transaction
3M MEDIA FILES
SCANNED
Leading the
Automation of
Advertising
300 DECISIONS
per transaction
QUERIES Millions
per second
MPBS of data 20K <
MapR Hadoop nodes 330
© 2015 MapR Technologies 9
1.7T RECORDS PROCESSED
per month
20TB NEW DATA
INGESTED per day
720,000 DIGITAL INTERACTIONS
COLLECTED
per second
10X COMPUTATIONAL
SPEED
Analytics for a
Digital World
400
© 2014 MapR Technologies 9
MapR Hadoop nodes
<
© 2015 MapR Technologies 10
900B WORLDWIDE
BILLS
$
DATA STORED
10Years 100M+ CARDS
45s TERASORT
1.6TB MINUTESORT
Offer Serving,
Credit Risk & Fraud
<
Largest deployment
in financial services
2000+
SAVED FOR
CARDHOLDERS
$100M
MapR Hadoop nodes
FIN SERVICES
GOAL:
© 2015 MapR Technologies 11
2000+
+2% CONVERSION RATE
IMPROVEMENT
40TB per NODE
+50 PRODUCTION
APPLICATIONS
200 DATA
SCIENTISTS
Targeted Marketing: In-store Geo-located Offers
Largest deployment
in retail
MapR Hadoop nodes 7PB per CLUSTER
245M per week
CUSTOMERS
© 2014 MapR Technologies
5 RETAILER TOP
W O R L D W I D E
© 2015 MapR Technologies 12
100TB
<
DATA
10T DATA POINTS
2.5M SENSORS
10K OUTCOMES
per location
Corn, weat
growing
Manage and Adapt
to Climate Change
60Yrs CROP-YIELD
statistics
2M LOCATIONS
Natl. Weather Service
Doppler Scans
from
85% OF FARMER
RISK IS
WEATHER
RELATED
© 2015 MapR Technologies 13
Entertaining Millions
© 2015 MapR Technologies 14
Machine Zone Topology
Gaming servers +
MySQL Analytics with Hadoop
Gamers
data
copying
Gaming servers +
MapR-DB
Analytics with
Enterprise Database
Edition
table
mirroring
Before
After
Gamers
Remote data center
Remote data center Local data center
Local data center
© 2015 MapR Technologies 15
Lessons to Apply
© 2015 MapR Technologies 16
Apps dictated
the data format
Data freely supports
varied compute engines
Data Silos Can’t have silos
ETL Ingest & go
© 2015 MapR Technologies 17
Lesson #1
Real-Time Requires
Big AND Fast
(in one cluster)
© 2015 MapR Technologies 18
Billion
200 Billion
400 Billion
600 Billion
800 Billion
1.000 Billion
1.200 Billion
1.400 Billion
1.600 Billion
1.800 Billion
2.000 Billion
Ju
l 2
00
9
Ag
ust 2
00
9
Se
p 2
00
9
Okt 2
00
9
No
p 2
00
9
De
s 2
00
9
Ja
n 2
01
0
Fe
b 2
01
0
Ma
r 2
01
0
Ap
r 2
01
0
Me
i 2
01
0
Ju
n 2
01
0
Ju
l 2
01
0
Ag
ust 2
01
0
Se
p 2
01
0
Okt 2
01
0
No
p 2
01
0
De
s 2
01
0
Ja
n 2
01
1
Fe
b 2
01
1
Ma
r 2
01
1
Ap
r 2
01
1
Me
i 2
01
1
Ju
n 2
01
1
Ju
l 2
01
1
Ag
ust 2
01
1
Se
p 2
01
1
Okt 2
01
1
No
p 2
01
1
De
s 2
01
1
Ja
n 2
01
2
Fe
b 2
01
2
Ma
r 2
01
2
Ap
r 2
01
2
Me
i 2
01
2
Ju
n 2
01
2
Ju
l 2
01
2
Ag
ust 2
01
2
Se
p 2
01
2
Okt 2
01
2
No
p 2
01
2
De
s 2
01
2
Ja
n 2
01
3
Fe
b 2
01
3
Ma
r 2
01
3
Ap
r 2
01
3
Me
i 2
01
3
Ju
n 2
01
3
Ju
l 2
01
3
Ag
ust 2
01
3
Se
p 2
01
3
Okt 2
01
3
No
p 2
01
3
De
s 2
01
3
Ja
n 2
01
4
Fe
b 2
01
4
Ma
r 2
01
4
Ap
r 2
01
4
Me
i 2
01
4
Ju
n 2
01
4
Ju
l 2
01
4
Ag
ust 2
01
4
Se
p 2
01
4
# o
f re
co
rds
Beacon Records
Panel Records Total records collected in September 2014 = 1,758,229,317,769
Total records collected YTD 2014 = 15,423,470,100,013
comScore Big Data Expansion
© 2015 MapR Technologies 19
Big Data Drivers: Cost Efficiencies
• Gartner, "Forecast Analysis: Enterprise IT Spending by Vertical Industry Market, Worldwide, 2010-2016, 3Q12 Update.“
• Wall Street Journal, “Financial Services Companies Firms See Results from Big Data Push”, Jan. 27, 2014
$9,000
$40,000
<$1,000
2013 ENTERPRISE
STORAGE 2014 2015 2016 2017
DATABASE WAREHOUSE
IT Budget vs Data Growth Storage Cost terabyte
© 2015 MapR Technologies 20
More
DATA The Unreasonable Effectiveness of Data,
published by Google
beats complex
algorithms
© 2015 MapR Technologies 21
More Data Allows You to Spot Infrequent Behaviors
f
Time (t=years)
t t+1 t+2 t+n
With recent data you have
limited historical data
accumulated
© 2014 MapR Technologies 21
© 2015 MapR Technologies 22
Time (t=years)
f
t t+1 t+2
With big data, you can trace
infrequent patterns through time
that call out anomalies.
t+n
More Data Allows You to Spot Infrequent Behaviors
© 2014 MapR Technologies 22
© 2015 MapR Technologies 23
Fast Data: Performance with Low Latency
© 2015 MapR Technologies 24
Mobile
application server
Web
application server
Operational Analytics
Hadoop
Data exploration
(SQL)
Operational DBMS
(NoSQL or RDBMS) Batch import/export
Customer 360
dashboard Churn analysis
(predictive analytics)
Typical Hadoop/DBMS Integration
© 2015 MapR Technologies 25
Mobile
application server
Data exploration
(SQL)
Customer 360
dashboard Churn analysis
(predictive analytics)
A Better Hadoop/DBMS Integration
MapR-DB
• User profiles and state
• User interactions
• Real-time location data
• Web and mobile session state
• Comments/rankings
Product/service
optimization and
personalization
Real-time ad
targeting
Operational Real-Time and
Actionable
Analytics
Web
application server
MapR Distribution
with MapR-DB
© 2015 MapR Technologies 26
Alternatives Do Big OR Fast, But Not Both
Big Limited reliability
Limited functionality
Batch (mostly)
Rewrite existing apps
Fast
Expensive
Special purpose
Coding required for speed
Limited scale
Limited functionality
OR
© 2015 MapR Technologies 27
For Real-time It Has To Be Both Big And Fast
Unlimited Scale
Data diversity
Evolving schemas
Affordable
Secure
Reliable
Manageable
Fast ingestion
Batch & streaming
Polyglot
RDBMS & NoSQL
General purpose
Affordable
Big Fast AND
© 2015 MapR Technologies 28
Lesson #2
Data Agility is a Must
Managing it
Processing it
Exploring it
Using it
© 2015 MapR Technologies 29
Business
(analysts, developers)
“Plumbing”
development and
data structuring
Traditional
Approaches
Business
(analysts, developers) Data Agility
Data
Data
Infrastructure Issues
Data Agility
© 2015 MapR Technologies 30
Self-Service Data Exploration
Data Agility with Less IT Required
Single SQL Interface for Structured
and Semi-Structured Data
© 2015 MapR Technologies 31
Apache Drill Brings Flexibility & Performance Access to any data type, any data source
• Relational
• Nested data
• Schema-less
Rapid time to insights
• Query data in-situ
• No Schemas required
• Easy to get started
Integration with existing tools
• ANSI SQL
• BI tool integration
Scale in all dimensions
• TB-PB of scale
• 1000’s of users
• 1000’s of nodes
Granular security
• Authentication
• Row/column level controls
• De-centralized
© 2015 MapR Technologies 32
Drill’s Role in the Enterprise Data Architecture
Raw data
• JSON, CSV, ...
“Optimized” data
• Parquet, …
Centrally-structured data
• Schemas in Hive Metastore
Relational data
• Highly-structured data
Hive, Impala, Spark SQL
Oracle, Teradata
Exploration
(known and unknown questions)
© 2015 MapR Technologies 33
Advice for your Journey
No matter what use case you start with…
you will require real-time & enterprise-grade
“As it happens” is just as much about business
process reinvention as technology
© 2015 MapR Technologies 34 © 2015 MapR Technologies
© 2015 MapR Technologies 35 © 2014 MapR Technologies
Enterprise Data Optimization
© 2015 MapR Technologies 36
Data Warehouse Optimization
Data Transformation/ETL on Hadoop
Offloading “cold” data to Hadoop
Restores
Storage capacity
One-time offload capitalizes on
historic underused data
Minimal impact to existing
data pipelines
Present new data
for exploration
ETL work includes
incremental updates Restores CPU capacity
and storage to DW
© 2015 MapR Technologies 37
Offload Cold Data to Hadoop
Structured
Data ETL Incoming
Data
Data Warehouse
MapR Data Platform
• Data Query-able
• Inexpensive Bulk
• Restores DW
Process: – One-time Migration
– Standard Apache Tools
Data Access: – ODBC
– Thrift, REST
– Standard Connectors
Cold Data
Offload
Log Archive
© 2015 MapR Technologies 38
ETL in Hadoop
Low Latency Data
ETL Incoming
Data
Data Warehouse
MapR
Bulk Data
Restored
CPU and
Disk
“I want to get all this off so I can use it again”
• Restores even more
CPU and Disk
• Improves old DW Response and Speed
• Most Cost Efficient CPU and Storage
© 2015 MapR Technologies 39 © 2014 MapR Technologies
Recommendation Engine
© 2015 MapR Technologies 40
The Scenario
• Users constantly interact with items
– LOTS of users; LOTS of reusable items
– Relationships among items persist in the short
term, while user behavior can change instantly
– The more types of interactions available, the
better
– Interactions / preferences need not be explicit
ratings
• Recommendations are needed in real-time
your
products
here
© 2015 MapR Technologies 41
Building the Use Case
U S E R S I T E M S preferences
V A L U E
customers products
purchases, views, etc.
sell more products
visitors articles / content
increase subscriptions, ads, etc.
scrolls, clicks, etc.
patients diseases
predict future illness
family history, diagnosis
cardholders anything
purchases
detect anomalous purchases
© 2015 MapR Technologies 42
Recommendation Engine Workflow
User
Histories
Machine
Learning
Index Item
Meta-Data
Data Ingest
MapR Cluster
Ingest Post-
Process
Update
Documents Pre-
Process
Web
Tier Recommendations (based on the
indicators of affiliated items)
New User History (past affiliations)
O F F – L I N E
O N – L I N E
SQL
© 2015 MapR Technologies 43 © 2014 MapR Technologies
Fraud & Anomaly Detection
© 2015 MapR Technologies 44
Fraud & Anomaly Detection
Fraud detection Personalized
offers
Fraud
investigation
tool
Fraud investigator
Fraud model Recommendations
Clickstream
analysis
Online
transactions
MapR Hadoop Distribution
Analytics
Real-time Operational Applications
Interactive marketer
© 2015 MapR Technologies 45
Fraud & Security Analytics Architecture
MAPR DISTRIBUTION FOR
HADOOP
Sqoop, NFS Drill Hive, Pig
Ingest, ETL, Batch
Processing
Operational Interactive
SQL
MapR-FS
Realtime
Processing
MapR Data Platform
NFS MapR-FS MapR-DB HBase API
Spark Streaming,
Mahout, MLLib,
SparkSQL
Data Sources Analytics
Search
Schema-less
data exploration
Compliance reporting
Ad-hoc integrated
analytics
Operational Apps
Anomaly/Threat
Detection
Fraud Detection SIEM/Splunk
IDS/IPS Logs
Server Logs
Firewall/Proxy
Logs
Database logs
Application
Logs
Privileged user
activity logs
Identity access
Logs
Resource
Access Logs
Miscellaneous
Logs
Security Feeds
© 2015 MapR Technologies 46
Techniques
Anti-Money
Laundering
System
Consumer
Transactions
Data Lake
(Hadoop)
Suspicious
Events
• Latent Dirichlet Allocation
• Trained Classifier
• T-digest Percentiles
• Clustering / Peer Group Analysis
• Markov Models
Analyst
© 2015 MapR Technologies 47
Visualization of Techniques
Clustering
New point
Find closest cluster accumulate
By Host
threshold
Find closest sample
Reservoir
Disappearing periodic
event due potentially to malware
alert
alert
© 2015 MapR Technologies 48
What Does Big Data Mean to You? Customer stats on how MapR customers are using it
63%
Cost Reduction =
40%
Revenue
21%
Risk Reduction
© 2015 MapR Technologies 49
Customer stats on business impact (ROI) of MapR
What Does Big Data Mean to You?
65%
40%
10%
5%
More than 5x More than 10x More than 20x More than 25x
© 2015 MapR Technologies 50
Top-Ranked NoSQL
Top-Ranked Hadoop Distribution
Top-Ranked SQL-on-Hadoop Solution
© 2015 MapR Technologies 51
Free Hadoop On-Demand Training
$50M In-Kind Contribution to the Hadoop Community
www.mapr.com/training
http://www.indonesiabigdata.com/gallery.html
© 2015 MapR Technologies 52
Q & A
@bensadeghi, @mapr maprtech
MapR
maprtech
mapr-technologies
Ray Sugiarto MAPR Champion Indonesia
0815167 2882