Upload
mongodb
View
113
Download
3
Embed Size (px)
DESCRIPTION
Join us for a webinar on how MongoDB and Hadoop can work together to solve Big Data problems in today's enterprises. We will take an in depth look at how the two technologies make real business intelligence accessible to end users. After a brief introduction to both technologies, this webinar will dive deep into the MongoDB+Hadoop Connector and how it is applied to enable new business insights. In this webinar you will learn: What information problems are a good fit for MongoDB and Hadoop How to integrate the two technologies using the MongoDB+Hadoop Connector Programming paradigms for tackling common problems
Citation preview
MongoDB & Hadoop:Providing Business Insights
Thomas BoydSenior Solutions Architect, MongoDB
2
What is MongoDB?
The leading NoSQL database
Document Database
Open-Source
General Purpose
3
RDBMS
MongoDB Document Model
MongoDB
{
_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
plan : "PPO Plus" },
{ type : "Dental",
plan : "Standard" }
]
}
4
What is Hadoop?
“The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.”*
*source: hadoop.apache.org
• Large datasets• Analytics• Batch• Map-Reduce
5
Enterprise IT Stack
EDWHadoop
Man
agem
ent
& M
on
ito
rin
gS
ecurity &
Au
ditin
g
RDBMS
CRM, ERP, Collaboration, Mobile, BI
OS & Virtualization, Compute, Storage, Network
RDBMS
Applications
Infrastructure
Data Management
Online Data Offline Data
6
Consideration: Online vs. Offline
• Long-running• High-Latency• Availability is lower
priority
• Real-time• Low-latency• High availability
Online Offlinevs.
7
Consideration: Online vs. Offline
Online Offlinevs.
8
Hadoop is good for…
Risk Modeling Churn AnalysisRecommendation
Engine
Ad TargetingTransaction
AnalysisTrade
Surveillance
Network Failure Prediction
Search Quality Data Lake
9
MongoDB is good for…
360 Degree View of the Customer
Mobile & Social Apps
Fraud Detection
User Data Management
Content Management &
DeliveryReference Data
Product CatalogsMachine to
Machine AppsData Hub
10
MongoDB and Hadoop: Complementary
• “Data Lake”• In-depth analytics
• Real-time systems• Light-weight analytical
workloads
11
Use MongoDB+Hadoop Together
E-Commerce
• Products & Inventory• Real-time
recommendations• Customer profile• Session management• Customer clickstream• Fraud detection
• Transaction history• Clickstream history• Recommendation
model• Fraud modeling
Analysis
MongoDB Connector for
Hadoop
12
Example – Fraud Detection
Payments
• Fraud modeling
Nightly Analysis
MongoDB Connector for
Hadoop
Results Cache
• Online payments processing
3rd Party Data Sources
Fraud Detection
queryonly
query only
13
Customer example – Global Travel Firm
Travel
• Flights, hotels and cars
• Real-time offers• User profiles,
reviews• User metadata
(previous purchases, clicks, views)
• User segmentation• Offer recommendation
engine• Ad serving engine• Bundling engine
Algorithms
MongoDB Connector for
Hadoop
14
Customer example – MetLife
Insurance
• Insurance policies• Demographic data• Customer web data• Call center data• Real-time churn
detection
• Customer action analysis
• Churn prediction algorithms
Churn Analysis
MongoDB Connector for
Hadoop
15
Customer example – Criteo
Ad-Serving
• Catalogs and products
• User profiles• Clicks• Views• Transactions
• User segmentation• Recommendation
engine• Prediction engine
Algorithms
MongoDB Connector for
Hadoop
16
• Java Map-Reduce, Stream Map-Reduce, Pig, & Hive access to MongoDB– MongoDB as input
• mongo.job.input.format=com.hadoop.MongoInputFormat• mongo.input.uri=mongodb://my-db:27017/db1.collection1
– MongoDB as output• mongo.job.output.format=com.hadoop.MongoOutputFormat• mongo.input.uri=mongodb://my-db:27017/db1.collection2
– Using MongoDB backup files• mongo.job.output.format=com.hadoop.BSONFileOutputFormat• mapred.output.dir=file:///results.bson
– Xxx
What is MongoDB-Hadoop Connector?
17
• Version 1.1.0, July 2013
– Pig support
– Hive support
– Streaming support
– Read/Write MongoDB backups
– Update writes
– Much more….
Enhancing MongoDB-Hadoop Connector
• Version 1.2.0, December 2013
– Apache Hadoop 2.2 support
– Multiple collections as M-R
source
– Multiple mongos support
– Custom splitting support
– Performance improvements
18
• Rich query language
• Native secondary indexes
• Geospatial indexes & search
• Text indexes & search
• Aggregation framework
• Javascript Map-Reduce
• Client-side analytics
MongoDB Native Analytics
19
Resources
White paper: Big Data: Examples and Guidelines for the Enterprise Decision Maker
http://www.mongodb.com/lp/whitepaper/big-data-nosql
Recorded Webinar Series: Thrive with Big Data
http://www.mongodb.com/lp/big-data-series
Recorded Webinar: What’s New with MongoDB Hadoop Integration
http://www.mongodb.com/presentations/webinar-whats-new-mongodb-hadoop-integration Documentation: MongoDB Connector for
Hadoophttp://docs.mongodb.org/ecosystem/tools/hadoop/
Trouble Tickets http://jira.mongodb.org (project = Hadoop Integration)
Subscriptions, support, consulting, training https://www.mongodb.com/products/how-to-buy
Resource Location