Upload
mapr-technologies
View
539
Download
0
Tags:
Embed Size (px)
Citation preview
© 2014 MapR Technologies 1© 2014 MapR Technologies
Real Time and Big Data – It’s About TimeTomer ShiranVP Product, MapR
Hadoop Summit 2015
© 2014 MapR Technologies 3
Time to Insight
EventOccurs
Gain Insight
NFS + Drill
Kafka + Camus + Drill HBase/MapR-DB + Drill
Time to Ingest Data Time to Iterate +
© 2014 MapR Technologies 4
Real-time Data Exploration on newly ingested data via NFS
Sources
RELATIONAL
WEB SERVER
APPLICATION SERVER
REAL TIME ANALYTICS
MAPR DISTRIBUTION FOR HADOOP
NFS
drillbit
drillbit
ODBC
Node Nodedrillbi
tdrillbi
tNode Nodedrillbi
tdrillbi
tNode Node
© 2014 MapR Technologies 5
Real-time Data Exploration on newly ingested streams via Kafka and Camus
REAL TIME ANALYTICS
MAPR DISTRIBUTION FOR HADOOP
drillbit
drillbit
ODBC
Node Nodedrillbi
tdrillbi
tNode Nodedrillbi
tdrillbi
tNode Node
Cam
us
Cluster
Cluster
Kafka Cluster
Sources
LOG FILES, CLICKSTREAMSSENSORS
BLOGS, TWEETS,LINK DATA
© 2014 MapR Technologies 6
Real-time Data Exploration on Operational Data stored in HBase/MapR-DB
REAL TIME ANALYTICS
MAPR DISTRIBUTION FOR HADOOPODBC
Node
HBasedrillbi
t
Node
HBasedrillbi
t
Node
HBasedrillbi
t
Node
HBasedrillbi
t
APPLICATION SERVER
© 2014 MapR Technologies 7
Apache Drill Brings Flexibility & PerformanceAccess to any data type, any data source
• Relational• Nested data• Schema-less
Rapid time to insights
• Query data in-situ• No Schemas required• Easy to get started
Integration with existing tools
• ANSI SQL• BI tool integration
Scale in all dimensions
• TB-PB of scale• 1000’s of users• 1000’s of nodes
Granular Security
• Authentication• Row/column level controls• De-centralized
© 2014 MapR Technologies 8
Omni-SQL (“SQL-on-Everything”)
Drill: Omni-SQLWhereas the other engines we're discussing here create a relational database environment on top of Hadoop, Drill instead enables a SQL language interface to data in numerous formats, without requiring a formal schema to be declared. This enables plug-and-play discovery over a huge universe of data without prerequisites and preparation. So while Drill uses SQL, and can connect to Hadoop, calling it SQL-on-Hadoop kind of misses the point. A better name might be SQL-on-Everything, with very low setup requirements.
Andrew Brust,
“”
© 2014 MapR Technologies 9
JSON Model, Columnar Speed
JSONBSONMongo
HBaseNoSQL
ParquetAvro
CSVTSV
Schema-lessFixed
schema
Flat
Complex
Name Gender Age
Michael M 6
Jennifer F 3
{ name: { first: Michael, last: Smith }, hobbies: [ski, soccer], district: Los Altos}{ name: { first: Jennifer, last: Gates }, hobbies: [sing], preschool: CCLC}
RDBMS/SQL-on-Hadoop table
Apache Drill table
© 2014 MapR Technologies 10
Drill Supports Schema Discovery On-The-Fly
• Fixed schema• Leverage schema in centralized
repository (Hive Metastore)
• Fixed schema, evolving schema or schema-less
• Leverage schema in centralized repository or self-describing data
2Schema Discovered On-The-FlySchema Declared In Advance
SCHEMA ON WRITE
SCHEMA BEFORE READ
SCHEMA ON THE FLY
© 2014 MapR Technologies 11
Drill’s Role in the Enterprise Data Architecture
Raw data
• JSON, CSV, ...
“Optimized” data
• Parquet, …
Centrally-structured data
• Schemas in Hive Metastore
Relational data
• Highly-structured data
Hive, Impala, Spark SQL
Oracle, Teradata
Exploration(known and unknown questions)
© 2014 MapR Technologies 12
Data Warehouse Augmentation with DrillAugment existing expensive SQL analytics platform with Hadoop and Drill
• Apache Drill allows interactive analysis on large datasets with MapR as the underlying platform that meets scale, reliability and data protection needs
• SQL users did not have to learn Pig, HiveQL or any other language and continue to use Tableau on top of Drill
OBJECTIVES
CHALLENGES
SOLUTION
• Hadoop and Drill dramatically reduce the price point to about $1,000 / TB • MapR platform with Drill delivers reliability and performance for the end users• Leverage existing BI and SQL skill-sets on Hadoop without retraining
Business Impact
Potential
• Mine purchase data and compare consumer shopping habits• Require internal SQL specialists to gain instant access to data at all times• Currently process tens of TB on Traditional MPP DB
• Want to preserve instant access to data but a lower price point• Need a system that is reliable, does not lose data and is fast• Must be able to leverage the SQL skill sets in the company
Retail Analytics
© 2014 MapR Technologies 14
Real-time processing leading to instant action
MAPR DISTRIBUTION FOR HADOOP
HBase
APPLICATION SERVERS
File system
Batch: Spark, Drill
File systemFile systemFile system
Kafka
HBaseHBaseHBase
Stream Processing
ACTION
ACTION
© 2014 MapR Technologies 15
Stream Processing – Global MSSP
SENSOR DATA
FIREWALL LOGS
INTRUSION PROTECTION
SYSTEM LOGS
Globally Dispersed Datacenters
SECURITY APPLIANCE LOGS
SQL Queries and Reporting
Batch Processing
Graph Processing
New Threat Footprint within 2-5 min
Closed-Loop Operations
Benefits: Unified platform for Analytics Low Operational Costs Faster Response Times Better Algorithms
MapR M7 Distribution for Hadoop1 million events/sec. Over 100 channels
Spark Streaming
for known threats & aggregation
Mahout, MLLib
Drill, Impala
GraphX & Titan
© 2014 MapR Technologies 16
Operations + Analytics = Real-time, Personalized Services
Fraud modelRecommendationstable
MapR Distribution for Hadoop
Fraud investigator
Interactive marketer
Online transactions
Fraud detection
Personalized offers
Clickstream analysis
Fraud investigation tool
Real-time Operational Applications
Analytics
© 2014 MapR Technologies 17
Q & A
@mapr maprtech
Engage with us!
MapR
maprtech
mapr-technologies