17
© 2014 MapR Technologies 1 © 2014 MapR Technologies Real Time and Big Data – It’s About Time Tomer Shiran VP Product, MapR Hadoop Summit 2015

Real Time and Big Data – It’s About Time

Embed Size (px)

Citation preview

© 2014 MapR Technologies 1© 2014 MapR Technologies

Real Time and Big Data – It’s About TimeTomer ShiranVP Product, MapR

Hadoop Summit 2015

© 2014 MapR Technologies 2

What is Real Time

EventOccurs

Gain Insight

TakeAction

Time Elapsed

© 2014 MapR Technologies 3

Time to Insight

EventOccurs

Gain Insight

NFS + Drill

Kafka + Camus + Drill HBase/MapR-DB + Drill

Time to Ingest Data Time to Iterate +

© 2014 MapR Technologies 4

Real-time Data Exploration on newly ingested data via NFS

Sources

RELATIONAL

WEB SERVER

APPLICATION SERVER

REAL TIME ANALYTICS

MAPR DISTRIBUTION FOR HADOOP

NFS

drillbit

drillbit

ODBC

Node Nodedrillbi

tdrillbi

tNode Nodedrillbi

tdrillbi

tNode Node

© 2014 MapR Technologies 5

Real-time Data Exploration on newly ingested streams via Kafka and Camus

REAL TIME ANALYTICS

MAPR DISTRIBUTION FOR HADOOP

drillbit

drillbit

ODBC

Node Nodedrillbi

tdrillbi

tNode Nodedrillbi

tdrillbi

tNode Node

Cam

us

Cluster

Cluster

Kafka Cluster

Sources

LOG FILES, CLICKSTREAMSSENSORS

BLOGS, TWEETS,LINK DATA

© 2014 MapR Technologies 6

Real-time Data Exploration on Operational Data stored in HBase/MapR-DB

REAL TIME ANALYTICS

MAPR DISTRIBUTION FOR HADOOPODBC

Node

HBasedrillbi

t

Node

HBasedrillbi

t

Node

HBasedrillbi

t

Node

HBasedrillbi

t

APPLICATION SERVER

© 2014 MapR Technologies 7

Apache Drill Brings Flexibility & PerformanceAccess to any data type, any data source

• Relational• Nested data• Schema-less

Rapid time to insights

• Query data in-situ• No Schemas required• Easy to get started

Integration with existing tools

• ANSI SQL• BI tool integration

Scale in all dimensions

• TB-PB of scale• 1000’s of users• 1000’s of nodes

Granular Security

• Authentication• Row/column level controls• De-centralized

© 2014 MapR Technologies 8

Omni-SQL (“SQL-on-Everything”)

Drill: Omni-SQLWhereas the other engines we're discussing here create a relational database environment on top of Hadoop, Drill instead enables a SQL language interface to data in numerous formats, without requiring a formal schema to be declared. This enables plug-and-play discovery over a huge universe of data without prerequisites and preparation. So while Drill uses SQL, and can connect to Hadoop, calling it SQL-on-Hadoop kind of misses the point. A better name might be SQL-on-Everything, with very low setup requirements.

Andrew Brust,

“”

© 2014 MapR Technologies 9

JSON Model, Columnar Speed

JSONBSONMongo

HBaseNoSQL

ParquetAvro

CSVTSV

Schema-lessFixed

schema

Flat

Complex

Name Gender Age

Michael M 6

Jennifer F 3

{ name: { first: Michael, last: Smith }, hobbies: [ski, soccer], district: Los Altos}{ name: { first: Jennifer, last: Gates }, hobbies: [sing], preschool: CCLC}

RDBMS/SQL-on-Hadoop table

Apache Drill table

© 2014 MapR Technologies 10

Drill Supports Schema Discovery On-The-Fly

• Fixed schema• Leverage schema in centralized

repository (Hive Metastore)

• Fixed schema, evolving schema or schema-less

• Leverage schema in centralized repository or self-describing data

2Schema Discovered On-The-FlySchema Declared In Advance

SCHEMA ON WRITE

SCHEMA BEFORE READ

SCHEMA ON THE FLY

© 2014 MapR Technologies 11

Drill’s Role in the Enterprise Data Architecture

Raw data

• JSON, CSV, ...

“Optimized” data

• Parquet, …

Centrally-structured data

• Schemas in Hive Metastore

Relational data

• Highly-structured data

Hive, Impala, Spark SQL

Oracle, Teradata

Exploration(known and unknown questions)

© 2014 MapR Technologies 12

Data Warehouse Augmentation with DrillAugment existing expensive SQL analytics platform with Hadoop and Drill

• Apache Drill allows interactive analysis on large datasets with MapR as the underlying platform that meets scale, reliability and data protection needs

• SQL users did not have to learn Pig, HiveQL or any other language and continue to use Tableau on top of Drill

OBJECTIVES

CHALLENGES

SOLUTION

• Hadoop and Drill dramatically reduce the price point to about $1,000 / TB • MapR platform with Drill delivers reliability and performance for the end users• Leverage existing BI and SQL skill-sets on Hadoop without retraining

Business Impact

Potential

• Mine purchase data and compare consumer shopping habits• Require internal SQL specialists to gain instant access to data at all times• Currently process tens of TB on Traditional MPP DB

• Want to preserve instant access to data but a lower price point• Need a system that is reliable, does not lose data and is fast• Must be able to leverage the SQL skill sets in the company

Retail Analytics

© 2014 MapR Technologies 13

Real-time Action

EventOccurs

TakeAction

© 2014 MapR Technologies 14

Real-time processing leading to instant action

MAPR DISTRIBUTION FOR HADOOP

HBase

APPLICATION SERVERS

File system

Batch: Spark, Drill

File systemFile systemFile system

Kafka

HBaseHBaseHBase

Stream Processing

ACTION

ACTION

© 2014 MapR Technologies 15

Stream Processing – Global MSSP

SENSOR DATA

FIREWALL LOGS

INTRUSION PROTECTION

SYSTEM LOGS

Globally Dispersed Datacenters

SECURITY APPLIANCE LOGS

SQL Queries and Reporting

Batch Processing

Graph Processing

New Threat Footprint within 2-5 min

Closed-Loop Operations

Benefits: Unified platform for Analytics Low Operational Costs Faster Response Times Better Algorithms

MapR M7 Distribution for Hadoop1 million events/sec. Over 100 channels

Spark Streaming

for known threats & aggregation

Mahout, MLLib

Drill, Impala

GraphX & Titan

© 2014 MapR Technologies 16

Operations + Analytics = Real-time, Personalized Services

Fraud modelRecommendationstable

MapR Distribution for Hadoop

Fraud investigator

Interactive marketer

Online transactions

Fraud detection

Personalized offers

Clickstream analysis

Fraud investigation tool

Real-time Operational Applications

Analytics

© 2014 MapR Technologies 17

Q & A

@mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies