40
BIG DATA = BIG BIG DATA = BIG DECISIONS DECISIONS Bob Zurek | SVP Products | Epsilon | www.epsilon.com

Big Data = Big Decisions

Embed Size (px)

DESCRIPTION

Presented on April 17th for InnoTech Dallas.

Citation preview

BIG DATA = BIG DECISIONSBIG DATA = BIG DECISIONS

Bob Zurek | SVP Products | Epsilon | www.epsilon.com

BIG DATA APPROACHING

Consider the following:• New model for data • Accessible over TCP/IP and variety of languages• Initially difficult to understand• Capable of processing thousands of ops/sec• Very different from old model• Threatening as much was invested in old model• Changing course seems ridiculous

Source: Eben Hewitt

What are we talking about?

Source: IBM

IBM IMS

“IMS is IBM's premier transaction and hierarchical database management system, virtually unsurpassed in database and transaction processing availability and speed” – IBM 2013

“Mission-critical processing that requires unparalleled performance is best served by a hierarchical model. Analytics and business intelligence are best served by a relational model. Most Fortune 100 companies use both.”

A New Model Is Invented

A Disruptive Model

A Threatening Model

A Competitive Model

Data evolution

Source: Eben Hewitt

A HUGE industry success

The relational model & SQL

So now what?

We have a problem

confusion

innovation

Sound familiar?

complexity

disruptiona new model

fierce competition

Source: McKinsey

Big data – a growing torrent

$600 to buy a disk drive that canstore all of the world’s music

5 billion mobile phonesin use in 2010

30 billion

pieces of content sharedon Facebook every month

40% projected growth in global data

generated per year vs. 5%growth in globalIT spending235 terabytes data collected by the

U.S. Library of Congress by April 2011

15 out of 17sectors in the United States have more datastored per company than the U.S. Library of Congress

What is What is big data, big data, exactly?exactly?

Industry buzz

Big data confusion?

Source: IBM

A greater scope of information

New kinds of data and analysis

Real-time information

Data influx from new technologies

Non-traditional forms of media

Large volumes of data

The latest buzzword

Social media data

18%

16% 15% 13% 13%

10%

8%

7%

What do business executivesthink “big data” is?

Source: McKinsey

Big data is…

Large pools of data Large pools of data that can be captured, that can be captured, communicated, communicated, aggregated, stored, aggregated, stored, and analyzedand analyzed

Source: TDWI

Another way of looking at it

Is it time to lookfor an alternative?

It’s not that simple,

is it?

• Vertical scaling = throw hardware at it• Optimize the application = sql, indexes, access• Employ caching layers = MemcacheD, Coherence• Denormalization = reduce joins• Sharding/Shared Nothing = split the data up• Innovation = columnar

How are we solving (historically)?

What’s driving change and innovation?

102556397102556397

Doug Cutting = Nutch

Google = GFS and GMR

A search engine project at Yahoo

Big data innovation incubatedBig data innovation incubated

“Hadoop is an amazing technology stack. We now depend on it to run eBay.”

Bob Page, Vice President of Analytics, eBay

Source: http://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop/

eBay erected a Hadoop cluster spanning 530 servers – now five times the size!

It can get complex and confusing

“It replaced our need for ETL”

“It is great for batch processing in parallel”

“A beautiful platform for all of problems”

What it’s not good for

• High volume transactional data

• Structured data with low latency

“Note that Hadoop is not an Extract-Transform-Load (ETL) tool. It is a platform that supports running ETL processes in parallel. The data integration vendors do not compete with Hadoop; rather, Hadoop is another channelfor use of their data transformation modules. “

Teradata/Cloudera Presentation

What it’s really good for

• Index building

• Pattern recognitions

• Sentiment analysis

• Machine generated data

• Log processing

• Web scale = Google, Twitter, YouTube

Use Cases

Online Travel Reservations

Mobile Data

E-Commerce

Energy Discovery

Energy SavingsInfrastructure Management

Image Processing

Fraud Detection

IT Security

HealthCare

Analyze machine generated data

Semantic analysis for relevance

Suggest ways customers save money

Spot fraud anomolies

Process mobile data

Large marketplaces

Sort and process seismic data

Detecting patterns in sat imagery

Travel booking

Collecting device logs

Source: Teradata/Cloudera

Source: Teradata/Cloudera

Many shades of grey and lots of great innovations

Relational is still in play

Some innovations worth a look

Dynamically Scaling OLTP = “No Need To Shard”

The NoSQL generation

• Document Storage Model• Allows MTV to store

hierarchical data• Flexible schema to model

structure/data by brand• Needed to have ability

to query nested content• No need for a shared

disk storage

• Released by NSA to open source• Apache Accumulo• Based on Google Big Table• Built on top of Hadoop• Fine-grained access control• Cell level security • Server side programming

• Schemaless model = Easy to to add fields • Document oriented = Json format (think objects)• Built from the ground up to be distributed• Auto sharding • Distributed querying capabilities

Why NoSQL?

NoSQL Use Case

1. Click/Event into Hadoop

2. Data Analyzed via Map Reduce jobs; generates 100M profiles based on campaigns running

3. Selected profiles loaded into Couch

4. Ad targeting logic query Couch with sub-second latency to optimizedecision and real-time ad placement

Source: Couchbase

Hadoop Augmentation• Side-by-Side will be commonplace• ETL solutions support Hadoop • Relational Databases

• Provide ETL interfaces to Hadoop• Execute map/reduce jobs inside DBMS

• NoSQL supports ETL

Example Hybrid DBMS SystemsOracle Endeca Server• Hybrid Search/Analytic Database• Supports structured, unstructured, semi-structured• No schema required. Records stacked.• Columnar

Trends• SQL On Hadoop – Hadapt, Clodera Impala, EMC• Unified Support of Structured, Unstructured, Semi• Embedding Search• Expanded ETL/ELT Support• Big Data In Motion Takes Hold• Added Data Mining and Analytic Functions In NoSQL• Embedding R Language = gain in popularity• Data Scientists instrumental in business success

Bob Zurek | [email protected]