38
Grab some coee and enjoy the pre-show banter before the top of the hour!

The Big Picture: Understanding the Many Roles of Hadoop

Embed Size (px)

Citation preview

Grab some

coffee and

enjoy the

pre-show

banter before

the top of the

hour!

The Big Picture: Understanding the Many Roles of Hadoop Exploratory Webcast | January 28, 2015

SPONSORED BY

Guests

Robin Bloor Chief Analyst, The Bloor Group @robinbloor [email protected]

Eric Kavanagh CEO, The Bloor Group @eric_kavanagh [email protected]

Findings Webcast May 27, 2015

Making Sense of Hadoop

Roundtable Webcast March 18, 2015

Exploratory Webcast January 28, 2015

#Hadoop

Making Sense of Hadoop

Robin Bloor, PhD

In Three Segments

The Forces of Disruption

Focus Areas

Hadoop: Then, Now & Later

PART ONE

PART THREE

PART TWO

The Forces of Disruption

The Forces Of Disruption

The Generic Dimensions of IT

u  All IT involves 4 components (only) •  Users •  Software •  Data •  Hardware

u  Change any one of these and the other three components have to adjust

u  Aggregate these and you get a process

u  Time will impose change anyway

u  We can also consider a larger field, since this applies to all systems not just IT systems

Four Fundamental (IT) Factors

Hardware

Users

Software Data

Business

InformationB

usinessProcess

Hum

anActivity

AllInform

ation

Staff

Facility

People

Civilization

TIME

The Hexagon of Business Change

u  Speed •  Speed of action •  Speed of business process

u  Cost •  Cost of acquisition •  Cost of ownership

u  Time •  Time to deploy •  Time to employ

u  Business Value •  By competitiveness •  By cost reduction

u  Effort •  Effort to develop •  Effort to deploy

u  Fit •  Compatible •  Incompatible

Plus, capacity to change

SPEED

TIMETAKEN

EFFORT

FIT

VALUE

Speed of ProcessSpeed of Action

CompatibleIncompatible

TimetoDeploy

TimetoEmploy

Cost Reduction

Competitiveness

AcquisitionCost

TCOCOST

Effort toDeploy

Effort toDevelop

Hexagon ofChange Factors

Plus Capacity

The Technology Layers

u  The buying impulse descends through the stack

u  The impact of technology change rises up the stack

u  This ensures the eventual “legacification” of all technology

The BuyingImpulse Goes

Down

TechnologyChange Rises Up

The TechnologyLayers

Technology Layer Perspectives

u  This simple model has a number of uses

u  For example, we can use it to depict the “aaS options”

u  More importantly we can use it to track disruption …

u  More of which later…

The aaS Possibilities

Disruption in The Technology Layers

u  Disruption (as innovation) can happen in any layer

u  Where it occurs, it will impact all layers above it

u  And it may also impact the layers below it (but less quickly)

u  There is no such thing as future-proof; but some technologies definitely live longer

The BuyingImpulse Goes

Down

TechnologyChange Rises Up

The TechnologyLayers

Mainframe Computer (Batch architecture)

On-line Interaction (Centralized architecture)

PC (Client server)

Internet (Multi-tier architecture)

Mobile (Service oriented architecture)

Internet of things (Event driven architecture)

Tech Revolutions

Note that all of these disruptive changes were driven by hardware innovation

Hardware Layer Disruption

u  SSD is now on the Moore’s Law curve

u  Spinning disk has almost popped its clogs

u  Memory grows and can be networked

u  CPUs still evolving: CPU and GPU merged

Hierarchical Memory

u  On chip speed v RAM •  L1(32K) = 100x •  L2(246K) = 30x •  L3(8-20Mb) = 8.6x

u  RAM v SSD •  RAM = 300x

u  SSD v Disk •  SSD = 10x

Note: Vector instructions and data compression

In-Memory Disruption

u  Memory will become the primary store for data (this impacts data flows)

u  Almost all applications are poorly built for this

u  Memory is an accelerator, as is CPU cache – this is becoming a factor

u  HP’s Memristor waits in the wings

Hadoop: Deceptive Impression Because Hadoop was built to run on 1000s of servers, there’s an impression that Hadoop needs such huge

clusters/grids

In reality the opposite is now happening, the number of servers is diminishing, which means that Moore’s Law still

operates

Putting a SoC in IT

u  It’s possible that the CPU-Memory split will vanish, possibly soon

u  This requires the emergence of the commodity SoC

u  There are already SoCs that run Linux

u  Grids of SoCs would replace grids of servers

Parallelism: The Imp Is Out of the Bottle

u Multicore chips enabled parallelism

u  It has changed the whole performance equation

u  It enabled Big Data

u  Big Data is really Big Processing

Some Architectural Principles

u  The new atom of data is the event

u  SUSO, scale up before scale out

u  Take the processing to the data, if you can

u  Hadoop is a component not a solution

Hadoop: Then, Now & Later

The Hadoop Ecosystem

u  Apache Projects: HBase, HCatalog, Pig, Hive, Flume, Storm, Sqoop, Nutch, Avro, Oozie, ZooKeeper, etc.

u  New commercial products: Actian, RedPoint, Attunity, Voltage Security, etc.

u  Languages and dev environments

Hadoop Usage

u  Data archive

u  Data staging & ETL

u  Data preparation

u  Analytics sandbox

u  Analytics platform

u  Database environment

The State of Play

Graphic from Allied Market Research

There are some straws in the wind here: it is being used everywhere

It’s a HUGE market Also VC investment is massive

Hadoop as a Clip-On

Data Lake, Refinery, Hub: In Overview

Think Logical, Implement Physical

Two Data Flows

Hadoop in the Technology Layers

Hadoop in theTechnology Layersu  Hadoop starts as a scale-

out file system with a one-dimensional development environment

u  It evolves with the addition of YARN to begin to occupy the OS & Sys Mgt layer

u  Analytics applications become synonymous with Hadoop

u  Hadoop is migrating through the stack

Hadoop as an OS

u  The Trail of OSes •  OS/360 -> OS/370 -> z/OS •  VMS •  Unix -> Solaris •  MS-DOS -> Windows •  Linux •  OS X -> iOS

u  OSes evolve in two ways •  Own development •  Third party add-ons

u  They create application ecosystems u  In time they make previous OSes

obsolete u  This is what Hadoop is in the

process of doing

Focus Areas

Hadoop in the Basic Map

u  Hardware & Cloud

u  Software

u  Data realities

u  Usage

Four Fundamental (IT) Factors

Hardware

Users

Software Data

Business

InformationB

usinessProcess

Hum

anActivity

AllInform

ation

Staff

Facility

People

Civilization

TIME

Hadoop in the Hexagon

Plus, capacity to change

SPEED

TIMETAKEN

EFFORT

FIT

VALUE

Speed of ProcessSpeed of Action

CompatibleIncompatible

TimetoDeploy

TimetoEmploy

Cost Reduction

Competitiveness

AcquisitionCost

TCOCOST

Effort toDeploy

Effort toDevelop

Hexagon ofChange Factors

Plus Capacity

u  Speed •  Speed of action •  Speed of business process

u  Cost •  Cost of acquisition •  Cost of ownership

u  Time •  Time to deploy •  Time to employ

u  Business Value •  By competitiveness •  By cost reduction

u  Effort •  Effort to develop •  Effort to deploy

u  Fit •  Compatible •  Incompatible

Hadoop in the Technology Layers

Hadoop in theTechnology Layers

u  Speed •  Speed of action •  Speed of business process

u  Cost •  Cost of acquisition •  Cost of ownership

u  Time •  Time to deploy •  Time to employ

u  Business Value •  By competitiveness •  By cost reduction

u  Effort •  Effort to develop •  Effort to deploy

u  Fit •  Compatible •  Incompatible

In Three Segments

Part 1 - The Forces of Disruption

Part 2 – Hadoop: Then, Now & Later

Part 3 – Focus Areas

Questions?

#Hadoop or

USE THE Q&A

THANK YOU!

FIND OUT MORE at http://insideanalysis.com/research/making-sense-of-hadoop