Hadoop, Oracle and the big data revolution collaborate 2013

Preview:

DESCRIPTION

Presentation given at Collaborate 2013

Citation preview

Hadoop, Oracle and the Industrial Revolution of Data

Guy Harrison, Dell Software Group

Hadoop, Oracle and the Industrial Revolution of Data

Guy Harrison

Executive Director, R&DInformation management group

3 Software Group

Introductions

www.guyharrison.net

guy_harrison@dell.com

http://twitter.com/guyharrison

4 Software Group

Dell, Quest and Toad

5 Software Group

6 Software Group

7 Software Group

8 Software Group

9 Software Group

10 Software Group

11 Software Group

Blue

Yellow

Red

0 10 20 30 40 50 60 70 80

Star trek shirt fatality analysis

Pct

12 Software Group

13 Software Group

14 Software Group

Quest Software is now part of Dell

15 Software Group

“Big” Data?

16 Software Group

Three or Four “V”s

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

17 Software Group

Data volumes have always been increasing….

2006 Perspective

18 Software Group

Though the absolute volumes are boggling…

Human Brain

Google

Living Human Genomes

Digital information 2008

Total Digital capacity

Digital information created 2011

1E+09 1E+16 1E+23

2.81E+15

1.10E+17

5.48E+18

4.87E+18

1.18E+21

2.13E+21

Gigabyte Tera-byte

Petabyte Exabyte zettabyte

19 Software Group

Velocity

20 Software Group

21 Software Group

Fail whales

22 Software Group

Variety OR – the industrial Revolution of data

23 Software Group

24 Software Group

25 Software Group

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Data: now and then

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitiveness

Source of product innovation

Changing our world

2013

31 Software Group

“Big” data driven by the smallest devices

32 Software Group

Smartphone hardware

• Quad-core 1.4 GHz CPU

• 1GB RAM

• 64GB Storage

• 1080p display

• GSM/Bluetooth/WiFi Network

• 8MP Camera

• GPS & Compass

33 Software Group

Smartphone software

34 Software Group

35 Software Group

36 Software Group

37 Software Group

Name: Willy Bowman

Nationality: German

DON’T MENTION THE WAR

39 Software Group

Data Input

40 Software Group

41 Software Group

Siri

From now on, I’ll call you ‘An Ambulance’. OK?

“Siri call me an ambulance”

I found 14 bridges nearby:

“I want to jump off a bridge”

42 Software Group

Sixth-Sense

43 Software Group

44 Software Group

45 Software Group

Brain Control

46 Software Group

47 Software Group

48 Software Group

49 Software Group

50 Software Group

51 Software Group

The intrumented human

• Bluetooth Personal Area Network

• 3G/WiFi Wide Area Network

• GPS• Storage

• Pulse, temp monitor

• Silent alarms• Pedometer, sleep

monitoring

• Compass • Camera• Mike/earphones• Heads up display

52 Software Group

All this requires and generates huge data sets

But what else are they good for?

53 Software Group

The data “exhaust” itself generates new opportunites

Companies want to generate competitive advantage through “Big Data analytics”

54 Software Group

Machine LearningPrograms that evolve with “experience”

Collective IntelligencePrograms that use inputs from “crowds’ to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data Analytics

55 Software Group

56 Software Group

57 Software Group

58 Software Group

59 Software Group

60 Software Group

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Collective Intelligence

Search Optimization

Recommendation Systems

Security• Vulnerability• Penetration

Detection

Fraud Detection

Predictive Analytics• Churn • Defaults

Medical• Risk analysis• Diagnosis• Prognosis

Game optimization

Advertising• Targeting• Tailoring

67 Software Group

Collective Intelligence beats Artificial Intelligence ?

68 Software Group

69 Software Group

70 Software Group

71 Software Group

72 Software Group

73 Software Group

For the last 40 years AI has been consistently disappointing

74 Software Group

75 Software Group

76 Software Group

In 2011 AI made a comeback

77 Software Group

78 Software Group

79 Software Group

80 Software Group

81 Software Group

82 Software Group

83 Software Group

84 Software Group

Google: Pioneers of Big Data

85 Software Group

86 Software Group

87 Software Group

88 Software Group

89 Software Group

Google File System (GFS)

Map Reduce BigTableChubby

Google Applications

Google Software Architecture

90 Software Group

START REDUCEMAPMAP

MAPMAP

MAPMAP

MAPMAP

MAPMAP

MAPMAP

MAP

MAPMAP

MAPMAP

MAPMAP

MAPMAP

MAPMAP

MAPMAP

MAPMAP

MAPMAP

MAPMAP

MAPMAP

MAPMAP

Map Reduce

91 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCECLIENT

Multi-stage Map-Reduce

92 Software Group

Schema on Read vs Schema on Write

93 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

Code

ExtractLoad Transform Data

Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

94 Software Group

Hadoop: Open Source Map-Reduce Stack

95 Software Group

Hadoop at Yahoo

Yahoo! Hadoop cluster:4000 nodes16PB disk64 TB of RAM32,000 Cores

96 Software Group

97 Software Group

Hadoop 1.0 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA, PIG, HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

98 Software Group

Hadoop File System (HDFS)

Hadoop Map ReduceHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

99 Software Group

HBaseA Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

100 Software Group

Name Site Counter

Dick Ebay 507,018

Dick Google 690,414

Jane Google 716,426

Dick Facebook 723,649

Jane Facebook 643,261

Jane ILoveLarry.com 856,767

Dick MadBillFans.com 675,230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarry.com

5 MadBillFans.com

NameId SiteId Counter

1 1 507,018

1 3 690,414

2 3 716,426

1 3 723,649

2 3 643,261

2 4 856,767

1 5 675,230

Id Name Ebay Google Facebook (other columns) MadBillFans.com

1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230

Id Name Google Facebook (other columns) ILoveLarry.com

2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767

Hbase Data Model

101 Software Group

Hive

102 Software Group

103 Software Group

SQL

JAV

A

RES

ULT

S

104 Software Group

Other SQL-like Hadoop Interfaces

• Cloudera Impala

• MapR Drill

• Aster

• Greenplumb (Pivotal HD)

• Paraccel

• Hadapt

• Oracle SQL Connector for Hadoop (External Table interface to HDFS)

105 Software Group

Pig

106 Software Group

Pig Latin

SQL or Hive QL

107 Software Group

Meanwhile, back at the Deathstar…

108 Software Group

109 Software Group

110 Software Group

Oracle Exadata

Database servers

64 cores, 576 GB RAM

Storage Servers112 cores, 100 TB SAS or336 TB SATA plus5 TB SSD

111 Software Group

Economies

Exadata

Hadoop

$0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000

$4,911

$750

Exadata vs Hadoop $$/TB (Hardware only)

114 Software Group

Oracle Big Data Appliance

18 Sun X4270 M2 servers− 48GB RAM per node (864GB total)− 2x6 Core CPU per node (216 total)− 12x2TB HDD per node (216

spindles, 864 TB)− 40Gb/s Infiniband between nodes− 10Gb/s Ethernet to datacentre

Competitive Pricing

www.oracle.com/us/bigdata/index.html

115 Software Group

Big Data Appliance Software

• Cloudera Enterprise

• Oracle Enterprise R

• Oracle NoSQL

• Oracle Big Data Connectors

116 Software Group

ORACLEEXADATA

ORACLEEXALOGIC

ORACLEBIG DATA

APPLIANCE

ORACLE NOSQL

ORACLE LOADER FOR HADOOPAPACHE

HADOOP ORACLE RDBMS

ORACLE WEBLOGIC

ORACLE EXALYTICS

ORACLE ESSBASE

ORACLE TIMES TEN

Latency

Storage Costs

117 Software Group

The following week at the Borg collective….

Pg. 118© 2012 Quest Software Inc. All rights reserved. 118

119 Software Group

120 Software Group

Integrating Hadoop and RDBMS

121 Software Group

Scenario #1: Reference data in RDBMS

CUSTOMERS

WEBlOGS

PRODUCTS

HDFS

RDBMS

122 Software Group

Scenario #2: Hadoop for off-line analytics

CUSTOMERS

PRODUCTS

RDBMS

SALESHISTORY

HDFS

123 Software Group

Scenario #3: MapReduce output to RDBMS

WEBLOGSSUMMARY

RDBMS

DB QUERYTOOL

WEBLOGS

HDFS

124 Software Group

Scenario #4: Hadoop as RDBMS “active archive”

SALES 2011

HDFS

RDBMS

QUERYTOOL

SALES 2010

SALES 2009

SALES 2008

SALES 2009

SALES 2008

125 Software Group

The Big Data Stack

126 Software Group

HDFS

MAP-REDUCE HBASE

PIG

CASCADING

MAHOUT

JAVA APIHIVE

R (ET AL)JAVA API

DATA SCIENTIST

127 Software Group

128 Software Group

HDFS

MAP-REDUCE HBASE

PIG

CASCADING

MAHOUT

JAVA APIHIVE

R (ET AL)JAVA API

DATA SCIENTISTBIG DATA ANALYTICS SOFTWARE

129 Software Group

BIG DATA ANALYTICS

INDEXING AND

SEARCH VISUALIZATION

RECOMMENDERS

CLUSTERING

CLASSIFICATION

EXPERT SYSTEMS (LIKE WATSON)

OPTIMIZATIONMACHINE LEARNING

PREDICTIVE ANALYTICS

COLLECTIVE INTELLIGENCE

BASKET ANALYSIS

SENTIMENT ANALYSIS

130 Software Group

In Summary….

131 Software Group

Hadoop is….

132 Software Group

Economical

Exadata

Hadoop

$0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000

$4,911

$750

Exadata vs Hadoop $$/TB (Hardware only)

133 Software Group

Proven at Scale

134 Software Group

A platform for Advanced analytics

135 Software Group

ETL Free

Data

Analyse

Aggregate

Normalize

Cleanse

Code

Extract Load Transform Data Warehouse

Utilize

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

136 Software Group

The most concrete technology enabling the Big Data revolution

137 Software Group

Hadoop is not….

138 Software Group

A replacement for RDBMS

But future Enterprise Data Architectures will likely incorporate Hadoop side by side with RDBMS

139 Software Group

Suitable for OLTP

Though OLTP systems can be built with Hadoop-compatible NoSQL systems such as HBase and Cassandra

140 Software Group

A complete solution

Hadoop alone only solves the storage challenge of Big Data

141 Software Group

Shameless plugs

142 Software Group

Toad for Cloud Databases

143 Software Group

Toad BI Suite

Business Intelligence solutions with first class support for Hadoop, Oracle and many other platforms

144 Software Group

Kitenga Analytics Suite

145 Software Group

SharePlex® for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit / Change

Data

HBase RealTime replication

146 Software Group

Toad for Hadoop

Hive Query IDE

Oracle <-> Hadoop data management

Basic Hadoop administration

Beta June

147 Software Group

Recommended