53

Hadoop within the data center of the future

Embed Size (px)

DESCRIPTION

Big Data technologies are surfacing in data centers to solve problems legacy systems were not built to handle. Hadoop is one of those technologies. Successful Hadoop implementations share common characteristics and also address the requirements needed to function as a proper tenant within the data center. This session addresses those common characteristics as well as the integration requirements that need to be addressed for Hadoop within the data center of the future.

Citation preview

Page 1: Hadoop within the data center of the future
Page 2: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Hadoop in the datacenter Donald Livengood/ June 2013

Page 3: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 3

My background

Title

Distinguished Technologist

TS Consulting

IT industry experience • Big Data

• Client Infrastructure, Mobility, VDI

• Unified Communications & Collaboration

• Virtualization & Private Cloud

• Electronic Messaging & Directory Services

Professional information • Certified Infrastructure Architect

Years at HP

28

Current responsibilities Responsible for the creation of services and delivery readiness for Big Data Infrastructure world-wide

Name: Donald Livengood

E-mail: [email protected]

Page 4: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 4

Agenda

Big Data

Why Hadoop exists

Designing Hadoop

Integrating Hadoop into the datacenter

Page 5: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Big Data

Page 6: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 6

Gartner "Big Data" is a popular term generally used to acknowledge the exponential growth, availability and use of information in the data-rich landscape of the emerging information economy era.

What is “Big Data”

HP definition

“Big Data is a class of data challenges, due to increasing volume, velocity, variety, and complexity, that are beyond the capabilities of the traditional software, architecture, and processes to effectively manage and utilize.”

What does Big Data mean for Enterprise IT? A combination of IT capabilities to deal with volume, velocity, variety of data.

McKinsey Report “Big Data” refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.

Note: Slide for internal use only

Page 7: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 7

Big Data variables forcing Big Data technology adoption

Variety Any type of data

Volume Ability to handle very large amounts of Data

Velocity Process all data quickly

Voracity End-user appetite for Big Data consumption

Data in many forms Structured, unstructured, text, multimedia. Relevant information are into unstructured data.

Data consumption Ingestion and processing of Data Real Time Processing. Velocity as it relates to consumption of big amounts of data

Data quantity Scale from terabytes to petabytes to zettabytes. Volumes that traditional Data Management technologies cannot handle in time for consumption

Data creation & transport Streaming data, milliseconds to seconds to respond. Velocity related to ingestion, cleaning, meaning of data.

Page 8: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 8

Traditional Information Technologies are not adequate with Big Data

SQL

Consistency

Availability

Big Data

Traditional

Finding out useful information requires powerful analytics and massive processing

Variety Volume

Velocity

Real-time data processing (vertical DB, In-Memory DB)

Scale-Out, Partitioned architecture

Handle Structured and Unstructured Data

Voracity

Allow for multiple Ingress points (Query, Search)

Page 9: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 9

Agenda

Big Data

Why Hadoop exists

Designing Hadoop

Integrating Hadoop into the datacenter

Page 10: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Why Hadoop exists

Page 11: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 11

Sto

rag

e P

latf

orm

Today's architecture

Classic ETL Processing

Business Transactions and Interactions

Business transactions and interactions

CRM – ERM – SCM FMS – HRM

$ € ¥ Transaction Data

Analytical, Dashboards, Reports, Visualization

Enterprise Data Warehouse

Business intelligence & analytics

Page 12: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 12

Sto

rag

e O

nly

pla

tfo

rm (

SA

N/N

AS

)

Gap in today's architecture

Social media data

Forum

Blog

Feeds

Web

Clicks

Multi-media

Audio

Video

Images

Document management

Content Management

File Sharing

File Hosting

Collaboration

Search

Message data

IM and VOIP

Messaging System

Sensors data

GPS

Sensors devices

RFID

Other events

Classic ETL Processing

Business Transactions and Interactions Business transactions and interactions

CRM – ERM – SCM FMS – HRM

$ € ¥ Transaction Data

Analytical, Dashboards, Reports, Visualization

Enterprise Data Warehouse

Business intelligence & analytics

Moving data to compute doesn’t scale

Can’t explore original data

Archiving = Death Cheap storage Expensive restore

Data dropped due to ETL

Can’t handle data types

Schema change takes time

Page 13: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 13

Had

oo

p

Gap in today's architecture

Social media data

Forum

Blog

Feeds

Web

Clicks

Multi-media

Audio

Video

Images

Document management

Content Management

File Sharing

File Hosting

Collaboration

Search

Message data

IM and VOIP

Messaging System

Sensors data

GPS

Sensors devices

RFID

Other events

Classic ETL Processing

Business Transactions and Interactions Business transactions and interactions

CRM – ERM – SCM FMS – HRM

$ € ¥ Transaction Data

Analytical, Dashboards, Reports, Visualization

Enterprise Data Warehouse

Business intelligence & analytics

Move some data to legacy system

All data available

Keep Data in Hadoop Cheap storage & can tier Always available

Hadoop

Use MapReduce

Page 14: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 14

What is Hadoop?

Hadoop consists of two core components

• The Hadoop Distributed File System (HDFS)

• MapReduce

- Computation Framework (engine)

- Resource Manager & Scheduler

- Other engines are/will be introduced (Impala)

A set of machines running HDFS and MapReduce is known as a Hadoop Cluster

• Individual machines are known as nodes

• A cluster can have as few as one node, as many as several thousands

• More nodes = better performance!

There are many other projects based around core Hadoop

The ‘Hadoop Ecosystem’ includes many projects

eg, Pig, Hive, HBase, Flume, Oozie, Sqoop, etc

A flexible and scalable architecture for large scale processing and computation across a distributed network of computers

Page 15: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 15

Nodes & roles

Master Nodes: Name Node

• Oversees data storage in HDFS

– Maps a HDFS file name to set of blocks, maps blocks to DataNodes

Job Tracker

• Coordinates parallel processing using MapReduce

Slave Nodes: DataNode (slave to Name node)

• Block server

– Stores blocks as separate files on local filesystem

• Communicates to NameNode re: existing blocks

TaskTracker (slave to Job Tracker)

• Starts and monitors Map tasks

• Heartbeat and status to Job Tracker

Edge Node

- Not part of Hadoop architecture

- Usually not part of cluster (but could be)

- 1 or more used for ingress/egress to/from cluster

- Provides authenticated users with access to private subnet (cluster)

- Configured for transient storage & high bandwidth to core network

Page 16: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 16

Hadoop HDFS and MapReduce B

lock

2

B

lock

3

B

lock

4

B

lock

5

B

lock

6

B

lock

1

Server 1 Server 2

Block 1

Block 2

Block 1

Block 3

Server 3

Block 5

Block 6

Server 4

Block 2

Block 3

Server 5

Block 4

Block 5

Server 6

Block 4

Block 6

HDFS MapReduce

Mapping Process

Shuffle Data

Reduce Process

Outputs Stored locally to HDFS

Page 17: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 17

Agenda

Big Data

Why Hadoop exists

Designing Hadoop

Integrating Hadoop into the datacenter

Page 18: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Designing Hadoop

Page 19: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 19

HP Reference Architectures provide a firm baseline for a balanced cluster

Hadoop Sizing: Workload Matters

Examples IO-bound workloads

• Indexing

• Searching

• Grouping

• Decoding/decompressing

• Data importing and exporting Computation Optimized Low Power Consumption

Balanced

Balanced/ More Power per Node Storage Optimized

Fewer Disks Disk More disks

Low

CP

U

Hig

h

Examples CPU-bound workloads

• Machine learning

• Complex text mining

• Natural language processing

• Feature extraction

Page 20: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 20

Caveat: You must know your workload

Workload-based Configuration Approach (1 of 2)

Base guidelines

NameNode: RAID1, 32GB per 1M files, 4+ disks (usually 64GB balanced)

Datanode: 1GB per core, 1 disk per core

- 4 1TB or 2TB hard disks in a JBOD (Just a Bunch Of Disks) configuration

- 2 quad core CPUs, running at least 2-2.5GHz

- 16-24GBs of RAM (24-32GBs if you’re considering Hbase)

Network:

- 1Gb Ethernet for nodes, 10Gb for edge nodes and network switch uplinks

- Use 10Gb if “free”: can drive cost very high for adapters and switches

Page 21: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 21

Caveat: You must know your workload

Workload-based Configuration Approach (2 of 2)

Light Processing Configuration: 1GB per core

- (1U/machine): Two quad core CPUs, 8GB memory, and 4 disks

- CPU-intensive: Use 2GB per core versus 1GB

Balanced Compute Configuration: 2 to 3GB per core

- (1U/machine): Two quad core CPUs, 16 to 24GB memory, and 4 disks

Storage Heavy Configuration: 2 to 3GB per core, big storage & power

- (2U/machine): Two quad core CPUs, 16 to 24GB memory, and 12 disk drives

- Power consumption ~200W in idle state and can go as high as ~350W when active

Compute Intensive Configuration: large memory, moderate storage

- (2U/machine): Two quad core CPUs, 48-72GB memory, and 8 disks

- Used when a combination of large in-memory models and heavy reference data caching is required.

Page 22: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 22

Which platform to choose?

DC rack capacity is limited

DC cooling & power are issues

High density commodity servers

Need to balance

• Core: disk ratio (1:1) – threads help

• CPU cost – power budget and price

• Disk capacity – more is better

Average size ~ 20 servers

Plan for change!

Optimizing Rack Capacity

DL360p SFF(12 core), 1.20

DL380p/e(16 core), 1.33

SL4540(16 core), 1.07

DL380p/e(12 core), 1.00

350

400

450

500

550

600

650

400 500 600 700

Ha

rd D

riv

es/

Ra

ck

Cores/Rack

Hadoop Data Nodes Core/Disk Ratios per 42u Rack

DL360p SFF(12 core)

DL380p/e(16 core)

SL4540(16 core)

DL380p/e(12 core)

Page 23: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 23

Item %Cost

Memory 40%

Disk 36%

Chassis 8%

Network 7%

CPU 6%

Software 2%

Rack 1%

Cost Distribution – SL/DL Server rack

Network

Load balanced, redundant, wire-speed

Separate management network

Chassis/CPU

DL/SL series of commodity servers

Single quad-core, mid-range Xeon

Disks

Full complement of LFF Terabyte disks

Memory

(24)32 GB of ECC

Disk and Memory are the largest cost contributors

23

Page 24: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 24

Hadoop Physical Architecture Typically organized as racks of commodity servers with DAS storage

“Commodity” Server Hardware

1GbE Rack Switches

ECC Memory

Storage using SATA disk

Only master servers require RAID disk

Out-of-band management via iLO

Rack

|

HPN Top of Rack Switches

Management Node

Hadoop Master

Hadoop Slave

Hadoop Slave

Rack

|

HPN Top of Rack Switches

Hadoop Slave

Hadoop Slave

Hadoop Slave

Hadoop Slave

Cluster Switch

Rack

|

HPN Top of Rack Switches

Hadoop Slave

Hadoop Slave

Hadoop Slave

Hadoop Slave

iLO 1Gb iLO 1Gb

iLO 1Gb

10Gb 10Gb

Page 25: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 25

Hadoop Solution Packaging Consistent with Reference Architectures & Hadoop Appliance

0

|

HPN Top of Rack Switches

Management Node

Hadoop Master

Hadoop Slave

Hadoop Slave

Head Rack Enclosure

|

HPN Top of Rack Switches

Hadoop Slave

Hadoop Slave

Hadoop Slave

Hadoop Slave

42u Rack Expansion

|

HPN Top of Rack Switches

Hadoop Slave

Hadoop Slave

Hadoop Slave

Hadoop Slave

42u Rack Expansion

Base Expansion

Page 26: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Designing Hadoop: Network

Page 27: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 27

Network considerations

Hadoop is a high-performance computing platform - Hadoop drives performance and availability through IP communications

Guidelines - Cluster must have dedicated switching – no shared switches or VLANs:

• Network traffic characteristics of Hadoop demand this

- All servers should use 1Gb/10Gb Ethernet to the Top of Rack (ToR) switches

- All ToR switches should have multiple 10 GbE connections to the core switches, for both bandwidth and redundancy

• Integrated Lights Out (ILO) management may be supported from a separate 1GbE/100 Mbps network

Use server bonded NICs and redundant ToR switches - Cost is higher but worth it in multi-rack clusters

- Improved bandwidth

- Avoids replications costs on failure of ToR switch - Connect ToR to Aggregation switches to join racks

- More complex but significant benefits

• HP can provide assistance

Page 28: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 28

Network considerations

Routing

• Cluster should not route any in-cluster traffic out of the cluster

− Misconfigured routers can allow this

Network & port stress

• Hadoop can stress all ports, across all servers and ToR switches for extended periods

− Use switches suited for Hadoop, not just “favored” switch types

− Network traffic characteristics of Hadoop demand this

DNS

• Hadoop makes many DNS & reverse DNS lookups

− Even for nodes within the cluster

• Use and maintain local /etc/hosts file for in-cluster lookups

• MapReduce jobs making excessive calls to remote servers can general large amounts of external traffic

• May consider placing cached DNS server in every worker node to mitigate the problem

Integration into the corporate network

Page 29: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

“We’ve profiled our Hadoop applications so we know what type of infrastructure we need”

Said no-one. Ever.*

*Credit: HP Hadoop engineer

Page 30: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 30

Balanced design approach

Whole Cluster as an Appliance

Well defined Ingress/Egress Interfaces

Cluster Deployment & Management

Integration into DC Monitoring

Infrastructure-isolated cluster network

• Simplifies cluster network

• Separates cluster traffic load

• High speed connections for ingress/egress

DMZ Edge Nodes

Appliance Cluster

Access via controlled interfaces to minimize disruption, improve security, and reduce risk to DC and processes

Data import

Data export

Monitoring

Management

Page 31: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 31

Enterprise-ready Big Data platform

Pre-integrated, pre-tested, pre-engineered We’ve done all the hard work for you

Full-rack, half-rack, expansion rack options

Out of the box Not in months, but hours or days

Super fast Loading, sorting, and analysis

Easy scaling Expansion racks available

Via CMU: 800 nodes in 30 minutes

HP AppSystem for Apache Hadoop

Page 32: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 32

Deploy

…it’s like using as-is open source technology, you have a lot of work to do!

Without AppSystem

Without the HP AppSystems ~ 8+ weeks

Research components

Develop complex Design

Order collection of parts

Assemble parts

Install, upgrade firmware & software

Test & adjust design

Find your mistakes somewhere in here and start over

With HP AppSystem for Apache Hadoop ~ 4 weeks

Choose your AppSystem

Order the AppSystem

Installation Deploy Success!

Page 33: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 33

Agenda

Big Data

Why Hadoop exists

Designing Hadoop

Integrating Hadoop into the datacenter

Page 34: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Integration into the datacenter

Page 35: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 35

Big Data Transformation

Big Data information refinery

Insight Processing

Protection & Compliance Management

Infrastructure Integration

Enterprise Data Warehouse

Analytical, Dashboards Reports, Visualization

Business intelligence

Business transactions and interactions

Web, Mobile

CRM – ERM – SCM FMS – HRM Value

Creation

share, refine & development

Message data

Document management

Social media data

Multi-media

Sensors data

Page 36: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 36

Big Data Functional Architecture: a refinery approach

Technology Integration

Se

curi

ty

Op

era

tion

s

Big Data Converged, Automated, Energy efficient infrastructure

Activity logging Intrusion Prevention

Switch virtualization Virtual application network

SSL/VPN Networks Storage replication

Server scale-out management

Collection Computation Consumption

Protection

Big Data Management

Compliance

Big Data Storage

Big Data Processing On Line / Batch Analysis

Internal / External Data

Structured / Unstructured

Real Time

Backup and Recovery

Governance

Privacy and Security

Destruction Archival Retention

Page 37: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 37

Functionalities

Destruction

Backup and Recovery

Governance

Privacy and Security

Protection Compliance

Retention

Archival

Protection qualities

Confidentiality, integrity, availability

De-duplication Replication

Data quality metrics

Compliance qualities

Rapid search (Legal)

Tiering

Page 38: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 38

Big Data Classification

! Classification Retention period

Recovery Time Objective (RTO)

Recovery Point Objective (RPO)

Forensic window

Vital / Critical 7 years 30 minutes <10 minutes 6 months

Sensitive 5 years 1 day < 1 hour 3 months

Non critical 6 months 1 week < 48 hours 1 month

Page 39: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 39

Confidentiality, Integrity, and Security of Big Data

Big Data Security

Unique Big Data Threats

Data Privacy Preservation

CSIRT Program Changes

Security Controls

Security Technology

eDiscovery

!

Page 40: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 40

Security integration

Confidentiality

Identity Access

Identity Access

Perimeter Security

Confidentiality

Perimeter Security

Refinery Outbound Presentation

Refinery Inbound Pipeline

Page 41: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 41

Big Data Security examples

HP related technologies

HP ProtectTool

• Authentication Services

• Multi-Factor Authentication

• Role Based Access (RBAC)

HP TippingPoint IPS

• In-line protection

• Real-time threat protection

Qualities

Role based

Speed

Reliable

Flexible authentication

Perimeter Security

Speed

Reliable

Managed

Real-time

Identity Access

Page 42: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 42

Backup and recovery

Backup and Recovery Policy

target time

Deduplication

eDiscovery

Vaulting

Replication

Media transfer performance

Storing reliability

Qualities

HP ESL Tape Backup

StoreOnce

• Back up up to 100TB/hr with Catalyst

• Restores up to 40TB/hr

• Couplet redundant

• Tape vaulting

HP Related Technologies

Page 43: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 43

Governance

Variety

Velocity Voracity

Volume

Validity

Accuracy assurance Consistency assurance Accessibility assurance Big Data Governance

Page 44: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 44

Privacy & Security H

P R

ela

ted

T

ech

no

log

ies

ArcSight Security Intelligence

• Threat Detection

• Security Analysis

• Different data sources log data management

• Legal and Compliance

Qu

alities

Confidentiality

Connectors

Compliance

Traceability

Speed

Autonomy Security Performance Suite

• Data Protector

• Live Vaulting

• eDiscovery

• Compliance Archiving

• Records Mgmt

Protection

Vaulting

Fast Restore

Encryption

Fast Discover

Security Controls

Data Privacy Preservation eDiscovery

Privacy & Security Policy

Page 45: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 45

Protection

Purging Shredding Wiping Degaussing

Different types of media

Protection Policy

Page 46: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 46

Archival

HP Related Technologies

3PAR StoreServ

StoreAll

• Console based tiering

• Express Query and Autonomy IDOL integration

• Mesh-Active Architecture

• Thin technologies

• Peer Motion

• Virtual Lock

• Adaptive Optimization

Qualities

Automated policy-based tiering

Rapid search

Extreme Data Reduction

Scalable Storage

Archival Policy

Page 47: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 47

Retention

Sarbanes-Oxley

HIPAA

PCI DSS

Safe Harbor

Data Privacy Act

GLBA

HP Related Technologies

3PAR StoreServ

StoreAll

• Console based tiering

• Snapshots and data validation

• WORM features

• File and Object Storage

• 16PB namespace

Qualities

High scalability

Automated policy-based tiering

Data Protection

WORM potentiality

Open standards interface

Retention Policy

Page 48: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 48

Enabling Big Data with Networking & Storage example

HP Related Technologies

HP FlexNetwork Architecture

• FlexCampus

• FlexFabric

• FlexManagement

HP Converged Storage

• HP 3Par StoreServ

• HP StoreAll

• WAN Optimization

HP Infrastructure Tools

• Insight CMU

• IMC

• StoreVirtual software

Qualities

Simplicity

Speed

Scalability

Identity-based access

Storage

Geographic Snapshot and Cloning Capabilities

Thin Provisioning

Seamlessly handle fast moving data

Network IT Operations

Manageability of:

Connections

Storage Scale-out

Server Scale-out

Page 49: Hadoop within the data center of the future

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 49

End-to-End Service level

Big Data Refinery Service Level Target:

Current different Service Level:

Business transactions and interactions

Very High

High

Business intelligence & analytics

High

Message data

High

Document management

Medium

Multi-media

Very Low

Sensors data

Medium

Refinery's consolidated Service Level:

Low

Social media data

Big Data information refinery

Insight Processing

Infrastructure Integration

Management

Page 50: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 50

Summary slide

Do you know the technology you will use and the workload? • If not, check out HP AppSystems and Reference Architectures for Hadoop (and other Big Data technologies)

Skills • Do you have experience with high-performance Linux clusters or Hadoop clusters?

Space & power • Can your data center handle the space, power, and cooling now & in the future?

Network & storage capacity • Can they handle data movement, staging, post-processing, and export/import?

• What load (export/import) can existing BI/Analytics systems handle?

Monitoring & Support framework • How will Hadoop ecosystem integrate

IT architectural requirements & standards

Page 51: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 51

For more information

Attend these sessions

• RT3462, Big Data Analytics 360

• RT3463, Big Data & the internet of things

• TB2590,What’s new in HP Vertica

• BB3378, Any data, any size

• TK2789, Keynote: Make information matter

Visit these demos

• HP AppSystem for Apache Hadoop

• IT Big Data Transformation Experience

After the event

• Contact your sales rep!

• Visit www.hp.com/go/bigdata

• Visit www.hp.com/go/hadoop

Your feedback is important to us. Please take a few minutes to complete the session survey.

Page 52: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 52

Learn more about this topic

Use HP Autonomy’s Augmented Reality (AR) to access more content

1. Launch the HP Autonomy AR app*

2. View this slide through the app

3. Unlock additional information!

*Available on the App Store and Google Play

Page 53: Hadoop within the data center of the future

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thank you