30
Bridging the Big Data Gap in the Software-Driven World Michael Harer MFT09S #CAWorld CA Technologies Product Management Mainframe Scott Andress Hortonworks Sr. Director, Business Development

Bridging the Big Data Gap in the Software-Driven World

Embed Size (px)

DESCRIPTION

Implementing and managing a Big Data environment effectively requires essential efficiencies such as automation, performance monitoring and flexible infrastructure management. Discover new innovations that enable you to manage entire Big Data environments with unparalleled ease of use and clear enterprise visibility across a variety of data repositories. To learn more about Mainframe solutions from CA Technologies, visit: http://bit.ly/1wbiPkl

Citation preview

Page 1: Bridging the Big Data Gap in the Software-Driven World

Bridging the Big Data Gap in the Software-Driven World Michael Harer

MFT09S #CAWorld

CA Technologies Product Management

Mainframe

Scott Andress Hortonworks Sr. Director, Business Development

Page 2: Bridging the Big Data Gap in the Software-Driven World

2 © 2014 CA. ALL RIGHTS RESERVED.

Abstract

Michael Harer

CA Technologies

Sr. Principal Product Mgr.

Database and Analytics

Implementing and managing a Big Data environment

effectively requires essential efficiencies such as

automation, performance monitoring and flexible

infrastructure management. Discover new innovations

that enable you to manage entire Big Data environments

with unparalleled ease of use and clear enterprise

visibility across a variety of data repositories.

Page 3: Bridging the Big Data Gap in the Software-Driven World

3 © 2014 CA. ALL RIGHTS RESERVED.

Agenda

QUICK REFRESHER ON BIG DATA

BIG DATA INFRASTRUCTURE MANAGEMENT CHALLENGES

RECOMMENDED SESSIONS / RELATED ACTIVITIES

360 DEGREE BIG DATA INFRASTRUCTURE MANAGEMENT APPROACH

HORTONWORKS BIG DATA PLATFORM

SUMMARY

1

2

3

4

5

6

Page 4: Bridging the Big Data Gap in the Software-Driven World

4 © 2014 CA. ALL RIGHTS RESERVED.

Big Data Means Different Things To Different People 1

4 © 2014 CA. ALL RIGHTS RESERVED.

High-Velocity capture, discovery and/or analysis

Large Volumes of a Variety of data from various sources across the enterprise

Veracity – keeping the right, trusted data

Or explained via the 4 Vs…

Any analytical processing that is different from the traditional data warehouse applications in place today

Defined by the types and speed of data being analyzed

Customers define Big Data in a broad sense:

Page 5: Bridging the Big Data Gap in the Software-Driven World

5 © 2014 CA. ALL RIGHTS RESERVED. 5 © 2014 CA. ALL RIGHTS RESERVED.

Big Data – Growing Fast 1

80 percent of data is unstructured (images, audio, tweets, etc.

New analytic applications based on a next generation big data platform are reaching the market

Low-cost hardware and software environments

– Less costly capture and exploitation of big data

Data volumes are doubling every year

Organizations are storing three or more years of data

Hadoop Administrator

Hadoop Developer/Architect

Data Scientist, etc.

Commoditized Hardware and Software New Personas

Capturing and Managing lots of information

Working with many new types of data

Page 6: Bridging the Big Data Gap in the Software-Driven World

6 © 2014 CA. ALL RIGHTS RESERVED.

Going From The Science Project To Production

The organization realizes that the analytics and insights coming out of a Big Data project are essential

To keep costs down, you start with the basic Hadoop distribution from Apache

Maybe a free tool or two and off you go

Gain traction – tremendous pressure to deliver or the business gets farther behind

More tools, software and data sources are added

You now have a huge number of moving parts, tools from many vendors and a ton of complexity

2

Page 7: Bridging the Big Data Gap in the Software-Driven World

7 © 2014 CA. ALL RIGHTS RESERVED.

The “Big” Big Data Management Pains The Need to Overcome Many Challenges

Managing complex multi-vendor big data environments

Finding Hadoop/Big Data experts

Understanding capacity requirements for rapidly changing business needs

As complexity increases, manual processes are often required

System problems are hard to isolate, downtime increases

Unique tools and shortcomings

Driving forces… acquisitions, department consolidations demand greater operational efficiency

AMZ EMR Console

Mainframe

2

Page 8: Bridging the Big Data Gap in the Software-Driven World

8 © 2014 CA. ALL RIGHTS RESERVED.

Gaps/Complexities in Managing These Environments

How many people do you have to manage your Big Data infrastructure? 1

Do your Big Data administrators always know the health of the systems? 2

Can you detect most problems before significant system outages occur? 3

How many different monitoring tools do you have in place now? 4

How do you know if your capacity is optimized for cost and performance? 5

What was the financial impact of downtime over the past year? 6

2

Page 9: Bridging the Big Data Gap in the Software-Driven World

9 © 2014 CA. ALL RIGHTS RESERVED.

A New Role in the Organization is Born

3

Role / Responsibilities: Hadoop Multi-Vendor Management

Hadoop Resource Management / Reporting

Hadoop Process Management / Automation

Hadoop Job Management & Monitoring

Hadoop System Health Monitoring & Alerts

Perform day-to-day operations and support of Hadoop infrastructure

Monitor/maintain existing clusters and provision new ones

Integrate enterprise monitoring tools

Analyze current workloads and perform capacity planning

Key Management Capabilities:

Big Data / Hadoop Administrator

Page 10: Bridging the Big Data Gap in the Software-Driven World

10 © 2014 CA. ALL RIGHTS RESERVED.

360 Degree Big Data Infrastructure Management Approach

Storage Hadoop Distributed File System

(Unstructured/Structured)

Big Data Infrastructure Management Use Cases

Job Mgmt / Monitoring

Alert Management

Multi-Vendor Management System Management

Resource Mgmt / Reporting

Process Mgmt / Automation

Co

nfi

gura

tio

n

Mo

bili

ty

Secu

rity

CA Big Data Infrastructure Management

Big Data Platform Vendors

Hadoop & Hybrid

NAS Hadoop Distributed File System

Data Movement (ETL)

Data Management

System Health Monitoring

Had

oo

p B

ig D

ata

Pla

tfo

rm V

end

or

A

Had

oo

p B

ig D

ata

Pla

tfo

rm V

end

or

B

Had

oo

p B

ig D

ata

Pla

tfo

rm V

end

or

C

Had

oo

p B

ig D

ata

Pla

tfo

rm V

end

or

D

Hyb

rid

Big

Dat

a P

latf

orm

Ven

do

r A

Hyb

rid

Big

Dat

a P

latf

orm

Ven

do

r B

Hyb

rid

Big

Dat

a P

latf

orm

Ven

do

r C

3

Page 11: Bridging the Big Data Gap in the Software-Driven World

11 © 2014 CA. ALL RIGHTS RESERVED.

360 Degree Big Data Infrastructure Management Approach 3

SINGLE, CONSISTENT MANAGEMENT UI EXPERIENCE

Linux / x86

SINGLE ACCESS POINT INTO HETEROGENEOUS

ENVIRONMENT

OPERATIONALIZE , MANAGE MULTI-VENDOR HADOOP MANAGEMENT DOMAINS

Big Data Infrastructure

Management Server

CA Big Data Infrastructure Management (In Development) Big Data (Hadoop) Infrastructure

(In Development)

Page 12: Bridging the Big Data Gap in the Software-Driven World

12 © 2014 CA. ALL RIGHTS RESERVED.

CA Big Data Infrastructure Management 3

Demo Scenario: A global financial institution has been using Big Data technologies to bring new investment

products to the market. They are now expanding their Big Data environment to support 6 other business units and an

ever growing number of business initiatives. They also discovered that some of the business units had already started their own Big Data

projects using different big data platforms.

Demonstration

(Under development )

Challenges: Revised budget remains flat and requires a 30% to 50% increase in Big Data environment utilization. Significant complexity associated to hosting multiple Hadoop distributions & an increasing number of business- critical Hadoop clusters to support their business apps.

Page 13: Bridging the Big Data Gap in the Software-Driven World

13 © 2014 CA. ALL RIGHTS RESERVED.

Page 14: Bridging the Big Data Gap in the Software-Driven World

14 © 2014 CA. ALL RIGHTS RESERVED.

Page 15: Bridging the Big Data Gap in the Software-Driven World

15 © 2014 CA. ALL RIGHTS RESERVED.

Page 16: Bridging the Big Data Gap in the Software-Driven World

16 © 2014 CA. ALL RIGHTS RESERVED.

Page 17: Bridging the Big Data Gap in the Software-Driven World

17 © 2014 CA. ALL RIGHTS RESERVED.

Page 18: Bridging the Big Data Gap in the Software-Driven World

18 © 2014 CA. ALL RIGHTS RESERVED.

Page 19: Bridging the Big Data Gap in the Software-Driven World

19 © 2014 CA. ALL RIGHTS RESERVED.

Page 20: Bridging the Big Data Gap in the Software-Driven World

20 © 2014 CA. ALL RIGHTS RESERVED.

Page 21: Bridging the Big Data Gap in the Software-Driven World

Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hortonworks Scott Andress

Senior Director, Business Development

CA World 2014

We Do Hadoop

Page 22: Bridging the Big Data Gap in the Software-Driven World

Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hortonworks enables adoption of Apache Hadoop

through HDP (Hortonworks Data Platform)

• Founded in 2011

• Original 24 architects, developers,

operators of Hadoop from Yahoo!

• We are leaders in Hadoop

community

• 500+ employees

Customer Momentum • 300+ customers in seven quarters, growing at 75+/quarter

• Two thirds of customers come from F1000

Hortonworks and Hadoop at

Scale • HDP in production on largest clusters on planet

• Multiple +1000 node clusters, including 35,000 nodes at

Yahoo!, 800 nodes at Spotify

Page 23: Bridging the Big Data Gap in the Software-Driven World

Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Key Drivers of Hadoop

OPERATIONS TOOLS

Provision,

Manage &

Monitor

DEV & DATA TOOLS

Build &

Test

DA

TA S

YST

EM

REPOSITORIES

SOU

RC

ES

RDBMS EDW MPP

AP

PLI

CA

TIO

NS

Business Analytics

Custom Applications

Packaged Applications

Unlock New Approach to Analytics

• Agile analytics via “Schema on Read” with ability to store all data in native format

• Create new apps from new types of data

A

Optimize Investments, Cut Costs

• Focus EDW on high value workloads

• Use commodity servers & storage to enable all data (original and historical) to be accessible for ongoing exploration

B

Enable a Modern Data Architecture

• Integrate new & existing data sets

• Make all data available for shared access and processing in multitenant infrastructure

• Batch, interactive & real-time use cases

• Integrated with existing tools & skills

C

EXISTING Systems

Clickstream Web & Social

Geolocation Sensor & Machine

Server Logs

Unstructured

YARN: Data Operating System

° ° ° ° ° ° ° ° °

Interactive Real-Time Batch

HDFS: Hadoop Distributed File

System

Page 24: Bridging the Big Data Gap in the Software-Driven World

Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hortonworks Approach

Innovate the Core 1

Architect and build

innovation at the core of

Hadoop

• YARN: Data Operating

System

• HDFS as the storage layer

• Key processing engines

Extend Hadoop as an

Enterprise Data Platform 2 Enable the Ecosystem 3

Extend Hadoop with enterprise

capabilities for governance,

security & operations

Apply enterprise software rigor

to the open source development

process

Enable the leaders in the data

center to easily adopt & extend

their platforms

• Establish Hadoop as standard

component of a modern data

architecture

• Joint engineering

YARN : Data Operating System

Script

Pig

Search

Solr

SQL

Hive/Tez, HCatalog

NoSQL

HBase Accumulo

Stream

Storm

Batch

Map Reduce

HDFS (Hadoop Distributed File System)

HDP 2.1

Go

vern

an

ce

& In

teg

rati

on

Secu

rity

Op

era

tio

ns

Data Access

Data Management

YARN

Page 25: Bridging the Big Data Gap in the Software-Driven World

Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN : Data Operating System

Script

Pig

Search

Solr

SQL

Hive/Tez, HCatalog

NoSQL

HBase Accumulo

Stream

Storm

Batch

Map Reduce

HDFS (Hadoop Distributed File System)

Contributes more to the Apache Hadoop

ecosystem in the ASF than any other

vendor

Hadoop is a platform decision

• Open Source: fastest path to innovation for a platform technology

• Eliminate vendor lock in, no proprietary software

• Data center leaders have committed to the open source approach

…all done completely in Open Source 4

Apache

Project

Committer

s

PMC

Members

Hadoop 27 20

Tez 15 15

Hive 16 4

HBase 6 4

Pig 5 5

Accumul

o 2 2

Flume 1 0

Storm 3 2

Sqoop 1 1

Ambari 32 27

Oozie 3 2

Zookeepe

r 2 1

Knox 11 5

Falcon 5 3

TOTAL 129 91

HDP 2.1

Go

vern

an

ce

& In

teg

rati

on

Secu

rity

Op

era

tio

ns

Data Access

Data Management

YARN

Page 26: Bridging the Big Data Gap in the Software-Driven World

26 © 2014 CA. ALL RIGHTS RESERVED.

Our Vision - Big Data Infrastructure Management Extending our IT Management Leadership

5

“It’s extremely difficult for data scientists, Chief Marketing Officers (CMOs) and other stakeholders to get access to their raw System z data in tandem with machine logs and other types of transactional information,” said Mike Madden, general manager, Mainframe, CA Technologies. “Customers around the world are looking for greater insight to gain competitive advantage and much of the world’s most important transactional data resides on System z. Veristorm provides next-generation data movement technology that makes it easier to move System z data into Hadoop, lowering overall total cost of ownership.”

LAS VEGAS, November 10, 2014 — CA WORLD ’14 — CA Technologies (NASDAQ:CA) today announced a new global distribution agreement with Veristorm, a software company focused on Big Data management. The agreement strengthens CA’s ability to help customers leverage key business data on the mainframe for Big Data and analytics projects.

Page 27: Bridging the Big Data Gap in the Software-Driven World

27 © 2014 CA. ALL RIGHTS RESERVED.

Wrap Up

Key Thoughts… The Big Data market is forcing significant changes to IT.

Most Big Data infrastructures will grow in complexity as business needs evolve.

Think ahead - you will need to effectively manage mixed, heterogeneous Big Data environments.

Next Steps Understand the changes (e.g. Hadoop) and align a Big Data roadmap to meet your changing

business needs.

Retain flexibility & adopt the Big Data technologies that are right for your business needs.

Choose a management solution that can support the range of Big Data technologies your business requires now and in the future. Consider CA Big Data Infrastructure Management

5

Page 28: Bridging the Big Data Gap in the Software-Driven World

28 © 2014 CA. ALL RIGHTS RESERVED.

Polling Question

HAVE PROJECT IN PRODUCTION

CONDUCTING A PILOT PROJECT

PROJECT BEING PLANNED

INVESTIGATING A PROJECT

NONE OF THE ABOVE

1

2

3

4

5

When it comes to a Big Data project, what best describes your organization:

Page 29: Bridging the Big Data Gap in the Software-Driven World

29 © 2014 CA. ALL RIGHTS RESERVED.

For More Information

To learn more about Mainframe solutions from

CA Technologies, please visit:

http://bit.ly/1wbiPkl

Insert appropriate screenshot and text overlay from following “More Info Graphics” slide here;

ensure it links to correct page Mainframe

Page 30: Bridging the Big Data Gap in the Software-Driven World

30 © 2014 CA. ALL RIGHTS RESERVED.

For Informational Purposes Only

© 2014 CA. All rights reserved. All trademarks referenced herein belong to their respective companies.

This presentation provided at CA World 2014 is intended for information purposes only and does not form any type of warranty. Some of the specific slides with customer references relate to customer's specific use and experience of CA products and solutions so actual results may vary.

Terms of this Presentation