Bridging the Big Data Gap in the Software-Driven World Michael Harer
MFT09S #CAWorld
CA Technologies Product Management
Mainframe
Scott Andress Hortonworks Sr. Director, Business Development
2 © 2014 CA. ALL RIGHTS RESERVED.
Abstract
Michael Harer
CA Technologies
Sr. Principal Product Mgr.
Database and Analytics
Implementing and managing a Big Data environment
effectively requires essential efficiencies such as
automation, performance monitoring and flexible
infrastructure management. Discover new innovations
that enable you to manage entire Big Data environments
with unparalleled ease of use and clear enterprise
visibility across a variety of data repositories.
3 © 2014 CA. ALL RIGHTS RESERVED.
Agenda
QUICK REFRESHER ON BIG DATA
BIG DATA INFRASTRUCTURE MANAGEMENT CHALLENGES
RECOMMENDED SESSIONS / RELATED ACTIVITIES
360 DEGREE BIG DATA INFRASTRUCTURE MANAGEMENT APPROACH
HORTONWORKS BIG DATA PLATFORM
SUMMARY
1
2
3
4
5
6
4 © 2014 CA. ALL RIGHTS RESERVED.
Big Data Means Different Things To Different People 1
4 © 2014 CA. ALL RIGHTS RESERVED.
High-Velocity capture, discovery and/or analysis
Large Volumes of a Variety of data from various sources across the enterprise
Veracity – keeping the right, trusted data
Or explained via the 4 Vs…
Any analytical processing that is different from the traditional data warehouse applications in place today
Defined by the types and speed of data being analyzed
Customers define Big Data in a broad sense:
5 © 2014 CA. ALL RIGHTS RESERVED. 5 © 2014 CA. ALL RIGHTS RESERVED.
Big Data – Growing Fast 1
80 percent of data is unstructured (images, audio, tweets, etc.
New analytic applications based on a next generation big data platform are reaching the market
Low-cost hardware and software environments
– Less costly capture and exploitation of big data
Data volumes are doubling every year
Organizations are storing three or more years of data
Hadoop Administrator
Hadoop Developer/Architect
Data Scientist, etc.
Commoditized Hardware and Software New Personas
Capturing and Managing lots of information
Working with many new types of data
6 © 2014 CA. ALL RIGHTS RESERVED.
Going From The Science Project To Production
The organization realizes that the analytics and insights coming out of a Big Data project are essential
To keep costs down, you start with the basic Hadoop distribution from Apache
Maybe a free tool or two and off you go
Gain traction – tremendous pressure to deliver or the business gets farther behind
More tools, software and data sources are added
You now have a huge number of moving parts, tools from many vendors and a ton of complexity
2
7 © 2014 CA. ALL RIGHTS RESERVED.
The “Big” Big Data Management Pains The Need to Overcome Many Challenges
Managing complex multi-vendor big data environments
Finding Hadoop/Big Data experts
Understanding capacity requirements for rapidly changing business needs
As complexity increases, manual processes are often required
System problems are hard to isolate, downtime increases
Unique tools and shortcomings
Driving forces… acquisitions, department consolidations demand greater operational efficiency
AMZ EMR Console
Mainframe
2
8 © 2014 CA. ALL RIGHTS RESERVED.
Gaps/Complexities in Managing These Environments
How many people do you have to manage your Big Data infrastructure? 1
Do your Big Data administrators always know the health of the systems? 2
Can you detect most problems before significant system outages occur? 3
How many different monitoring tools do you have in place now? 4
How do you know if your capacity is optimized for cost and performance? 5
What was the financial impact of downtime over the past year? 6
2
9 © 2014 CA. ALL RIGHTS RESERVED.
A New Role in the Organization is Born
3
Role / Responsibilities: Hadoop Multi-Vendor Management
Hadoop Resource Management / Reporting
Hadoop Process Management / Automation
Hadoop Job Management & Monitoring
Hadoop System Health Monitoring & Alerts
Perform day-to-day operations and support of Hadoop infrastructure
Monitor/maintain existing clusters and provision new ones
Integrate enterprise monitoring tools
Analyze current workloads and perform capacity planning
Key Management Capabilities:
Big Data / Hadoop Administrator
10 © 2014 CA. ALL RIGHTS RESERVED.
360 Degree Big Data Infrastructure Management Approach
Storage Hadoop Distributed File System
(Unstructured/Structured)
Big Data Infrastructure Management Use Cases
Job Mgmt / Monitoring
Alert Management
Multi-Vendor Management System Management
Resource Mgmt / Reporting
Process Mgmt / Automation
Co
nfi
gura
tio
n
Mo
bili
ty
Secu
rity
CA Big Data Infrastructure Management
Big Data Platform Vendors
Hadoop & Hybrid
NAS Hadoop Distributed File System
Data Movement (ETL)
Data Management
System Health Monitoring
Had
oo
p B
ig D
ata
Pla
tfo
rm V
end
or
A
Had
oo
p B
ig D
ata
Pla
tfo
rm V
end
or
B
Had
oo
p B
ig D
ata
Pla
tfo
rm V
end
or
C
Had
oo
p B
ig D
ata
Pla
tfo
rm V
end
or
D
Hyb
rid
Big
Dat
a P
latf
orm
Ven
do
r A
Hyb
rid
Big
Dat
a P
latf
orm
Ven
do
r B
Hyb
rid
Big
Dat
a P
latf
orm
Ven
do
r C
3
11 © 2014 CA. ALL RIGHTS RESERVED.
360 Degree Big Data Infrastructure Management Approach 3
SINGLE, CONSISTENT MANAGEMENT UI EXPERIENCE
Linux / x86
SINGLE ACCESS POINT INTO HETEROGENEOUS
ENVIRONMENT
OPERATIONALIZE , MANAGE MULTI-VENDOR HADOOP MANAGEMENT DOMAINS
Big Data Infrastructure
Management Server
CA Big Data Infrastructure Management (In Development) Big Data (Hadoop) Infrastructure
(In Development)
12 © 2014 CA. ALL RIGHTS RESERVED.
CA Big Data Infrastructure Management 3
Demo Scenario: A global financial institution has been using Big Data technologies to bring new investment
products to the market. They are now expanding their Big Data environment to support 6 other business units and an
ever growing number of business initiatives. They also discovered that some of the business units had already started their own Big Data
projects using different big data platforms.
Demonstration
(Under development )
Challenges: Revised budget remains flat and requires a 30% to 50% increase in Big Data environment utilization. Significant complexity associated to hosting multiple Hadoop distributions & an increasing number of business- critical Hadoop clusters to support their business apps.
13 © 2014 CA. ALL RIGHTS RESERVED.
14 © 2014 CA. ALL RIGHTS RESERVED.
15 © 2014 CA. ALL RIGHTS RESERVED.
16 © 2014 CA. ALL RIGHTS RESERVED.
17 © 2014 CA. ALL RIGHTS RESERVED.
18 © 2014 CA. ALL RIGHTS RESERVED.
19 © 2014 CA. ALL RIGHTS RESERVED.
20 © 2014 CA. ALL RIGHTS RESERVED.
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks Scott Andress
Senior Director, Business Development
CA World 2014
We Do Hadoop
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks enables adoption of Apache Hadoop
through HDP (Hortonworks Data Platform)
• Founded in 2011
• Original 24 architects, developers,
operators of Hadoop from Yahoo!
• We are leaders in Hadoop
community
• 500+ employees
Customer Momentum • 300+ customers in seven quarters, growing at 75+/quarter
• Two thirds of customers come from F1000
Hortonworks and Hadoop at
Scale • HDP in production on largest clusters on planet
• Multiple +1000 node clusters, including 35,000 nodes at
Yahoo!, 800 nodes at Spotify
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Key Drivers of Hadoop
OPERATIONS TOOLS
Provision,
Manage &
Monitor
DEV & DATA TOOLS
Build &
Test
DA
TA S
YST
EM
REPOSITORIES
SOU
RC
ES
RDBMS EDW MPP
AP
PLI
CA
TIO
NS
Business Analytics
Custom Applications
Packaged Applications
Unlock New Approach to Analytics
• Agile analytics via “Schema on Read” with ability to store all data in native format
• Create new apps from new types of data
A
Optimize Investments, Cut Costs
• Focus EDW on high value workloads
• Use commodity servers & storage to enable all data (original and historical) to be accessible for ongoing exploration
B
Enable a Modern Data Architecture
• Integrate new & existing data sets
• Make all data available for shared access and processing in multitenant infrastructure
• Batch, interactive & real-time use cases
• Integrated with existing tools & skills
C
EXISTING Systems
Clickstream Web & Social
Geolocation Sensor & Machine
Server Logs
Unstructured
YARN: Data Operating System
° ° ° ° ° ° ° ° °
Interactive Real-Time Batch
HDFS: Hadoop Distributed File
System
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks Approach
Innovate the Core 1
Architect and build
innovation at the core of
Hadoop
• YARN: Data Operating
System
• HDFS as the storage layer
• Key processing engines
Extend Hadoop as an
Enterprise Data Platform 2 Enable the Ecosystem 3
Extend Hadoop with enterprise
capabilities for governance,
security & operations
Apply enterprise software rigor
to the open source development
process
Enable the leaders in the data
center to easily adopt & extend
their platforms
• Establish Hadoop as standard
component of a modern data
architecture
• Joint engineering
YARN : Data Operating System
Script
Pig
Search
Solr
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Batch
Map Reduce
HDFS (Hadoop Distributed File System)
HDP 2.1
Go
vern
an
ce
& In
teg
rati
on
Secu
rity
Op
era
tio
ns
Data Access
Data Management
YARN
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN : Data Operating System
Script
Pig
Search
Solr
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Batch
Map Reduce
HDFS (Hadoop Distributed File System)
Contributes more to the Apache Hadoop
ecosystem in the ASF than any other
vendor
Hadoop is a platform decision
• Open Source: fastest path to innovation for a platform technology
• Eliminate vendor lock in, no proprietary software
• Data center leaders have committed to the open source approach
…all done completely in Open Source 4
Apache
Project
Committer
s
PMC
Members
Hadoop 27 20
Tez 15 15
Hive 16 4
HBase 6 4
Pig 5 5
Accumul
o 2 2
Flume 1 0
Storm 3 2
Sqoop 1 1
Ambari 32 27
Oozie 3 2
Zookeepe
r 2 1
Knox 11 5
Falcon 5 3
TOTAL 129 91
HDP 2.1
Go
vern
an
ce
& In
teg
rati
on
Secu
rity
Op
era
tio
ns
Data Access
Data Management
YARN
26 © 2014 CA. ALL RIGHTS RESERVED.
Our Vision - Big Data Infrastructure Management Extending our IT Management Leadership
5
“It’s extremely difficult for data scientists, Chief Marketing Officers (CMOs) and other stakeholders to get access to their raw System z data in tandem with machine logs and other types of transactional information,” said Mike Madden, general manager, Mainframe, CA Technologies. “Customers around the world are looking for greater insight to gain competitive advantage and much of the world’s most important transactional data resides on System z. Veristorm provides next-generation data movement technology that makes it easier to move System z data into Hadoop, lowering overall total cost of ownership.”
LAS VEGAS, November 10, 2014 — CA WORLD ’14 — CA Technologies (NASDAQ:CA) today announced a new global distribution agreement with Veristorm, a software company focused on Big Data management. The agreement strengthens CA’s ability to help customers leverage key business data on the mainframe for Big Data and analytics projects.
27 © 2014 CA. ALL RIGHTS RESERVED.
Wrap Up
Key Thoughts… The Big Data market is forcing significant changes to IT.
Most Big Data infrastructures will grow in complexity as business needs evolve.
Think ahead - you will need to effectively manage mixed, heterogeneous Big Data environments.
Next Steps Understand the changes (e.g. Hadoop) and align a Big Data roadmap to meet your changing
business needs.
Retain flexibility & adopt the Big Data technologies that are right for your business needs.
Choose a management solution that can support the range of Big Data technologies your business requires now and in the future. Consider CA Big Data Infrastructure Management
5
28 © 2014 CA. ALL RIGHTS RESERVED.
Polling Question
HAVE PROJECT IN PRODUCTION
CONDUCTING A PILOT PROJECT
PROJECT BEING PLANNED
INVESTIGATING A PROJECT
NONE OF THE ABOVE
1
2
3
4
5
When it comes to a Big Data project, what best describes your organization:
29 © 2014 CA. ALL RIGHTS RESERVED.
For More Information
To learn more about Mainframe solutions from
CA Technologies, please visit:
http://bit.ly/1wbiPkl
Insert appropriate screenshot and text overlay from following “More Info Graphics” slide here;
ensure it links to correct page Mainframe
30 © 2014 CA. ALL RIGHTS RESERVED.
For Informational Purposes Only
© 2014 CA. All rights reserved. All trademarks referenced herein belong to their respective companies.
This presentation provided at CA World 2014 is intended for information purposes only and does not form any type of warranty. Some of the specific slides with customer references relate to customer's specific use and experience of CA products and solutions so actual results may vary.
Terms of this Presentation