68
Solution Guide EMC HYBRID CLOUD SOLUTION WITH VMWARE Hadoop Applications Solution Guide 2.5 EMC Solutions Abstract This document serves as a reference for planning and designing a Pivotal Hadoop solution that enables IT organizations to quickly deploy Hadoop as a service (HaaS) on an existing cloud. August 2014

EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Embed Size (px)

DESCRIPTION

This Solution Guide is a reference for planning and designing a Pivotal Hadoop solution.

Citation preview

Page 1: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Solution Guide

EMC HYBRID CLOUD SOLUTION WITH VMWARE Hadoop Applications Solution Guide 2.5

EMC Solutions

Abstract

This document serves as a reference for planning and designing a Pivotal Hadoop solution that enables IT organizations to quickly deploy Hadoop as a service (HaaS) on an existing cloud.

August 2014

Page 2: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

2 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Copyright © 2014 EMC Corporation. All rights reserved. Published in the USA.

Published August 2014

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Part Number H13221

Page 3: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Contents

3 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Contents

Chapter 1 Executive Summary 7

Document purpose ..................................................................................................... 8

Audience .................................................................................................................... 8

Solution purpose ........................................................................................................ 8

Business challenge .................................................................................................... 9

Technology solution ................................................................................................... 9

Chapter 2 EMC Hybrid Cloud Solution Overview 11

Introduction ............................................................................................................. 12

EMC Hybrid Cloud features and functionality ............................................................ 13

Automation and self-service provisioning ............................................................ 13

Multitenancy and secure separation .................................................................... 14

Workload-optimized storage ................................................................................ 14

Elasticity and service assurance .......................................................................... 14

Operational monitoring and management ............................................................ 15

Metering and chargeback .................................................................................... 15

Modular add-on components ............................................................................... 16

Chapter 3 EMC Hybrid Cloud Hadoop as a Service 19

Overview .................................................................................................................. 20

EMC Hybrid Cloud HaaS and IaaS ............................................................................. 20

Pivotal Hadoop ......................................................................................................... 21

Serengeti .................................................................................................................. 22

VMware vSphere Big Data Extensions ....................................................................... 22

Chapter 4 HaaS Component Integration 25

Overview .................................................................................................................. 26

Integrating Hadoop components with EMC Hybrid Cloud .......................................... 26

BDE Topology....................................................................................................... 26

Virtualized Hadoop .............................................................................................. 27

Configuring the platform ........................................................................................... 28

Installing and configuring BDE ............................................................................. 28

Installing and configuring PHD ............................................................................. 30

Installing and configuring EMC Hybrid Cloud IaaS ................................................ 33

Page 4: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Contents

4 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Chapter 5 Creating vCO Workflows and vCAC Catalog Services for HaaS 35

Overview .................................................................................................................. 36

Importing and modifying custom vCO workflows ...................................................... 36

Modifying custom workflows ............................................................................... 36

Creating BDE Clusters ............................................................................................... 42

Creating new BDE clusters ................................................................................... 42

Configuring a Hadoop cluster ............................................................................... 42

Creating vCAC Catalog Services ................................................................................ 45

Accessing vCAC ................................................................................................... 45

Creating a new service blueprint .......................................................................... 45

Chapter 6 Use Cases: EMC Hybrid Cloud IaaS 49

Overview .................................................................................................................. 50

IaaS – storage services ............................................................................................. 50

Overview .............................................................................................................. 50

Use case 1: Storage provisioning ......................................................................... 50

Use case 2: Select virtual machine storage .......................................................... 54

Use case 3: Metering storage services ................................................................. 55

Summary ............................................................................................................. 56

Monitoring and capacity planning ............................................................................ 57

Monitoring ........................................................................................................... 57

Capacity planning ................................................................................................ 57

Capacity planning example .................................................................................. 60

Metering and chargeback ......................................................................................... 61

Chapter 7 Conclusion 65

Summary .................................................................................................................. 66

Appendix A References 67

VMware references ................................................................................................... 68

Page 5: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Contents

5 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figures Figure 1. EMC Hybrid Cloud key components ..................................................... 12

Figure 2. EMC Hybrid Cloud self-service portal ................................................... 14

Figure 3. EMC ViPR Analytics with VMware vCenter Operations Manager ............ 15

Figure 4. IT Business Management Suite overview dashboard for hybrid cloud .. 16

Figure 5. EMC Hybrid Cloud HaaS component overview ...................................... 21

Figure 6. Pivotal Hadoop (PHD) components ...................................................... 22

Figure 7. BDE and Serengeti stack ...................................................................... 23

Figure 8. BDE and vSphere deployment topology ............................................... 26

Figure 9. The evolution of virtual Hadoop ........................................................... 27

Figure 10. Configuring the SSO lookup service and management server IP addresses ........................................................................................... 29

Figure 11. Importing Hadoop binaries into BDE management server .................... 31

Figure 12. Removing the default Apache template from BDE ................................ 32

Figure 13. Importing custom workflows into vCO .................................................. 36

Figure 14. Using the validate workflows action .................................................... 37

Figure 15. How to edit the attributes .................................................................... 37

Figure 16. Editing and creating custom parameter passing .................................. 38

Figure 17. Launching scripts from the VCO ........................................................... 39

Figure 18. Launching of Micro Hadoop Cluster workflow ...................................... 40

Figure 19. Status of creation of Micro Hadoop cluster from BDE (vSphere web client) .................................................................................................. 41

Figure 20. Status of Micro Hadoop cluster creation from BDE vSphere Client ....... 41

Figure 21. Create and name a new Big Data Cluster ............................................. 42

Figure 22. Advance Service Designer ................................................................... 46

Figure 23. Edit Entitlement window ...................................................................... 46

Figure 24. vCAC Service Catalog showing Hadoop as a Service ............................ 47

Figure 25. Storage Services - Provision cloud storage .......................................... 51

Figure 26. Provision Cloud Storage – select vCenter cluster ................................. 52

Figure 27. Storage Provisioning – Select datastore type ....................................... 52

Figure 28. Storage provisioning – Choose ViPR storage pool ................................ 53

Figure 29. Storage provisioning – Enter storage size ............................................ 53

Figure 30. Provision Storage – Storage Reservation for vCAC Business Group ...... 53

Figure 31. Set storage reservation policy for virtual machine disks ...................... 54

Figure 32. Create new virtual machine storage profile for Tier 2 storage ............... 55

Figure 33. Automatic discovery of storage capabilities using EMC ViPR Storage Provider ............................................................................................... 55

Figure 34. VMware ITBM chargeback based on storage profile of datastore ......... 56

Figure 35. Choosing virtual machine consumption models and profiles ............... 58

Page 6: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Contents

6 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 36. Specifying configuration and projected capacity usage of new virtual machines ............................................................................................ 58

Figure 37. Capacity summary showing insufficient CPU and RAM resources ......... 59

Figure 38. Specifying number of hosts and amount of CPU and memory .............. 59

Figure 39. Specifying datastore size ..................................................................... 60

Figure 40. Compared scenarios ............................................................................ 60

Figure 41. Combined scenarios ............................................................................ 61

Figure 42. Categorized hybrid cloud environment cost overview .......................... 62

Figure 43. vSphere Cluster cost overview ............................................................. 63

Figure 44. Storage cost overview .......................................................................... 63

Page 7: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 1: Executive Summary

7 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Chapter 1 Executive Summary

This chapter presents the following topics:

Document purpose ..................................................................................................... 8

Audience .................................................................................................................... 8

Solution purpose........................................................................................................ 8

Business challenge .................................................................................................... 9

Technology solution ................................................................................................... 9

Page 8: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 1: Executive Summary

8 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Document purpose

This document serves as a reference for planning and designing a Pivotal Hadoop solution that enables IT organizations to quickly deploy Hadoop as a service (HaaS) on an existing cloud. The solution delivers infrastructure as-a-service (IaaS) capabilities to support big data application development. This document introduces the main features and functionality of the solution, the solution architecture and key components, and the validated hardware and software environment. It demonstrates the integration of Pivotal Hadoop Enterprise in the EMC® Hybrid Cloud solution.

The Pivotal Hadoop solution is a modular add-on to the EMC Hybrid Cloud solution. EMC Hybrid Cloud Solution with VMware: Foundation Infrastructure Reference Architecture 2.5 and EMC Hybrid Cloud Solution with VMware: Foundation Infrastructure Solution Guide 2.5 describe the reference architecture and the foundation solution upon which all the EMC Hybrid Cloud add-on solutions build.

The following documents provide further information about how to implement specific capabilities or enable specific use cases within the EMC Hybrid Cloud solution with VMware:

EMC Hybrid Cloud Solution with VMware: Data Protection Continuous Availability Solution Guide 2.5

EMC Hybrid Cloud Solution with VMware: Data Protection Disaster Recovery Solution Guide 2.5

EMC Hybrid Cloud Solution with VMware: Data Protection Backup Solution Guide 2.5

EMC Hybrid Cloud Solution with VMware: Security Solution Guide 2.5

EMC Hybrid Cloud Solution with VMware: Pivotal CF Platform as a Service Solution Guide 2.5

Audience

This document is intended for executives, managers, architects, cloud administrators, and technical administrators of IT environments who want to build a self-service Pivotal Hadoop-based Enterprise big data platform. Readers should be familiar with VMware vCloud Suite, Pivotal Hadoop, VMware Big Data Extensions (BDE), EMC ViPR®, general IaaS defined datacenter concepts, and how a hybrid cloud infrastructure accommodates these technologies and requirements.

Solution purpose

The EMC Hybrid Cloud solution enables EMC customers to build an enterprise-class, scalable, multitenant infrastructure that enables:

Complete management of the infrastructure and application service lifecycle

On-demand access to and control of network bandwidth, servers, storage, and security

Page 9: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 1: Executive Summary

9 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Quick deployment of IaaS components to support HaaS-based services without IT administrator involvement

Scalable, elastic, flexible HaaS-based services for maximum asset utilization

Access to application services from a single platform for both business-critical and next-generation cloud applications

This solution provides the reference architecture and the best practice guidance necessary to integrate the key components and functionality of enterprise HaaS into an underlying EMC Hybrid Cloud infrastructure.

Business challenge

Today’s enterprise demands an agile development platform that can enable the continuous delivery, updating, and horizontal scalability of applications. The Pivotal Hadoop (PHD) platform enables developers to easily deploy, bind, and scale applications and data services. When integrated with VMware vCloud Automation Center, it delivers a self-service Pivotal Hadoop platform that facilitates rapid deployment and instant scaling or updating of Hadoop clusters.

HaaS interoperability with the underlying infrastructure needs to accommodate consumable new generation applications while maintaining existing end-to-end service delivery to provide:

Efficiency and flexibility

Fast, proactive responses for services requests

Easy as-a-service model of deployment

Adequate visibility into the cost of the infrastructure

Technology solution

This EMC Hybrid Cloud solution integrates the best of EMC, VMware, and Pivotal products and services, and empowers IT organizations to adopt an as-a-service implementation model of compute and storage infrastructure within the data center. Agile, elastic, on-demand, end-to-end IaaS provisioning is crucial to support a comprehensive, dynamic, and fast-growing big data environment.

The key solution components include:

EMC ViPR software-defined storage platform

VMware vCloud Suite cloud management and infrastructure

EMC and VMware integrated workflows

VMware NSX virtual networking technologies

VMware vSphere virtualization platform

VMware Big Data Extensions (BDE) with Project Serengeti

Pivotal Hadoop (PHD)

Page 10: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 1: Executive Summary

10 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Page 11: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 2: EMC Hybrid Cloud Solution Overview

11 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Chapter 2 EMC Hybrid Cloud Solution Overview

This chapter presents the following topics:

Introduction ............................................................................................................. 12

EMC Hybrid Cloud features and functionality ........................................................... 13

Page 12: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 2: EMC Hybrid Cloud Solution Overview

12 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Introduction

The EMC Hybrid Cloud solution enables a well-run hybrid cloud by bringing new functionality not only to IT organizations, but also to developers, end users, and line-of-business owners. Beyond delivering baseline infrastructure as a service (IaaS), built on a software-defined data center (SDDC) architecture, the solution delivers feature-rich capabilities to expand from IaaS to business-enabling IT as a service (ITaaS). Backup as a service (BaaS) and disaster recovery as a service (DRaaS) are now policies that users can enable with just a few mouse clicks. End users and developers can quickly access a marketplace of resources for Microsoft, Oracle, SAP, EMC Syncplicity®, and Pivotal applications, and can add third-party packages as required. All of these resources can be deployed on private cloud or public cloud services, including VMware vCloud Air, from EMC-powered cloud service providers.

The EMC Hybrid Cloud solution uses the best of EMC and VMware products and services, and takes advantage of the strong integration between EMC and VMware technologies to provide the foundation for enabling IaaS on new and existing infrastructure for the hybrid cloud.

Figure 1 shows the key components of the EMC Hybrid Cloud solution. For detailed information, refer to EMC Hybrid Cloud Solution with VMware: Foundation Infrastructure Solution Guide 2.5. For information on EMC Hybrid Cloud modular add-on solutions, which provide functionality such as data protection, continuous availability, and application services, refer to Modular add-on components and to the individual Solution Guides for those add-ons.

Figure 1. EMC Hybrid Cloud key components

Page 13: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 2: EMC Hybrid Cloud Solution Overview

13 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

EMC Hybrid Cloud features and functionality

The EMC Hybrid Cloud solution incorporates the following features and functionality:

Automation and self-service provisioning

Multitenancy and secure separation

Workload-optimized storage

Elasticity and service assurance

Operational monitoring and management

Metering and chargeback

Modular add-on components

The solution provides self-service provisioning of automated cloud services to both users and infrastructure administrators. It uses VMware vCloud Automation Center (vCAC), integrated with EMC ViPR software-defined storage and VMware NSX, to provide the compute, storage, network, and security virtualization platforms for the SDDC.

Cloud users can request and manage their own applications and compute resources within established operational policies. This can reduce IT service delivery times from days or weeks to minutes. Automation and self-service provisioning features include:

Self-service portal—Provides a cross-cloud storefront that delivers a catalog of custom-defined services for provisioning workloads based on business and IT policies, as shown in Figure 2

Role-based entitlements—Ensure that the self-service portal presents only the virtual machine, application, or service blueprints appropriate to a user’s role within the business

Resource reservations—Allocate resources for use by a specific group and ensure that those resources are inaccessible to other groups

Service levels—Define the amount and types of resources that a particular service can receive during initial provisioning or as part of configuration changes

Blueprints—Contain the build specifications and automation policies that define the process for building or reconfiguring compute resources

Automation and self-service provisioning

Page 14: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 2: EMC Hybrid Cloud Solution Overview

14 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 2. EMC Hybrid Cloud self-service portal

The solution provides the ability to enforce physical and virtual separation for multitenancy, as strongly as the administrator requires. This separation can encompass network, compute, and storage resources to ensure appropriate security and performance for each tenant.

The solution supports secure multitenancy through vCAC role-based access control (RBAC), which enables vCAC roles to be mapped to Microsoft Active Directory groups. The self-service portal shows only the appropriate views, functions, and operations to cloud users, based on their role within the business.

The solution enables customers to take advantage of the proven benefits of EMC storage in a hybrid cloud environment. Using ViPR storage services, which leverage the capabilities of EMC VNX® and EMC VMAX® storage systems, the solution provides software-defined, policy-based management of block- and file-based virtual storage. ViPR abstracts the storage configuration and presents it as a single storage control point, enabling cloud administrators to access all heterogeneous storage resources within a data center as if the resources were a single large array.

The solution uses the capabilities of vCAC and various EMC tools to provide the intelligence and visibility required to proactively ensure service levels in virtual and cloud environments. Infrastructure administrators can add storage, compute, and network resources to their resource pools as needed. Cloud users can select from a range of service levels for compute, storage, and data protection for their applications and can expand the resources of their virtual machines on demand to achieve the service levels they expect for their application workloads.

Multitenancy and secure separation

Workload-optimized storage

Elasticity and service assurance

Page 15: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 2: EMC Hybrid Cloud Solution Overview

15 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

The solution features automated monitoring and management capabilities that provide IT administrators with a comprehensive view of the cloud environment to enable smart decision-making for resource provisioning and allocation. These automated capabilities are based on a combination of EMC ViPR Storage Resource Management (SRM), VMware vCenter Log Insight, and VMware vCenter Operations Manager (vC Ops), and use EMC plug-ins for ViPR, VNX, VMAX, and EMC Avamar® systems to provide extensive additional storage detail.

Cloud administrators can use ViPR SRM to understand and manage the impact that storage has on their applications and to view their storage topologies from application to disk, as shown in Figure 3.

Figure 3. EMC ViPR Analytics with VMware vCenter Operations Manager

Capacity analytics and what-if scenarios in vC Ops identify over-provisioned resources so they can be right-sized for the most efficient use of virtualized resources. In addition, for centralized logging, infrastructure components can be configured to forward their logs to vCenter Log Insight, which then aggregates the logs from all the disparate sources for analytics and reporting.

The solution uses VMware IT Business Management Suite (ITBM) to provide cloud administrators with comprehensive metering and cost information across all business groups in the enterprise. ITBM is integrated into the cloud administrator’s self-service portal and presents a dashboard overview of the hybrid cloud infrastructure, as shown in Figure 4.

Operational monitoring and management

Metering and chargeback

Page 16: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 2: EMC Hybrid Cloud Solution Overview

16 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 4. IT Business Management Suite overview dashboard for hybrid cloud

The EMC Hybrid Cloud solution provides modular add-on components for the following services:

Application services

This add-on solution leverages VMware vCloud Application Director to optimize application deployment and release management through logical application blueprints in vCAC. Users can quickly and easily deploy blueprints for applications and databases such as Microsoft Exchange, Microsoft SQL Server, Microsoft SharePoint, Oracle, and SAP.

Data protection services

EMC Avamar and EMC Data Domain® systems provide a backup infrastructure that offers features such as deduplication, compression, and VMware integration. By using VMware vCenter Orchestrator (vCO) workflows customized by EMC, administrators can quickly and easily set up multitier data protection policies and enable users to select an appropriate policy when they provision their virtual machines.

Continuous availability

A combination of EMC VPLEX® virtual storage and VMware vSphere High Availability (HA) provides the ability to federate information across multiple data centers over synchronous distances. With virtual storage and virtual servers working together over distance, the infrastructure can transparently provide load balancing, real time remote data access, and improved application protection.

Modular add-on components

Page 17: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 2: EMC Hybrid Cloud Solution Overview

17 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Disaster recovery

This add-on solution enables cloud administrators to select disaster recovery (DR) protection for their applications and virtual machines when they provision their hybrid cloud environment. ViPR automatically places these systems on storage that is protected remotely by EMC RecoverPoint® technology. VMware vCenter Site Recovery Manager automates the recovery of all virtual storage and virtual machines.

Platform as a service

The EMC Hybrid Cloud solution provides an elastic and scalable IaaS foundation for platform-as-a-service (PaaS) and software-as-a-service (SaaS) services. Pivotal CF provides a highly available platform that enables application owners to easily deliver and manage applications over the application lifecycle. The EMC Hybrid Cloud service offerings enable PaaS administrators to easily provision compute and storage resources on demand to support scalability and growth in their Pivotal CF enterprise PaaS environments.

Page 18: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 2: EMC Hybrid Cloud Solution Overview

18 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Page 19: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 3: EMC Hybrid Cloud Hadoop as a Service

19 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Chapter 3 EMC Hybrid Cloud Hadoop as a Service

This chapter presents the following topics:

Overview .................................................................................................................. 20

EMC Hybrid Cloud HaaS and IaaS ............................................................................. 20

Pivotal Hadoop ......................................................................................................... 21

Serengeti ................................................................................................................. 22

VMware Big Data Extensions .................................................................................... 22

Page 20: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 3: EMC Hybrid Cloud Hadoop as a Service

20 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Overview

This chapter identifies and briefly describes the major features and functionality required to support Pivotal Hadoop as a service and promote scalability in the EMC Hybrid Cloud environment.

EMC Hybrid Cloud HaaS and IaaS

Project Serengeti

VMware Big Data Extensions (BDE)

Pivotal Hadoop (PHD)

HaaS Self-Service Portal

EMC Hybrid Cloud HaaS and IaaS

EMC Hybrid Cloud HaaS is a solution stack made up of EHC IaaS, integrated with BDE and PHD. The self-service aspect of the portal is controlled by vCAC as shown in Figure 5.

Hadoop is an open-source software program that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. PHD is an Apache Hadoop distribution.

Deploying a Hadoop cluster using traditional methods is complex and time-consuming. It typically involves setting up the infrastructure, installing and configuring the operating system, acquiring the respective Hadoop media, installing Hadoop components, and finally creating the Hadoop cluster.

This process typically takes weeks and requires a significant skillset. The EMC HaaS offering simplifies the process by using extensive workflow automation in the EHC IaaS backend. Through self-service automation, it is now possible to deploy or expand a Hadoop cluster in minutes using the vCloud Automation Center self-service portal.

Page 21: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 3: EMC Hybrid Cloud Hadoop as a Service

21 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 5. EMC Hybrid Cloud HaaS component overview

Pivotal Hadoop

Pivotal Hadoop (PHD) is an open-source software program that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. PHD is an Apache Hadoop distribution. The complete PHD platform contains a number of components that are not specifically used within this solution:

YARN (Yet Another Resource Negotiator)—a distributed processing framework that can schedule and execute resource requests from multiple applications

HBASE—a column database that runs on top of the Hadoop Distributed Files System (HDFS)

HAWQ—HAWQ is a parallel SQL query engine that combines the merits of the Greenplum Database Massively Parallel Processing (MPP) relational database engine and the Hadoop parallel processing framework

ZooKeeper—a centralized service for maintaining configuration information, naming services, providing distributed synchronization, and providing group services

Hive—a data warehouse infrastructure built on top of Hadoop infrastructure

Hadoop Map Reduce—Map Reduce is a programming model for processing and generating large data sets with a parallel, distributed algorithm on a cluster

Page 22: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 3: EMC Hybrid Cloud Hadoop as a Service

22 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 6 shows the PHD components.

Figure 6. Pivotal Hadoop (PHD) components

Note: YARN, HBASE, HAWQ and HIVE are not referenced in this solution. HAWQ is not installed by default and must be installed separately. This can be automated through the use of vCO workflows if required.

Serengeti

Serengeti is an open source project initiated by VMware to enable the deployment and management of Hadoop and big data clusters in a vCenter Server managed environment. The key components are the Serengeti Management Server, which provides a framework for running big data clusters on vSphere, and a command line interface that provides tools and utilities that form an administrative interface for managing and monitoring the cluster environments.

VMware vSphere Big Data Extensions

VMware vSphere Big Data Extensions, or BDE, is a feature within vSphere to support big data and open source Hadoop distribution workloads. BDE provides an integrated set of management tools to help enterprises deploy, run, and manage Hadoop on a common virtual infrastructure. Figure 7 shows how BDE is an installable virtual appliance plug-in that controls and monitors Hadoop Services. The BDE virtual appliance runs on top of vSphere and uses the Serengeti Management Server to control cluster creation by cloning templates through the template server.

BDE is a commercial version of Serengeti, which is an open source project from VMware. BDE provides the features of Serengeti in an enterprise format, including:

An open source supported version of the Apache Hadoop Distribution

Page 23: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 3: EMC Hybrid Cloud Hadoop as a Service

23 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

The big data extensions GUI which is integrated into vSphere Web Client to perform Hadoop infrastructure and cluster management tasks

Elastic-enabled clusters that optimize and provide scaling of physical compute resources in a vSphere environment

Figure 7. BDE and Serengeti stack

Page 24: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 3: EMC Hybrid Cloud Hadoop as a Service

24 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Page 25: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 4: HaaS Component Integration

25 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Chapter 4 HaaS Component Integration

This chapter presents the following topics:

Overview .................................................................................................................. 26

Integrating Hadoop components with EMC Hybrid Cloud ......................................... 26

Configuring the platform .......................................................................................... 28

Page 26: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 4: HaaS Component Integration

26 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Overview

This section provides guidance on configuring the services required for Hadoop as a Service, specifically BDE and PHD, and integrating them with EMC Hybrid Cloud IaaS services.

Integrating Hadoop components with EMC Hybrid Cloud

To install and configure Hadoop-as-a-Service components, refer to the appropriate vendor documentation referenced in in the installing and configuring sections for the component in this chapter.

The steps discussed assume that the EMC Hybrid Cloud has been installed and configured as described in the EMC Hybrid Cloud Solution with VMware – Foundation Intrastructure Solution Guide 2.5, and that the IaaS, portal, catalog services, and tenant structure are all in place.

BDE runs on top of Serengeti. Figure 8 shows the virtual appliance that runs the Serengeti Management Server and Template Server. BDE provides the GUI for managing Hadoop clusters, communicating through the Serengeti Management Server.

Figure 8. BDE and vSphere deployment topology

With VMware’s vSphere Big Data Extensions, you can enable deployment of Hadoop inside your VMware vSphere environment. The Big Data Extensions are distributed as a downloadable OVA-based virtual appliance that is imported into an existing environment. The minimum requirements to support BDE are vSphere 5.0 or later and Enterprise or Enterprise plus vSphere licenses. By default, the basic Apache Foundation distribution of Hadoop is also included, but it is very easy to add in other commercial Hadoop distributions such as Pivotal Hadoop, Cloudera Hadoop, Hortonworks Hadoop, or MapR Hadoop. This solution uses the Pivotal Hadoop distribution integrated with the EMC Hybrid Cloud IaaS stack to create Hadoop as a Service.

BDE Topology

Page 27: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 4: HaaS Component Integration

27 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

After BDE is installed, you can begin creating a virtual Hadoop cluster. You can specify a number of configuration options including distribution, topology (basic, compute/storage separation, HBase-only, or custom), and the number and size of the virtual machines for each of the Hadoop roles (for example, name node, client node, and data nodes). Note the options presented in the web interface are only a fraction of what can be invoked through the advanced command-line tools and API.

When you start to deploy a Hadoop cluster, BDE clones the appropriate virtual machines and automatically builds out the cluster. When you are satisfied with the cluster, you can scale up (increase the size of the virtual machine’s memory and CPU resources) or scale out (increase the number of virtual machines). You can configure the cluster to scale automatically as the load alters for additional flexibility and efficiency.

Some of the benefits of virtualizing Hadoop—for example, elasticity and multi-tenancy—arise from the increased number of deployment options that become available when Hadoop is virtualized. Figure 9 shows the evolution of virtual Hadoop, from self-contained to a tenant-based model.

Figure 9. The evolution of virtual Hadoop

The traditional Hadoop model combines compute and data. While this implementation is straightforward, representing how the physical Hadoop model can be directly translated into a virtual machine, the ability to scale up and down is limited because the lifecycle of this type of virtual machine is tightly coupled to the data it manages. Powering off a virtual machine with combined storage and computing means access to its data is lost. Scaling out by adding more nodes would necessitate rebalancing data across the expanded cluster, so this model is not particularly elastic.

Separating computing from storage in a virtual Hadoop cluster can achieve compute elasticity, enabling mixed workloads to run on the same virtualization platform and improving resource utilization. It is simple to configure using a HDFS data layer that is always available, along with a compute layer comprising a variable number of TaskTracker nodes, which can be expanded and contracted on demand.

Extending the concept of data-compute separation, multiple tenants can be accommodated on the virtualized Hadoop cluster by running multiple Hadoop

Virtualized Hadoop

Page 28: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 4: HaaS Component Integration

28 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

compute clusters against the same data service. Using this model, each virtual compute cluster enjoys performance, security, and configuration isolation.

While Hadoop performance using the combined data-compute model on vSphere is similar to its performance on physical hardware, providing virtualized Hadoop increased topology awareness can enable the data locality needed to improve performance when data and compute layers are separated. Topology awareness allows Hadoop operators to realize elasticity and multi-tenancy benefits when data storage and computing are separated. Furthermore, topology awareness can improve reliability when multiple nodes of the same Hadoop cluster are colocated on the same physical host.

To optimize the data locality and failure group characteristics of virtualized Hadoop:

Group virtual Hadoop nodes on the same physical host into the same failure domain, and avoid multiple replicas.

Maximize usage of the virtual network between virtual nodes on the same physical host. The virtual network has higher throughput and lower latency than the physical network and does not consume any physical switch bandwidth

Configuring the platform

Refer to VMware vSphere Big Data Extensions Administrator's and User's Guide to install and configure the BDE components required for Hadoop as a Service.

Configuration task order

The following steps outline the high-level tasks you need to perform to install and configure BDE:

1. Ensure the environment meets the minimum vSphere requirements, correct licensing is in place, and compute, storage and networking pre-requisites are met.

2. Configure cluster settings, including vSphere HA, Distributed Resource Scheduling, host monitoring, and admission control.

3. Configure network settings using either vSwitch, vSphere Distributed Switch (vDS), or NSX. Ensure the required ports are configured as part of any firewall policy.

4. Deploy the BDE OVF file and assign the management network. When you deploy BDE the setup will ask for a destination port group; this is the network that the management network uses to communicate with the server so the port group should be the same as the VLAN ID. If vCenter or BDE are unable to communicate with each other, then the integration will fail.

Configuring SSO service

As part of the configuration process an important step is to configure the SSO service and management server IP addresses.

1. As shown in Figure 10, from the left pane in the Deploy OVF Template page select Customize template.

Installing and configuring BDE

Page 29: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 4: HaaS Component Integration

29 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

2. In the VC SSO Lookup Service URL box, type the vCenter Server Fully Qualified Domain Name FQDN in the same format as shown (if the default server name has not been changed). If you do not specify the FQDN here, then the certificate will not be accepted and there will be a connection issue between BDE and the Serengeti server later.

3. Under Management Server Network Settings, enter the appropriate IP address settings.

Figure 10. Configuring the SSO lookup service and management server IP addresses

Starting BDE in vSphere

After successfully installing and configuring BDE within vSphere, power on the BDE management server and then register BDE within vSphere as the final part of configuration by performing the following steps:

1. Log in to the vSphere client with administrative privileges.

2. Within the vSphere client, locate the BDE management server. The management server is located under the datacenter resource pool in which it was deployed.

3. Select and record the management IP address.

4. Register the management server using the register plugin URL: https://management-server-ip-address:8443/register-plugin where management-server-ip-address is the IP address you recorded in step 3.

5. Complete the required registration information and then click Submit.

The BDE icon should now be available in the list of objects within the inventory.

Page 30: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 4: HaaS Component Integration

30 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Before installing and configuring PHD, download the following required components and make them available for the installation:

Cent OS 6.2 64 bit ISO

Pivotal Hadoop Tar files

Oracle JDK 7, 64 bit rpm for Cent-OS

Big Data Extension OVF

VMware BDE comes supplied with a default Hadoop distribution from Apache. The HaaS integration requires that Pivotal Hadoop be installed. Get the Pivotal Hadoop media and documentation from http://www.gopivotal.com/big-data/pivotal-hd, and register and obtain the necessary licenses. The following high level tasks outline the process to load the media and create a PHD template within the BDE configuration.

Installing PHD

To create the required installation configuration for BDE, use Yum repositories (as opposed to a TAR-ball). When you create a Hadoop cluster that is YUM-deployed, the Hadoop nodes within the cluster then download the Red Hat Package Manager (RPM) packages for the Pivotal Hadoop distribution from the official Yum repositories.

The Pivotal Hadoop distribution must be installed in a 64-bit version of the CentOS 6.x operating system. You must use either CentOS 6.2 or CentOS 6.4 to create the Hadoop template virtual machine . The template is used in the cloning process for creating a Hadoop cluster. After you have deployed the BDE OVF you must follow the steps to integrate YUM into PHD by creating a YUM repository as outlined below, and then create the template.

Creating a Yum repository for PHD The steps for configuring PHD with BDE are described in the VMware vSphere Big Data Extensions Administrator’s and User’s Administration Guide.

Creating a Hadoop template virtual machine You must use either CentOS 6.2 or CentOS 6.4 to create the Hadoop template virtual machine. To upgrade from a previous version, refer the chapter titled “Create a Hadoop Template Virtual Machine using RHEL Server 6.x” in the VMware vSphere Big Data Extensions Administrator’s and User’s Administration Guide.

The following steps outline the procedure for creating a Hadoop template virtual machine:

1. Import the PHD binaries and create PHD media by logging into the BDE management server and importing the PHD tar files into an appropriate directory structure on the server. Figure 11shows the binary import process.

Installing and configuring PHD

Page 31: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 4: HaaS Component Integration

31 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 11. Importing Hadoop binaries into BDE management server

2. Test that the import was successful by accessing the URL path from a browser and ensuring that the expected folders are present.

3. After installing the media into the BDE management server, create a new Pivotal Hadoop template.

4. Make the new Pivotal Hadoop template the default template by removing the default Hadoop Apache template from the BDE management server, as shown in Figure 12.

Page 32: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 4: HaaS Component Integration

32 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 12. Removing the default Apache template from BDE

Configuring custom resources for BDE

VMware BDE requires two resources types when automating Hadoop clusters: networking resources and storage resources.

Networking resources Networking is used to assign virtual machines IP addresses. BDE deploys all nodes of a Hadoop cluster from a single common CentOS template that comes preconfigured with the BDE vApp management server. As BDE deploys virtual machines into a cluster, it uses either an existing DHCP server or a statically created IP address pool. As part of the deployment process, hostnames are assigned by BDE. The hostnames are the same as the IP addresses. For example, if DHCP assigns 10.10.10.10 then the hostname of that virtual machine is 10.10.10.10. Hadoop then uses this hostname for the clusters.

Storage resources BDE defines two types of storage resources—local and shared. Shared storage is useful for management or client servers deployed by BDE as shared storage can be protected with technologies such as VMware HA.

Within Hadoop there are two types of nodes: master and worker nodes. Master nodes provide tracking functions whereas worker nodes provide job processing capabilities. Because worker nodes are disposable, they do not require top tier storage since Hadoop is designed to deal with node failure. There is also no reason to deploy worker nodes on shared storage. The choice of storage however must be capable of dealing with the required level of performance for the nodes. Allowing BDE

Page 33: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 4: HaaS Component Integration

33 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

to use local VMFS storage for worker nodes is analogous to deploying physical worker nodes on commodity storage using direct attached storage.

The final stage of configuration is to assign storage resources to BDE. This defines how the Hadoop clusters are deployed, either using local or shared datastores. By default BDE defines data stores as local. If you need shared datastores, you must configure the datastores accordingly. Refer to Chapter 6 of the VMware vSphere Big Data Extensions Administrator’s and User’s Guide for details on how to add datastores and networks to a cluster from the vSphere client.

For details. refer to the EMC Hybrid Cloud Solution with VMware - Foundation Infrastructure Reference Architecture 2.5. Detailed installation and configuration information is available only to select EMC personnel and authorized partners.

Installing and configuring EMC Hybrid Cloud IaaS

Page 34: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 4: HaaS Component Integration

34 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Page 35: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

35 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Chapter 5 Creating vCO Workflows and vCAC Catalog Services for HaaS

This chapter presents the following topics:

Overview .................................................................................................................. 36

Importing and modifying custom vCO workflows ..................................................... 36

Creating vCAC Catalog Services ............................................................................... 45

Page 36: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

36 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Overview

The automation of Hadoop clusters is achieved by using custom workflows created with VMware vCloud Orchestrator (vCO). This chapter describes how these workflows are configured from within VMware Cloud Automation Center (vCAC) to present enterprise organizations with a self-service portal that includes a catalog of pre-configured Hadoop deployment scenarios.

Importing and modifying custom vCO workflows

To use HaaS within EMC Hybrid Cloud, the administrator must use custom vCO workflows for deploying HaaS. These workflows offer a choice of cluster sizes that can then be presented as catalog items from the vCloud Automation Center portal. The workflows are imported into VMware vCO using the vCO import function to be edited, tested, and packaged according the needs of the organization.

This section describes the process for importing the custom workflows into vCO, so that the Hadoop Administrator can alter them and link them with the big data cluster configurations created in the earlier stages of the process.

Importing custom workflows

From within the vCO client, as shown in Figure 13, select Run, click Workflows, and select Import workflow. Browse to the location where you have placed the workflow package and click Open. The imported workflow appears in the folder selected.

Figure 13. Importing custom workflows into vCO

Validating workflows

After importing the workflows into vCO, validate them by clicking the name of the folder containing the workflows and then selecting the Validate option from the context menu, as shown in Figure 14. The validation process ensures there are no

Modifying custom workflows

Page 37: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

37 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

open ends, unreachable workflow elements, or unused attributes in the workflows, so that they will execute correctly.

Figure 14. Using the validate workflows action

Customizing HaaS workflows

The HaaS workflows provide a framework for deploying each Hadoop cluster configuration of a given size through an automated workflow. The Hadoop administrator should modify the attributes of these workflows to meet the specific needs of the organization. Figure 15 shows how to use the vCO client to edit the attributes within a workflow.

Figure 15. How to edit the attributes

Page 38: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

38 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Configuring custom parameters

To make the workflows dynamic, vCO uses a combination of attributes and parameters to transfer data when it is processing a workflow. Workflow parameters must receive an input to generate an output or action. An example of configuring a custom parameter is when an input is received from the user or system. The input can then be passed to a command or script that would create a username or password, This in turn can be passed to the Hadoop cluster for authentication.

Figure 16 shows how to create a custom username and password for the Hadoop Client node.

Figure 16. Editing and creating custom parameter passing

Launching a custom script

Scripts help to edit the schema, which is the main component of a workflow. Launching individual scripts lets you test the components of the workflow one element at a time, or execute a script at runtime to prepare the data set, for example.

Figure 17 shows how to launch scripting from within the workflow by using the Schema panel within the workflow itself.

Page 39: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

39 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 17. Launching scripts from the VCO

Testing VCO HaaS custom workflows

The previous sections demonstrated how to import the HaaS sample workflows into EMC Hybrid Cloud, specifically the vCenter Orchestrator which is the main orchestration and automation engine for the solution. As shown, once imported, the default workflows can be altered to meet any modifications made to the Hadoop clusters. The workflows can also be modified to pass any additional parameters that may be required, for example, passing the username and password or executing parts of additional scripts components.

The final stage in importing and configuring the workflows is to test the workflows that have been imported and modified for each of the HaaS cluster sizes (micro cluster, small cluster, and large cluster). Figure 18 shows how to:

Select the specific workflow for a given cluster size

Execute the workflow from vCO

View the execution process

Verify the execution progress by checking the log files for any error messages

Page 40: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

40 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 18. Launching of Micro Hadoop Cluster workflow

Viewing cluster creation

After the VCO workflow is launched, the cluster creation process starts within vSphere and BDE. The management server uses the template server to clone the nodes required to create the cluster in terms of the numbers and types of node that comprise the cluster. To view and verify the cluster creation process, follow these steps:

1. Login to the vSphere web client.

2. Go to the BDE and view the actual cluster being created.

Figure 19 shows the status of the creation of a micro Hadoop cluster in the BDE panel of the vSphere web client.

Page 41: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

41 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 19. Status of creation of Micro Hadoop cluster from BDE (vSphere web client)

You can also log in to the vSphere Client Application and view the Hadoop cluster being created. Figure 20 shows the status of the creation of the Micro Hadoop cluster in the vSphere Client Application.

Figure 20. Status of Micro Hadoop cluster creation from BDE vSphere Client

Page 42: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

42 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Creating BDE Clusters

After the vCO workflows are imported they need to be customized for the different sized clusters according to the requirements of the enterprise. The examples provided describe micro, small, and large Hadoop clusters.

The custom workflows define the type of the cluster, including cluster configuration, in terms of the number of master nodes, client nodes, and data nodes for each size.

Creating a Hadoop cluster

These steps document the procedure for creating a Hadoop cluster within BDE, which can then be translated when building a VCO workflow:

1. In vCenter, under Objects > Data Extensions,click New Big Data Cluster.

2. Follow the steps in the wizard, specifying the appropriate parameters as required. More detail can be found in the VMware vSphere Big Data Extensions Administrator’s and User’s Guide.

The following sections outline the options and details required during the cluster configuration process.

Naming a Hadoop cluster

When prompted by the wizard, type a name to identify the cluster. Valid characters for cluster names are alphanumeric and underscores. When choosing a cluster name you should also consider the associated vApp name. Together the vApp and cluster name must be less than 80 characters.

Configuring the Hadoop distribution

When configuring a Hadoop cluster, you must select the correct Hadoop distribution from the Hadoop distribution list box Change the default from Apache to Pivotal HD, as shown in Figure 21. The distribution name matches the value of the name parameter that was passed to the config-distro.rb script when the Hadoop distribution was configured. For a Pivotal PHD 1.1 cluster, you must configure a valid DNS and FQDN for the cluster's HDFS and MapReduce traffic. Without valid DNS and FQDN settings, the cluster creation process might fail or the cluster is created but does not function.

Figure 21. Create and name a new Big Data Cluster

Specifying deployment type

When prompted by the wizard, select the deployment type for the cluster, either Basic Hadoop Cluster or Data/Compute Separation Cluster. The type of cluster you create determines the available node group selections.

Creating new BDE clusters

Configuring a Hadoop cluster

Page 43: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

43 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Identifying the DataMaster node group

The DataMaster node is a virtual machine that runs the Hadoop NameNode service. This node manages HDFS data and assigns tasks to Hadoop TaskTracker services deployed in the worker node group. To identify the group:

1. Select a resource template from the list box or select Customize to create a custom resource template.

2. For the master node, specify shared storage so that the virtual machine is protected with vSphere HA.

Identifying the ComputeMaster node group

The ComputeMaster node is a virtual machine that runs the Hadoop JobTracker service. This node assigns tasks to Hadoop TaskTracker services deployed in the worker node group. To identify the group:

1. Select a resource template from the list box or select Customize to create a custom resource template.

2. For the master node, specify shared storage so that the virtual machine is protected with vSphere HA.

Identifying the HBaseMaster node group (HBase cluster only)

The HBaseMaster node is a virtual machine that runs the HBase master service. This node orchestrates a cluster of one or more RegionServer slave nodes. To identify the group:

1. Select a resource template from the list box or select Customize to create a custom resource template.

2. For the master node, specify shared storage so that the virtual machine is protected with vSphere HA.

Identifying the Worker node group

Worker nodes are virtual machines that run the Hadoop DataNode, TaskTracker, and HBase HRegionServer services. These nodes store HDFS data and execute tasks. To identify the group:

1. Select a resource template from the list box or select Customize to create a custom resource template.

2. For the worker nodes, use local storage.

Note: You can add nodes to the worker node group by using Scale Out Cluster, but you cannot reduce the number of nodes.

Identifying the Client node group

A client node is a virtual machine that contains Hadoop client components. From this virtual machine you can access HDFS, submit MapReduce jobs, run Pig scripts, run Hive queries, and run HBase commands. When configuring the cluster for use with HaaS, you do not configure the Client node group unless any of these configuration items are required outside of the HaaS solution.

Page 44: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

44 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

To identify the group:

1. Select a resource template from the list box or select Customize to create a custom resource template.

2. For the client nodes, use local storage.

Note: You can add nodes to the client node group by using Scale Out Cluster, but you cannot reduce the number of nodes.

Selecting the Hadoop topology configuration

When you create a cluster with BDE, BDE disables automatic migration for the cluster’s virtual machines. This prevents vSphere from migrating anything but does not prevent the administrator from migrating nodes unintentionally to other vCenter hosts. It is essential that migrating is not performed from within vCenter as this could break the cluster placement policy.

As part of the final cluster configuration you should select the topology configuration that you want the cluster to use: RACK_AS_RACK, HOST_AS_RACK , HVE, or NONE.

More information is available in the chapter “About Cluster Topology” in chapter 7 of the VMware vSphere Big Data Extensions Administrator’s and User’s Guide.

Page 45: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

45 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Creating vCAC Catalog Services

The focus of customization for this EMC Hybrid Cloud solution is the VMware vCAC user self-service portal, where additional functionality is included to enable additional services for cloud users. The final stage of integrating Hadoop as a Service is to present to vCAC the HaaS workflows that have been imported and modifiedso that they can be selected as catalog items.

VMware vCAC 6.0 provides the extensibility to enable IaaS functionality through Advanced Service blueprints. The IaaS functionality is achieved by exposing custom vCO workflows that the vCAC 6.0 portal can present as a catalog of services for cloud users.

You can create custom workflow definitions using vCAC Designer. The vCAC Designer console provides a visual workflow editor for customizing vCAC lifecycle workflows. The extensibility toolkits include a library of activities that serve as building blocks for custom workflows.

Using the Advanced Service Designer, you can define new service offerings and publish them to the common catalog as catalog items.

To create the service blueprints you must access vCAC from a browser and log in to vCAC.

Each tenant has a unique URL to the vCAC console:

The default tenant URL is in the following format: https://hostname/shell-ui-app where hostname is the Fully Qualified Domain Name (FQDN) of a vCAC host.

The URL for additional tenants is in the following format: https://hostname/shell-ui-app/org/tenantURL where tenantURL is the URL name specified when the tenant is being created. This is the workspace in which the customer creates catalog services.

The following steps demonstrate, at a high level, how to integrate the HaaS workflows into the vCAC self-service catalog by showing the creation of:

Catalog services

Blueprints

Custom resources and resource actions

For more information, refer to the vCloud Automation Center Extensibility Guide.

To integrate the HaaS workflows into the vCACA self-service catalog, follow these steps:

1. From the main vCAC portal page, click Advanced Services to list all of the current service blueprints defined.

2. Click the green “plus” symbol, shown in Figure 22, to create a new service blueprint.

Accessing vCAC

Creating a new service blueprint

Page 46: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

46 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 22. Advance Service Designer

Follow these steps to create a new service blueprint:

1. Select one of the imported Hadoop Cluster Creation workflows from the list.

2. Name the new service and create a form to support user input for the required parameters. If required, delete the default form and create a new form.

3. Drag and drop any appropriate input fields onto the form.

4. Publish the new service to create the appropriate service definition in the catalog management.

5. Assign a catalog management service to the new advanced service, and create the appropriate entitlement definition in the catalog management, as shown in Figure 23.

Figure 23. Edit Entitlement window

Page 47: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

47 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

When these tasks are completed, the new service is then available in the service catalog for the cloud administrator. It is possible to replace the default VMware logo icons in the service catalog with more suitable HaaS icons. The replacement of icons is the final stage of customization and ensures that the service catalog items are tailored to a specific function or application. This can be performed from the Catalog Management menu by selecting the Catalog Items list box, selecting the configure an icon option, and then browsing and selecting a new icon.

After the configuration stages have been performed within vCAC, the service catalog is available to provision HaaS items, as shown in Figure 24.

Figure 24. vCAC Service Catalog showing Hadoop as a Service

Page 48: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS

48 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Page 49: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

49 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Chapter 6 Use Cases: EMC Hybrid Cloud IaaS

This chapter presents the following topics:

Overview .................................................................................................................. 50

IaaS – storage services ............................................................................................ 50

Monitoring and capacity planning ............................................................................ 57

Metering and chargeback ........................................................................................ 61

Page 50: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

50 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Overview

This chapter covers EMC Hybrid Cloud IaaS and other use cases that can be incorporated to extend the functionality beyond virtual machine provisioning to consume resources.

From time to time additional physical resources will be required to support the extension of a Hadoop environment. The following sections show how EHC storage provisioning workflows can be used to create additional resources on demand by provisioning additional storage as required, and how the VMware vC Ops tool set can be used to analyze consumed resources, provide capacity planning, increase resources using scenarios that increase physical resources, and increase VM and node capacity.

IaaS – storage services

Storage is provisioned, allocated, and consumed by different cloud users in this solution.

For vCAC IaaS users, the storage services provided in the vCAC service catalog provision storage resources that will be allocated to and consumed by other cloud users.

Once the storage resources are available, fabric group administrators can assign the resources to business groups. Creators of virtual machine blueprints (business group managers) can then configure their blueprints to use those particular storage resources for the list of virtual machine disks.

When they provision virtual machines, cloud users consume the storage and, depending on their entitlements, may choose the storage service for their virtual machines.

This use case demonstrates how ViPR software-defined storage is provisioned for the hybrid cloud from the VMware vCAC self-service catalog.

1. To provision block or file storage from the vCAC self-service portal, select the Provision Cloud Storage item from the vCAC service catalog, as shown in Figure 25.

Overview

Use case 1: Storage provisioning

Page 51: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

51 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 25. Storage Services - Provision cloud storage

The storage service blueprint can be created using vCAC anything-as-a-service (XaaS) functionality in the vCAC Advanced Service Designer. EMC ViPR provisioning workflows, which are presented by vCO to the vCAC service catalog, support storage services.

The storage provisioned by the IaaS user enables the fabric group administrator to make storage resources available to their business group. The storage provisioning request requires very little input from the vCAC IaaS user.

The main inputs required are:

Datastore Type: VMFS or NFS

Datastore Size

vCenter Cluster

Storage Tier

Most of these inputs, except LUN size, are selected from pre-populated list boxes whose items are determined by the cluster resources available through vCenter and the virtual pools available in ViPR.

After entering a description and reason for the storage-provisioning request, enter your password. The vCenter Server will manage multiple ESXi clusters; therefore, you must choose the relevant vCenter cluster to tell the provisioning operation where to assign the storage device. Select a vCenter cluster from the next screen, as shown in Figure 26.

Page 52: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

52 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 26. Provision Cloud Storage – select vCenter cluster

2. Select the type of datastore you require from the list of available storage types, as shown in Figure 27. A datastore type of VMFS requires block storage, while NFS requires file storage. Other data services such as disaster recovery and continuous availability are displayed as appropriate only if detected in the underlying infrastructure.

Figure 27. Storage Provisioning – Select datastore type

3. Select from which storage offering the new storage device should be provisioned. The list of available storage offerings is based on the datastore type selected, such as VMFS or NFS, and what matching virtual pools are available from the ViPR virtual array.

In this example, a single NFS-based ViPR virtual pool is available to provision storage from, with the available capacity of the virtual pool also displayed to the user, as shown in Figure 28.

The storage pools listed have been configured in the EMC ViPR virtual array and their storage capabilities are associated with storage profiles created in vCenter.

Page 53: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

53 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 28. Storage provisioning – Choose ViPR storage pool

4. Enter the size required for the new storage, in GB, as shown in Figure 29.

Figure 29. Storage provisioning – Enter storage size

5. The fabric group administrator must reserve the new Storage Pool for use by the business group, as shown in Figure 30.

Figure 30. Provision Storage – Storage Reservation for vCAC Business Group

When the automated process sends an email notification to the fabric group administrator that the storage is ready and available in vCAC, the fabric group administrator can then assign capacity reservations on the device for use by the business group.

Page 54: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

54 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

In this example, a number of required input values, such as LUN or datastore name, have been masked from the user during the storage provisioning request process. Some of these values are locked-in and managed by the orchestration process and logic to ensure consistency.

In addition to the initial provisioning of storage to the ESXi cluster at the vSphere layer, this solution provides further automation and integration of the new storage up into the vCAC layer. The ViPR storage provider automatically tags the storage device with the appropriate storage profile based on its storage capabilities.

The remaining automated steps in this solution are:

vCAC rediscovery of resources under vCenter endpoint

vCAC storage reservation policy assigned to new datastore

vCAC fabric group administrator notification of availability of new datastore

This use case demonstrates how cloud users can consume the available storage service offerings. This use case is part of the broader virtual machine deployment use case, but here it relates directly to how the business group manager and users can manage the storage service offerings available to them.

VMware vCAC business group managers and users can select the appropriate storage for their virtual machine through the VMware vCAC user portal.

For business group managers, the storage type for the virtual machine disks can be set during the creation of a virtual machine blueprint. As shown in Figure 31, the relevant storage reservation policy can be applied to each of the virtual disks.

Figure 31. Set storage reservation policy for virtual machine disks

After the storage reservation policy is set, the blueprint will always deploy this virtual machine and its virtual disks to that storage type. If more user control is required at deployment time, the business group manager can elect to allow business group users to reconfigure the storage reservation policies at deployment time by selecting the checkbox Allow user to see and change storage reservation policies.

Use case 2: Select virtual machine storage

Page 55: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

55 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

This solution uses VMware IT Business Management Suite (ITBM) to provide chargeback information on the storage service offerings for the hybrid cloud. Through its integration with VMware vCenter and vCAC, ITBM enables the cloud administrator to automatically track utilization of storage resources provided by EMC ViPR.

The EMC ViPR VASA provider in vCenter automatically captures the underlying storage capabilities of LUNs provisioned from virtual pools on the EMC ViPR virtual array. Storage profiles are created based on these storage capabilities, which are aligned with the storage service offerings. This integration enables ITBM to automatically discover and group datastores based on predefined service levels of storage.

In this solution we created a separate virtual machine storage profile for each of the storage service offerings, as shown in Figure 32.

Figure 32. Create new virtual machine storage profile for Tier 2 storage

The storage capabilities are shown automatically in vSphere, as shown in Figure 33, where Tier 2 EMC ViPR storage is supporting a datastore.

Figure 33. Automatic discovery of storage capabilities using EMC ViPR Storage Provider

Note: Storage capabilities are only visible in the traditional vSphere client and not in the web client. Also, the web client uses virtual machine storage policies in place of virtual machine storage profiles.

After the EMC ViPR Storage Provider has automatically configured the datastores with the appropriate storage profiles, the data stores can be grouped and managed in ITBM in line with their storage profile. Figure 34 shows that the cost profiles created

Use case 3: Metering storage services

Page 56: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

56 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

in vCenter are discovered by ITBM. This allows the business management administrator to group tiered datastores provisioned with ViPR and set the monthly cost per GB as needed.

Figure 34. VMware ITBM chargeback based on storage profile of datastore

VMware vCAC can provide a storefront for storage services to be used by cloud users. These service catalog items deploy EMC ViPR software-defined storage services based on the usage of multiple service offerings of block and file storage across EMC VNX and VMAX storage arrays. Each service offers varying levels of availability, capacity, and performance to satisfy the operational requirements of different lines of business.

This solution combines EMC ViPR with EMC array-based FAST-enabled storage service offerings across the EMC storage arrays with VMware vSphere to simplify storage operations for hybrid cloud consumers.

Summary

Page 57: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

57 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Monitoring and capacity planning

The vCenter Operations Management Suite has functions that can help HaaS administrators to achieve the following goals:

Eliminate or significantly reduce the manual problem-solving effort in the environment.

Proactively manage core service and cloud infrastructure performance, and utilize infrastructure resources optimally.

Provision proactive warnings regarding performance issues before problems affect the end user. Real-time performance dashboards enable service providers to meet their SLAs by highlighting potential performance issues before end users notice these issues.

Infrastructure maintenance and operations teams need the end-to-end visibility and intelligence to make fast, informed operational decisions to proactively ensure service levels in cloud environments. They need to get promptly to the root cause of performance problems, optimize capacity in real time, and maintain compliance in a dynamic environment of constant change.

The vCenter Operations Management Suite offers many features and functions to deliver quality of service, operational efficiency, and continuous compliance for your dynamic cloud infrastructure and business critical applications.

This section describes in detail the capacity planning functions that can help you to predict the impact on underlying infrastructure of new HaaS deployments or of upgrading current HaaS instances with new services.

Forecasting capacity risks in vCenter Operations Manager involves creating what-if scenarios to examine the demand and supply of resources in the cloud infrastructure.

A what-if scenario is a supposition about how capacity and load might change if certain conditions, influenced by an increased or decreased number of ESX hosts, storage resources, or virtual machines in environment, occur, without making actual changes to your virtual infrastructure. If you implement the scenario, you know in advance what your capacity requirements are.

To create a what-if scenario, you can use models and profiles based on current resource consumption in the existing environment. Alternatively, you can manually define amounts of virtual machine RAM, storage, CPU, and utilization in a new consumption profile, as shown in Figure 35, to predict the potential impact of growth.

Monitoring

Capacity planning

Page 58: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

58 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 35. Choosing virtual machine consumption models and profiles

To define a new virtual machine profile, you can make detailed specifications that give you the option to include and predict specific resource utilizations, reservations, and limits in order to get as accurate a projection as possible, as shown in Figure 36.

Figure 36. Specifying configuration and projected capacity usage of new virtual machines

Figure 37 shows that there are insufficient resources for a planned deployment scenario consisting of either 50 or 85 new virtual machines. In this case, we can easily provision new vSphere hosts using vCAC services as described in previous sections.

Page 59: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

59 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 37. Capacity summary showing insufficient CPU and RAM resources

Before you provision new hardware resources, you can create hardware change scenarios to determine the effect of adding, removing, or updating the hardware capacity in a vSphere cluster. You can create a scenario that models changes to hosts and datastores, as shown in Figure 38 and Figure 39.

Figure 38. Specifying number of hosts and amount of CPU and memory

Page 60: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

60 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 39. Specifying datastore size

The what-if scenario capacity planning function allows you compare how adding different amounts of virtual machines and hardware will impact your actual environment, as shown in Figure 40.

Figure 40. Compared scenarios

In a planning exercise, assume that you:

Have a request to deploy an additional 45 Hadoop node instances in the existing HaaS.

Plan to purchase blade servers compliant with a certain specification.

Want to deploy an additional 25 Hadoop clusters.

Capacity planning example

Page 61: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

61 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

In Figure 41, each column shows how an individual change affects resources in your environment. The Combined Scenarios column shows you the cumulative effect of hardware purchasing and an overall expansion of 70 virtual machines.

Figure 41. Combined scenarios

Metering and chargeback

VMware ITBM provides cloud administrators with comprehensive metering and cost information across physical and virtual resources in the EMC Hybrid Cloud environment. Besides working out the cost of physical components such as storage, compute, and networking resources, you can also include and configure other factors that affect the overall cost of your cloud environment, such as operating system licensing, maintenance, labor, and environmental facilities costs, as shown in Figure 42.

Page 62: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

62 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Figure 42. Categorized hybrid cloud environment cost overview

ITBM is integrated into the vCAC portal for the Hadoop administrator and presents a dashboard overview of the hybrid cloud infrastructure.

VMware ITBM Standard Edition uses its own reference database, which has been preloaded with industry-standard data and vendor-specific data to generate the base price for virtual CPU (vCPU), RAM, and storage values. These prices, which default to the cost of CPU, RAM, and storage, are automatically consumed by vCAC, where they can be changed as appropriate by the cloud administrator. This eliminates the need to manually configure cost profiles in vCAC and assign them to compute resources.

ITBM is also integrated with vCenter and can import existing resource hierarchies, folder structures, and vCenter tags to associate EMC Hybrid Cloud resource usage with business units, departments, and projects.

Infrastructure resources consumed by HaaS instances and hosted applications are provided by dedicated vSphere clusters with associated vSphere hosts and datastores. ITBM provides you with detailed information about:

Number of vSphere hosts in the vSphere cluster and the number of virtual machines on each host

CPU and RAM capacity and utilization of the vSphere cluster

Overall cost of the compute resources provided by the dedicated vSphere cluster

Cluster cost by virtual machine

Page 63: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

63 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

The Clusters tab provides you with insight into the cost of the vSphere cluster resources consumed by Hadoop cluster instances. You can monitor costs while provisioning new hosts, as shown in Figure 43.

Figure 43. vSphere Cluster cost overview

The Datastores tab provides insight into the cost of the storage resources consumed by an HaaS instance. The name of a datastore provisioned by vCAC storage services inherits a cluster name prefix as part of its published name. Performing a sort by datastore name gives you a list of the names and costs of the datastores provisioned and assigned to hosts in the vSphere cluster, as shown in Figure 44.

Figure 44. Storage cost overview

Page 64: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 6: Use Cases: EMC Hybrid Cloud IaaS

64 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Page 65: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 7: Conclusion

65 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Chapter 7 Conclusion

This chapter presents the following topics:

Summary .................................................................................................................. 66

Page 66: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Chapter 7: Conclusion

66 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Summary

Pivotal Hadoop is designed to create an easy-to-scale big data framework. To achieve this kind of flexibility, HaaS is designed around the modular system components of Pivotal Hadoop. Using vCenter Orchestrator workflows, the administrator can provide fixed cluster configuration catalog items or create dynamic workflows that can be called from a catalog. The size of the nodes used is determined by the individual making the request.

Elastic provisioning refers to the ability to provision flexible computing resources when and where they are required and to easily scale resources up and down to match demand. Resource elasticity can relate to processing power, memory, storage, bandwidth, and so on. This document indicates the importance of having an elastic and scalable IaaS platform on which to support the hosting of dynamically changing and fast-growing big data platforms.

VMware vCenter Operations Manager enables you to deliver quality of service, attain operational efficiency, and gather current capacity capabilities while forecasting the effect of future HaaS deployments or upgrades in your cloud infrastructure.

HaaS clusters can grow to a large number of node instances. The limit can be changed by changing the BDE configuration parameters. It is crucial therefore to have proactive performance monitoring and capacity planning solutions in place.

To support comprehensive, dynamic, and fast-growing development environments such as Hadoop as a service, you must ensure the stability of the underlying cloud compute infrastructure, which must provide availability, scalability, flexibility, and performance to the big data platform and its services. As a solution to these challenges, this document has addressed simple provisioning from a self-service catalog and considerations for building scalable Hadoop as-a -ervice environments, with an elastic and easy-to-deploy underlying IaaS infrastructure provided by the EMC Hybrid Cloud solution.

Page 67: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Appendix A: References

67 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

Appendix A References

This appendix presents the following topic:

References ............................................................................................................... 68

Page 68: EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

Appendix A: References

68 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5

VMware references

The following VMware documents provide additional and relevant information:

Advanced Service Design vCloud Automation Center 6.0

Installing and Configuring VMware vCenter Orchestrator

VMware Compatibility Guide

VMware vSphere Big Data Extensions Administrator’s and User’s Guide: vSphere Big Data Extensions 1.0

Installing and Configuring VMware vSphere Big Data Extensions (Video)