CLOUD COMPUTING - JBIET

1

CLOUD

COMPUTING

NAME :P.UMA DEVI

CLASS III CSE A

UNIT 1

COURSE MATERIAL

2

CLOUD COMPUTING

UNIT-I

Topic name page no

1) Principles of Parallel and Distributed Computing, 2-21

2) Introduction to cloud computing, 22-23

3) Cloud computing Architecture, 24-25

4) cloud concepts and technologies, 26-32

5) cloud services and platforms, 32-37

6) Cloud models, 37-40

7) cloud as a service, 40-45

8) cloud solutions, 46-53

9) cloud offerings, 53-58

10) Introduction to Hadoop and Mapreduce. 59-63

3

1. Principles of Parallel and Distributed Computing:

The fundamental principles of parallel and distributed computing and discusses

models and conceptual frameworks that serve as foundations for building cloud

computing systems and applications.

1.1 Eras of computing

The two fundamental and dominant models of computing are sequential and parallel.

The sequential computing era began in the1940s; the parallel (and distributed)

computing era followed it within a decade(see Figure 2.1). The four key elements of

computing developed during these eras are architectures, compilers, applications, and

problem-solving environments.

The computing era started with a development in hardware architectures, which

actually enabled the creation of system software—particularly in the area of

compilers and operating systems—which support the management of such systems

and the development of applications. Every aspect of this era under went a three-

4

phase process: research and development (R&D), commercialization, and

commoditization.

1.2 Principles of Parallel and Distributed Computing

The terms parallel computing and distributed computing are often used

interchangeably, even though they means lightly different things. The term parallel

implies a tightly coupled system, whereas distributed refers to a wider class of

system, including those that are tightly coupled.

The term parallel computing refers to a model in which the computation is divided

among several processors sharing the same memory. The architecture of a parallel

computing system is often characterized by the homogeneity of components: each

processor is of the same type and It has the same capability as the others. The shared

memory has a single address space, which is accessible to all the processors. Parallel

programs are then broken down into several units of execution that can be allocated

to different processors and can communicate with each other by means of the shared

memory. For example, a cluster of which the nodes are connected through an

InfiniBand network and configured with a distributed shared memory system can be

considered a parallel system.

The term distributed computing encompasses any architecture or system that allows

the computation to be broken down into units and executed concurrently on different

computing elements, whether these are processors on different nodes, processors on

the same computer, or cores within the same processor. Therefore, distributed

computing includes a wider range of systems and applications than parallel

computing and is often considered a more general term. Classic examples of

distributed computing systems are computing grids or Internet computing systems,

which combine together the biggest variety of architectures, systems, and

applications in the world.

1.3 Elements of parallel computing

5

1.3.1 Parallel Processing:

Processing of multiple tasks simultaneously on multiple processors is called

parallel processing. The parallel program consists of multiple active processes (tasks)

simultaneously solving a given problem. A given task is divided into multiple

subtasks using a divide-and-conquer technique, and each subtask is processed on a

different central processing unit(CPU). Programming on a multiprocessor system

using the divide-and-conquer technique is called parallel programming.

The development of parallel processing is being influenced by many factors.

The prominent among them include the following:

• Computational requirements are ever increasing in the areas of both scientific and

business computing. The technical computing problems, which require high-speed

computational power, are related to life sciences, aerospace, geographical information

systems, mechanical design and analysis, and the like.

• Sequential architectures are reaching physical limitations as they are constrained by

the speed of light and thermodynamics laws. The speed at which sequential CPUs can

operate is reaching saturation point (no more vertical growth), and hence an

alternative way to get high computational speed is to connect multiple CPUs

(opportunity for horizontal growth).

• Hardware improvements in pipelining, superscalar and the like are non-scalable and

require sophisticated compiler technology. Developing such compiler technology is a

difficult task.

• Vector processing works well for certain kinds of problems. It is suitable mostly for

scientific problems (involving lots of matrix operations) and graphical processing. It

is not useful for other areas, such as databases.

• The technology of parallel processing is mature and can be exploited commercially;

there is already significant R&D work on development tools and environments.

6

• Significant development in networking technology is paving the way for

heterogeneous computing.

1.3.2 Hardware Architecture for parallel processing:

The core elements of parallel processing are CPUs. Based on the number of

instruction and data streams that can be processed simultaneously, computing systems

are classified into the following four categories:

• Single-instruction, single-data (SISD) systems:

An SISDcomputingsystemisauniprocessormachinecapableofexecutingasingleinstruction,

which operates on a single data stream. In SISD, machine instructions are processed

sequentially; hence computers adopting this model are popularly called sequential

computers. All the instructions and data to be processed have to be stored in primary

memory. The speed of the processing element in the SISD model is limited by the rate at

which the computer can transfer information internally. Dominant representative SISD

systems are IBM PC, Macintosh, and workstations.

7

Single-instruction, multiple-data (SIMD) systems:

An SIMD computing system is a multiprocessor machine capable of executing the same

instruction on all the CPUs but operating on different data streams. Machines based on an

SIMD model are well suited to scientific computing since they involve lots of vector and

matrix operations. For instance, statements such as Ci=Ai * Bi

can be passed to all the processing elements(PEs); organized data elements of vectors A

and B can be divided into multiple sets(N-sets for N PE systems);and each PE can

process one dataset. Dominant representative SIMD systems are Cray’s vector processing

machine and Thinking Machines’ cm*.

• Multiple-instruction, single-data (MISD) systems:

An MISD computing system is a multiprocessor machine capable of executing different

instructions on different PEs but all of them operating on the same dataset (see Figure2.4). For

instance, statements such as y=sin(x)+cos(x)+tan(x) perform different operations on the same

data set. Machines built using the MISD model are not useful in most of the applications; a few

machines are built, but none of them are available commercially. They became more of an

intellectual exercise than a practical configuration.

8

• Multiple-instruction, multiple-data (MIMD) systems:

An MIMD computing system is a multiprocessor machine capable of executing multiple

instructions on multiple data sets (see Figure 2.5). Each PE in the MIMD model has separate

instruction and data streams; MIMD machines are broadly categorized into shared-memory

MIMD and distributed-memory MIMD based on the way PEs are coupled to the main memory.

Shared memory MIMD machines:

9

In the shared memory MIMD model, all the PEs are connected to a single global memory and

they all have access to it(see Figure 2.6). Systems based on this model are also called tightly

coupled multiprocessor systems. The communication between PEs in this model takes place

through the shared memory. Dominant representative shared memory MIMD systems are Silicon

Graphics machines and Sun/IBM’s SMP (Symmetric Multi-Processing).

Distributed memory MIMD machines:

In the distributed memory MIMD model, all PEs have a local memory. Systems based on this

model are also called loosely coupled multi processor systems. The communication between PEs

in this model takes place through the interconnection network (the inter process communication

channel, or IPC). The network connecting PEs can be configured to tree, mesh, cube, and soon.

Each PE operates asynchronously, and if communication/synchronization among tasks is

necessary, they can do so by exchanging messages between them.

1.4 Approaches to parallel Programming:

A sequential program is one that runs on a single processor and has a single line of

control. To make many processors collectively work on a single program, the

program must be divided into smaller independent chunks so that each processor can

work on separate chunks of the problem. The program decomposed in this way is a

parallel program. A wide variety of parallel programming approaches are available.

The most prominent among them are the following:

10

• Data parallelism: In the case of data parallelism, the divide-and-conquer technique

is used to split data into multiple sets, and each data set is processed on different PEs

using the same instruction. This approach is highly suitable to processing on

machines based on the SIMD model.

• Process parallelism: In the case of process parallelism, a given operation has mul-

tiple (but distinct) activities that can be processed on multiple processors

• Farmer-and-worker model: In the case of the farmer- and-worker model, a job

distribution approach is used: one processor is configured as master and all other

remaining PEs are designated as slaves; the master assigns jobs to slave PEs and, on

completion, they inform the master, which in turn collects results

1.4.1 Levels of parallelism:

Levels of parallelism are decided based on the lumps of code (grain size) that can be a

potential candidate for parallelism. The common goal is to boost processor efficiency

by hiding latency. The idea is to execute concurrently two or more single-threaded

applications, such as compiling, text format- ting, database searching, and device

simulation.

parallelism within an application can be detected at several levels:

• Large grain (or task level)

• Medium grain (or control level)

• Fine grain (data level)

• Very fine grain (multiple-instruction issue)

11

1.5 Elements of distributed computing:

1.5.1 General concepts and definitions

A general definition of the term distributed system, we use the one proposed by

Tanenbaum:

A distributed system is a collection of independent computers that appears to its users

as a single coherent system.

Communication is another fundamental aspect of distributed computing. Since

distributed systems are composed of more than one computer that collaborate together,

12

it is necessary to provide some sort of data and information exchange between them,

which generally occurs through the network (proposed by Coulouris):

A distributed system is one in which components located at networked computers

communicate and coordinate their actions only by passing messages.

1.5.2 Components of a distributed system

A distributed system is the result of the interaction of several components that traverse

the entire computing stack from hardware to software. It emerges from the

collaboration of several elements that—by working together—give users the illusion of

a single coherent system. Figure 2.10 provides an overview of the different layers that

are involved in providing the services of a distributed system.

At the very bottom layer, computer and network hardware constitute the physical

infrastructure; these components are directly managed by the operating system, which

provides the basic services for interprocess communication (IPC), process scheduling

and management, and resource management in terms of file system and local

devices.

13

The use of well-known standards at the operating system level and even more at

the hardware and network levels allows easy harnessing of heterogeneous

components and their organization into a coherent and uniform system.

The middleware layer leverages such services to build a uniform environment for the

development and deployment of distributed applications. This layer supports the

programming paradigms for distributed systems. By relying on the services offered

by the operating system, the middleware develops its own protocols, data formats,

and programming language or frameworks for the development of distributed

applications. All of them constitute a uniform interface to distributed application

developers that is completely independent from the underlying operating system and

hides all the heterogeneities of the bottom layers.

The top of the distributed system stack is represented by the applications and services

designed and developed to use the middleware. These can serve several purposes and

often expose their features in the form of graphical user interfaces (GUIs) accessible

locally or through the Internet via a Web browser.

Figure 2.11 shows an example of how the general reference architecture of a

distributed system is contextualized in the case of a cloud computing system.

14

Note that hardware and operating system layers make up the bare-bone

infrastructure of one or more datacenters, where racks of servers are deployed and

connected together through high-speed connectivity. This infrastructure is managed by

the operating system, which provides the basic capability of machine and network

management. The core logic is then implemented in the middleware that manages the

virtualization layer, which is deployed on the physical infrastructure in order to maximize

its utilization and provide a customizable runtime environment for applications. The

middleware provides different facilities to application developers according to the type of

services sold to customers. These facilities, offered through Web 2.0-compliant

interfaces, range from virtual infrastructure building and deployment to application

development and runtime environments.

1.5.3 Architectural styles for distributed computing

Although a distributed system comprises the interaction of several layers, the

middleware layer is the one that enables distributed computing, because it provides a

coherent and uniform runtime environment for applications. There are many different

ways to organize the components that, taken together, constitute such an environment.

15

The interactions among these components and their responsibilities give structure to the

middleware and characterize its type or, in other words, define its architecture.

Architectural styles are mainly used to determine the vocabulary of components and

connectors that are used as instances of the style together with a set of constraints on

how they can be combined.

We organize the architectural styles into two major classes:

• Software architectural styles

• System architectural styles

The first class relates to the logical organization of the software; the second class

includes all those styles that describe the physical organization of distributed software

systems in terms of their major components.

1.5.3.1 Component and connectors:

We intend for components and connectors, since these are the basic building

blocks with which architectural styles are defined. A component represents a unit

of software that encapsulates a function or a feature of the system. Examples of

components can be programs, objects, processes, pipes and filters. A connector is

a communication mechanism that allows cooperation and coordination among

components. Differently from components, connectors are not encapsulated in a

single entity, but they are implemented in a distributed manner over many system

components.

1.5.3.2 Software architectural styles:

Software architectural styles are based on the logical arrangement of software

components. They are helpful because they provide an intuitive view of the

whole system, despite its physical deployment. They also identify the main

abstractions that are used to shape the components of the system and the

expected interaction patterns between them.

These models constitute the foundations on top of which distributed systems are

designed from a logical point of view, and they are discussed in the following

sections.

16

Data centered architectures:

These architectures identify the data as the fundamental element of the software

system, and access to shared data is the core characteristic of the data-centered

architectures.

The repository architectural style is the most relevant reference model in this

category. It is characterized by two main components: the central data structure,

which represents the current state of the system, and a collection of independent

components, which operate on the central data. In particular, repository-based

architectures differentiate and specialize further into subcategories according to

the choice of control discipline to apply for the shared data structure. Of

particular interest are databases and blackboard systems.

The blackboard architectural style is characterized by three main components:

• Knowledge sources. These are the entities that update the knowledge base that

is maintained in the blackboard.

• Blackboard. This represents the data structure that is shared among the

knowledge sources and stores the knowledge base of the application.

• Control. The control is the collection of triggers and procedures that govern the

interaction with the black board and update the status of the knowledgebase.

17

Data-flow architectures:

In the case of data-flow architectures, it is the availability of data that controls the

computation. With respect to the data-centered styles, in which the access to data is the

core feature, data-flow styles explicitly incorporate the pattern of data flow.

Batch Sequential Style.

The batch sequential style is characterized by an ordered sequence of separate programs

executing one after the other. These programs are chained together by providing as input for the

next program the output generated by the last program after its completion, which is most likely

in the form of a file.

Pipe-and-Filter Style:

The pipe-and-filter style is a variation of the previous style for expressing the activity of a

software system as sequence of data transformations. Each component of the processing chain is

called a filter, and the connection between one filter and the next is represented by a data stream.

Filters generally do not have state, know the identity of neither the previous nor the next filter,

and they are connected with in-memory data structures such as first-in/first-out (FIFO) buffers or

other structures. This particular sequencing is called pipelining and introduces concurrency in the

execution of the filters. Data-flow architectures are optimal when the system to be designed

embodies a multistage process, which can be clearly identified into a collection of separate

components that need to be orchestrated together.

18

Virtual machine architectures:

The virtual machine class of architectural styles is characterized by the presence of an abstract

execution environment (generally referred as a virtual machine) that simulates features that are

not available in the hardware or software. Applications and systems are implemented on top of

this layer and become portable over different hardware and software environments as long as

there is an implementation of the virtual machine they interface with.

Rule-Based Style:

This architecture is characterized by representing the abstract execution environment as an

inference engine. Programs are expressed in the form of rules or predicates that hold true. The

input data for applications is generally represented by a set of assertions or facts that the

inference engine uses to activate rules or to apply predicates, thus transforming data. The output

can either be the product of the rule activation or a set of assertions that holds true for the given

input data. The set of rules or predicates identifies the knowledge base that can be queried to

infer properties about the system. The use of rule-based systems can be found in the networking

domain: network intrusion detection systems (NIDS) often rely on a set of rules to identify

abnormal behaviors connected to possible intrusions in computing systems.

Interpreter Style.

The core feature of the interpreter style is the presence of an engine that is used to interpret a

pseudo-program expressed in a format acceptable for the interpreter. The interpretation of the

pseudo-program constitutes the execution of the program itself. Systems modeled according to

this style exhibit four main components: the interpretation engine that executes the core activity

of this style, an internal memory that contains the pseudo-code to be interpreted, a representation

of the current state of the engine, and a representation of the current state of the program being

executed.

Virtual machine architectural styles are characterized by an indirection layer between

applications and the hosting environment. This design has the major advantage of decoupling

applications from the underlying hardware and software environment, but at the same time it

introduces some disadvantages, such as a slowdown in performance.

19

Call & return architectures

This category identifies all systems that are organised into components mostly connected

together by method calls. The activity of systems modeled in this way is characterized by a chain

of method calls whose overall execution and composition identify the execution of one or more

operations.

Top-Down Style.

This architectural style is quite representative of systems developed with imperative

programming, which leads to a divide-and-conquer approach to problem resolution. Systems

developed according to this style are composed of one large main program that accomplishes its

tasks by invoking subprograms or procedures. The components in this style are procedures and

subprograms, and connections are method calls or invocation. The calling program

passes information with parameters and receives data from return values or parameters. Method

calls can also extend beyond the boundary of a single process by leveraging techniques for

remote method invocation, such as remote procedure call (RPC) and all its descendants.

Object-Oriented Style.

This architectural style encompasses a wide range of systems that have been designed and

implemented by leveraging the abstractions of object-oriented programming (OOP). Systems are

specified in terms of classes and implemented in terms of objects. Classes define the type of

components by specifying the data that represent their state and the operations that can be done

over these data. One of the main advantages over the top-down style is that there is a coupling

between data and operations used to manipulate them.

Layered Style.

The layered system style allows the design and implementation of software systems in terms of

layers, which provide a different level of abstraction of the system. Each layer generally operates

with at most two layers: the one that provides a lower abstraction level and the one that provides

a higher abstraction layer. Specific protocols and interfaces define how adjacent layers interact.

It is possible to model such systems as a stack of layers, one for each level of abstraction.

20

Architectural styles based on independent components

This class of architectural style models systems in terms of independent components that have

their own life cycles, which interact with each other to perform their activities. There are two

major categories within this class—communicating processes and event systems—which

differentiate in the way the interaction among components is managed.

Communicating Processes.

In this architectural style, components are represented by independent processes that leverage

IPC facilities for coordination management. This is an abstraction that is quite suitable to

modeling distributed systems that, being distributed over a network of computing nodes, are

necessarily composed of several concurrent processes. Each of the processes provides other

processes with services and can leverage the services exposed by the other processes.

Event Systems.

In this architectural style, the components of the system are loosely coupled and connected. In

addition to exposing operations for data and state manipulation, each component also publishes

(or announces) a collection of events with which other components can register. In general, other

components provide a callback that will be executed when the event is activated. During the

activity of a component, a specific runtime condition can activate one of the exposed events, thus

triggering the execution of the callbacks registered with it.

The advantage of such an architectural style is that it fosters the development of open

systems: new modules can be added and easily integrated into the system as long as they have

compliant inter-

faces for registering to the events.

1.5.3.3 System architectural styles

System architectural styles cover the physical organization of components and processes over a

distributed infrastructure. They provide a set of reference models for the deployment of such

systems and help engineers not only have a common vocabulary in describing the physical layout

of systems but also quickly identify the major advantages and drawbacks of a given deployment

and whether it is applicable for a specific class of applications.

21

Client/server

This architecture is very popular in distributed computing and is suitable for a wide variety of

applications. As depicted in Figure2.12, the client/server model features two major components:

a server and a client. These two components interact with each other through a network

connection using a given protocol. The communication is unidirectional: The client issues are

quest to the server, and after processing the request the server returns a response. There could be

multiple client components issuing requests to a server that is passively waiting for them. Hence,

the important operations in the client-server paradigm are request, accept (clientside), and listen

and response (server side).

• Thin-client model. In this model, the load of data processing and transformation is put on the

server side, and the client has a light implementation that is mostly concerned with retrieving and

returning the data it is being asked for, with no considerable further processing.

• Fat-client model. In this model, the client component is also responsible for processing and

transforming the data before returning it to the user, where as the server features a relatively light

implementation that is mostly concerned with the management of access to the data.

Client/server model

Peer-to-peer

The peer-to-peer model, depicted in Figure2.13, introduces a symmetric architecture in

which all the components, called peers, play the same role and incorporate both client

and server capabilities of the client/server model. More precisely, each peer acts as a

22

server when it processes requests from other peers and as a client when it issues

requests to other peers. With respect to the client/ server model that partitions the

responsibilities of the IPC between server and clients, the peer-to- peer model attributes

the same responsibilities to each component.

Interesting example of peer-to-peer architecture is represented by the Skype network.

The system architectural styles presented in this section constitute a reference model

that is further enhanced or diversified according to the specific needs of the application

to be designed and implemented. The server and client abstraction can be used in some

cases to model the macro scale or the micro scale of the systems.

nteresting example of peer-to-peer architecture is represented by the Skype network.

1.5.4 Models for inter process communication

Distributed systems are composed of a collection of concurrent processes interacting with each

other by means of a network connection. Therefore, IPC is a fundamental aspect of distributed

systems design and implementation. IPC is used to either exchange data and information or

coordinate the activity of processes. IPC is what ties together the different components of a

distributed system, thus making them act as a single system.

23

2.Introduction to Cloud Computing:

Cloud Computing provides us a means by which we can access the applications as utilities, over

the Internet. It allows us to create, configure, and customize applications online.

What is Cloud?

The term Cloud refers to a Network or Internet. In other words, we can say that Cloud is

something, which is present at remote location. Cloud can provide services over network, i.e.,

on public networks or on private networks, i.e., WAN, LAN or VPN. Applications such as e-

mail, web conferencing, customer relationship management (CRM), all run in cloud.

What is Cloud Computing?

Cloud Computing refers to manipulating, configuring, and accessing the applications online. It

offers online data storage, infrastructure and application.

We need not to install a piece of software on our local PC and this is how the cloud computing

overcomes platform dependency issues. Hence, the Cloud Computing is making our business

application mobile and collaborative.

.

24

CLOUD COMPUTING IN A NUTSHELL:

Cloud computing has been coined as an umbrella term to describe a category of sophisticated on-

demand computing services initially offered by commercial providers, such as Amazon, Google,

and Microsoft. It denotes a model on which a computing infrastructure is viewed as a “cloud,”

from which businesses and individuals access applications from anywhere in the world on

demand. The main principle behind this model is offering computing, storage, and software “as a

service.”

ROOTS OF CLOUD COMPUTING:

We can track the roots of clouds computing by observing the advancement of several

technologies, especially in hardware (virtualization, multi-core chips), Internet technologies

(Web services, service-oriented architectures, Web 2.0), distributed computing (clusters, grids),

and systems management (autonomic computing, data center automation).

Computing delivered as a utility can be defined as “on demand delivery of infrastructure,

applications, and business processes in a security-rich, shared, scalable, and based computer

environment over the Internet for a fee”.

This model brings benefits to both consumers and providers of IT services. Consumers can attain

reduction on IT-related costs by choosing to obtain cheaper services from external providers as

opposed to heavily investing on IT infrastructure and personnel hiring. The “on-demand”

component of this model allows consumers to adapt their IT usage to rapidly increasing or

unpredictable computing needs.

25

3.Cloud Computing Architecture:

Cloud Computing is trending in today’s technology driven world. With the advantages of

flexibility, storage, sharing and easy accessibility, Cloud is being used by major players in IT.

Apart from companies, individuals also use Cloud technologies for various daily activities. From

using Google drive to store, to Skype to chat and Picasa web albums, we use Cloud Computing

platforms extensively. Cloud Computing is a service provided via virtual networks, especially

the world wide web.

Cloud Computing architecture refers to the various components and sub-components of cloud

that constitute the structure of the system. Broadly, this architecture can be classified into two

sections:

-Front-end

-Back-end

Each of the ends are connected through a network, usually via Internet. The following diagram

shows the graphical view of cloud computing architecture:

https://www.simplilearn.com/cloud-computing-trends-for-2016-and-beyond-article

26

The front-end and back-end are connected to each other via a virtual network or the internet.

There are other components like Middleware, Cloud Resources, etc, that are part of the Cloud

Computing architecture.

Front-end is the side that is visible to the client, customer or the user. It includes the client’s

computer system or network that is used for accessing the cloud system. Different Cloud

Computing systems have different user interfaces. For email programs, support is driven from

web browsers like Firefox, Chrome, and Internet Explorer.

Back-end is the side used by the service provider. It includes various servers, computers, data

storage systems, and virtual machines that together constitute the cloud of computing services.

This system can include different types of computer programs. Each application in this system is

managed by its own dedicated server. The back-end side has some responsibilities to fulfill for

the client.

-To provide security mechanisms, traffic control and protocols

-To employ protocols that connects networked computers for communication

Protocols

A single central server is used to manage the entire Cloud Computing system. This server is

responsible for monitoring traffic and making each end run smoothly without any disruption.

This process is followed with a fixed set of rules called Protocols. Also, a special set of software

called Middleware is used to perform the processes. Middleware connects networked computers

to each other.

Depending on the client’s demand, adequate storage space is provided by the Cloud Computing

service provider. While some companies require huge number of digital storage devices, others

do not require as much space. The Cloud Computing service provider usually has capacity for

twice the amount of storage space that is required by the client. This is to keep a copy of client’s

data secured during system breakdown. Saving copies of data for backup is known as

Redundancy.

27

4.Cloud Concepts and Technologies:

This Chapter Covers

Concepts and enabling technologies of cloud computing including:

Virtualization

Load balancing

Scalability & Elasticity

Deployment

Replication

Monitoring 4.1 Virtualization Virtualization refers to the partitioning the resources of a physical system (such as computing,

storage, network and memory) into multiple virtual resources. Virtualization is the key

enabling technology of cloud computing and allows pooling of resources. In cloud computing,

resources are pooled to serve multiple users using multi-tenancy. Multi-tenant aspects of the

cloud allow multiple users to be served by the same physical hardware. Users are assigned

virtual resources that run on top of the physical resources. Figure 2.1 shows the architecture

of a virtualization technology in cloud computing. The physical resources such as computing,

storage memory and network resources are virtualized. The virtualization layer partitions the

physical resources into multiple virtual machines. The virtualization layer allows multiple

operating system instances to run currently as virtual machines on the same underlying

physical resources.

Hypervisor

The virtualization layer consists of a hypervisor or a virtual machine monitor (VMM). The

hypervisor presents a virtual operating platform to a guest operating system (OS). There are

two types of hypervisors as shown in Figures 2.2 and 2.3 . Type-1 hypervisors or the native

hypervisors run directly on the host hardware and control the hardware and monitor the guest

operating systems. Type 2 hypervisors or hosted hypervisors run on top of a conventional

(main/host) operating system and monitor the guest operating systems.

Guest OS

A guest OS is an operating system that is installed in a virtual machine in addition to the host or

main OS. In visualization, the guest OS can be different from the host OS.

Various forms of virtualization approaches exist:

28

Full Virtuallzation

In full virtualization, the virtualization layer completely decouples the guest OS from the

underlying hardware. The guest OS requires no modification and is not aware that it is being

virtualized. Full virtualization is enabled by direct execution of user requests and binary

translation of OS requests. Figure 2.4 shows the full virtualization approach.

Para-Virtualization

In para-virtualization, the guest OS is modified to enable communication with the hypervisor to

improve performance and efficiency. The guest OS kernel is modified to replace nonvirtualizable

instructions with hypercalls that communicate directly with the virtualization layer hypervisor.

Figure 2.5 shows the para-virtualization approach.

Hardware Virtualization

Hardware assisted virtualization is enabled by hardware features such as Intel's Virtualization

Technology (VT-x) and AMD's AMD-V. In hardware assisted virtualization, privileged and

sensitive calls are set to automatically trap the hypervisor. Thus, there is no need for either binary

translation or para-virtualization.

4.2 Load Balancing

One of the important features of cloud computing is scalability. Cloud computing resources can

be scaled up on demand to meet the performance requirements of applications. Load balancing

distributes workloads across multiple servers to meet the application workloads. The goals of

load balancing techniques are to achieve maximum utilization of resources, minimizing the

response times, maximizing throughput. Load balancing distributes the incoming user requests

across multiple resources. With load balancing, cloud-based applications can achieve high

availability and reliability. Since multiple resources under a load balancer are used to serve the

user requests, in the event of failure of one or more of the resources, the load balancer can

automatically reroute the user traffic to the healthy resources. To the end user accessing a cloud-

based application, a load balancer makes the pool of servers under the load balancer appear as a

single server with high computing capacity.

Round Robin

In round robin load balancing, the servers are selected one by one to serve the incoming requests

in a non-hierarchical circular fashion with no priority assigned to a specific server.

Weighted Round Robin

29

In weighted round robin load balancing, severs are assigned some weights. The incoming

requests are proportionally routed using a static or dynamic ratio of respective weights.

Low Latency

In low latency load balancing the load balancer monitors the latency of each server. Each

incoming request is routed to the server which has the lowest latency.

Least Connections

In least connections load balancing, the incoming requests are routed to the server with the least

number of connections.

Priority

In priority load balancing, each server is assigned a priority. The incoming traffic is routed to the

highest priority server as long as the server is available. When the highest priority server fails,

the incoming traffic is routed to a server with a lower priority.

Overflow

Overflow load balancing is similar to priority load balancing. When the incoming requests to

highest priority server overflow, the requests are routed to a lower priority server.

Figure 2.6 depicts these various load balancing approaches. For session based applications, an

important issue to handle during load balancing is the persistence of multiple requests from a

particular user session. Since load balancing can route successive requests from a user session to

different servers, maintaining the state or the information of the session is important. Three

commonly used persistence approaches are described below:

Sticky sessions

In this approach all the requests belonging to a user session are routed to the same server. These

sessions are called sticky sessions. The benefit of this approach is that it makes session

management simple. However, a drawback of this approach is that if a server fails all the

sessions belonging to that server are lost, since there is no automatic failover possible.

Session Database

In this approach, all the session information is stored externally in a separate session database,

which is often replicated to avoid a single point of failure. Though, this approach involves

additional overhead of storing the session information, however, unlike the sticky session

approach, this approach allows automatic failover.

30

Browser cookies

In this approach, the session information is stored on the client side in the form of browser

cookies. The benefit of this approach is that it makes the session management easy and has the

least amount of overhead for the load balancer.

URL re-writing

In this approach, a URL re-write engine stores the session information by modifying the URLs

on the client side. Though this approach avoids overhead on the load balancer, a drawback is that

the amount of session information that can be stored is limited. For applications

4.3 Scalability & Elasticity

Multi-tier applications such as e-Commerce, social networking, business-to-business, etc. can

experience rapid changes in their traffic. Each website has a different traffic pattern which is

determined by a number of factors that are generally hard to predict beforehand. Modern web

applications have multiple tiers of deployment with varying number of servers in each tier.

Capacity planning is an important task for such applications. Capacity planning involves

determining the right sizing of each tier of the deployment of an application in terms of the

number of resources and the capacity of each resource. Capacity planning may be for computing,

storage, memory or network resources. Figure 2.7 shows the cost versus capacity curves for

traditional and cloud approaches.

Traditional approaches for capacity planning are based on predicted demands for applications

and account for worst case peak loads of applications. When the workloads of applications

increase, the traditional approaches have been either to scale up or scale out.

4.4 Deployment

Deployment prototyping can help in making deployment architecture design choices. By

comparing performance of alternative deployment architectures, deployment prototyping can

help in choosing the best and most cost effective deployment architecture that can meet the

application performance requirements. Deployment design is an iterative process that involves

the following steps:

Deployment Design

In this step the application deployment is created with various tiers as specified in the

deployment configuration. The variables in this step include the number of servers in each tier,

computing, memory and storage capacities of severs, server interconnection, load balancing and

replication strategies. Deployment is created by provisioning the cloud.

Performance Evaluation

31

Once the application is deployed in the cloud, the next step in the deployment lifecycle is to

verify whether the application meets the performance requirements with the deployment. This

step involves monitoring the workload on the application and measuring various workload

parameters such as response time and throughput. In addition to this, the utilization of servers

(CPU, memory, disk, I/O, etc.) in each tier is also monitored.

Deployment Refinement

After evaluating the performance of the application, deployments are refined so that the

application can meet the performance requirements. Various alternatives can exist in this step

such as vertical scaling (or scaling up), horizontal scaling (or scaling out), alternative server

interconnections. alternative load balancing and replication strategics, for instance.

4.5 Replication

Replication is used to create and maintain multiple copies of the data in the cloud. Replication of

data is important for practical reasons such as business continuity and disaster recovery.

In the event of data loss at the primary location, organizations can continue to operate their

applications from secondary data sources. With real-time replication of data, organizations can

achieve faster recovery from failures. Traditional business continuity and disaster recovery

approaches don't provide efficient, cost effective and automated recovery of data. Cloud based

data replication approaches provide replication of data in multiple locations, automated recovery,

low recovery point objective (RPO) and low recovery time objective (RTO). Cloud enables rapid

implementation of replication solutions for disaster recovery for small and medium enterprises

and large organizations. With cloud-based data replication organizations can plan for disaster

recovery without making any capital expenditures on purchasing, configuring and managing

secondary site locations. Cloud provides affordable replication solutions with pay-per-use/pay-

as-you-go pricing models. There are three types of replication approaches as shown in Figure 2.9

and described as follows:

Array based replication

Host based replication

Network based replication

4.6 Monitoring

Cloud resources can be monitored by monitoring services provided by the cloud service

providers. Monitoring services allow cloud users to collect and analyze the data on various

monitoring metrics. Figure 2.10 shows a generic architecture for a cloud monitoring service. A

monitoring service collects data on various system and application metrics from the cloud

computing instances. Monitoring services provide various pre-defined metrics. Users can also

32

define their custom metrics for monitoring the cloud resources. Users can define various actions

based on the monitoring data, for example, auto-scaling a cloud deployment when the CPU

usage of monitored resources becomes high. Monitoring services also provide various statistics

based on the monitoring data collected. Table 2.4 lists the commonly

4.7 Software Defined Networking

Software-Defined Networking (SDN) is a networking architecture that separates the control

plane from the data plane and centralizes the network controller. Figure 2.11 shows the

conventional network architecture built with specialized hardware (switches, routers, etc.).

Network devices in conventional network architectures are getting exceedingly complex with the

increasing number of distributed protocols being implemented and the use of proprietary

hardware and interfaces. In the conventional network architecture the control plane and data

plane are coupled. Control plane is the part of the network that carries the signaling and routing

message traffic while the data plane is the part of the network that carries the payload data

traffic.

The limitations of the conventional network architectures are as follows:

• Complex Network Devices: Conventional networks are getting increasingly complex with

more and more protocols being implemented to improve link speeds and reliability.

Interoperability is limited due to the lack of standard and open interfaces. Network devices use

proprietary hardware and software and have slow product lifecycles limiting innovation. The

conventional networks were well suited for static traffic patterns and had a large number of

protocols designed for specific applications. With the emergence of cloud computing and

proliferation of internet access devices, the traffic patterns are becoming more and more

dynamic. Due to the complexity of conventional network

4.8 Network Function Virtualization

Network Function Virtualization (NFV) is a technology that leverages virtualization to

consolidate the heterogeneous network devices onto industry standard high volume servers,

switches and storage. NFV is complementary to SDN as NFV can provide the infrastructure on

which SDN can run. NFV and SDN are mutually beneficial to each other but not dependent.

Figure 2.16 shows the NFV architecture, as being standardized by the European Telecom-

munications Standards Institute (ETSI) [III. Key elements of the NFV architecture are as

follows:

Virtualized Network Function (VNF): VNF is a software implementation of a

network function which is capable of running over the NFV Infrastructure (NFVI).

33

NFV Infrastructure (NFVI): NFVI includes compute, network and storage

resources that are virtualized.

NFV Management and Orchestration: MN Management and Orchestration

focuses on all virtualization-specific management tasks and covers the orchestration and

lifecycle management of physical and/or software resources that support the

infrastructure virtualization, and the lifecycle management of VNFs.

NFV comprises of network functions implemented in software that run on virtualized resources

in the cloud. NFV enables a separation the network functions which are implemented.

5.Cloud Services and Platforms:

There are various types of cloud services and for each category of cloud services, examples

of services are provided by various cloud service providers including Amazon, Google and

Microsoft.

5.1 Compute Services

Compute services provide dynamically scalable compute capacity in the cloud. Compute

resources can be provisioned on-demand in the form of virtual machines. Virtual machines can

be created from standard images provided by the cloud service provider (e.g. Ubuntu image,

Windows server image, etc.) or custom images created by the users. A machine image is a

template that contains a software configuration (operating system. application server, and

applications). Compute services can be accessed from the web consoles of these services that

provide graphical user interfaces for provisioning, managing and monitoring these services.

Cloud service providers also provide APIs for various programming languages (such as Java,

Python. etc. ) that allow developers to access and manage these services programmatically.

Features

• Scalable: Compute services allow rapidly provisioning as many virtual machine instances

as required. The provisioned capacity can be scaled-up or down based on the workload levels.

Auto-scaling policies can be defined for compute services that are triggered when the

monitored metrics (such as CPU usage. memory usage. etc.) go above pre-defined thresholds.

34

Flexible: Compute services give a wide range of options for virtual machines with multiple

instance types, operating systems. zones/regions, etc.

Secure: Compute services provide various security features that control the access to the

virtual machine instances such as security groups, access control lists, network fire-walls,

ctc. Users can securely connect to the instances with SSH using authentication mechanisms

such as (Muth or security certificates and keypairs.

Cost effective: Cloud service providers offer various billing options such as on-demand instances

which arc billed per-hour, reserved instances which arc reserved after one-time initial

payment, spot instances for which users can place bids. etc.

5.2 Storage Services

Cloud storage allow storage and retrieval of any amount of data, at anytime from anywhere on

the web. Most cloud storage services organize data into buckets or containers. Buckets or

containers store objects which arc individual pieces of data

Features

Scalahility: Cloud storage services provide high capacity and scalahility. Objects upto

several tera-bytes in size can be uploaded and multiple buckets/containers can be created on

cloud storages.

Replication: When an object is uploaded it is replicated at multiple facilities and/or on

multiple devices within each facility.

Access Policies: Cloud storage services provide several security features such as Access

Control Lists (ACLs). bucket/container level policies. etc. ACLs can be used to selectively

grant access permissions on individual objects. Bucket/container levelpolicies can also be

defined to allow or deny Nrtnissions across some or all of the objects within a single

bucket/container.

Encryption: Cloud storage services provide Server Side Encryption (SSE) options to encrypt

all data stored in the cloud storage.

Consistency: Strong data consistency is provided for all upload and delete operations. Therefore,

any object that is uploaded can be immediately downloaded after the upload is complete.

35

5.3 Database Services

Cloud database services allow you to set-up and operate relational or non-relational databases in

the cloud. The benefit of using cloud database services is that it relieves the application

developers from the time consuming database administration tasks. Popular relational

databases provided by various cloud service providers include MySQL, Oracle, SQL Server,

etc. The non-relational (No-SQL) databases provided by cloud service providers are mostly

proprietary solutions. No-SQL databases are usually fully-managed and deliver seamless

throughput and scalability. The characteristics of relational and non-relational databases are

described in Chapter 5.

Features

Scalability: Cloud database services allow provisioning as much compute and storage resources

as required to meet the application workload levels. Provisioned capacity can be scaled-up or

down. For read-heavy workloads, read-replicas can be created.

Reliability: Cloud database services arc reliable and provide automated backup and snapshot

options.

Performance: Cloud database services provide guaranteed performance with options such as

guaranteed input/output operations per second (TOPS) which can be provisioned upfront.

Security: Cloud database services provide several security features to restrict the access to the

database instances and stored data, such as network firewalls and authentication mechanisms.

5 .4 Applicat ion Services

In this section you will learn about various cloud application services such as application

runtimes and frameworks, queuing services, email services, notification services and media

services.

5.5 Content Delivery Services

Cloud-based content delivery service include Content Delivery Networks (CDNs). A CDN is a

distributed system of servers located across multiple geographic locations to serve content to

36

end-users with high availability and high performance. CDNs are useful for serving static content

such as text, images, scripts, etc., and streaming media. CDNs have a number of edge locations

deployed in multiple locations, often over multiple backbones. Requests for static or streaming

media content that is served by a CDN are directed to the nearest edge location. CDNs cache the

popular content on the edge servers which helps in reducing bandwidth costs and improving

response times.

5.6 Analytics Services

Cloud-based analytics services allow analyzing massive data sets stored in the cloud either in

cloud storages or in cloud databases using programming models such as MapReduce. Using

cloud analytics services applications can perform data-intensive tasks such as such as data

mining, log file analysis, machine learning, web indexing, etc.

5.7 Deployment & Management Services

Cloud-based deployment & management services allow you to easily deploy and manage

applications in the cloud. These services automatically handle deployment tasks such as capacity

provisioning. load balancing. auto-scaling, and application health monitoring.

5.8 Identity & Access Management Services

Identity & Access Management (IDAM) services allow managing the authentication and

authorization of users to provide secure access to cloud resources. IDAM services are useful for

organizations which have multiple users who access the cloud resources. Using IDAM services you

can manage user identifiers, user permissions, security credentials and access keys.

5.9 Open Source Private Cloud Software

In the previous sections you learned about popular public cloud platforms. This section covers

open source cloud software that can he used to build private clouds.

5.9.1 CloudStack

Apache CloudStack is an open source cloud software that can be used for creating private

cloud offerings. CloudStack manages the network, storage, and compute nodes that make

37

up a cloud infrastructure. A CloudStack installation consists of a Management Server and

the cloud infrastructure that it manages. The cloud infrastructure can be as simple as one host

running the hypervisor or a large cluster of hundreds of hosts. Thc Management Server

allows you to configure and manage the cloud resources. Figure 3.21 shows the architecture

of CloudStack which is basically the Management Server. The Management Server

manages one or more zones where each zone is typically a single datacenter. Each zone

has one or more pods. A pod is a rack of hardware comprising of a switch and one or more

clusters. A cluster consists of one or more hosts and a

Primary storage. A host is a compute node that runs guest virtual machines. The primary

storage of a cluster stores the disk volumes for all the virtual machines running on the hosts

in that cluster.

5.9.2 Eucalyptus

Eucalyptus is an open source private cloud software for building private and hybrid clouds that

are compatible with Amazon Web Services (AWS) APIs. The architecture of Eucalyptus has The

Node Controller (NC) hosts the virtual machine instances and manages the virtual network

endpoints. The cluster-level (availability-zone) consists of three components - Cluster Controller

(CC), Storage Controller (SC) and VMWare Broker. The CC manages the virtual machines and

is the front-end for a cluster. The SC manages the Eucalyptus block volumes and snapshots to

the instances within its specific cluster. SC is equivalent to AWS Elastic Block Store (EBS). The

VMWarc Broker is an optional component that provides an AWS-compatible interface for

VMware environments. At the cloud-level there are two components - Cloud Controller (CLC)

and Walrus. CLC provides an administrative interface for cloud management and performs high-

level resource scheduling. system accounting. authentication and quota management.

5.9.3 OpenStack

OpenStack is a cloud operating system comprising of a collection of interacting services that

control computing, storage. and networking resources The OpenStack compute service (called

nova-compute) manages networks of virtual machines running on nodes, providing virtual

servers on demand. The network service (called nova-networking) provides connectivity

between the interfaces of other OpenStack services. The volume service (Cinder) manages

38

storage volumes for virtual machines. The object storage service (swift) allows users to store and

retrieve tiles. The identity service (keystone) provides authentication and authorization for other

services. The image registry (glance) acts as a catalog and repository for virtual machine images.

The OpenStack scheduler (nova-scheduler) maps the nova-API calls to the appropriate

OpenStack components. The scheduler takes the virtual machine requests from the queue and

determines where they should run. The messaging service (rabbit-mq) acts as a central node for

message passing between daemons. Orchestration activities such as running an instance are

performed by the nova-api which accepts and responds to end user compute API calls. The

Open-Stack dashboard (called horizon) provides web-based interface for managing OpenStack

services.

6.Cloud Models:

Cloud Service Models

Cloud computing services are offered to users in different forms. NIST defines atleast three

service models as follows:

Infrastructure-as-a-Service (laaS)

laaS provides the users the capability to provision computing and storage resources. These

resources are provided to the users as virtual machine instances and virtual storage. Users can

start, stop, configure and manage the virtual machine instances and virtual storage. Users can

deploy operating systems and applications of their choice on the virtual resources provisioned in

the cloud. The cloud service provider manages the underlying infrastructure. Virtual resources

provisioned by the users are billed based on a pay-per-use paradigm. Common metering metrics

used arc the number of virtual machine hours used and/or the amount of storage space

provisioned.

Platform-as-a-Service (PaaS)

PaaS provides the users the capability to develop and deploy application in the cloud using the

development tools, application programming interfaces (APIs), software libraries and services

39

provided by the cloud service provider. The cloud service provider manages the underlying cloud

infrastructure including servers, network, operating systems and storage. The users, themselves,

arc responsible for developing, deploying, configuring and managing applications on the cloud

infrastructure.

Software-as-a-Service (SaaS)

SaaS provides the users a complete software application or the user interface to the the

application itself. The cloud service provider manages the underlying cloud infrastructure

including servers, network, operating systems, storage and application software, and the user is

unaware of the underlying architecture of the cloud. Applications are provided to the user

through a thin client interface (e.g., a browser). SaaS applications arc platform independent and

can be accessed from various client devices such as workstations, laptop, tablets and

smartphones, running different operating systems. Since the cloud service provider manages both

the application and data, the users are able to access the applications from anywhere.

40

Cloud Deployment Models:

NIST also defines four cloud deployment models as follows:

Public cloud

In the public cloud deployment model, cloud services arc available to the general public or a

large group of companies. The cloud resources are shared among different users (individuals,

large organizations, small and medium enterprises and governments). The cloud services are

provided by a third-party cloud provider. Public clouds are best suited for users who want to use

cloud infrastructure for development and testing of applications and host applications in the

cloud to serve large workloads, without upfront investments in IT infrastructure.

Private cloud

In the private cloud deployment model, cloud infrastructure is operated for exclusive use of a

single organization. Private cloud services are dedicated for a single organization. Cloud

infrastructure can be setup on premise or off-premise and may be managed internally or by a

third-party. Private clouds are best suited for applications where security is very important and

organizations that want to have very tight control over their data.

Hybrid cloud

The hybrid cloud deployment model combines the services of multiple clouds (private or public).

The individual clouds retain their unique identities but are bound by standardized or proprietary

technology that enables data and application portability. Hybrid clouds are best suited for

organizations that want to take advantage of secured application and data hosting on a private

cloud, and at the same time benefit from cost savings by hosting shared applications and data in

public clouds.

Community cloud

In the community cloud deployment model, the cloud services are shared by several orga-

nizations that have the same policy and compliance considerations. Community clouds are best

suited for organizations that want to access to the same applications and data, and want the cloud

costs to be shared with larger group.

41

7.CLOUD AS A SERVICE:

In today's economy many

businesses are faced with

challenges like:

● "taking cost" out of their infrastructure

● deliver new, innovative business services

● "do more with less"

● fast change their IT infrastructure

● IT resource optimization and lowering cost

● Add rental-style capability to IT resource u

42

7.1 Gamut A complete range of cloud solutions,

43

44

45

7.2principal technologies,

46

7.3 cloud strategy,

7.4 cloud design and implementation using SOA,

7.5 Conceptual cloud model,

47

7.6 cloud service demand.

8.Cloud Solutions:

8.1 Introduction

Cloud environment presents

A new opportunity to enhance the user experience by providing a broader communication

path for reaching out to the user.

Providing a series of business services to the user via the application features.

Deploying the application to the cloud is somewhat different since the deployment process will

not be done locally within the enterprise and the existence of the provisioned image and a series

of deployment steps needed to deploy the application and validate the deployment.Development

and testing environments are readily available within the cloud environment.The advantages of

these environments, especially from a costing perspective, are numerous as there is no need to

purchase any servers within the normal enterprise environment.If a POC is being developed and

project is cancelled, no software, hardware and even development tools would have to be

purchased, only to be thrown away later as the cloud supports development and testing of

application.

8.1 1. Cloud Application Planning

The design and development of cloud application requires

many unique considerations:

Business Functions

Application Architecture

Security for cloud computing Cloud delivery model User experience Development, testing, and

48

runtime environment Application architecture is selected through some sort of criteria

evaluation.The key thing is talk from a security aspect is the enhancements to the existing

security model where data protection and the isolation of the data from the other areas of the

cloud environment.Encryption is one possibility to further enhance the security model whereas

the enterprise would not necessary to invoke that option.

8.1 2. Cloud Business and Operational

Support Services (BSS & OSS)

Business Support services (B_SS) are the components that cloud operators use to run

their business operations. Such operations include - taking customer orders, managing

customer data etc

Operational support Services ( OSS) are computer systems used by cloud service

providers - network services, provisioning services

BSS and OSS need to be externalized so that they can be moved to cloud environments.

8.2 Cloud Ecosystem

Bringing any cloud service to market requires corresponding pre-investment along with

respective metering and charging models in support of the corresponding business.

8.3 Cloud business process management

49

Business process management (BPM) governs an organization's cross-functional, customer-

focussed, end-to-end core business processes.

Its objective is to direct and deploy resources from across the organization into efficient

processes that create customer value-

. It focuses on driving overall bottom line success by integrating verticals and optimizing core

work. examples

order-to-cash

integrated product development

integrated supply chain

. This what differentiates BPM from traditional functional management disciplines

. In addition, intrinsic to BPM is the principle of continuous improvement, perpetually increasing

value-generation and sustaining market competitiveness (or dominance) of the organization.

. BPM clearly defines and aligns operations, organizations and information technology.

The cloud environment could help BPM

Integration of core process

holistic

cross organizational functions and boundaries ( height and breadth)

Includes business and Technology

. Continuous

This is based on longer periods of intervals pertaining to cloud business

Continual improvement

50

Cultural

Cultural considerations of the organization and geographical area kept in mind at the time of due

diligence of the requirement.

8.3.1 Identifying BPM Opportunities

The following exploratory set of question might uncover opportunities using Cloud for

Identifying BPM opportunities.

What are the strategic value proposition and capabilities defined by the enterprise?

How do you manage core business processes?

How does your customers measure and assess the performance

Cloud application development offering provide

Cloud application reference architecture

Unmatched experience developing high-performing, secure applications across a wide

range of technologies of the cloud vendor

Unmatched application security expertise

Leadership in the cloud related technologies- multi-tenancy, virtualization, pervasive

computing

Significant expertise with cloud business models

Ability to integrate a portfolio of related cloud services ex. gmail

8.3.2 Cloud Technical Strategy

Cloud services enable user to build middleware clouds in their data centers and utilize public

clouds. The following cloud-enabled services are provided

51

Infrastructure services

Platform services

ApplicatIon services

Cloud Strategy enables organizations to do the following :

. Build middleware cloud in their data center

. Utilize public clouds, where it makes sense.

. It does so by providing support in the following areas

Cloud Strategy enables organizations to do the following

Cloud enabled middleware Services

Infrastructure services

Platform services

Application services

Serving the on premise and public clouds

8.3.3 Cloud Service Management

A service Management system provides the visibility, control and automation needed for

efficient cloud delivery in both public and private implementations:

. Enable policies to lower cost with provisioning

. Automated provisioning and deprovisioning speeds service delivery

. Provisioning policies allow release and reuse of assets

. Increase system administrator productivity

Cloud services are managed either by in-house teams or cloud brokers.

Every service-oriented approach needs a mechanism to enable discovery and end-point

52

resolution

Registry/repository technology provides this where service delivery is inside the firewall

Cloud services delivered across firewalls need something similar— a third party that

serves as a " service broker"

8.4 Cloud Service Management- Cloud Brokers

. These cloud intermediaries will help companies choose the right platform, deploy app across

multiple clouds and

perhaps even provide cloud arbitrage services that allow end user to shift between platform to

capture the best

pricing.

8.5 Cloud Service Management- Cloud Brokers - categories of opportunities

. Cloud service intermediaries

Building services atop an existing cloud platform- such as additional security or management

capabilities

. Cloud Aggregation

Deploying customer services over multiple cloud platform

. Cloud service arbitrage

. supplying flexibility and "opportunities & choices" and fostering competition between clouds

8.6 Cloud Stack

CIoudStack is the bundled offering that includes hardware,software, and services needed

to get started with cloud computing.

53

It includes all the elements in a service ecosystem. It has a self-service portal, it includes

automation, and it tracks and controls all the resources.

It is completely integrated and includes a service and on top of that , users can add

additional services to do integration or other types of cloud work.

CIoudStack is a pre-packaged private cloud offering that brings together the hardware,

software and services needed to establish a private cloud to accelerate your selling efforts

and effectiveness.

. CIoudStack solution is designed from client cloud implementation experience and integrates the

service

management software system with servers, storage, and services to enable a private cloud in IT

environment.

. CIoudStack is " Built for performance" and is based on architectures and configurations

required by specific

Workloads.

. It enables the data center to accelerate the creation of services for a variety of workloads with

high degree of

flexibility, reliability and resource optimization.

8.7 On-premise cloud orchestration & provisioning engine

On-premise cloud orchestration and provisioning engine can be a bundled offer that

includes hardware, software and the services one needs to get started with cloud

computing

Orchestration describes the automated arrangement, coordination, and management of

complex computer systems, middleware, and services.

It should includes all the elements in a services ecosystem.

54

It must have a service-portal and include automation, and track and control all resources

8.8 Computing on Demand ( CoD)

On-demand computing is a necessity in today’s enterprises. Virtualization helps us in

implementing on-demand

computing. Cloud helps enterprises to use resources without buying them. This enables

enterprise to transfer workload to outside when their resource can not support it and let other to

use them when idle.

8.9 Cloud Sourcing

Cloud Sourcing the end to end solution using cloud technology using public cloud,

infrastructure and platform.

Cloud source is a planned approach and comprises or the whole service cycle of

outsourcing business with cloud principles with the help strategized connected cloud

platform that will match the overall enterprise requirements.

9.Cloud Offerings:

9.1Introduction:

Information is pouring in faster than we make sense of it. It is being authored by billions of

people flowing from a trillion intelligent devices, sensors and instrumented objects. With 80% of

new data growth existing as unstructured content from music files, to 3D images, to email

keystrokes and more, the challenge is trying to pull it all together and make it useful.

55

Until now, organizations could not fully or quickly synthesize and interpret all the information

out there- they had make decisions based largely on instinct.But now, there is software that can

capture organize and process all the data scattered throughout an organization, and turn into

actual intelligence. This enables organizations to make better business decisions.

9.2 ILM Objectives

Cost reductions Controlling demand for storage

Better system performance and personal productivity Doing the storage activities "right"

Increased effectiveness Doing the -right storage activities

Ways to generate, enhance and sustain higher savings

.Activities for gaining initial saving Reduce the amount of used storage as a result of initial clean

up

.Activities for maximizing saving Reconfigure the current storage environment effectively

improving the available to raw utilization

.Activities for sustaining savings Develop storage architecture governance model

9.2 ILM Objectives

How the pike Of a rall tlekot breaks down 26,

Cost Components 25%

Operating cost categories 22%

personal, facilities, storage hardware maintenance, storage software maintenance,

outagesInvestment cost categories new hardware required, new software required, hardware

refresh, transition services.

56

9.3 Information Management Points

Data

Information

ILM

Information Taxonomy

Information Classes

Value Driven data Placement

Storage Process

Storage service

Enterprise Class of Service ( CoS)

9.4 Information Management points

Storage service

Enterprise Class of Service ( CoS)

Storage Tier

Tiered Storage Infrastructure

Utility based Service Delisery

9.5 Cloud Analytics

Cloud analytics is the new offering in the new era of cloud computing.

This will help consulting domain and will ensure the better results.

It provides user with better forecasting techniques to analyze and optimize the

service lines and provides a higher level of accuracy.

57

9.6 Cloud Analytics

It also helps to apply analytics principles and best practices to analyse different

business consequences and achieve new levels of optimization.

This can combine complex analytics with the newer software platforms and will

lead towards the predictable business out of every business insight.

5.3.1 Cloud Business Analytics Competencies:

Cloud analytics is supported by different types of competencies.

1.Cloud Business Analytics Strategy that helps client Analytics and optimization – provides

different type or modeling techniques, deep computing and simulation techniques to check

different types of “what if” analysis to increase performance.

2. Business management and performance

management helps increase performance by providing accurate and on-time data

reporting.

9.6.1 Cloud Business Analytics Competencies Cloud analytics is supported by different

types of competencies Enterprise information management that lets the user to apply

different architecture related to data extraction, archival, retrieval, movement and

integration. Content management that includes different service architecture,

technology architecture and process related to capturing, storing, preserving, delivering

and managing the data. It also provides access in the global environment and makes it

easy to share data with stakeholders across the globe.

9.6.2 How it Works: Analytics

Analytics works with the combination of hardware, services and middleware.

This expertise makes it best suited to help clients extract new value from their business

information.

Delivering business analytics and information software requires a seamless flow

of all forms of data regardless of format, platform and location.

58

It focuses on open industry standards is the key to this effort, and gives us

significant advantages.

9.6.3 How it Works: Ana lytics

The system features include the platform that provides data reporting, analytics based on

text, mining activities, business intelligence, dashboard and

perceptive analytics techniques.

This also takes care of the storage optimization and different high-performance data-

warehouse management techniques.

9.6.4 How it Works: Analytics

Analytics Business Outcomes

Analytics systems help to get the right information as and when required, identify to get

it and point out right sources to get it.

Therefore, analytics also helps in designing the policies faster based on the information

available in the organization as decision-makers work with the exploration services

available within the organization.

This also helps in gauging the business results by measuring

the different metrics generated with the help of analytics.

59

This gives the option through which the organization can increase the

profitability, reduces cycle times and reduce defects

9.7 esting under Cloud

Testing under cloud provides a good return on investment on moving typical

testing environment to cloud.

It allows flexibility to play with the surrogate of the real system without the actual

risk

9.7.1 Benefits

Cut capital and operational costs and not affect mission critical applications.

Offer new and innovative services to clients, and present an opportunity to speed

cycle of innovation and improve solution quality

Facilitate a test environment based on request and provide request-based service

for storage, network and OS.

9.7.1 Benefits

Cut capital and operational costs and not affect mission critical applications.

Offer new and innovative services to clients, and present an opportunityto speed

cycle of innovation and improve solution quality

60

Facilitate a test environment based on request and provide request-based service

for storage, network and OS

9.7.2 Value proposition

Business test cloud delivers an integrated, flexible and extensible approach to test

resource services and management with rapid time to value.

This is an end-to-end set of services to strategize, design and build request-driven

delivery of test resources in a cost-effective, efficient manner.

9.7.3 Biggest Benefitters

With ability to deploy virtual environments quickly and automatically and redirect

capacity as needed, cloud computing offers an ideal solution for testing and development.

10.Introduction to hadoop and Mapreduce:

Hadoop is an Apache open source framework written in java that allows distributed processing

of large datasets across clusters of computers using simple programming models. A Hadoop

frame-worked application works in an environment that provides distributed storage and

computation across clusters of computers. Hadoop is designed to scale up from single server to

thousands of machines, each offering local computation and storage. Hadoop is an Apache open

source framework written in java that allows distributed processing of large datasets across

clusters of computers using simple programming models. A Hadoop frame-worked application

works in an environment that provides distributed storage and computation across clusters of

computers. Hadoop is designed to scale up from single server to thousands of machines, each

offering local computation and storage.

61

Hadoop Architecture

Hadoop framework includes following four modules:

Hadoop Common: These are Java libraries and utilities required by other Hadoop

modules. These libraries provides filesystem and OS level abstractions and contains the

necessary Java files and scripts required to start Hadoop.

Hadoop YARN: This is a framework for job scheduling and cluster resource

management.

Hadoop Distributed File System (HDFS™): A distributed file system that provides

high-throughput access to application data.

Hadoop MapReduce: This is YARN-based system for parallel processing of large data

sets.

We can use following diagram to depict these four components available in Hadoop framework.

Since 2012, the term "Hadoop" often refers not just to the base modules mentioned above but

also to the collection of additional software packages that can be installed on top of or alongside

Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Spark etc.

MapReduce

62

Hadoop MapReduce is a software framework for easily writing applications which process big

amounts of data in-parallel on large clusters (thousands of nodes) of commodity hardware in a

reliable, fault-tolerant manner.

The term MapReduce actually refers to the following two different tasks that Hadoop programs

perform:

The Map Task: This is the first task, which takes input data and converts it into a set of

data, where individual elements are broken down into tuples (key/value pairs).

The Reduce Task: This task takes the output from a map task as input and combines

those data tuples into a smaller set of tuples. The reduce task is always performed after

the map task.

Typically both the input and the output are stored in a file-system. The framework takes care of

scheduling tasks, monitoring them and re-executes the failed tasks.

The MapReduce framework consists of a single master JobTracker and one

slave TaskTracker per cluster-node. The master is responsible for resource management,

tracking resource consumption/availability and scheduling the jobs component tasks on the

slaves, monitoring them and re-executing the failed tasks. The slaves TaskTracker execute the

tasks as directed by the master and provide task-status information to the master periodically.

The JobTracker is a single point of failure for the Hadoop MapReduce service which means if

JobTracker goes down, all running jobs are halted.

Hadoop Distributed File System

Hadoop can work directly with any mountable distributed file system such as Local FS, HFTP

FS, S3 FS, and others, but the most common file system used by Hadoop is the Hadoop

Distributed File System (HDFS).

The Hadoop Distributed File System (HDFS) is based on the Google File System (GFS) and

provides a distributed file system that is designed to run on large clusters (thousands of

computers) of small computer machines in a reliable, fault-tolerant manner.

HDFS uses a master/slave architecture where master consists of a single NameNode that

manages the file system metadata and one or more slave DataNodes that store the actual data.

63

A file in an HDFS namespace is split into several blocks and those blocks are stored in a set of

DataNodes. The NameNode determines the mapping of blocks to the DataNodes. The

DataNodes takes care of read and write operation with the file system. They also take care of

block creation, deletion and replication based on instruction given by NameNode.

HDFS provides a shell like any other file system and a list of commands are available to interact

with the file system. These shell commands will be covered in a separate chapter along with

appropriate examples.

How Does Hadoop Work?

Stage 1

A user/application can submit a job to the Hadoop (a hadoop job client) for required process by

specifying the following items:

1. The location of the input and output files in the distributed file system.

2. The java classes in the form of jar file containing the implementation of map and reduce

functions.

3. The job configuration by setting different parameters specific to the job.

Stage 2

The Hadoop job client then submits the job (jar/executable etc) and configuration to the

JobTracker which then assumes the responsibility of distributing the software/configuration to

the slaves, scheduling tasks and monitoring them, providing status and diagnostic information

to the job-client.

Stage 3

The TaskTrackers on different nodes execute the task as per MapReduce implementation and

output of the reduce function is stored into the output files on the file system.

Advantages of Hadoop

Hadoop framework allows the user to quickly write and test distributed systems. It is

efficient, and it automatic distributes the data and work across the machines and in turn,

utilizes the underlying parallelism of the CPU cores.

64

Hadoop does not rely on hardware to provide fault-tolerance and high availability

(FTHA), rather Hadoop library itself has been designed to detect and handle failures at

the application layer.

Servers can be added or removed from the cluster dynamically and Hadoop continues to

operate without interruption.

Another big advantage of Hadoop is that apart from being open source, it is compatible

on all the platforms since it is Java based.

Cloud Computing UNIT I

Short Answer Questions

1.Define parallel; computing.

2.Types of Distributed Computing on basis of architectural style.

3.Define cloud computing.

4.Define map reduce technique.

Descriptive Questions

1. Write short notes on compute services, Storage services and database

services. CO

2. What is Hadoop? How does it work? Explain the architecture of Hadoop

with neat diagram.

3. Write down the broad approaches of migrating into cloud?

4. What is virtualization? Explain the taxonomy of virtualization techniques.

Assignment Questions 1.Give the differences between parallel computing and distributed computing.

2.Give the deployment models of cloud .

3Give the architecture of cloud.

4.Write about Amazon cloud.

Objective questions

1. _________ model consists of the particular types of services that you can

access on a cloud computing platform. a) Service b) Deployment c) Application

d) None of the mentioned

65

2. Point out the correct statement : a) The use of the word “cloud” makes

reference to the two essential concepts b) Cloud computing abstracts systems

by pooling and sharing resources c) cloud computing is nothing more than the

Internet d) All of the mentioned

3. ________ refers to the location and management of the cloud’s

infrastructure. a) Service b) Deployment c) Application d) None of the

mentioned

4. Which of the following is deployment model ? a) public b) private c) hybrid

d) all of the mentioned

5. ________ as a utility is a dream that dates from the beginning of the

computing industry itself. a) Model b) Computing c) Software d) All of the

mentioned

6. Point out the wrong statement : a) All applications benefit from deployment

in the cloud b) With cloud computing, you can start very small and become big

very fast c) Cloud computing is revolutionary, even if the technology it is built

on is evolutionary d) None of the mentioned

7. ________ has many of the characteristics of what is now being called cloud

computing. a) Internet b) Software’s c) Web Service d) All of the mentioned

8. Which of the following is related to service provided by Cloud ? a) Sourcing

b) Ownership c) Reliability d) AaaS

9. The ________ cloud infrastructure is operated for the exclusive use of an

organization. a) Public b) Private c) Community d) All of the mentioned

10. A ____________ cloud combines multiple clouds where those clouds retain

their unique identities, but are bound together as a unit. a) Public b) Private c)

Community d) Hybrid

Answer the following questions either True or False:

11. Scalability in the cloud allows users to expand or contract when they need

to.

12. Cloud load balancers typically have built-in redundancy.

13. Cloud computing refers to applications and services that run on a

distributed network using virtualized resources.

14. Productivity is essential concept related to Cloud?

15. Virtualization cloud concept is related to pooling and sharing of resources.

16. Intranet can be identified as cloud.

17. Cloud computing is an abstraction based on the notion of pooling physical

resources and presenting them as a Virtual resource.

18. AWS is Cloud Platform by Amazon.

19. Deployment refers to the location and management of the cloud’s

infrastructure.

20. Amazon has built a worldwide network of data centers to service its search

engine.

66

UNIT TEST QUESTIONS

1.Give the different types of parallel computing.

2.Explain IAAS feature of cloud.

3.Describe about hadoop architecture.

Web Link:

https://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf

https://www.guru99.com/cloud-computing-for-beginners.html

PPTS

NA

Videos

NA

https://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf

https://www.guru99.com/cloud-computing-for-beginners.html

67

Documents

CLOUD COMPUTING - JBIET