Bringing FM and IT together – Volume II

8/13/2019 Bringing FM and IT together – Volume II

1/18

Copyright Quocirca © 2014

Clive Longbottom

Quocirca Ltd

Tel : +44 118 9483360

Email: [email protected]

Bringing FM and IT together – Volume II

A series of ar ti cles br ought together from Quoc ir ca ’ s writings for SDC during

2013

January 2014

Quocirca continued to write articles for SearchDataCenter throughout 2013,

looking at how facilities management (FM) and information technology (IT)

professionals were needing to work together more closely than ever. This

report pulls those articles together as a two-volume set for ease of access.

mailto:[email protected]:[email protected]:[email protected]:[email protected]


2/18

Bringing FM and IT together –

Volume II

© Quocirca 2014 - 2 -

Bringing FM and IT together – Volume II A series of articles brought together from Quocirca’s writings for SDC during 2013

Designing the data

centre for tomorrow

Starting a new data entre design now is likely to be different to any data centre that you may

have been involved with in the past. A new way of thinking is required; one where the facility

can be just as flexible as the IT equipment held within it.

DCIM – a story of

growing, overlapping

functionality

Data centre infrastructure management (DCIM) software has come a long way. It is now

beginning to threaten existing IT systems management software in many ways – and this could

be where it runs into trouble. Just how far can, and should, DCIM go?

The data centre for

intellectual property

management.

It may have cost millions to build your latest data centre, and it may house some expensive

equipment. However, none of this really matters to the organisation. It is the intellectual

property that matters to it – and the IT platform and the facility it is in must be architected to

reflect this.

The data centre and

the remote office.

Organisations are increasing diffuse – decentralised through remote offices with mobile and

home workers. Yet all of these need to be well-served by the organisation’s technology. How

can this be best provided without breaking the bank – or compromising security?

Cascading data

centre design

Are you a “server hugger”? If so, probably time to review your position and your worth to the

organisation. The future will be around hybrid IT with different data centre facilities playing their

part. Some of these may be under your control; others will be under various levels of control

from others.

Disaster recovery and

business continuity –

chalk and cheese?

Disaster recovery and business continuity should not be treated as a single entity. These two

distinct capabilities need their own teams working on them to ensure that the organisation gets

an overall approach that meets its own risk profile and is managed within the cost parameters

that the business can operate in.

Managing IT –

converged

infrastructure,

private cloud or

public cloud?

IT can be architected in many different ways. In some cases, a physical, one-application-per-

(clustered)server may still be the way forwards, whereas for others, it may be virtualisation or

cloud. The hardware that underpins these different approaches may also be changing, from rack-

mount, self-built systems to modular converged systems. Cloud computing throws more

variables into the mix – just what is an organisation to do?

The Tracks of my

Tiers

There is a concept of the “tiering” of data centres which can be used by an organisation to see

whether an external facility will offer the levels of availability that it requires for certain IT

workloads. What are these tiers, and what do they mean to an organisation?

What to look for

from a Tier III data

centre provider

If you decide to go for an Uptime Institute Tier III data centre provider, what should you look out

for? As accredited Tier III facilities are few and far between, are there other things that can be

looked out for that will enable a non-Uptime Institute facility to be chosen instead?


3/18


Volume II

© Quocirca 2014 - 3 -

Designing the data centre for tomorrow

Since the dawning of the computer era in the 1960s, data centre design has essentially been evolutionary. Sure, therehave been moves from mainframes to distributed computers; from water cooling to air cooling; from monolithic UPS

and power systems to more modular approaches. Yet the main evolution has been from small data centres to large

data centres.

Even where an organisation comes to the conclusion that the cost of the next data centre is too large for itself, the

move has been to a co-location facility where future growth can be allowed for.

Now, the world is changing. Application rationalisation, hardware virtualisation and consolidation have led to

organisations finding themselves with a large facility and a need to only house 50% or less of what they were running

previously. New, high-density server, storage and network equipment, along with highly engineered systems, such as

VCE VBlocks, Cisco UCSs, IBM Pure Systems and Dell Active Systems mean that less space is required for more effective

direct business compute power.

And then, cloud computing comes in. Suddenly, data centre managers and systems architects are no longer just

having to decide how to best support a workload, but also through what means. A workload that would normally sit

on a stack totally owned by the organisation may now be put into co-location, or be outsources through infrastructure,

platform or software as a service (I, P or SaaS).

Even where decisions are made to keep specific workloads in-house, it makes no sense to design a data centre to

house that workload in the long term. Cloud is still an immature platform, but within the next few years, it is likely to

become the platform of choice for the mainstream, and those organisations that have built a data centre for hosting

a specific application over a long period of time could see themselves at a disadvantage.

To design a data centre for the future, there are the two parts to consider – the facility itself and the IT hardware that

it houses. From a hardware point of view, a full IT lifecycle management (ITLM) approach can ensure that a dynamic

infrastructure is maintained (reference previous article on ITLM). Use of the hardware assets can grow and shrink as

the needs change, with excess equipment being sold to recoup some cost. Through the use of subscription pricing,

software licenses can also be controlled, through signing up or shutting down subscriptions as required.

The main issues revolve around the facility. A data centre is a pretty fixed chunk of asset – if it is built to house 10,000

square feet of space and the business finds that it only needs 5,000 square feet, the walls cannot be that easily moved

to serve only this area. Even where new walls can be implemented, for example to create new office space, this is

only a small part of the problem solved.

A data centre facility is often built with a designed and relatively fixed layout for the services offered. Power

distribution units will be hard-fitted to the walls and other areas of the data centre; CRAC-based cooling systems will

be fixed to the roof in specific places and UPSs and auxiliary generators will be sized and sited to suit the original data

centre design.

So, a new approach to the facility is the key to designing and building tomorrow’s data centre.

The first place to start is with the physical design. If sloping sub floors and raised data centre floors are preferred to

deal with any flooding issues (either natural or through the use of liquid-based fire suppressant), then make this

multiple gullies, rather than a single “V” shaped system. This way, if downsizing is required, there will be the raised

walls marking off each gulley that can be used to build new walls from without impacted the capabilities of the sub

floor to allow drainage for the data centre itself.


4/18


Volume II

© Quocirca 2014 - 4 -

Next is the cabling within the data centre. This will need to be fully structured, with data and power being carried

through separate paths and with an easy means of re-laying any cables should the layout of the data centre change.

Then, there is power distribution itself. Rather than build these against walls or pillars, it may be better to make them

free standing with power feeds coming from structured cabling from the roof. This way, should a redesign be

required, the power distribution is as flexible as the rest of the IT equipment and can be easily relocated.

With cooling systems, a move to free-air cooling or other low-need systems will mean that less impact will be felt in

redesigning the cooling when the data centre changes size. If combined with effective hot and cold aisle approaches

with ducted cooling, the cooling system can be better sized appropriately and placement is less of an issue.

Even where a CRAC-based system is perceived to be needed, a move to a more modular system with multiple,

balanced, variable speed CRAC units will make life easier if the data centre needs to be resized.

The same goes for UPSs and auxiliary generators – a monolithic system could leave an organisation looking at a need

to buy a completely new unit if the needs of the data centre changes, or having a massively over-engineered system

in place if they carry on using the same old UPS or generator when the data centre shrinks. As most UPS systems used

these days will be in-line, every single percentage loss of efficiency could be against the rating of the UPS – not againstthe actual power used by the equipment in the data centre. With a generator, its fuel usage will be pretty much in

line with its rating, so even when running below its rated power, it will use a lot more fuel than one which is correctly

engineered for the task.

If your organisation is reaching a point where a new data centre is seen to be a necessity, bear in mind that the IT

world is going through a bigger change than it has ever done before. Planning to embrace this change will save money

in the mid- to long-term, and it will provide a far more flexible platform for the business.

DCIM – a story of growing, overlapping

functionality An organisation’s technical environment can be seen to be of two distinct parts – the main IT components of servers,

storage and networking along with the facility or facilities that are required to house the IT. Historically, these two

areas have fallen under the ownership and management of two different groups: IT has fallen under the IT

department while the facility has fallen under facilities management (FM) group.

This leads to problems as FM tend to see the data centre as just another building to be managed alongside all the

other office and warehouse buildings, whereas IT tend to see the data centre as the be all and end all of their purpose

in life. One group’s priorities may not match with the other groups – and the language that each group speaks can be

subtly (or not so subtly) different.

Another problem is emerging due to cloud. In the past, the general direction for a data centre has been for it to grow

as the business grows; cloud can now make it that the IT equipment required within the data centre could shrink

rapidly as workloads are pushed out to public cloud – and yet managing this where the facilities equipment (such as

UPS, CRAC units and power distribution systems) may be monolithic items.

In order to ensure that everything runs optimally and supports the business in the way that is required, a single form

of design, maintenance and management is required that pulls FM and IT together that also enables “what-if”

scenarios to be run so that future planning can be carried out effectively. This has been emerging over the past few

years as data centre infrastructure management (DCIM).


5/18


Volume II

© Quocirca 2014 - 5 -

DCIM systems started off as far more of a tool for the FM team as more of a part of a building information modelling

(BIM) tool. BIM software enables a building to be mapped out and the major equipment to be placed within a physical

representation, or schematic of the facility. DCIM made this specific to the needs of a data centre, holding information

about power distribution, UPS and cooling systems, along with power cabling and environmental monitoring sensors

and so on. The diagrams could be printed out for when maintenance was required, or given to the IT team so thatthey could then draw in the IT equipment knowing where the facilities bits were.

It soon became apparent that allowing the IT equipment to be placed directly in the schematic was useful for both IT

and FM. This led to a need for DCIM systems to bring in asset discovery systems alongside databases of the physical

appearance and the technical description of the IT equipment so that existing data centre layouts could be more easily

created.

This brought DCIM systems into competition with the asset discovery and management systems that were part of the

IT systems management software. Interoperability between the two systems is not always available, yet a common

database, along the lines of a change management data base (CMDB) makes sense to provide a single true view of

what is in a data centre.

A differentiation between DCIM systems is often how good their databases of equipment are – some will not be

updated with new equipment details on a regular basis; others will use “plate values” for areas such as power usage.

The difference between using a plate value (just taking the rated power usage) and the actual energy usage measured

in real time can be almost an order of magnitude, which can lead to over-engineering of power, backup and cooling

systems.

2D schematics have moved over in many DCIM systems to be 3D so that rack-based systems can be engineered in situ

and viewed from multiple directions to make sure that pathways for humans remained traversable. 3D schematics

also allow for checking to see if new equipment can be brought directly into a spot in the data centre, or if there are

too many existing objects in the way.

From this came the capability to deal with “what if?” scenarios. For example, would placing this server in this rack

here cause an overload on this power distribution block? Would placing these power transformers here cause a hot

spot that could not be cooled through existing systems? Again, such capabilities help both FM and IT work together

to ensure that the data centre is optimally designed and gives the best support to the business.

With 3D visual representations and granular details of the systems involved along with real time data from

environmental sensors, the use of computational fluid dynamics then comes into play. Using empirical data from the

DCIM system to see what happens to cooling flows as systems are changed and new equipment added ensures that

hot spots are avoided right from the start.

The problems for DCIM lay mainly in trying to give a single tool that covers two different groups. The FM team will

often have their own BIM systems in place, and see the data centre as “just another building” with a few additional

needs. To the IT team, the data centre is the centre of their universe, but they tend to see it as a load of interesting

bits surrounded by a building. The need for the two teams to not only talk, but work from common data sources to

create an optimal solution is not always seen as a priority. Even where DCIM is seen as being a suitable way forward,

there will be a need to integrate it into existing systems so as not to replicate too much and create a whole new set

of data silos.

Vendors have also been part of the problem – the main IT vendors have been poor on covering the facility, preferring

to stick with archetypal systems management tools that just look at the IT equipment. It has been down to the

vendors of the UPSs and other “facilities” equipment alongside smaller new-to-market vendors to come up with full-

service DCIM tools and try and create a market.


6/18


7/18


Volume II

© Quocirca 2014 - 7 -

Public free cloud – SaaS or function as a service (FaaS), such as Google or Bing Maps where a function is

taken on a best efforts support basis.

This mix of data centres also leads to a mix in areas where data and information will lie. No longer can an organisation

simply centralise all its data into a single storage area network (SAN).

On top of this is the lack of capability for the organisation to draw a line around a specific group of people and say

“this is the organisation”. The need for organisations to work across an extended value chain of contractors,

consultants, suppliers (and their suppliers), logistics companies, customers (and often their customers) means that

data and information flows are often moving into area where the organisation has less control.

This is all made more complex through the impact of bring your own device (BYOD). The unstoppable tide of end

users expensing their own devices and expecting them to work with the enterprise’s own systems, and then

downloading consumer apps from appstores and so creating data and information in extra places unknown to the IT

department means that the value of data and information is being increasingly diluted.

IT now has to accept that the data centre itself is just part of the equation, and start to move to a model that pays far

more attention to the data and information the organisation is dependent upon.

To manage this, it is a waste of time looking at how firewalls should be deployed – after all, just where should this

wall be positioned along the extended value chain? It is equally wrong to look at applying security just at the

application or hardware levels, as as soon as someone manages to breach that security, they will have free will to

roam around the rest of the information held in that information store.

No, data and information now has to be secured and managed at a far more granular level, with users being identified

by different types through them as an individual, to their role within a team to their level of corporate security

clearance. On top of this needs to be contextual knowledge, such as where the person is accessing the data from and

from what sort of device. Then the data itself needs to be classified against an agreed security taxonomy – which

could be as simple as tagging data and information as being “Public”, “Commercial in confidence” or “For your eyes

only”.

Touchpoints need to be implemented such that the organisation can see who is attempting to access information

assets – this is best done through virtual private networks (VPNs) and hybrid virtual desktops, which can enforce the

means in which corporate assets are accessed. Through these touch points, information security such as encryption

of data at rest and on the move, along with data leak prevention (DLP) and digital rights management (DRM) can be

applied alongside information rules based on access rights for the person and their context.

Mobile device management (MDM) can help to keep an eye on what devices are attaching to the network, and can

help to air lock them from full access to systems until appropriate identification of the individual using the device has

been made. This may require multi-level identification going well beyond the normal challenge and response

username/password pair, maybe to include single use access codes or biometrics.

All of this then means that information assets are only accessible by the right people in the right place. Even ifsomeone else can get hold of the digital representation of the asset, it will still be useless to them, as it will be

encrypted and controlled by a DRM certificate where necessary.

All of this needs changes in how the data centre operates – each aspect of the above will require new systems, new

applications and agreements with the business of what information security means to them. Much of this can now

be done outside of the corporately owned data centre – managed security providers are appearing which can provide

the functions required on a subscription basis without the need for massive capital investment by your organisation.

The heading to this article was “The data centre for intellectual property management”. As such, the title is

completely wrong. What needs to be put in place is an architectural platform for intellectual property management


8/18


Volume II

© Quocirca 2014 - 8 -

– and this will transcend the single facility and move far over into a hybrid mix of needs across a range of different

facilities.

The data centre and the remote office. The remote office has always been a bit of a problem when it comes to technology. The employees in these offices

are still just as dependent on technology as their counterparts in the main offices – but they have little to no

qualifications to look after any technology that is co-located with them. Therefore, the aim has tended to be to

centralise the technology and provide access to the remote employees as required.

This has not tended to work well. Slow response and poor connectivity availability has pushed users away from

sticking with the preferred centralised solution, instead working around the systems with processes and solutions

they have chosen themselves.

As bring your own device (BYOD) has become more widespread, each individual has become their own IT expert –

unfortunately, with a little knowledge being a dangerous thing. The choice of consumer apps to carry out enterprise

tasks is leading to a new set of information silos – ones that central IT has no capability of managing; ones where

pulling together the disparate data for corporate analysis and reporting is impossible.

Architecting a new platform that meets everyone’s needs should now be possible – it just requires a little bit of give

and take.

Each individual has to accept that what pays their salary is a much greater entity – the organisation. If they do not

work in a manner that helps the organisation, the capability for the salary to be paid could be impacted. Therefore,

working in a manner that is organisation-centric is a basic requirement of having a job – and I don’t care that the

millennials scream that they won’t work for any organisation that doesn’t allow them 7 hours a day time for posting

on Facebook.

What IT has to look at is how best to put in place the right platform to meet the organisation’s and the individual’s

needs. This should start with a need for centralisation of the data – as long as the organisation can access all data and

information assets, it can analyse these and provide the findings through to those in the organisation who can then

make better informed decisions against all the available information.

Therefore, data and files should be stored within a single place – eventually. This does not stop enterprise file sharing

systems, such as Huddle, Box or Citrix ShareFile from being used; it just means that the information held within these

repositories needs to be integrated into the rest of the platform. Capturing the individual’s application usage is

important – being able to steer them in the direction of corporate equivalents of consumer applications can help

minimise problems at a later date when security is found to be below the organisation’s needs, or the lack of the

capability to centralise data leads to a poor decision being made.

It may well be that remote users would be best served through server-based computing approaches such as virtual

desktop infrastructure (VDI). Using modern acceleration technologies such as application streaming or Numacent’s

Cloudpaging will provide very fast response for the remote user, while allowing them to travel between remote offices

and larger offices and still have full access to their specific desktop. Citrix, Centrix Software and RES Software also

provide the capabilities for these desktops to be accessible from the user’s BYOD devices – and apply excellent levels

of enterprise security to the system as well. What an organisation should be looking for is the capability to “sandbox”

the device – creating an area within any device which is completely separate to the rest of it. Through this means,

any security issues with the device can be kept at bay; enterprise information can be maintained within the sandbox

with no capability for the user to cut and paste from the corporate part to the consumer part of the device. Should

the user leave the organisation, the sandbox can be remotely wiped without impacting the user’s device itself.


9/18


Volume II

© Quocirca 2014 - 9 -

For remote offices of a certain size or which are in a geographic location where connectivity may be too slow for a

good end user experience, it may be that a “server room” may be warranted to hold specific applications that the

office needs, and maybe to run desktop images for them locally. Data and information created can be replicated in

the background, using WAN acceleration from the likes of Veeam, Symantec, Silver Peak or Riverbed, ensuring that it

is still all available centrally.

Where such a server room is put in place, it is important to ensure that it can be run “lights out” from a more central

place. Depending on the person at the remote office who may have the biggest PC at home is no way to support a

mission critical or even business important environment. Dedicated staff with the right qualifications must be able to

log in remotely and carry out actions on the systems as required. Wherever possible, patching and updating should

be automated with full capability to identify which systems may not be able to take an upgrade (for example due to

a lack of disk space or an old device driver) and either remediate the issue or roll back any updates as required. Here,

the likes of CA and BMC offer good software management systems built around solid configuration management

databases (CMDBs).

The increasing answer for many organisations, however, is to outsource the issue. As systems such as Microsoft’s

Office 365 become more capable, many service providers are offering fully managed desktops that provide a full officesuite, along with Lync telephony, alongside other software. Some offer the capability for organisations to install their

own software on these desktops, so enabling any highly specific applications, such as old accountancy or engineering

software packages to be maintained for any individual’s usage. Cloud-based service providers should be able to

provide greater levels of systems availability and better response times and SLAs through their scalability – and should

be better positioned to maintain their platforms to a more up-to-date level.

With connectivity speed and availability improvements continuing, a centralised approach to remote offices should

be back on the data centre manager’s agenda. However, the choice has to be as to how that centralisation takes

place. For the majority, the use of cloud-base service provision of a suitable platform will probably be better than the

use of a server room or centralisation directly to an existing owned data centre. Quocirca’s recommendation is to

look to outsourcing wherever possible: use the existing data centre for differentiated core, mission critical activities

only.

Cascading data centre design

Back in the early days of computing, a concept of “time sharing” was common. Few organisations could afford the

cost or had the skills to build their own data centre facility, and so they shared someone else’s computer in that

organisation’s facility through the use of terminals and modems.

As computing became more widespread, the use of self-owned, central data centre facilities became more the norm.

The emergence of small distributed servers led to server rooms in branch offices – and even to servers under an

individual’s desk. Control of systems suffered; departments started to buy their own computer equipment andapplications. The move to a fully distributed platform was soon being pulled back together to a more centralised

approach – but often with a belt and braces, sticking plaster result. The end result for many was a combination of

multiple different facilities, each running to different service levels with poor overall systems utilisation and a lack of

overall systems availability.

Virtualisation – touted as the ultimate answer – may just have made things worse, as the number of virtual machines

(VMs) and live applications not being used have spiralled out of control. Cloud computing – again, another “silver

bullet” – means that the organisation is now having to deal not only with its own issues around multiple facilities, but

also other organisations’.


10/18


Volume II

© Quocirca 2014 - 10 -

Increased mobility of the workforce, both through home working and the needs of the “road warrior” has led to a

need for “always on” access to enterprise applications – and also to a bring your own device (BYOD) appetite for using

apps from other environments.

It’s all a bit of a mess. Just what can be done to ensure that things get better, not worse?

The first thing that has to be done is a full audit of your own environment. Identify every single connected server

within your network, and every single application running on them – there are plenty of tools out there to do this.

Once you have this audit, you will need to identify the physical location of each server. This may be slightly more

difficult, but there is one way that is pretty effective where you cannot identify exactly where a server is. Deny access

for it to the main network – within a few minutes, there will be a call to the help desk from a user complaining: they

will know where it is.

Now you have a physical asset map of where the servers are, and you know what applications are running on them.

First, identify all the different applications that are doing the same job. You may find that you have three or four

completely different customer relationship management (CRM) systems. Make sure that you identify your strategic

choice, and arrange with those using the non-strategic systems to migrate over as soon as possible. Now, identify all

the different instances of the same application that are running. Consolidate these down as far as possible – there

may be 5 different instances of the same enterprise resource planning (ERP) application in place.

Such functional redundancy is not just bad for IT in the cost of servers, operating system licences, maintenance and

power that are required to keep them running, but also for the business. These systems will generally be running

completely separate to each other, and this means that the business does not have a single view of the truth.

Consolidation has to be carried out – for everyone’s sake.

At this point, you have a more consolidated environment, but there will still be lots of applications that are being run

by the organisation that could be better sourced through a SaaS model. Software that is providing functionality that

is highly dependent on domain expertise – for example, payroll and expense management - is much better outsourced

to a third party, as they can ensure that all the legal aspects of the process.

This then leads to dealing with your organisation’s overall IT platform in a more controlled yet flexible manner.

The overall internal IT platform for the organisation should be smaller than it was previously. Consolidation,

particularly when carried out with a fully planned virtualisation strategy, should reduce the amount of IT equipment

required by up to 80%. All the equipment can now be placed where you want it. But – should this be all in an owned

facility?

Probably not. There are problems in building and managing a highly flexible data centre. Power distribution and

cooling tend to be designed and implemented to meet specific needs. Further shrinkage of internal platform can lead

to issues with the facility’s power utilisation effectiveness (PUE) score growing, rather than shrinking. The always-on

requirement means that multiple different connections from the facility to the outside world will be required.

No – a cascade design of data centres is what is required. There may be applications that for any reason (long-term

investment in the application and/or IT equipment, fears over data security) will be required to remain in an owned

facility. There will be many more applications that can be placed into a co-location facility. Here, someone else is

providing and managing the facility – they have the responsibility for connectivity, cooling, power distribution and so

on. You just have to manage the hardware and software in your part of the facility. Should your needs grow, the

facility owner can give you more space, power and cooling. Should your needs shrink, then you can negotiate a smaller

part of the facility.

SaaS based solutions take this even further – you have no responsibility whatsoever for the facility, hardware or

software. This is all someone else’s problem: you can concentrate on the business’ needs.


11/18


Volume II

© Quocirca 2014 - 11 -

Ensuring that a cascaded data centre design works, consisting of an owned facility in conjunction with a co-location

facility and public cloud functionality, means having in place the right tools to manage the movement of workloads

from one environment to another. It also requires effective monitoring of application performance with the capability

to switch workloads around to maintain service levels. The more that is kept within an owned facility, the more

availability becomes an issue, and multiple connections to the outside world will be required.

However, getting it right will provide far greater flexibility at both a technical and business level. Quocirca strongly

recommends that IT departments do start on this process: start carrying out a complete and effective audit now –

and plan as to how your IT platform will be housed and managed in the years to come.

Disaster recovery and business continuity –

chalk and cheese?

Most organisations will have an IT disaster recovery (DR) plan in place. However, it was probably created some timeback and will, in many cases, be unfit for purpose.

The problem is that DR plans have to deal with the capabilities and constraints of a given IT environment at any one

time, so a DR plan created in 2005 would hardly be designed for virtualisation, cloud computing and fabric networks.

The good thing is that the relentless improvements in IT have created a much better environment – one where the

focus should now really be away from DR to business continuity (BC).

At this stage, it is probably best to come up with a basic definition of both terms so as to show how they differ.

Business Continuity – a plan that attempts to deal with the failure of any aspect of an IT platform in a manner

that still retains some capability for the organisation to carry on working.

Disaster recovery – a plan that attempts to get an organisation back up and working again after the failure

of any aspect of an IT platform.

Hopefully, you see the major difference here – BC is all about an IT platform coping with a problem: DR is all about

bringing things back when the IT platform hasn’t coped.

Historically, the costs and complexities of putting in place technical capabilities for BC meant that only the richest

organisations with the strongest needs for continuous operation could afford BC: now, it should be within the reach

of most organisations; at least to a reasonable extent.

Business continuity is based around the need for a high availability platform, something that was covered in an earlier

article (insert link to “Uptime – the heart of the matter”). By the correct use of “N+M” equipment alongside well

architected and implemented virtualisation, cloud and mirroring, an organisation should be able to ensure that some

level of BC can be put in place to provide BC for the majority of cases.

Note the use of the word “majority” here. Creating a full BC-capable IT platform is not a low-cost project. The

organisation must be fully involved in how far the BC approach goes – by balancing its own risk profile against the

costs involved, it can make the decision as to at what point a BC strategy becomes too expensive for the business to

fund.

This is where DR still comes in. Let’s assume that the business has agreed that the IT platform must be able to survive

the failure of any single item of equipment in the data centre itself. It has authorised the investment of funds for an

N+1 architecture at the IT equipment level, and as such, the IT team has now got one more server, storage system

and network path per system than is needed. However, as the data centre is based on monolithic technologies, the


12/18


Volume II

© Quocirca 2014 - 12 -

costs of implementing an N+1 architecture around the UPS, the cooling system and the auxiliary generation systems

were deemed too high.

Therefore, the DR team has to look at what will be needed should there be a failure of any of these items, as well as

what happens if N+1 is not good enough.

The first areas that have to be agreed with the business are around how long it will take to get to a specified level of

recovery of function, and what that level of function is. These two points are known as the recovery time objective

(RTO) and the recovery point objective (RPO). This is not something that an IT team should be defining – the business

has to be involved and must fully understand what the RTO and RPO mean. In particular, the RPO defines how much

data has to be accepted as being lost – and this could have a knock-on impact on how the business views its BC

investment.

For example, in an N+1 architecture, the failure of a single item will have no direct impact on the business, as there is

still enough capacity for everything to keep running. Should a second item fail, then the best that will happen is that

the speed of response to the business for the workload or workloads on that set of equipment will be slower. The

worst that can happen is that the workload or workloads will fail to work. In the former case, the RPO will be to regain

the full speed of response within a stated RTO – which would generally be defined as the time taken for replacement

equipment to be obtained, installed and fully implemented. Therefore, the DR plan may state that a certain amount

of spares inventory have to be held, or that agreements with suppliers have to be in place for same-day delivery of

replacements – particularly for the large monolithic items such as UPSs. The plan must also then include all the steps

that will be required to install and implement the new equipment – and the timescales that are acceptable to ensure

that the RTO is met.

In the latter case where the workload has been stopped, then the RPO has to include a definition of the amount of

data that could be lost over specified periods. In most cases this will be per hour or per quarter hour; in high-

transaction systems, it could be per minute or per second. The impact on the RTO is therefore dependent on the

business’ view of how many “chunks” of data loss it believes it can afford. The DR team has to be able to provide a

fully quantified plan as to how to meet the RPO within the constraints of the business-defined RTO – and if it is aphysical impossibility to balance these two, then it has to go back to the business which will have to decide whether

to invest in a BC strategy for this area, or to lower its expectations on the RPO so that a reasonable RTO can be agreed.

In essence, BC has to be the main focus for a business: it is far more important to create and manage an IT platform

in a manner for the organisation to maintain a business capability. The DR plan is essentially a safety net: it is there

for when the BC plan fails. BC ensures that business continues, even if process flows (and therefore cash flows) are

impacted to an extent. DR is there to try and stop a business from failing: as a workload has or workloads have been

stopped, the process flows are no longer there.

The two elements of BC and DR are critical to have within an organisation – the key is to make sure that each

compliments and feeds into and back against each other to ensure that there are no holes in the overall strategy.

Managing IT – converged infrastructure,

private cloud or public cloud?

The days of taking servers, storage and network components, putting them together and running applications on

them on an essentially physical one-to-one basis is rapidly passing by. The uptake of virtualisation means that

workloads are sharing many resources, and the emergence of “as a service” means that the underlying resources have

to be more flexible and easy to implement and use than ever before.


13/18


Volume II

© Quocirca 2014 - 13 -

However, this still leaves a lot of choice to an end-user organisation. Should they go for a converged or engineered

system, such as a Cisco UCS, an IBM PureSystem or a Dell VRTX, or should they go for a more cloud-like scale out

model based on “commodity” servers? Should they go the whole way and forget about the physical platform itself

and just go for a public cloud based service?

Each has its own strengths – and its own weaknesses.

Converged systems take away a lot of the technical issues that organisations run up against when attempting to use

a pure scale out approach. By engineering from the outset how the servers, storage and networking equipment within

a system will work together, the requirements for management are simplified. However, expansion is not always

easy, and in many cases may well require over-engineering through implementing another converged system

alongside the existing one just to gain the desired headroom. There is also the issue of managing across multiple

systems – this may not be much of a problem if a homogeneous approach is taken, but if the fabric network consists

of multiple different vendors, or if there are converged systems from more than one vendor in place, it may be difficult

to ensure that everything is managed as expected.

A private cloud environment may then be seen as a better option. Although private cloud can (and generally should)

be implemented on converged systems, the majority of implementations Quocirca sees are based around the use of

standard high volume (SHV) servers built into racks and rows with separate storage and network systems. Adding

incremental resources is far simpler in this approach – new servers, storage and network resources can be plugged in

and embraced by the rest of the platform in a reasonably simple manner, provided that the management software

has the capabilities contained within it.

Provided that this right management software is in place, this can work. However, skills will be required that

understand not only the technical aspects of how such a platform works, but also expertise in areas such as how to

populate a rack or a row in such a manner so as not to cause issues through hot spots or a requirement to draw too

much power through any one spot.

Those choosing either of these paths must also make sure that any management software chosen does not just focus

on one aspect of the platform: the virtual environment has dependencies on the physical, and these must be

understood by the software. For example, the failure of a physical disk system will impact any virtual data stores that

sit on that system: the management software must be able to understand this – and ensure that backups or mirrors

are stored on a different physical system. It must also ensure that on any failure, the aim is for business continuity,

minimising any downtime and automating recovery as much as possible through the use of hot images, data mirroring

and network path virtualisation across multiple physical network interface cards (NICs) and connections.

Public cloud, whether this is infrastructure, platform or software as a service (I/P/SaaS) would seem to offer the means

for removing all the issues around needing to manage the platform. However, ensuring that you have visibility at the

technical level can help in seeing if there any trends that your provider has missed (for example, are storage resources

running low, is end-user response suffering?) and is needed to help lay the “what if?” scenarios that organisations

need to be able to run these days.

In reality, the majority of organisations will end up with a hybrid mix of the above options. This brings in further issues

– whereas a single converged system may be pretty much capable of looking after itself, once it needs to interact with

a public cloud system, extra management services will be required.

Whatever platform an organisation goes for, the software really should be capable of looking at the system from end-

to-end. To the end user, one major issue will always be the response of a system. A converged system will report

that everything is running at hyper-speed, as it tends to look inwardly and will be monitoring performance at internal

connection speeds. The end user may be coming in from a hand-held device over a public network and using a mix

of functions from the converged system and a public cloud: the management software must be able to monitor all of

this and be able to understand what is causing any problems. It must then be able to try and remediate the problem


14/18


Volume II

© Quocirca 2014 - 14 -

– for example, by using a less congested network, by offloading workload to a different virtual machine or by applying

more storage. It must understand that by providing more network, this could mean that the higher IOPS could require

a different tier of storage to be used, or for more virtual cores to be thrown at the server. All of this needs to be

carried out in essentially real time – or as near as makes no difference to the end user.

The existing systems management vendors of CA, IBM and BMC are getting there with their propositions, with HP

lagging behind. The data centre infrastructure management (DCIM) vendors, including nlyte and Emerson Network

Power are making great strides in adding to existing systems management tools through including monitoring and

management of the data centre facility and its equipment into the mix. EMC is making a play for the market through

its software defined data centre (SDDC) strategy, but may need to be bolstered by a better understanding of the

physical side as well as the virtual.

One things is for sure – the continued move to a mix of platforms for supporting an organisation’s needs will continue

to drive innovation in the systems management space. For an IT or data centre manager, now is the time to ensure

that what is put in place is fit for purpose and will support the organisation going forward no matter how the mix of

platforms evolves.

The Tracks of my Tiers

It’s time for change. Your old data centre has reached the end of the road, and you need to decide whether to build

a new one or to move to a co-location partner. What should you be looking for in how the data centre is put together?

Luckily, a lot of the work has already been done for you. The Uptime Institute (uptimeinstitute.com) has created a

simple set of tiering for data centres that describes what should be provided in the areas of overall availability through

a particular technical design of a facility.

There are four tiers, with Tier I being the most simple and least available, and Tier IV being the most complex and

most available. The Institute uses Roman numerals to try and avoid facility owners trying to say that they exceed one

tier but aren’t quite the next tier and using nomenclature of, for example, “Tier 3.5”. However, Quocirca has seen

instances of facility owners saying that they are “Tier III+”, so the plan hasn’t quite worked.

It would be fair to say that in most cases, costs also reflect the tiering – Tier I should be the cheapest, with Tier IV

being the most expensive. However, this is not always the case, and a well implemented, well run Tier III or IV facility

could have costs that are comparable to a badly run lower Tier facility.

A quick look at the tiers gives the following as basic descriptors, with each tier having to meet or exceed the

capabilities of the previous tier:

Tier I: Single non-redundant power distribution paths for serving IT equipment with non-redundant capacity

components, leading to an availability target of 99.671%. Capacity components are items such as UPS,cooling systems, auxiliary generators and so on. Any failure of a capacity component will result in downtime,

and scheduled maintenance will also require downtime.

Tier II: A redundant site infrastructure with redundant capacity components, leading to an availability target

of 99.741%. The failure of any capacity component can be manually managed by switching over to a

redundant item with a short period of downtime, and scheduled maintenance will still require downtime.

Tier III: Multiple independent distribution paths serving IT equipment; at least dual power supplies for all IT

equipment; leading to an availability target of 99.982%. Planned maintenance can be carried out without

downtime. However, a capacity component failure still requires manual switching to a redundant

component and will result in downtime.


15/18


Volume II

© Quocirca 2014 - 15 -

Tier IV: All cooling equipment to be dual powered; a complete fault tolerant architecture leading to an

availability target of 99.995%. Planned maintenance and the failure of a capacity component are dealt with

through automated switching to redundant components. Downtime should not occur.

Bear in mind that these availability targets are for the facility – not necessarily for the IT equipment within there.

Organisations must ensure that the architecture of the servers, storage and networking equipment, along withexternal network connectivity provide similar or greater levels of redundancy to ensure that the whole platform meets

the business’ needs.

The percentage facility availabilities may seem very close and very precise – however, a Tier I facility will allow for the

best part of 30 hours of downtime per annum, whereas a Tier IV facility will only allow for under half an hour.

The majority of Tier III and IV facilities will have their own internal targets of zero unplanned downtime, however –

and this should be an area of discussion when talking with possible providers or when designing your own facility.

It is tempting to look at the Tiers as a range of “worst -to-best” facilities. However, it really comes down more to the

business requirements that drive the need. For example, for a sub-office using a central data centre for the majority

of its critical needs, but having an on-site small server room for non-critical workloads, a Tier III data centre could be

overly expensive for its needs, and a Tier I or Tier II facility could be highly cost-effective. Although Tier I and Tier II

facilities are not generally suitable for mission critical workloads, if there are over-riding business reasons and the

risks are fully understood and plans are in place to manage how the business continues during downtime, then Tier I

could still be a solution.

It is Tiers III and IV where organisations should be looking for placing their more critical workloads. Tier III facilities

will still require a solid set of procedures in how to deal effectively with capacity component failures, and these plans

will need to be tested on a regular basis. Even with Tier IV, there is no case for assuming that everything will always

go according to plan. A simple single redundancy architecture (each capacity component being backed up by one

more) can still lead to non-availability. If a single capacity component fails, the facility is now back down to a non-

redundant configuration. If the failed component cannot be replaced rapidly, then a failure of the active component

will result in downtime.

Therefore, plans have to be in place as to whether replacement components are held in inventory, or whether there

is an agreement in place with a supplier to get a replacement on site – and probably installed by them – within a

reasonable amount of time. For a Tier IV facility, this should be measured in hours, not days.

If designing your own facility, the Uptime Institute’s facility Tiers give a good basis for what is required to create a

suitable data centre facility with requisite levels of availability around the capacity components. It will not provide

you with any reference designs – areas such as raised v. solid floors, in-row v. hot/cold aisle cooling and so on are not

part of the Institute’s remit.

If you are looking for a co-location partner, then the Institute runs a facility validation and certification process. Watch

out for co-location vendors who say that there facility is Tier III or Tier IV “compliant” – this is meaningless. If they

want to use the Tier nomenclature, then they should have gone through the Institute and become certified. A full list

of facilities that have been certified can be seen on the Institute’s site here:

http://uptimeinstitute.com/TierCertification/certMaps.php

http://uptimeinstitute.com/TierCertification/certMaps.phphttp://uptimeinstitute.com/TierCertification/certMaps.phphttp://uptimeinstitute.com/TierCertification/certMaps.php


16/18


Volume II

© Quocirca 2014 - 16 -

What to look for from a Tier III data centre

provider

The Uptime Institute provides a set of criteria for the tiering of data centre facilities that can help when looking to use

either a co-location facility or an infrastructure, platform or software as a service (I/P/SaaS) service.

The idea of the tiers are to provide indications of the overall availability of the facility – a Tier I facility is engineered

to have no more than 28.8 hours of unplanned downtime per annum, a Tier II 22 hours, a Tier III 1.6 hours and a Tier

IV 0.8 hours. As can be seen, there is a big jump from Tier II to Tier III – and this is why organisations should look for

a Tier II facility when looking for a new facility to house their IT within.

A Tier III facility offers equipment redundancy in core areas, such that planned maintenance can be made while

workloads are still on-line, and where the failure of a single item will not cause the failure of a complete area. Tier IV

takes this further to provide multi-redundancy, but will only be required by those who have a need for maximumavailability of the facility and the IT platform within it. For most, Tier III will be sufficient.

However, there are lots of co-location, hosting and cloud vendors out there who indicate that they are “Tier III” (or

more often, “Tier 3” – which the Uptime Institute do not like), many of which are not fully compliant with the

guidelines. It is a case of caveat emptor – buyer beware – but there are certain steps that can be taken to ensure that

what you are getting is fit for purpose.

If you really and truly require an Uptime Institute Tier III facility, then it is really quite simple. A facility can only call

itself Uptime Institute Tier III if it is certificated accordingly.

The Uptime Institute provides three different types of certification – and these require expense by the facility owner.

The only way to become certified is to go to the Uptime Institute Professional services company and get them to audityour plans, your operational approach or your physical data centre. Just having the plans audited is a quicker way of

audit, and results in a Tier Certification of Design Documents. This gives the facility owner a certificate and they can

be listed on the Uptime Institute’s site as a certificated member.

The certification of the physical data centre can only be obtained after the data centre plans have been certificated.

The Uptime Institute Professional Services company will then catty out a site visit and a full audit of the physical facility

to ensure that the build is in line with the plans. If this is the case, then the facility owner will get a Tier Certificate of

Constructed Facility – with a plaque to go on the vendor’s offices or wherever, as well as listing on the Institute’s site.

With the Operational Sustainability Certification, an on-site visit is made to evaluate the effectiveness of components

of the management and operations and building characteristics. These are compared to the specific requirements

outlined in the Institute’s document, Tier Standard: Operational Sustainability. Once validated, the facility owner gets

a certificate, plaque and listing on the Institute’s site.

Therefore, the first place to start when looking for a Tier III facility is the Uptime Institute’s site, as all certificate

owners will be listed there.

Does this mean that all of those who are not on the Institute’s site should be avoided? By no means. There are those

who believe that the Uptime Institute is too self-centred and that its certification process is not open enough. There

are those who object to having to pay for the certification process, and others who just do not see the point of having

an Uptime Institute Tiering at all.

The Telecoms Industry Association (TIA) came up with a similar 4 levels of facility tiering (Tiers 1-4) in 2005, under its

tiering requirements in document ANSI/TIA-942. These requirements have been modified in 2008, 2010 and 2013 to


17/18


Volume II

© Quocirca 2014 - 17 -

reflect changes and advancements in data centre design. The tiers roughly equate with the Uptime Institute’s tiers,

and as such, anyone using the TIA’s system should also be looking for a Tier 3 facility.

For those facilities that do not have either an Uptime Institute nor TIA tiering, then it is down to the buyer to carry

out due diligence. Quocirca recommends that the buyer uses either the Uptime Institute’s or the TIA’s documents to

pull out the areas that they believe to be of the largest concern to them and insist that the facility owner shows how

they meet the needs of these.

Don’t let them fob you off with responses like “Of course – but we do it differently” – challenge them; get them to

quantify risks and show how they will ensure defined availability targets; get them to put financial or other penalty

clauses with a service level agreement (SLA) so that they become more bought in to the need to manage availability

successfully. When you carry out your own site visit, ask questions – where’s the second generator; what happens if

that item fails; how do multiple power distribution systems come in and distribute around the facility?

Only through satisfying yourself will you be able to rest easy. Taking responses at face value could work out very

expensive – and it is in the nature of many facility owners to promise almost anything to get higher levels of occupancy

in their facility. They know that once you are in the facility, it is difficult to move out again.

Certainly, the Uptime Institute’s certification is the “Gold Standard” as it is based against a rigorous evaluation of

plans, facility and operational processes against a set of solid requirements. The TIA is a more open approach which

does put more of the weight of due diligence on the buyer to ensure that the requirements have been fully followed.

A facility stating that it is “built to Tier III standards” requires yet more diligence – and an understanding of the

requirements.

Lastly – remember that these tiers only apply to the facility itself – they do not define how the IT equipment itself

needs to be put together to give the same or higher levels of availability. Ensuring that overall availability is high

requires yet more work to cover how the IT equipment is configured…


18/18

About Quocirca

Quocirca is a primary research and analysis company specialising in the

business impact of information technology and communications (ITC).With world-wide, native language reach, Quocirca provides in-depth

insights into the views of buyers and influencers in large, mid-sized and

small organisations. Its analyst team is made up of real-world practitioners

with first-hand experience of ITC delivery who continuously research and

track the industry and its real usage in the markets.

Through researching perceptions, Quocirca uncovers the real hurdles to

technology adoption – the personal and political aspects of an

organisation’s environment and the pressures of the need for

demonstrable business value in any implementation. This capability to

uncover and report back on the end-user perceptions in the market

enables Quocirca to provide advice on the realities of technology adoption,not the promises.

Quocirca research is always pragmatic, business orientated and conducted

in the context of the bigger picture. ITC has the ability to transform businesses and the processes that drive them, but

often fails to do so. Quocirca’s mission is to help organisations improve their success rate in process enablement

through better levels of understanding and the adoption of the correct technologies at the correct time.

Quocirca has a pro-active primary research programme, regularly surveying users, purchasers and resellers of ITC

products and services on emerging, evolving and maturing technologies. Over time, Quocirca has built a picture of

long term investment trends, providing invaluable information for the whole of the ITC community.

Quocirca works with global and local providers of ITC products and services to help them deliver on the promise thatITC holds for business. Quocirca’s clients include Oracle, IBM, CA, O2, T-Mobile, HP, Xerox, Ricoh and Symantec, along

with other large and medium sized vendors, service providers and more specialist firms.

Details of Quocirca’s work and the services it offers can be found at http://www.quocirca.com

Disclaimer:

This report has been written independently by Quocirca Ltd. During the preparation of this report, Quocirca may have

used a number of sources for the information and views provided. Although Quocirca has attempted wherever

possible to validate the information received from each vendor, Quocirca cannot be held responsible for any errors

in information received in this manner.

Although Quocirca has taken what steps it can to ensure that the information provided in this report is true and

reflects real market conditions, Quocirca cannot take any responsibility for the ultimate reliability of the details

presented. Therefore, Quocirca expressly disclaims all warranties and claims as to the validity of the data presented

here, including any and all consequential losses incurred by any organisation or individual taking any action based on

such data and advice.

All brand and product names are recognised and acknowledged as trademarks or service marks of their respective

holders.

REPORT NOTE:This report has been writtenindependently by Quocirca Ltd

to provide an overview of theissues facing organisationsseeking to maximise theeffectiveness of today’sdynamic workforce.

The report draws on Quocirca’sextensive knowledge of thetechnology and businessarenas, and provides advice onthe approach that organisationsshould take to create a moreeffective and efficient

environment for future growth.
http://www.quocirca.com/http://www.quocirca.com/http://www.quocirca.com/http://www.quocirca.com/