Upload
teness
View
26
Download
0
Embed Size (px)
DESCRIPTION
Proposed Data Centre and base Infrastructure access strategy. John Young & Jon Blake May 2007. Questions for IMC members. Does the approach to DR outlined here cover your needs? Can you prioritise your systems into Gold, Silver and Bronze categories and specify critical periods? - PowerPoint PPT Presentation
Citation preview
Proposed Data Centre and base Infrastructure access strategy
John Young & Jon BlakeMay 2007
2
Questions for IMC members
• Does the approach to DR outlined here cover your needs?
• Can you prioritise your systems into Gold, Silver and Bronze categories and specify critical periods?
• Can you make the case for investment to cover the selected level of Gold, Silver, Bronze rating, if not what would be the impact of a loss of service e.g. Bronze level ?
• Can you endorse this approach, noting the base resilience / Disaster Recovery actions?
3
Contents
Background and issues
Principles
Two-phase approach
Current and planned network and datacentre layout
Data centre sourcing options
Benefits
Three-tier DR approach
Base infrastructure actions and impacts
Summary
Required actions
4
• ONS has 5 Data Centres (inc. Siemens) plus 2 server rooms, with different states of
– security, – capacity– appropriateness
• The March 2007 Capability review identified significant risks in – DR– Data Centre infrastructure
• DR capability – limited to external site DR for the PPI and RSI systems on NUMA, and
Model 204. – Can only be accessed from service suppliers premises– Other systems protection limited to tape backup– Odyssey requirement unspecified
• Odyssey capacity requirements need to be determined• We need to find a solution to these complex issues and propose a
two phase approach– Phase 1: Base Infrastructure– Phase 2: Modernised System Requirements
Background
5
Proposed Principles
• Resilience – to provide appropriate availability across ONS
• Disaster Recovery Capability – appropriate to ONS business needs
• Good VFM for ONS balancing costs versus risks• Simplify the deployment of equipment taking a shared
holistic approach• Resolve areas of weakness i.e.
– Data Centre environments and protection– Remote Access Support access and reslience– Firewall security– GSI network access(currently 1 very busy link) for mail and internet
6
Phase 1: Base Infrastructure
Consists of:datacentres networks, external connectivity
Objectives– provide a logical design which can support Disaster
Recovery requirements as they are determined for each business application
– identify gaps and weaknesses in the capability of the base infrastructure to support ONS systems and the means to communicate with them
– identify options to implement the logical solution
7
Phase 2: Modernised System requirements
A joint Odyssey / IM activity
Objectives – Identify the capacity requirements of the Odyssey
systems (including Atlas) for the next 3 years – Obtain the performance, service level and DR
requirements of each of these systems from the business
– Propose technical designs (including costs) which satisfy the above
– Subsequently establish a migration plan in conjunction with Phase 1 to realise the approach
8
Titchfield
Today- physical
Cardiff Bay Data Centre
Newport Data Centre
WAN
London Data Centre
Titchfield Data Centre
NewportDrummond Gate
Myddleton St
•Lacks expansion
•Equipment proximity
•Excellent power and environment•Limited space
•No fire protection
•Adequate power protection
•No fire protection•Poor environment•No power backup
•Closing
Back-up WAN Link (Non active)
Siemens Data Centre
Backed-up WAN link
WAN Link – no back-upDirect link
9
Newport Data Centre 2007
72 Wintel Servers
Odyssey Dev., Notes, File and Print, etc
Main Data Centre
Room C002
Grid Link 1
Grid Link 2
HP EVA SAN
Local Mixed
7 Tb
IBM NUMA C8IDBR, Common Software, PPI etc.
DYNIX Ingres
IBM NUMA C7Legacy Maintenance
DYNIX
Ingres
Firewall
Remote Access
Internet
WWW
GSi
ServiceMonitoring &
Reporting
IBM P55Q
RPI Replatform
AIX
Ingres
No Fire Suppression
10
Titchfield Data Centre 2007
IBM P690
NeSS and Odyssey Development & Test
AIX
Oracle
IBM Z Series
Social and Vital Statistics
OS 390
Model 204
Hitach Lightning SAN
NeSS, Odyssey Dev. & Test, Other Mixed
16.6Tb
195 Wintel Servers
NeSS, Odyssey Dev. & Test, Census 2001, Notes, File and Print, etc
Main Data Centre
Room nnn
Back-up Generator
Grid Link 1
Hitach AMS1000 SAN
Census 2001, less demanding apps.
7.2Tb
No Fire Suppression
Firewall
Remote Access
Internet
11
Cardiff Bay Data Centre 2007
IBM P690
Odyssey
AIX Oracle
Hitach Lightning SAN
ATLAS, Odyssey, GRIS, Gender Recognition
9.4 Tb
47 Wintel Servers
Odyssey, GRIS, Gender Recognition, Metis, E-learning
BT Cardiff Bay
Back-up Generators
Grid Link 1
IBM P590 AIX Oracle
ATLAS, Odyssey, GRIS, Gender Recognition
Highly secure
Fire Suppression
12
Titchfield
March 2008 – physical with network enhancement
• WAN resilience for Titchfield and Cardiff Bay
• Major computing equipment from DG moved to Titchfield and South Wales
• Subject to funding approval
WAN
Newport
Myddleton St
EthernetExtension
Siemens Data Centre
Main WAN linkBackup WAN Link
Direct link
Cardiff Bay Data Centre Titchfield
Data Centre
Newport Data Centre
13
What we want to achieve - logical
Newport
Titchfield
Myddleton St
Data Centre South
Back-up
WAN
Data Centre West
Main WAN linkSecond (active) WAN Link
Remote Access
GSi
Remote Access
GSi
Web-hosting Site
14
Data Centre Sourcing Options Set-up Cost
Running Cost p.a.
Timescale
Seek more capacity at Cardiff Bay and move all Newport equipment to there
? £640k 6-9 months
Refurbish Titchfield old Data Centre and move all on site equipment to it –
£1m?
? 18 Months
Refurbish Newport Data Centre and bring back equipment from Cardiff Bay
£1m?
? 18 Months
Move all Data Centre requirements to Isaac only if ‘Core+’ adopted
? ? 6-9 months
• Permutations of the above
15
Benefits
• Logical Data Centre and DR service approach provides
– Predetermined DR service models for developers to use propose and develop solutions tailored to business needs
– Framework against which physical Data Centre solutions can be established and reviewed
– Standardised framework with potential economies of scale
– Economies of scale for equipment and operation– Clarity of management
16
Example National Accounts, Atlas RPI,PPI
Data Secured Last hour or better Overnight Overnight
Application secured Weekly and on change Weekly/monthly
Restoration facilities (hardware)
Dedicated Transferred from other users None
Restoration time -limited users
Hours DaysFollowing hardware
procurement (weeks)
Restoration time –further users
1 week
Indeterminate
Restoration time - full service
4-8 weeks
Other Features
Seek to establish priority replacement facilities with suppliersSupport services split between Data Centres to minimise DR impact
Systems only installed on shared corporate equipment
Aggressive server virtualisation to minimise / insulate from hardware risks
RequirementsPredetermined schedule to prioritise recovery of systems with varying criticality through the year
Gold Silver Bronze
Illustrative 3 Tier DR approach
17
Business evaluation of DR requirements • ONS needs:
– a common process across businesses;– a single body to prioritise DR requirements.
Recovery period
Business Process
F / R 4 hours
24 hours
48 hours
72 hours
5 days
7 days
14 days
28 days
Process 1
Financial 1 1 2 3 3 3 3 3
Reputational 1 2 3 4 5 5 5 5
Process 2
Financial
Reputational
Example Method
• Financial and Reputational impact assessed on a 1-5 scale for each period– e.g. recovering Process 1 in less than 24 hours could have a low Financial and
reputational impact, but within 48 hours would begin to have adverse affects, at 72 hours would be more damaging, and 5 or 7 days is unacceptable.
18
Production
RPI
Production
PPI
Development environments
/ Shared critical period
backup
Illustrative Gold and Silver arrangement
Production 1NA
Production
RPI
Production
PPI
Data Centre West
Data Centre South
Standby 1 NA
Standby 2 Atlas
Production 2Atlas
Quiesced
19
Base infrastructure actions and impactsUpgrade Links between offices and Data Centres
Provides High resilience
Provides DR but DR events limited to 1-2 days
•£100k setup and similar pa running costs •3 months to implement . •Do as part of London move
Extend GSI, RAS, Firewall mechanisms
Provides needed extra capacity and resilience
Provides DR if more than 1 Data Centre
•Costs tba
Establish twin datacentre strategy
Can be used as a base for resilience
Provides effective base for DR strategy
•Costs tba but offset by existing costs•18 months if ONS housed •6-9 months if via Isaac
20
Summary
• ONS DR facilities are extremely limited and requirements are not established
• ONS Data Centre capability is fragmented and ranges from very poor to excellent in its capability
• ONS other base infrastructure is reasonably sound, however it requires additional resilience and capacity to be installed, e.g. RAS, GSI, WAN links
• A logical model to rationalise this is proposed• A 2nd piece of work is in hand to determine the Odyssey
requirements (inc. Atlas)• Further work is urgently required for NUMA based services
to consider such systems as RPI and CSDB• Several Data Centre sourcing options are proposed for
investigation
21
Required Actions
• IMC– Confirm the strategy provides a basis to support their
business objectives and risk management needs– Members arrange the provision of the necessary input
information about the business needs in their areas– Support is given for a project to establish the respective
Data Centre cost evaluations and selection– Note that projects will be brought forward to NWEB to
realise ….
• EMG – Note and endorse this proposal if supported by IMC