1 V1.0: October 3, 2011 Unum Wintel Server Virtualization Strategy Virtualization Summit 2011 - 4Q11 Curtis Gunderson Director – Virtualization Architecture

1

V1.0: October 3, 2011

Unum Wintel Server Virtualization Strategy

Virtualization Summit 2011 - 4Q11Curtis Gunderson

Director – Virtualization Architecture

2

Topics

Business Driver Review Wintel Server Virtualization Strategy

● Traditional review of Virtualization Strategy

● Where we are with the Cloud

● ServiceClasses

Technology Update for vSphere/ESX● vSphere 5 & core offering updates

● Compute, Security, Storage and Networking specific changes

Virtualization Candidacy – VM entrance criteria● vSphere 5 & core offering updates

● Virtualizing first

● Critical Recovery with vSphere VM only

3

Wintel Server Virtualization Strategy

What are we talking about in the Virtualization space?

Monito

ring

Wintel Server Virtualization

4

Business Driver Review

Enable the business● Agility

● Innovation

● Performance

Expand stability & availability● Increasing availability and reliability

● Reducing Risk

Cost optimization

5


Virtualization as a Business Enabler Agility

● Continuous provisioning improvements, reducing wait to customer

● Dynamic resource flexibility: Hot-add CPU/Memory, Grow Disk, etc.

● Identify the cost components, let customer make informed ‘build’ decisions on the cost for the SLA

● :Be Flexible: - Give us what we want, how we want it, when we need it

Innovation● Support new and visionary infrastructure technologies

● Build for EDC & UK Co-Lo multi-tenancy

● Build Private Cloud model (and Hybrid, and Public, and…)

● Alternative users, alternative connectivity

● :Enable the Future: - Build solutions with vision, flexibility and possibilty

6


Virtualization as a Business Enabler Performance

● Perform the same (and better) than physical hardware

● Track performance issues & optimize configuration

● Remove bottlenecks and technical barriers to performance

● Manage performance guarantees & prioritization of SLA

● :Make it faster: - Processing time is money…save money

7


Virtualization to Expand Stability and Availability Increase Availability & Reliability

● Up-time & online status driven beyond monitoring class

● Reduce patch downtime, reduce change impact, reduce outages

● Increase resiliency – same site & alternate site

● Backup / recovery – make it faster, less impactful, manageable

● :Keep it Up: - Meet the business service levels

Reduce Risk● HA & DRS are a given

● Storage migration

● Antivirus scanning

● :Make it safe: - Maintain the integrity of the business

8


Virtualization to Control Costs Consolidation & Cost Control

● Continued consolidation of physical assets to virtual assets (~180 in 2012)

● Removing maintenance & extended warranty costs: ($150>$300 month)

● Increasing utilization rates of physical assets: CPU 5-14%, 50%+ > 80%

● Maximizing FTE: 25:1 to 65:1…and beyond

● :Find more savings: - Continue to find cost savings

Lifecycle Management● Remove the tie to a physical asset to remove extended maintenance costs

● Support the extended ability to continue the life of an asset

● Automate the build and other lifecycle processes for increased efficiency

● :Be current: - Stay current with technology

9


Mature Infrastructure Hosting Services: Global Hosting as a Service Automate provisioning with Self-Service portal Extend Virtualization strengths: Availability, Scalability, Reliability Remove barriers to entry: Performance, Cost Provide underlying infrastructure to meet business SLAs

GHaaS SLAExtend Strengths

Remove Barriers On-Demand

10


Before the details…

The Vision

11


Pools of Resources

12


EHS/Gold/Tier1 SLAAvailability = X

RTO = Y Performance = Z

PHS/Silver/Tier2 SLAAvailability = A

RTO = BPerformance = C

SHS/Bronze/Tier3 SLA

Availability = QRTO = R

Performance = S

SAMPLE SLA

13


Data Replication

DataMigration

ApplicationMigration

DR

14


How do we get there?● GHaaS – evolution of the Hosting Services virtualization offering

● On-Demand Self Services

GHaaS

On-Demand

15


Remember the IHS Roadshow from 2008?

This edition of the strategy represents the 3rd generation of the Unum Hosting Service Model

16


Global Hosting as a Service – full presentation in later session

Providing Hosting Services via Private Cloud Model for virtual environment● Provide the plumbing - the infrastructure is there to support a dynamic load

● Enable the application teams and partners to mix/match the infrastructure to their requirements - including amount of resources & service types

● Provisioning services through Self-Service portal: pick type, size, SLA, quantity – Go!

● Capacity and Performance Management - providing and guaranteeing resources

Transferring the accountability to responsible owners● We build the pools; Application/Governance teams pick the options for an app

● Pick size, configuration, service offering – mix and match for the app, stack or env

● Dynamic and flexible infrastructure, with VM sizing to application needs

● Usage, Configuration and Tracking occurs to the VM level

● Accountability and Cost Translucency to Requestor: Usage and Showback Costs

17

Wintel Server Virtualization Strategy – The “Cloud”

What does the Virtualization strategy compare to the Cloud?● Compute + Networking + Storage = Virtual Infrastructure

● Virtual Infrastructure + Enablers = Private CloudEnablers: resource flexibility, governance, self-service, cap mgmt, site independence

The Virtual Infrastructure evolves into the Unum Private Cloud ● VMware vSphere/ESX5 and VMware vCloud Director are the transformation tools to

the Unum Private Cloud

● Host management/resource assignment moves from Cluster to Pools

● Site based resource pools, governed to SLA

● New services through the provisioning portal

How is the infrastructure different than ‘today’?● Generally the same! We have been moving towards ‘private cloud’ since 2008

● Adds: SLA, governance, self-service, accountability and control

Welcome to the Unum Private Cloud!

18


Removing Barriers to Entry

Remove Barriers

19


Removing Barriers to Entry - Platform Optimization

Add VM Scale-up and Scale-out workload balancing / performance design

• Scale-up: per VM max supported configuration increases as application scale-up grows

• Scale-out: continued support for application scale-out for redundancy and load balancing

• Configuration Review: New practice to right size VM configuration to the workload

Support larger VMs in Scale-up configuration

Today Strategy Small: 1 vCPU / 1 Gb 1-2 vCPU / 1-2 Gb

Medium: 2 vCPU / 2 Gb 4 vCPU / 4-8 Gb

Large: 4 vCPU / 4 Gb 8 vCPU / 8-16 Gb *

Jumbo: 8 vCPU / 8 Gb 16 vCPU / 16-64 Gb *

* When approved in Application Service Framework

20


Removing Barriers to Entry - Platform Optimization

Standardize underlying infrastructure capabilities• HA / DRS become standard offering for all platforms, all offerings (reduced in labs)

• All infrastructure tools for all platforms• All hosts move to Enterprise Plus licensing model

– Centralized network control– Centralized priority management across any VM type– Consistent build process & automated updates/deployments

Standardize service offerings• Managed via SLA by application/stack• Backup / Recovery• Fault Tolerance• DR / DRE• Alt-Site / Multi-site awareness

21

Wintel Server Virtualization Strategy - Compute

Walking back down the silo – Host specific updates

22


Platform Optimization – compute (vCPU)

Removing CPU bottleneck• Historically, we managed to VMs per Host or VM:Host ratios• This forced us to ‘cram’ VMs onto hosts, and had only CPU% or MEM% as guidelines for how full

a host was• Of course, bad way to manage, especially as each VM got bigger!

● New model is based upon the vCPUs and the number of Host Cores• Targeting near 1:1 vCPU to CPU Core ratios, based upon workload size• Guarantees closer to real-time performance optimization / scheduling • Continues to allow for over subscription in pools when usage is low

● Collectively, the vCPU for all VMs on a host-cluster and pool will be used to drive utilization, capacity and performance plans

● We continue to monitor individual VM for performance, but track capacity and availability at the higher pool/cluster level

● Of course, finance will still use VM:Host as metric!

23



Continue Scale-up host design*

• Compute: 4 socket, 8-10 cores: 64-80 vCPUs; 256-384 Gb memory

• Network: 2 x 1 Gb connectivity moves to 2 x 10 Gb connectivity

• Storage: 8 Gb fabric, SVC redundancy and replication (may lead to 3rd HBA)

• Cost per VM: Scale up continues to drive down the cost of an individual VM

• Rack or blade? From virtual silo it does not matter; it becomes DIS economic consideration* This remains true in rack or blade solution, IBM – UCS – VCE, etc.

Change new hardware introduction approach• Today, because we want ‘the best & fastest in Prod’, we introduce new hardware directly into

CAE – directly for PROD VMs – then shuffle down hw to other envs

Any risk there?• With new pool model, we’ll migrate workload into the new equipment based upon SLA,

migrating lower risk workload to newer equipment• Hardware refresh will change towards a forecasted, planned strategy by % resulting in fewer,

larger hardware purchase with the capacity planning model.

24

Wintel Server Virtualization Strategy - Compute


Upgrade of ESX/vCenter versions to v5 Converged compute/storage/network capabilities into Blades, UCS or VCE

environments expected for at least some workloads: View & LabMgr/vCD

Segment Cluster Design via DRS Groups● One draw back of incredibly dense hosts is that various sized workloads can frequently conflict

– Smaller 1 vCPU VMs can get in way of Larger VMs while waiting for CPUs to be available– Wait times can occur in VMs, affecting application response time and overall performance

● DRS Groups allow for aligning similarly sized workloads to run together to better align resource guarantees to the SLA

● Similarly sized workloads perform better: enhance memory sharing & lower context switches with VMs of smaller sizes when workload sizes compete for resources

before after

25

Wintel Server Virtualization Strategy - Storage

Walking back down the silo – Storage specific updates

26


Platform Optimization – storage (vDisk)

Introduce new VM backup tools / processes• Moving from TSM client within VM to TSM VE – SAN based backup

– Removes CPU utilization & guest operations– Removes network dependencies on backup– Greatly improves backup & restore capabilities

• Backups are based upon a ‘Snapshot’ process– Snapshot produces very short ‘performance’ impact on VM– Incremental changes between snapshots are small – very small– Continues to support application aware dependencies: SQL, Exchange

• Moving backups from direct to tape to VTL – ProtecTIER– Integrated de-duplication– Integrated replication to alternate sites (CAE>EDC)– Integrated DR / DRE with off-site repository

27



LUN sizing changes – support TSM VE and UK• Remain all Tier2 storage classification – Server storage behind SVC• Remain at 1 TB LUN sizes – all VM workload managed by SAN performance tools• Continue with Thin Provisioning on the ESX host side as default behavior• Change Free space from 30GB free to 150GB per LUN• Change Thin Overcommit from ~180% to 125%• Same performance, lower risk, increase availability

SVC Upgrade to full ESX compliant functionality• Includes industry standard API support: VAAI, VASA (soon!) *• Offloads migrations, clones, data copy operations directly to SAN• Faster provisioning, less impact to the VMs

* VAAI – vStorage APIs for Array Integration – moves host based activities back to array

VASA – vStorage APIs for Storage Awareness – array features/status/performance exposed within vCenter

28



Introduce Storage DRS – dynamic storage provisioning and management

• Similar to the traditional DRS at the ESX host level, but applies to Disk• Automatically monitors and reacts to changes in performance, capacity and VM

datastores performing at optimum levels: will move VMs when required• SLA managed to both capacity & performance of the vDisk requirements• Beyond the SAN array/LUN, includes host side observations to govern against HBA

overload, host overload, datastore overload, IO overload, etc.• Storage MaintenanceMode – mass storage migration or VM reshuffling

Introduce Storage IO Control – priority & sharing SLA

• Similar to CPU/Memory Priority, Disk usage and performance is managed to SLA• VMs are guaranteed access to their committed SLA• For example, if we have a Gold+ Prod VM and something horrible happens on a

shared LUN/HBA/controller that is caused by Silver VM workload, Storage IO Control guarantees, protects and prioritizes the Gold+ VM from performance and availability impacts of Silver VM

• A lot more on this topic in the SLA section!

29



Data Replication – extending AltSite concept with new technologies• Maintain SAN based replication* processes

– IBM SVC– EMC RecoverPoint – possibly limited to campus migrations

• Introduce VM based replication for lower SLAs (Site Recovery Mgr)• Site Recovery Manager policy based recovery plans

– Plane.biz / Harmony manual process converted to SRM RunBook automation– Extend ‘fenced’ test arena for isolated test & validation

• Extend replication & application migration to Gold+ SLA candidates^

• SRM integration into DR and DRE activities * Replication will require network bandwidth – quite likely, a lot of networking bandwidth

^ Growing capabilities to additional applications may occur with storage or hardware refresh cycles

EDC design may add additional HBA ports / SAN ports• Bigger hosts with bigger VMs are driving up I/O; degraded state with HBA failure• May move from 2 HBA ports to 4 HBA ports, per server• Protect environment, guarantee performance, even in degraded state

30

Wintel Server Virtualization Strategy - Network

Walking back down the silo – Network specific updates

31


Platform Optimization – network (vNet)

Introducing 10GbE to hosts (VMs already have 10 Gb to the host)

• Today, 7 x 1 Gb nics: 4 nics teamed to provide 2 x 1 Gb to all VMs• Host density starting to put pressure on network throughput• Will be moving VM networks to 2 x 10 Gb• Moves vMotion to the 10Gb nics, dedicating 2 nics to MGMT• Result: 10x improvement to VM network availability, 2 fewer ports overall

Staying with vDS – VMware Virtual Distributed Switch• DIS managed distributed soft switch residing at the vCenter level• Not implementing Cisco Nexus 1000v at this time• ACL / QoS remain at the Physical switch – managed by NS

Introduce Network IO Control – priority & sharing SLA

• Similar to CPU/Memory & Storage IOC, network usage is managed to the SLA• This isn’t pure ACL or QoS, but, ESX based sharing to match to the SLA• VMs are guaranteed access to their committed SLA

32


Platform Optimization – network (vNet)

Consolidating VLANs• Consolidate VLANs towards service offering: Primary, Backup, LB, DMZ-FE, etc.• Aids in up-front provisioning, SLA, IO Control• Move away from many-small VLANs to fewer-larger VLANs

– 50 VLANs of 200 IPs each becomes 3 VLANs of 1000 IPs each– Continue to provide static IPs but within same VLAN as service

Support physical switch environment separation, as required• DRS Group management of a pool of resources segments traffic to environment• DRS Group combines networking traffic for ‘environment’, forwards to switch• Physical network switch executes ACL rules

Site-to-Site IP translation and/or IP update automation• Today, moving workload between data centers: Network/IP changes• Implement translation mechanism with SRM & scripts to update DNS/hosts• Move towards IP / site independence: long term – no server / app change req’d

33


Platform Optimization – security

DMZ Services / Servers – remain physically separated, no operational change this version

Antivirus/Scanning - • Historically, every VM has antivirus software installed within the VM• With ~60 VMs on a host – concurrent scans, updates, access is an issue• Evaluating an approach to virtualization friendly solution• However, we will proceed cautiously with this review for two reasons:

– Previous negative server side experiences with new, early versions of Symantec tools– Review of the server/security support model on tool ownership & operations

AV Part 1: Thinner in-guest VM AV client which caches scans in all VMs• Symantec SEP 12.1 optimized for Virtual environments, where VM files are scanned and content

added to trusted store• Next VM scans the differences not stored in the store• Real-time scanning still occurs, and only unique content to the VM is rescanned based upon

content in the trusted store• Symantec estimates 90% reduction in overall IO, in heavy/dense environments

34

Has to scan every filein the VM

In the virtual environment, SEP 12.1 eliminates 90%+

of I/O scan activity!

Faster Scans

Today SEP 12.1

Only scans new &untrusted files


For each VM!

20 – 60 tim

es

a host

35


Platform Optimization – security

AV Part 2: Offload VM scanning entirely to the host• Extension of the VMware vShield API• Scan/disk write operations intercepted via policy to ‘appliance’• Keeps cache concept of AV Part 1, but offloads that remaining unique data to an

appliance – so, no direct in guest scanning operation occurs• Appliance intercepts both real-time & scheduled scan events• Support upgrade to Symantec Summer 2012 EndPoint Security release

Guest VMs

ESX

Hos

t

SecurityVM

SecurityVM

Guest VMsES

X H

ost

Again – we will remain cautious with evaluating these products

36


Build upon existing strengths and new capabilities of vSphere 5

Extend Strengths

37


Extend Availability – Same site recovery

Continue Scale-out cluster design● Beyond 8 nodes in a cluster…to 20-24+ hosts

– Collapse environment specific clusters into site specific clusters– Further distribution of priority workloads to more hosts– Priority workload has greater load balancing and access to more resources– Reduces capacity required by cluster for HA – greater utilization of all assets

● Retain N+2 recovery model in clusters• Compute perspective -

– Always be able to have 2 simultaneous nodes out and meet 100% of the SLA– Even during planned maintenance/upgrade, be able to absorb 1 failure

• Storage & Network perspective – – Redundant and load balanced connectivity with no single points of failure– 2nd level resiliency allows for full SLA even with component failure on the node

● Even faster HA recovery in a hardware failure event• Scaled clusters provide capacity for more simultaneous VM restarts• All VMs on a fully-loaded host would be restarted in less than 60 seconds

38


Extend Availability – Beyond same site

TSM VE – TSM backup software for Virtual Environments• Move Backup from agents installed in VMs to the SAN / hypervisor• Offloads disk activity to disk arrays, reducing utilization but increasing recoverability• Snapshot & VM state can be replicated to other regions as driven by SLA

True DR / DRE to required SLA• Through TSM VE, SRM or pure recovery, capability of full recovery of servers to DR site• Pure DR capable or DRE capability on quarterly basis

Site Recovery Manager introduced for automating DR & recoverability

• Improve and extend the VM recoverability & testing scenarios for AltSite VMs used today• Automation of the recoverability at the VM level• SAN replication or vStorage based replication to different arrays• Expected to be offered to Gold+ VMs

* A great deal of this requires a replication network and additional network bandwidth

39


Extend Reliability – Keeping Applications online & Performing

Enhanced Workload management● Implement workload isolation and IO Controls to guarantee performance an uptime

● Extend Anti-affinity rules to keep VMs separated where required: CLB VMs, etc.

● Extend Affinity rules to keep VMs that work together on same host / same nic: increase network throughput and decrease network hops

Extend Fault Tolerant VMs – Mirrored VMs to same site● Critical / Small VM workload mirrored at real-time to another host in same site

● In the event of a host failure, VM on other node stays online & takes over

1 vCPU limit still applies in 2012

Minimizing invasive impact of required operations● Anti-virus impact reduced

● Backup impact reduced

40


Provide underlying infrastructure to meet business SLAs

SLA

41


Infrastructure managed to support all requirements, all services• 24x7x365 infrastructure level: all silos within virtualization: compute, network, storage

• All service categories managed here: DR, AltSite, Replication, Backup, etc.

Infrastructure pools of SLA are created & guaranteed• Pools of resource sized to guarantee resources by pool SLA• CPU & Memory reservations and priority, network and storage I/O control and

prioritization – set at pool level, applied to VMsGold+, Gold, Silver, Bronze, etc.

• Service functionality created at to pool, applied to VMs

Replication, Backup/Recovery, FT, etc.

Applications / VMs selects SLA requirements, added to pool• Applications map to Application Service Levels, select appropriate SLA• VM added to the appropriate SLA Pool• SLA and service class applied to the VM / Application• SLA reporting/enforcement to the VM level

42


SLA & Service offerings • Support AMS definitions: August 11, 2010• Infrastructure capability governed to the SLA• Infrastructure costs governed to the capability• Additional GHaaS mapping to the OS level will apply (application monitoring, updown)

• Existing VMs/Applications have not been mapped

AMS Application Classifications

Class US App% Availability RTO / RPO

Platinum 0% 99.99% 20m / 0m

Gold+ 8% 99.6% 120m / 12hr

Gold 29% 99.3% 120m / 12hr

Silver 55% 99.0% 200m / 24hr

Bronze 8% 98.5% 300m / <7d

43


Define VM Infrastructure capability• Live data replication • Backup data w/ replication• Backup data• HA / DRS• FT

Class Infrastructure Service Cost

Platinum Live data replication, backup data replication, HA/DRS, Guarantee resources at two sites, DRE

$$$$$$

Gold+ Backup data w/ replication, HA/DRS, Guarantee, DRE $$$$

Gold Backup data w/ replication, HA/DRS, Guarantee, DRE $$$$

Silver Backup data, HA/DRS, Guarantee/Limits $$$

Bronze Backup data, HA/DRS, Guarantee/Limits $$

Labs* DRS-maintenance $

* AMS does not have a Labs service class, this included as a comparison

• Guarantee CPU/Memory/Storage/Network

• Guarantee/Limit CPU/Memory/Storage/Network limits

• DR Exercise Validation• Restart / Recovery Priority

Draft

Infrastructure services may be the same at different classifications- differentiators may be at the OS level, Monitoring Level, Response Time

44


SLA Priority – applying this to environments

1st

2nd 3rd

4th

Reco

very

Pri

ori

tyGuarantee

Meet / Limit

Meet / Limit

45


How can we guarantee SLA? How can we guarantee SLA in a shared environment? Questions you’ll want me to prove:

– If we are combining Gold+ and Silver workload in the same cluster, how can we guarantee that Silver will not affect Gold+?

– If there is a Gold+ VM for Prod and a Gold+ VM for Dev, how can we guarantee Dev will not affect Prod?

– If we combine Stress with the other environments, and Stress goes CRAZY, how can we guarantee it is isolated and not affecting anything else?

– If there is a Silver Prod VM and a Gold+ Dev VM, who wins & is that okay?

Great questions! Great answer:

• Resource shares & priority• IO Controls• DRS performance management & migration• Reserved capacity at the cluster level• Resource Pools with inheritance from parent pool to meet SLA• Capacity & Performance Planning

Prove it!

46


Let’s look at great examples in place today where this works: CHA● What is shared: Hosts, SAN, Network

● 9 Hosts, 368 VMs, 26TB SAN

VM % CPU % Mem %– DEV: 301 82% 77% 81%– ACPT: 8 2% 1% 2%– PROD: 59 16% 21% 17%

So, how do we guarantee ‘prod’ service today to this small % of VMs, when the bulk of the work is Dev?

• Resources provided for the peak workload• Shares prioritize workload for equal prioritization• Priority of Prod raised

We do not segment Prod/Acpt/Dev for any silo

47


That’s all good, but, in Prod – CAE Prod – how do we really guarantee?

Natively: ESX• Resource Pool: Guarantee both CPU & Memory share/priority

• IO Control: Network and Disk fair share guarantee & IO priority

• Pool Resource Level management: highwater mark & available

• DRS / sDRS: Dynamic shuffling of resources to meet guarantees

Effective Monitoring: vCenter, VKernel• Cluster & Pool resource monitoring, trending• Bottleneck identification: current & future• Performance management to the component and VM level

vCloud Director• Reservation Pool allocation model to meet performance expectations• Share priority isolation provides fairness to the VM workload• Resource pool governance to the SLA requirements

• SLA measurement

48


How we guarantee performance & isolation from bad actors

Gets what it needs

If cannot move, gets remaining

share

Follows SLA, guarantees

above, queues below if

contention

Q1 – How will Silver not affect Gold? Answered? Q2 – How will Dev not affect Prod? Answered? Q3 – How will Stress not affect anything else? Answered?

Q4 – Is Silver Prod impacted by Gold+ Dev? Answered?

N+2

25%

of p

ool

100%

SLA

40%

of p

ool

99.9

% S

LA35

% o

f poo

l99

.0%

SLA

49


The Monitoring Approach

Monito

ring

Wintel Server Virtualization

50

Wintel Server Virtualization Strategy - Monitoring

At Raw infrastructure component level from Virtual Perspective• Effective Performance Management

• Effective Capacity Management

51

Wintel Server Virtualization Strategy - Monitoring

Effective SLA Management• Governance to the SLA service definitions defined in offering• Effectively validating SLA via testing, DRE execution• Effectively validating SLA via vCenter/vCloud Director compliance reporting

Integration / Review points with other team infrastructure tools• IBM Director hardware integration tools• TPC / native storage tools• WAN / Riverbed / Other network performance tools

Guest – • Integration points between VMTools, in guest PerfMon counters• Integration points with application performance tools: Bluestripe/SCCM/NetIQ

52

Wintel Server Virtualization Strategy – Getting there!

We like the strategy…when and how are we getting there! Cluster Collapse

• CHA site already underway – we only had one cluster there!• Implement SLA, performance control, monitoring, enforcement in CHA• CAE site next, UUK as CoLo data migration occurs, EDC from Day One• Implement initial Resource Pool allocations

ESX 5 / vCloud Director implementation• Migrate technologies to latest offering, featureset• Implement reporting tools for SLA governance• Implement GHaaS to pool based resource governance, self-service

Extend for DRE 2012• SRM for Datacenter Migration Tools• SRM for DR / DRE in 2012 execution

Monitor / Plan / Analyze• Revamp Capacity Planning processes, procurement and refresh• Validate approach, measure results, evolve

53


Session Based Virtualization• Transition to Lee: RDS in Unum presentation to IDAT• Transition to James:

http://teamspace/sites/entinfarch/EIATeam/Wintel%20Virtualization%20Summit%202011/Slide%20Decks/Remote%20Desktop%20Services%20in%20Unum-9182011.pptx

54


Virtualization Candidacy• Transition to Curtis

55

Virtualization Candidacy – ToBe or ToBe a VM

Standard VM Offerings

* Unum internal limit driven mainly by cost, risk tolerance and recoverability at current configWill be reviewed with EDC where SQL is targeted for 32 vCPU & 128+ GB capability in VM

Even bigger workloads now fit:● High CPU: Enhanced CPU SMT architecture improves scheduling (10%+)

● High Memory: NUMA & Memory Compression extensions allows giving more memory to VMs, for more guest accessibility

● High Storage IO: Storage DRS, reduced overhead, reduced latency, IO priority

● High Network IO: 10 GB NICs, high packet rates, IO priority by tag/VMs

● Critical Apps: Faster HA recoverability, less context swapping, priority fairness

Unum Limit* ESX 5.0 Limit

Small: 1-2 vCPU / 1-2 GB

Medium: 4 vCPU / 4-8 GB

Large: 8 vCPU / 8-16 GB

Jumbo: 16 vCPU / 16-64 GB 32 vCPUs / 1 TB

56

Virtualization Candidacy

All Tiers / Stacks: Tier S, Tier A, Tier B Dedicated and Shared instances All Exchange functions:

– Public Folders / CAS / Gateways – since 2009– Mailbox servers in 4Q 11 [US] (UK doing already)

– 4-8 vCPU, 16-32 GB memory, 100-200 GB storage

Department level and exception SQL Servers– Tier1 SQL: Integrated VM / SQL DB; Security policy exception– Geographical isolation from standard offering– 8 vCPU, 64 GB memory, 256 GB storage (native per mount point)– May introduce additional storage tiers for SQL VMs

Internet / eCommerce facing through DMZ DR/DRE instances & Alt-Site workloads

57

Virtualization Candidacy

Unum continues with ‘Virtualize First’: 90+% target remains All environments, all technologies, all workloads, all SLAs Gold+ and higher service will ONLY be provided via a VM

● Jumbo VMs that lead to a 1:1 dedicated server – will do that in Jumbo cluster

● Large / Jumbo VMs that lead to a 4:1 ratio – will do that in Jumbo cluster

58

Virtualization Candidacy (a little Not To Be a VM)

Items intentionally not Virtualized in 2011 plan:– Dedicated Hardware resources: USB key, CPU affinity – Jumbo SQL Server Workload: >16 CPUs, 128-256 GB – Active Directory root controllers: under review– HPC Workloads: tool supports it, but we just built it physical in 2010 – High Compute, High IO and High Business Risk (SOA)– Phone systems, security system – hardwired/connectivity

The following conditions must be met for consideration to be outside of a VM:

– Application does not have vendor support within a VM and VMware Partner/Alliance program does not have a support policy for the tool (~1%)

– Physical hardware capabilities must be accessed or connected to the server running an application device: dongle, key, license lock (~1%)

– Demonstrated performance requirements that exceed the defined limits of a VM within the strategy: i.e., >16 vCPU and/or >64 Gb memory (~3-5%)

– Strategic determination by an architecture governance team to keep application stack outside of a virtualized environment: AD, DNS, hardware/application monitoring tools that monitor VM environment, etc. (~2%)

– Servers requiring physical clustering via tools like Microsoft Clustering Services (~1%)– VP escalation noting the SLA will not be met, server will incur higher cost, and capabilities

are reduced

Documents

1 V1.0: October 3, 2011 Unum Wintel Server Virtualization Strategy Virtualization Summit 2011 - 4Q11 Curtis Gunderson Director – Virtualization Architecture