Upload
iris-annabel-shaw
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
1
V1.0: October 3, 2011
Unum Wintel Server Virtualization Strategy
Virtualization Summit 2011 - 4Q11Curtis Gunderson
Director – Virtualization Architecture
2
Topics
Business Driver Review Wintel Server Virtualization Strategy
● Traditional review of Virtualization Strategy
● Where we are with the Cloud
● ServiceClasses
Technology Update for vSphere/ESX● vSphere 5 & core offering updates
● Compute, Security, Storage and Networking specific changes
Virtualization Candidacy – VM entrance criteria● vSphere 5 & core offering updates
● Virtualizing first
● Critical Recovery with vSphere VM only
3
Wintel Server Virtualization Strategy
What are we talking about in the Virtualization space?
Monito
ring
Wintel Server Virtualization
4
Business Driver Review
Enable the business● Agility
● Innovation
● Performance
Expand stability & availability● Increasing availability and reliability
● Reducing Risk
Cost optimization
5
Business Driver Review
Virtualization as a Business Enabler Agility
● Continuous provisioning improvements, reducing wait to customer
● Dynamic resource flexibility: Hot-add CPU/Memory, Grow Disk, etc.
● Identify the cost components, let customer make informed ‘build’ decisions on the cost for the SLA
● :Be Flexible: - Give us what we want, how we want it, when we need it
Innovation● Support new and visionary infrastructure technologies
● Build for EDC & UK Co-Lo multi-tenancy
● Build Private Cloud model (and Hybrid, and Public, and…)
● Alternative users, alternative connectivity
● :Enable the Future: - Build solutions with vision, flexibility and possibilty
6
Business Driver Review
Virtualization as a Business Enabler Performance
● Perform the same (and better) than physical hardware
● Track performance issues & optimize configuration
● Remove bottlenecks and technical barriers to performance
● Manage performance guarantees & prioritization of SLA
● :Make it faster: - Processing time is money…save money
7
Business Driver Review
Virtualization to Expand Stability and Availability Increase Availability & Reliability
● Up-time & online status driven beyond monitoring class
● Reduce patch downtime, reduce change impact, reduce outages
● Increase resiliency – same site & alternate site
● Backup / recovery – make it faster, less impactful, manageable
● :Keep it Up: - Meet the business service levels
Reduce Risk● HA & DRS are a given
● Storage migration
● Antivirus scanning
● :Make it safe: - Maintain the integrity of the business
8
Business Driver Review
Virtualization to Control Costs Consolidation & Cost Control
● Continued consolidation of physical assets to virtual assets (~180 in 2012)
● Removing maintenance & extended warranty costs: ($150>$300 month)
● Increasing utilization rates of physical assets: CPU 5-14%, 50%+ > 80%
● Maximizing FTE: 25:1 to 65:1…and beyond
● :Find more savings: - Continue to find cost savings
Lifecycle Management● Remove the tie to a physical asset to remove extended maintenance costs
● Support the extended ability to continue the life of an asset
● Automate the build and other lifecycle processes for increased efficiency
● :Be current: - Stay current with technology
9
Wintel Server Virtualization Strategy
Mature Infrastructure Hosting Services: Global Hosting as a Service Automate provisioning with Self-Service portal Extend Virtualization strengths: Availability, Scalability, Reliability Remove barriers to entry: Performance, Cost Provide underlying infrastructure to meet business SLAs
GHaaS SLAExtend Strengths
Remove Barriers On-Demand
10
Wintel Server Virtualization Strategy
Before the details…
The Vision
11
Wintel Server Virtualization Strategy
Pools of Resources
12
Wintel Server Virtualization Strategy
EHS/Gold/Tier1 SLAAvailability = X
RTO = Y Performance = Z
PHS/Silver/Tier2 SLAAvailability = A
RTO = BPerformance = C
SHS/Bronze/Tier3 SLA
Availability = QRTO = R
Performance = S
SAMPLE SLA
13
Wintel Server Virtualization Strategy
Data Replication
DataMigration
ApplicationMigration
DR
14
Wintel Server Virtualization Strategy
How do we get there?● GHaaS – evolution of the Hosting Services virtualization offering
● On-Demand Self Services
GHaaS
On-Demand
15
Wintel Server Virtualization Strategy
Remember the IHS Roadshow from 2008?
This edition of the strategy represents the 3rd generation of the Unum Hosting Service Model
16
Wintel Server Virtualization Strategy
Global Hosting as a Service – full presentation in later session
Providing Hosting Services via Private Cloud Model for virtual environment● Provide the plumbing - the infrastructure is there to support a dynamic load
● Enable the application teams and partners to mix/match the infrastructure to their requirements - including amount of resources & service types
● Provisioning services through Self-Service portal: pick type, size, SLA, quantity – Go!
● Capacity and Performance Management - providing and guaranteeing resources
Transferring the accountability to responsible owners● We build the pools; Application/Governance teams pick the options for an app
● Pick size, configuration, service offering – mix and match for the app, stack or env
● Dynamic and flexible infrastructure, with VM sizing to application needs
● Usage, Configuration and Tracking occurs to the VM level
● Accountability and Cost Translucency to Requestor: Usage and Showback Costs
17
Wintel Server Virtualization Strategy – The “Cloud”
What does the Virtualization strategy compare to the Cloud?● Compute + Networking + Storage = Virtual Infrastructure
● Virtual Infrastructure + Enablers = Private CloudEnablers: resource flexibility, governance, self-service, cap mgmt, site independence
The Virtual Infrastructure evolves into the Unum Private Cloud ● VMware vSphere/ESX5 and VMware vCloud Director are the transformation tools to
the Unum Private Cloud
● Host management/resource assignment moves from Cluster to Pools
● Site based resource pools, governed to SLA
● New services through the provisioning portal
How is the infrastructure different than ‘today’?● Generally the same! We have been moving towards ‘private cloud’ since 2008
● Adds: SLA, governance, self-service, accountability and control
Welcome to the Unum Private Cloud!
18
Wintel Server Virtualization Strategy
Removing Barriers to Entry
Remove Barriers
19
Wintel Server Virtualization Strategy
Removing Barriers to Entry - Platform Optimization
Add VM Scale-up and Scale-out workload balancing / performance design
• Scale-up: per VM max supported configuration increases as application scale-up grows
• Scale-out: continued support for application scale-out for redundancy and load balancing
• Configuration Review: New practice to right size VM configuration to the workload
Support larger VMs in Scale-up configuration
Today Strategy Small: 1 vCPU / 1 Gb 1-2 vCPU / 1-2 Gb
Medium: 2 vCPU / 2 Gb 4 vCPU / 4-8 Gb
Large: 4 vCPU / 4 Gb 8 vCPU / 8-16 Gb *
Jumbo: 8 vCPU / 8 Gb 16 vCPU / 16-64 Gb *
* When approved in Application Service Framework
20
Wintel Server Virtualization Strategy
Removing Barriers to Entry - Platform Optimization
Standardize underlying infrastructure capabilities• HA / DRS become standard offering for all platforms, all offerings (reduced in labs)
• All infrastructure tools for all platforms• All hosts move to Enterprise Plus licensing model
– Centralized network control– Centralized priority management across any VM type– Consistent build process & automated updates/deployments
Standardize service offerings• Managed via SLA by application/stack• Backup / Recovery• Fault Tolerance• DR / DRE• Alt-Site / Multi-site awareness
21
Wintel Server Virtualization Strategy - Compute
Walking back down the silo – Host specific updates
22
Wintel Server Virtualization Strategy
Platform Optimization – compute (vCPU)
Removing CPU bottleneck• Historically, we managed to VMs per Host or VM:Host ratios• This forced us to ‘cram’ VMs onto hosts, and had only CPU% or MEM% as guidelines for how full
a host was• Of course, bad way to manage, especially as each VM got bigger!
● New model is based upon the vCPUs and the number of Host Cores• Targeting near 1:1 vCPU to CPU Core ratios, based upon workload size• Guarantees closer to real-time performance optimization / scheduling • Continues to allow for over subscription in pools when usage is low
● Collectively, the vCPU for all VMs on a host-cluster and pool will be used to drive utilization, capacity and performance plans
● We continue to monitor individual VM for performance, but track capacity and availability at the higher pool/cluster level
● Of course, finance will still use VM:Host as metric!
23
Wintel Server Virtualization Strategy
Platform Optimization – compute (vCPU)
Continue Scale-up host design*
• Compute: 4 socket, 8-10 cores: 64-80 vCPUs; 256-384 Gb memory
• Network: 2 x 1 Gb connectivity moves to 2 x 10 Gb connectivity
• Storage: 8 Gb fabric, SVC redundancy and replication (may lead to 3rd HBA)
• Cost per VM: Scale up continues to drive down the cost of an individual VM
• Rack or blade? From virtual silo it does not matter; it becomes DIS economic consideration* This remains true in rack or blade solution, IBM – UCS – VCE, etc.
Change new hardware introduction approach• Today, because we want ‘the best & fastest in Prod’, we introduce new hardware directly into
CAE – directly for PROD VMs – then shuffle down hw to other envs
Any risk there?• With new pool model, we’ll migrate workload into the new equipment based upon SLA,
migrating lower risk workload to newer equipment• Hardware refresh will change towards a forecasted, planned strategy by % resulting in fewer,
larger hardware purchase with the capacity planning model.
24
Wintel Server Virtualization Strategy - Compute
Platform Optimization – compute (vCPU)
Upgrade of ESX/vCenter versions to v5 Converged compute/storage/network capabilities into Blades, UCS or VCE
environments expected for at least some workloads: View & LabMgr/vCD
Segment Cluster Design via DRS Groups● One draw back of incredibly dense hosts is that various sized workloads can frequently conflict
– Smaller 1 vCPU VMs can get in way of Larger VMs while waiting for CPUs to be available– Wait times can occur in VMs, affecting application response time and overall performance
● DRS Groups allow for aligning similarly sized workloads to run together to better align resource guarantees to the SLA
● Similarly sized workloads perform better: enhance memory sharing & lower context switches with VMs of smaller sizes when workload sizes compete for resources
before after
25
Wintel Server Virtualization Strategy - Storage
Walking back down the silo – Storage specific updates
26
Wintel Server Virtualization Strategy
Platform Optimization – storage (vDisk)
Introduce new VM backup tools / processes• Moving from TSM client within VM to TSM VE – SAN based backup
– Removes CPU utilization & guest operations– Removes network dependencies on backup– Greatly improves backup & restore capabilities
• Backups are based upon a ‘Snapshot’ process– Snapshot produces very short ‘performance’ impact on VM– Incremental changes between snapshots are small – very small– Continues to support application aware dependencies: SQL, Exchange
• Moving backups from direct to tape to VTL – ProtecTIER– Integrated de-duplication– Integrated replication to alternate sites (CAE>EDC)– Integrated DR / DRE with off-site repository
27
Wintel Server Virtualization Strategy
Platform Optimization – storage (vDisk)
LUN sizing changes – support TSM VE and UK• Remain all Tier2 storage classification – Server storage behind SVC• Remain at 1 TB LUN sizes – all VM workload managed by SAN performance tools• Continue with Thin Provisioning on the ESX host side as default behavior• Change Free space from 30GB free to 150GB per LUN• Change Thin Overcommit from ~180% to 125%• Same performance, lower risk, increase availability
SVC Upgrade to full ESX compliant functionality• Includes industry standard API support: VAAI, VASA (soon!) *• Offloads migrations, clones, data copy operations directly to SAN• Faster provisioning, less impact to the VMs
* VAAI – vStorage APIs for Array Integration – moves host based activities back to array
VASA – vStorage APIs for Storage Awareness – array features/status/performance exposed within vCenter
28
Wintel Server Virtualization Strategy
Platform Optimization – storage (vDisk)
Introduce Storage DRS – dynamic storage provisioning and management
• Similar to the traditional DRS at the ESX host level, but applies to Disk• Automatically monitors and reacts to changes in performance, capacity and VM
datastores performing at optimum levels: will move VMs when required• SLA managed to both capacity & performance of the vDisk requirements• Beyond the SAN array/LUN, includes host side observations to govern against HBA
overload, host overload, datastore overload, IO overload, etc.• Storage MaintenanceMode – mass storage migration or VM reshuffling
Introduce Storage IO Control – priority & sharing SLA
• Similar to CPU/Memory Priority, Disk usage and performance is managed to SLA• VMs are guaranteed access to their committed SLA• For example, if we have a Gold+ Prod VM and something horrible happens on a
shared LUN/HBA/controller that is caused by Silver VM workload, Storage IO Control guarantees, protects and prioritizes the Gold+ VM from performance and availability impacts of Silver VM
• A lot more on this topic in the SLA section!
29
Wintel Server Virtualization Strategy
Platform Optimization – storage (vDisk)
Data Replication – extending AltSite concept with new technologies• Maintain SAN based replication* processes
– IBM SVC– EMC RecoverPoint – possibly limited to campus migrations
• Introduce VM based replication for lower SLAs (Site Recovery Mgr)• Site Recovery Manager policy based recovery plans
– Plane.biz / Harmony manual process converted to SRM RunBook automation– Extend ‘fenced’ test arena for isolated test & validation
• Extend replication & application migration to Gold+ SLA candidates^
• SRM integration into DR and DRE activities * Replication will require network bandwidth – quite likely, a lot of networking bandwidth
^ Growing capabilities to additional applications may occur with storage or hardware refresh cycles
EDC design may add additional HBA ports / SAN ports• Bigger hosts with bigger VMs are driving up I/O; degraded state with HBA failure• May move from 2 HBA ports to 4 HBA ports, per server• Protect environment, guarantee performance, even in degraded state
30
Wintel Server Virtualization Strategy - Network
Walking back down the silo – Network specific updates
31
Wintel Server Virtualization Strategy
Platform Optimization – network (vNet)
Introducing 10GbE to hosts (VMs already have 10 Gb to the host)
• Today, 7 x 1 Gb nics: 4 nics teamed to provide 2 x 1 Gb to all VMs• Host density starting to put pressure on network throughput• Will be moving VM networks to 2 x 10 Gb• Moves vMotion to the 10Gb nics, dedicating 2 nics to MGMT• Result: 10x improvement to VM network availability, 2 fewer ports overall
Staying with vDS – VMware Virtual Distributed Switch• DIS managed distributed soft switch residing at the vCenter level• Not implementing Cisco Nexus 1000v at this time• ACL / QoS remain at the Physical switch – managed by NS
Introduce Network IO Control – priority & sharing SLA
• Similar to CPU/Memory & Storage IOC, network usage is managed to the SLA• This isn’t pure ACL or QoS, but, ESX based sharing to match to the SLA• VMs are guaranteed access to their committed SLA
32
Wintel Server Virtualization Strategy
Platform Optimization – network (vNet)
Consolidating VLANs• Consolidate VLANs towards service offering: Primary, Backup, LB, DMZ-FE, etc.• Aids in up-front provisioning, SLA, IO Control• Move away from many-small VLANs to fewer-larger VLANs
– 50 VLANs of 200 IPs each becomes 3 VLANs of 1000 IPs each– Continue to provide static IPs but within same VLAN as service
Support physical switch environment separation, as required• DRS Group management of a pool of resources segments traffic to environment• DRS Group combines networking traffic for ‘environment’, forwards to switch• Physical network switch executes ACL rules
Site-to-Site IP translation and/or IP update automation• Today, moving workload between data centers: Network/IP changes• Implement translation mechanism with SRM & scripts to update DNS/hosts• Move towards IP / site independence: long term – no server / app change req’d
33
Wintel Server Virtualization Strategy
Platform Optimization – security
DMZ Services / Servers – remain physically separated, no operational change this version
Antivirus/Scanning - • Historically, every VM has antivirus software installed within the VM• With ~60 VMs on a host – concurrent scans, updates, access is an issue• Evaluating an approach to virtualization friendly solution• However, we will proceed cautiously with this review for two reasons:
– Previous negative server side experiences with new, early versions of Symantec tools– Review of the server/security support model on tool ownership & operations
AV Part 1: Thinner in-guest VM AV client which caches scans in all VMs• Symantec SEP 12.1 optimized for Virtual environments, where VM files are scanned and content
added to trusted store• Next VM scans the differences not stored in the store• Real-time scanning still occurs, and only unique content to the VM is rescanned based upon
content in the trusted store• Symantec estimates 90% reduction in overall IO, in heavy/dense environments
34
Has to scan every filein the VM
In the virtual environment, SEP 12.1 eliminates 90%+
of I/O scan activity!
Faster Scans
Today SEP 12.1
Only scans new &untrusted files
Wintel Server Virtualization Strategy
For each VM!
20 – 60 tim
es
a host
35
Wintel Server Virtualization Strategy
Platform Optimization – security
AV Part 2: Offload VM scanning entirely to the host• Extension of the VMware vShield API• Scan/disk write operations intercepted via policy to ‘appliance’• Keeps cache concept of AV Part 1, but offloads that remaining unique data to an
appliance – so, no direct in guest scanning operation occurs• Appliance intercepts both real-time & scheduled scan events• Support upgrade to Symantec Summer 2012 EndPoint Security release
Guest VMs
ESX
Hos
t
SecurityVM
SecurityVM
Guest VMsES
X H
ost
Again – we will remain cautious with evaluating these products
36
Wintel Server Virtualization Strategy
Build upon existing strengths and new capabilities of vSphere 5
Extend Strengths
37
Wintel Server Virtualization Strategy
Extend Availability – Same site recovery
Continue Scale-out cluster design● Beyond 8 nodes in a cluster…to 20-24+ hosts
– Collapse environment specific clusters into site specific clusters– Further distribution of priority workloads to more hosts– Priority workload has greater load balancing and access to more resources– Reduces capacity required by cluster for HA – greater utilization of all assets
● Retain N+2 recovery model in clusters• Compute perspective -
– Always be able to have 2 simultaneous nodes out and meet 100% of the SLA– Even during planned maintenance/upgrade, be able to absorb 1 failure
• Storage & Network perspective – – Redundant and load balanced connectivity with no single points of failure– 2nd level resiliency allows for full SLA even with component failure on the node
● Even faster HA recovery in a hardware failure event• Scaled clusters provide capacity for more simultaneous VM restarts• All VMs on a fully-loaded host would be restarted in less than 60 seconds
38
Wintel Server Virtualization Strategy
Extend Availability – Beyond same site
TSM VE – TSM backup software for Virtual Environments• Move Backup from agents installed in VMs to the SAN / hypervisor• Offloads disk activity to disk arrays, reducing utilization but increasing recoverability• Snapshot & VM state can be replicated to other regions as driven by SLA
True DR / DRE to required SLA• Through TSM VE, SRM or pure recovery, capability of full recovery of servers to DR site• Pure DR capable or DRE capability on quarterly basis
Site Recovery Manager introduced for automating DR & recoverability
• Improve and extend the VM recoverability & testing scenarios for AltSite VMs used today• Automation of the recoverability at the VM level• SAN replication or vStorage based replication to different arrays• Expected to be offered to Gold+ VMs
* A great deal of this requires a replication network and additional network bandwidth
39
Wintel Server Virtualization Strategy
Extend Reliability – Keeping Applications online & Performing
Enhanced Workload management● Implement workload isolation and IO Controls to guarantee performance an uptime
● Extend Anti-affinity rules to keep VMs separated where required: CLB VMs, etc.
● Extend Affinity rules to keep VMs that work together on same host / same nic: increase network throughput and decrease network hops
Extend Fault Tolerant VMs – Mirrored VMs to same site● Critical / Small VM workload mirrored at real-time to another host in same site
● In the event of a host failure, VM on other node stays online & takes over
1 vCPU limit still applies in 2012
Minimizing invasive impact of required operations● Anti-virus impact reduced
● Backup impact reduced
40
Wintel Server Virtualization Strategy
Provide underlying infrastructure to meet business SLAs
SLA
41
Wintel Server Virtualization Strategy
Infrastructure managed to support all requirements, all services• 24x7x365 infrastructure level: all silos within virtualization: compute, network, storage
• All service categories managed here: DR, AltSite, Replication, Backup, etc.
Infrastructure pools of SLA are created & guaranteed• Pools of resource sized to guarantee resources by pool SLA• CPU & Memory reservations and priority, network and storage I/O control and
prioritization – set at pool level, applied to VMsGold+, Gold, Silver, Bronze, etc.
• Service functionality created at to pool, applied to VMs
Replication, Backup/Recovery, FT, etc.
Applications / VMs selects SLA requirements, added to pool• Applications map to Application Service Levels, select appropriate SLA• VM added to the appropriate SLA Pool• SLA and service class applied to the VM / Application• SLA reporting/enforcement to the VM level
42
Wintel Server Virtualization Strategy
SLA & Service offerings • Support AMS definitions: August 11, 2010• Infrastructure capability governed to the SLA• Infrastructure costs governed to the capability• Additional GHaaS mapping to the OS level will apply (application monitoring, updown)
• Existing VMs/Applications have not been mapped
AMS Application Classifications
Class US App% Availability RTO / RPO
Platinum 0% 99.99% 20m / 0m
Gold+ 8% 99.6% 120m / 12hr
Gold 29% 99.3% 120m / 12hr
Silver 55% 99.0% 200m / 24hr
Bronze 8% 98.5% 300m / <7d
43
Wintel Server Virtualization Strategy
Define VM Infrastructure capability• Live data replication • Backup data w/ replication• Backup data• HA / DRS• FT
Class Infrastructure Service Cost
Platinum Live data replication, backup data replication, HA/DRS, Guarantee resources at two sites, DRE
$$$$$$
Gold+ Backup data w/ replication, HA/DRS, Guarantee, DRE $$$$
Gold Backup data w/ replication, HA/DRS, Guarantee, DRE $$$$
Silver Backup data, HA/DRS, Guarantee/Limits $$$
Bronze Backup data, HA/DRS, Guarantee/Limits $$
Labs* DRS-maintenance $
* AMS does not have a Labs service class, this included as a comparison
• Guarantee CPU/Memory/Storage/Network
• Guarantee/Limit CPU/Memory/Storage/Network limits
• DR Exercise Validation• Restart / Recovery Priority
Draft
Infrastructure services may be the same at different classifications- differentiators may be at the OS level, Monitoring Level, Response Time
44
Wintel Server Virtualization Strategy
SLA Priority – applying this to environments
1st
2nd 3rd
4th
Reco
very
Pri
ori
tyGuarantee
Meet / Limit
Meet / Limit
45
Wintel Server Virtualization Strategy
How can we guarantee SLA? How can we guarantee SLA in a shared environment? Questions you’ll want me to prove:
– If we are combining Gold+ and Silver workload in the same cluster, how can we guarantee that Silver will not affect Gold+?
– If there is a Gold+ VM for Prod and a Gold+ VM for Dev, how can we guarantee Dev will not affect Prod?
– If we combine Stress with the other environments, and Stress goes CRAZY, how can we guarantee it is isolated and not affecting anything else?
– If there is a Silver Prod VM and a Gold+ Dev VM, who wins & is that okay?
Great questions! Great answer:
• Resource shares & priority• IO Controls• DRS performance management & migration• Reserved capacity at the cluster level• Resource Pools with inheritance from parent pool to meet SLA• Capacity & Performance Planning
Prove it!
46
Wintel Server Virtualization Strategy
Let’s look at great examples in place today where this works: CHA● What is shared: Hosts, SAN, Network
● 9 Hosts, 368 VMs, 26TB SAN
VM % CPU % Mem %– DEV: 301 82% 77% 81%– ACPT: 8 2% 1% 2%– PROD: 59 16% 21% 17%
So, how do we guarantee ‘prod’ service today to this small % of VMs, when the bulk of the work is Dev?
• Resources provided for the peak workload• Shares prioritize workload for equal prioritization• Priority of Prod raised
We do not segment Prod/Acpt/Dev for any silo
47
Wintel Server Virtualization Strategy
That’s all good, but, in Prod – CAE Prod – how do we really guarantee?
Natively: ESX• Resource Pool: Guarantee both CPU & Memory share/priority
• IO Control: Network and Disk fair share guarantee & IO priority
• Pool Resource Level management: highwater mark & available
• DRS / sDRS: Dynamic shuffling of resources to meet guarantees
Effective Monitoring: vCenter, VKernel• Cluster & Pool resource monitoring, trending• Bottleneck identification: current & future• Performance management to the component and VM level
vCloud Director• Reservation Pool allocation model to meet performance expectations• Share priority isolation provides fairness to the VM workload• Resource pool governance to the SLA requirements
• SLA measurement
48
Wintel Server Virtualization Strategy
How we guarantee performance & isolation from bad actors
Gets what it needs
If cannot move, gets remaining
share
Follows SLA, guarantees
above, queues below if
contention
Q1 – How will Silver not affect Gold? Answered? Q2 – How will Dev not affect Prod? Answered? Q3 – How will Stress not affect anything else? Answered?
Q4 – Is Silver Prod impacted by Gold+ Dev? Answered?
N+2
25%
of p
ool
100%
SLA
40%
of p
ool
99.9
% S
LA35
% o
f poo
l99
.0%
SLA
49
Wintel Server Virtualization Strategy
The Monitoring Approach
Monito
ring
Wintel Server Virtualization
50
Wintel Server Virtualization Strategy - Monitoring
At Raw infrastructure component level from Virtual Perspective• Effective Performance Management
• Effective Capacity Management
51
Wintel Server Virtualization Strategy - Monitoring
Effective SLA Management• Governance to the SLA service definitions defined in offering• Effectively validating SLA via testing, DRE execution• Effectively validating SLA via vCenter/vCloud Director compliance reporting
Integration / Review points with other team infrastructure tools• IBM Director hardware integration tools• TPC / native storage tools• WAN / Riverbed / Other network performance tools
Guest – • Integration points between VMTools, in guest PerfMon counters• Integration points with application performance tools: Bluestripe/SCCM/NetIQ
52
Wintel Server Virtualization Strategy – Getting there!
We like the strategy…when and how are we getting there! Cluster Collapse
• CHA site already underway – we only had one cluster there!• Implement SLA, performance control, monitoring, enforcement in CHA• CAE site next, UUK as CoLo data migration occurs, EDC from Day One• Implement initial Resource Pool allocations
ESX 5 / vCloud Director implementation• Migrate technologies to latest offering, featureset• Implement reporting tools for SLA governance• Implement GHaaS to pool based resource governance, self-service
Extend for DRE 2012• SRM for Datacenter Migration Tools• SRM for DR / DRE in 2012 execution
Monitor / Plan / Analyze• Revamp Capacity Planning processes, procurement and refresh• Validate approach, measure results, evolve
53
Wintel Server Virtualization Strategy
Session Based Virtualization• Transition to Lee: RDS in Unum presentation to IDAT• Transition to James:
54
Wintel Server Virtualization Strategy
Virtualization Candidacy• Transition to Curtis
55
Virtualization Candidacy – ToBe or ToBe a VM
Standard VM Offerings
* Unum internal limit driven mainly by cost, risk tolerance and recoverability at current configWill be reviewed with EDC where SQL is targeted for 32 vCPU & 128+ GB capability in VM
Even bigger workloads now fit:● High CPU: Enhanced CPU SMT architecture improves scheduling (10%+)
● High Memory: NUMA & Memory Compression extensions allows giving more memory to VMs, for more guest accessibility
● High Storage IO: Storage DRS, reduced overhead, reduced latency, IO priority
● High Network IO: 10 GB NICs, high packet rates, IO priority by tag/VMs
● Critical Apps: Faster HA recoverability, less context swapping, priority fairness
Unum Limit* ESX 5.0 Limit
Small: 1-2 vCPU / 1-2 GB
Medium: 4 vCPU / 4-8 GB
Large: 8 vCPU / 8-16 GB
Jumbo: 16 vCPU / 16-64 GB 32 vCPUs / 1 TB
56
Virtualization Candidacy
All Tiers / Stacks: Tier S, Tier A, Tier B Dedicated and Shared instances All Exchange functions:
– Public Folders / CAS / Gateways – since 2009– Mailbox servers in 4Q 11 [US] (UK doing already)
– 4-8 vCPU, 16-32 GB memory, 100-200 GB storage
Department level and exception SQL Servers– Tier1 SQL: Integrated VM / SQL DB; Security policy exception– Geographical isolation from standard offering– 8 vCPU, 64 GB memory, 256 GB storage (native per mount point)– May introduce additional storage tiers for SQL VMs
Internet / eCommerce facing through DMZ DR/DRE instances & Alt-Site workloads
57
Virtualization Candidacy
Unum continues with ‘Virtualize First’: 90+% target remains All environments, all technologies, all workloads, all SLAs Gold+ and higher service will ONLY be provided via a VM
● Jumbo VMs that lead to a 1:1 dedicated server – will do that in Jumbo cluster
● Large / Jumbo VMs that lead to a 4:1 ratio – will do that in Jumbo cluster
58
Virtualization Candidacy (a little Not To Be a VM)
Items intentionally not Virtualized in 2011 plan:– Dedicated Hardware resources: USB key, CPU affinity – Jumbo SQL Server Workload: >16 CPUs, 128-256 GB – Active Directory root controllers: under review– HPC Workloads: tool supports it, but we just built it physical in 2010 – High Compute, High IO and High Business Risk (SOA)– Phone systems, security system – hardwired/connectivity
The following conditions must be met for consideration to be outside of a VM:
– Application does not have vendor support within a VM and VMware Partner/Alliance program does not have a support policy for the tool (~1%)
– Physical hardware capabilities must be accessed or connected to the server running an application device: dongle, key, license lock (~1%)
– Demonstrated performance requirements that exceed the defined limits of a VM within the strategy: i.e., >16 vCPU and/or >64 Gb memory (~3-5%)
– Strategic determination by an architecture governance team to keep application stack outside of a virtualized environment: AD, DNS, hardware/application monitoring tools that monitor VM environment, etc. (~2%)
– Servers requiring physical clustering via tools like Microsoft Clustering Services (~1%)– VP escalation noting the SLA will not be met, server will incur higher cost, and capabilities
are reduced