Upload
datacenters
View
249
Download
1
Embed Size (px)
Citation preview
Designing a Virtualization Architecture: A Best Practices
Approach
Greg Shields, MVP – Terminal ServicesAuthor / Speaker / Instructor / Consultant / All Around Good Guy
Join Us @ TechMentor Events
• TechMentor Las Vegas – Three weeks!– Early bird registration still available…
• TechMentor 2009– Las Vegas in the Spring– Orlando in the Fall
VirtualizationAutomation & PowerShellProactive Windows ManagementBecoming an IT ArchitectWindows Security, Auditing, and ComplianceExchange Server AdministrationWindows FundamentalsWindows Technologies
http://www.techmentorevents.com
Fear the Worst• The National Academy of Archives and Records
states that 96% of companies that lose access to their data centers for 10 days or longer are out of business within a year.
• A study by McGladrey and Pullen shows that 43% of companies experiencing disasters will never recover.
• Tape restorations can take days and tape failures exacerbate an already critical problem.– 72+ hours to restore 1.5T of office files
44% of Virtualization Deployments Fail
• According to a CA announcement from 2007.• Inability to quantify ROI• Insufficient administrator training• Success =
– Measure performance– Diligent Inventory and Load Distribution– Thorough Investigation of Technology
The Lifecycle of a Virtualization Architecture
• Step -1: Hype Recognition & Education• Step 0: Assessment• Step 1: Purchase & Implementation• Step 2: P2V• Step 3: Backups Expansion• Step 4: DR Implementation
Step 0Assessment
6
The Virtualization Assessment
• Successful virtualization rollouts need a virtualization assessment.– You need to analyze your environment before you act.
• Virtualization assessment should include:– Inventory of servers– Inventory of attached peripherals– Performance characteristics of servers– Analysis of performance characteristics– Analysis of hardware needs to support virtualized servers– Backups Analysis– Disaster Recovery Analysis (Hot vs. warm vs. cold)– Initial virtual resource assignment
(Obvious) Candidates for Virtualization
• Systems with minimal processor utilization• Systems with minimal RAM requirements
– We too often add too much RAM in a server.• Systems that do not require large quantities of
drive storage*• Redundant or warm-spare servers• Occasional- or limited-use servers• Systems where many partially-trusted people
need console access
Not Candidates for Virtualization
• Systems with constant and high processor utilization or RAM usage
• Systems with peripherals– Serial / parallel / USB / External SCSI /
License Keyfobs / Scanners / Bar Code Readers• Systems with exceptionally high network use
– Gigabit networking requirements• Systems with specialized hardware requirements
– Hardware appliances / OEM / Unique configs
Assessing Performance
• In the early days of virtualization, we used to say…– “Exchange Servers can’t be virtualized”– “Terminal Servers can’t be virtualized”– “You’ll never virtualize a SQL box”
• Today’s common knowledge is that the decision relates entirely to performance.– Thus, before you can determine which servers to virtualize,
you need to understand their performance.– Measure that performance over time.– Compile results into reports and look for deviations from
nominal activity.
Useful Performance Counters
Category Performance Metric Example ThresholdDisk % Disk Time > 50%
Memory Available MBytes Below Baseline
Memory Pages / Sec > 20
Page File % Usage > 70%
Physical Disk Current Disk Queue Length
>18
Processor % Processor Time > 40%
System Processor Queue Length > 5.4
System Context Switches / Sec > 5000
System Threads > 2000
These are examples (starting points).Your actual thresholds may be different.
The Virtualization Assessment
ServerDisk / % Disk Time
Memory / Available Mbytes
Memory / Pages/sec
Page File / % Usage
Physical Disk / Current Disk Queue Length
Processor / % Processor Time
System / Processor Queue Length
System / Context Switches/sec
System / Threads
Active Sessions (Where Applicable)
Virtualization Candidacy Index
Initial Assigned VProcs
Initial Assigned VRAM (in G)
ABCS 0 598 19 2 N/D 7 0 2435 712 Likely 1 1.5
ABCSDC0 1 553 2 2 0 1 0 372 520 Likely 1 0.5
ABCSTM 2 1525 0 0 0 1 0 302 465 Likely 1 0.5
ADS N/D 236 0 0 0 0 0 85 259 Likely 1 0.5
BDC 4 108 3 11 0 2 0 440 577 Likely 1 0.5
C3APPSVR N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 1 N/A
CTX-Surf 2 1319 1 0 0 0 0 557 528 Likely 1 1
DC1 N/D 544 20 1 0 13 5 1027 394 Likely 1 0.5
DIRECTOR N/D 84 37 14 0 7 0 2003 587 Probable 1 2
EX1 N/D 350 1 4 0 1 0 858 404 Likely 1 1
EX2K3 149 359 11 3 1 2 0 2296 927 Probable 1 2
EZTELLER 7 143 3 3 0 1 0 458 509 Likely 1 2
IFS N/D 348 0 1 0 0 0 99 311 Likely 1 0.5
IMAGE-WIN 67 469 7 1 1 18 0 5553 2540 Probable 1 2
ITIAPP02 N/D 1292 1 2 0 2 0 2300 823 Likely 1 2
ITIPrime N/D 34 4 12 0 2 0 830 468 Likely 1 1.5
License 1 140 1 39 0 2 1 417 490 Likely 1 0.5
PFS1 N/D 330 1 1 0 1 0 231 338 Likely 1 0.5
SC-MGR1R 0 255 1 N/D N/D N/D 0 1251 490 Likely 1 1
SURFCONTROL N/D 32 1 37 0 73 8 338 403 Probable 1 1
TESTLPW N/D 129 0 4 0 1 10 896 489 Probable 1 0.5
TSBANK0 0 2521 7 5 0 2 0 5050 1342 7 Probable 1 2
TSBANK1 3 1216 12 10 0 9 0 3381 1237 7 Probable 1 2
TSBANK15 0 2631 7 4 0 8 0 4386 1183 7 Probable 1 2
TSBANK17 0 2652 7 4 0 14 0 4329 1240 7 Probable 1 2
TSBANK2 1 1272 11 12 0 22 0 3314 1168 7 Probable 1 2
TSBANK3 4 1310 5 3 0 38 2 2589 887 4 Probable 1 2
TSBANK4 4 1297 4 3 0 16 2 2702 883 4 Probable 1 2
TSBANK6 7 1191 9 9 0 7 2 3271 1216 7 Probable 1 2
TSWIN1 3 1292 5 2 0 5 1 1689 884 4 Likely 1 2
TSWIN2 4 1272 4 2 N/D 7 1 1677 848 4 Likely 1 2
VIEMCAPP 5 2111 0 1 N/D 0 0 456 541 Likely 1 2
Total RAM Count: 40
Gathering Performance
• PerfMon is really the only mechanism to gather these statistics from servers.– But PerfMon can be challenging to use.
• Other products are available to assist...– Microsoft Assessment & Planning Solution
Accelerator– VMware Consolidation & Capacity Planner– Platespin PowerRecon– CiRBA– PerfMan
Step 1Purchase & Implementation
14
Consolidation = Cost Savings
8:115:120:1
Small Server $6,000 1:1 $6,000 per Server
Large Server $15,000
Virtualization $5,000$20,000
Large MarginalCost Increases perAdditional Server
$2,500 per Server
Smaller Marginal Cost Increases
+ Power+ Cooling
+ Provisioning Labor
$1,333$1,000
Virtualization Options• Three types of Virtualization
– Entire System Virtualization• VMware• Microsoft Virtual Server
– OS Virtualization• Parallels Virtuozzo
– Paravirtualization• Microsoft Hyper-V• Xen / Citrix XenSource
Virtual O/S is an entire systemthat has no awareness
of underlying host system.
Software runs on system assingle file. Requires client.
Similar to HardwareVirtualization, but Virtual O/S
is “aware” it is virtualized.
Hardware Virtualization(Type-1)
• ESX– Hybrid hypervisor and host OS– Device drivers in the hypervisor– Emulation (translation from emulated driver to real driver)– High cost, high availability, high performance
Paravirtualization
• Hyper-V, Citrix XenSource– Host OS becomes primary partition above hypervisor.– Device drivers in the primary partition– Paravirtualization (no emulation for “enlightened” VMs)– Low cost, moderate-to-high availability, high performance
Hardware Virtualization(Type-2)
• Microsoft Virtual Server– Hypervisor above host OS.– Device drivers in hypervisor– Emulation (translation from emulated driver to real driver)– Low cost, low availability, low performance
OS Virtualization
• Parallels Virtuozzo– Delta-based.– No hypervisor. V-layer processes requests.– All real device drivers hosted on host OS– Moderate cost, moderate availability, very high performance
Step 2P2V
21
P2V Isn’t Exciting Any More• After environment stand-up, P2V process
converts physical machines to virtual ones.– A “ghost” + a “driver injection”
• Numerous applications can do this in one step.– These days, P2V process is commodity.– Everyone has their own version.– Some are faster. Some much
slower. Paid options == faster.
22
P2V, P2V-DR
• P2V– SCVMM, VMware VI/Converter, Acronis, Leostream,
others.• P2V-DR
– Similar to P2V, but with interim step of image creation/storage.
– “Poor-man’s DR”
23
P2V-DR Uses
• P2V-DR can be leveraged for medium-term storage of server images– Useful when DR site does not have hot backup
capability or requirements– Regularly create images of physical servers, but only
store those images rather than load to virtual environment
– Cheaper-to-maintain DR environment• Not fast.• Not easy.• Not completely reliable.• …but essentially cost-free.
24
Step 3Backups Expansion
25
Backup Terminology
• File-Level Backup– Backup Agent in the Virtual Machine
• Image-Level Backup– Backup Agent on the Virtual Host
• Quiescing– Quieting the file system to prep for a backup
• O/S Crash Consistency– Capability for post-restore O/S functionality
• Application Crash Consistency– Capability for post-restore application functionality
26
Types of Backups
• Three types of Backups– Backing up the host system
• May be necessary to maintain host configuration• But often, not completely necessary• The fastest fix for a broken host is often a complete rebuild
– Backing up Virtual Disk Files• Fast and can be done from a single host-based backup client• Challenging to do file-level restore
– Backing up VM’s from inside the VM• Slower and requires backup clients in every VM.• Resource intensive on host• Capable of doing file-level restores
27
The Problem with Transactional Databases
• O/S Crash Consistency is easy to obtain. Just quiesce the file system before beginning the backup.
• Application Crash Consistency much harder.– Transactional databases like AD, Exchange, SQL don’t quiesce
when the file system does.– Need to stop these databases before quiescing.– Need an agent in the VM that handles DB quiescing.– Leverage VSS.
• Restoration without crash consistency will lose data. DB restores into “inconsistent” state.
28
The Problem with Transactional Databases• When considering backups of virtual machines, need
to consider file-level backups and image-level backups.– File-level backups provide individual file restorability and
transactional database crash consistency.– Image-level backups provide whole-server restorability.– Not all image-level backups provide app crash consistency.
• Solutions exist that call Windows VSS to quiesce apps and the file system prior to snapping a backup.– Compelling argument:
VSS = Microsoft, Hyper-V = Microsoft.
29
Step 4DR Implementation
30
DR, meet Virtualization…• Early all-physical attempts at DR were cost-prohibitive and
operationally complex.– Identical server inventory at primary and backup site.– Management cost of identical server configuration. Change
management costs prohibitive.• Virtualization eliminates many previous barriers.
– Virtual servers are chassis independent.– Image-level backup == image-level restore.– Hot sites one of many options – cold & warm sites.
• Numerous cost-effective solutions available.– Don’t believe the hype.– Make decisions based on need.
31
Disaster Recovery Terminology• What is Disaster Recovery?
– Disaster Recovery intends to provide continuity of business services after a critical event.
– Disaster Recovery is invoked after the large-scale loss of primary business services.
– DR is not the restoration of a critical server.– DR is not the restoration of a critical business service.
• Why the distinction?– DR solutions do not resolve daily operational issues.– Often, failback is challenging.
32
Disaster Recovery Terminology• RTO – Recovery Time Objective
– Time period between a failure and when a failed system is restored to full operational capability.
• RPO – Recovery Point Objective– Quantity of data that can acceptably be lost as part of a failure.
• MTTR – Mean-Time To Restore– The average amount of time expected to bring a system back
to full operational capability.• SLA – Service Level Agreement
– Agreement between IT and business on restoration metrics, what to restore, priorities, and ownership.
33
Disaster Recovery Terminology• Hot site
– Servers up and operational at remote site at all times.• Warm site
– Servers pre-provisioned at remote site. Tasks to complete for failover to occur.
• Cold site– Empty site and servers on retainer awaiting DR event.
34
Four DR Tiers
RTO RPO Examples
Continuous Availability
Immediate Immediate Business Critical DB’s, Transaction processing appliances
Immediate Availability
Minutes to Hours Minutes to Hours Infrastructure services, support services, messaging services
Fast Recovery Hours to Days Hours to Days Internal applications, analytic applications
Eventual Recovery
Days to a Week or More
Days to a Week or More
Development & test environments, stateless applications.
35
Four DR Tiers
RTO RPO Examples
Continuous Availability
Immediate Immediate Business Critical DB’s, Transaction processing appliances
Immediate Availability
Minutes to Hours Minutes to Hours Infrastructure services, support services, messaging services
Fast Recovery Hours to Days Hours to Days Internal applications, analytic applications
Eventual Recovery
Days to a Week or More
Days to a Week or More
Development & test environments, stateless applications.
36
Four DR Tiers• $ - Snap & Pray
– Leverage no-cost or low-cost tools to snapshot image-level backups of VM’s.
– Cold site and replacement equipment on retainer.– Store images to tape. Rotate tapes off-site.– Restoration:
• Activate cold site• Procure reserved replacement equipment• Procure tapes and tape device• Restore images to replacement equipment• Resolve database (and some O/S) inconsistencies
37
Four DR Tiers• $$ - Warm Snap
– Leverage no-cost or low-cost tools to create image-level backups of VM’s.
– Connected warm site with data storage location.– Transfer images to off-site data storage location– Restoration:
• Procure or spin up reserved replacement equipment• Restore images from data storage to replacement equipment• Resolve database (and some O/S) inconsistencies
38
Disk-to-disk backups over the WAN increase backup time, but significantly
reduce restore time.
Four DR Tiers• $$$ - Inconsistent Storage-to-Storage
– Warm site. Storage-to-storage replication instantiated between sites.
– Storage data automatically replicated to remote site.– Greater support for incrementals. Less WAN usage.– Restoration:
• Procure or spin up reserved replacement equipment• Attach virtual machines to replacement equipment and hit the
“green VCR button”.• Resolve database (and some O/S) inconsistencies
39SAN replication is often not aware of quiescing,
so this solution can be problematic.
Four DR Tiers• $$$$ - Real-time Replication
– Warm or hot site. Storage-to-storage replication instantiated between sites.
– 3rd Party tools used for image-to-image transfer.• In-VM for transactional database quiescing.• On-host for all other machines.
– Roll-back and roll-forward capabilities– Restoration:
• Hit the “green VCR button”• (or, auto-failover…)
40
Tools like DoubleTake, DoubleTake for Virtual Systems, esxReplicator, DataCore
SANMelody enable real-time and consistent DR between sites.
• Questions?• Comments?• Sarcastic Remarks?