View
1.428
Download
3
Category
Preview:
Citation preview
FOSSETCON 2015
Success with XenServer by DesignXenServer Design Workshop
#whoami
Name: Tim Mackey
Current roles: XenServer Community Manager and Evangelist; occasional coder
Cool things I’ve done• Designed laser communication systems• Early designer of retail self-checkout machines• Embedded special relativity algorithms into industrial control system
Find me• Twitter: @XenServerArmy• SlideShare: slideshare.net/TimMackey• LinkedIn: www.linkedin.com/in/mackeytim• Github: github.com/xenserverarmy
We’re following “MasterClass Format”
Admins matter• No sales pitch• No cost• Just the facts man
Interactive• Ask questions; the harder the better• Get what you need to be successful
What is XenServer?
What is a “XenServer”?
Packaged Linux distribution for virtualization• All software required in a single ISO
Designed to behave as an appliance• Managed via SDK, CLI, UI
Not intended to be a toolkit• Customization requires special attention
Open Source• Open source roots• Acquired by Citrix in 2007• Made open source in 2013 (xenserver.org)
XenServer market dynamic
Millions of Downloads
Over 1 million servers deployed
Optimized for XenDesktop
Powering NetScaler SDX
Supporting Hyper-Dense Clouds
Why XenServer?
Broad provisioning support• Apache CloudStack• Citrix CloudPlatform and XenDesktop• OpenStack• Microsoft System Center• VMware vCloud
Full type-1 hypervisor• Strong VM isolation• Supporting Intel TXT for secure boot
Designed for scale• 1000 VMs per host• Over 140 Gbps throughput in NetScaler SDX• Up to 96 shared hardware GPU instances per host
Understanding the architecture
Strong technical foundation with Xen Project
Advisory Board Members
Networking StorageCompute
Simplified XenServer Architecture Diagram
Xen Project Hypervisor
Standard Linux Distribution (dom0)
qemu
drivers
xapi
Guest
Driver front
Driver back
Guest
Driver front
dom0 in detail (XenServer 6.5)3.10+ kernel.org kernel with CentOS 5.10 distribution
kernel-space
netback
blkback
blktap3
user-space
XenAPI (xapi)
SM xha xenopsd
squeezed alertd multipathd
perfmon
interface
stunnel metadata xenstored
ovs-vswitchd qeum-dm Likewise
networkd
Hardware drivers
Features impacting functional design
Resource pools
Advantages• Reduce points of failure• Simplify management at scale• Reduce downtime during maintenance
Requirements• Shared storage• Network redundancy• Provisioning management
Core concepts• Pool master vs. member server roles
XenMotion Live VM Migration
XenServerXenServerXenServer
Shared Storage
XenServer Pool
Live Storage XenMotion
Migrates VM disks from any storage type to any other storage type• Local, DAS, iSCSI, FC
Supports cross pool migration• Requires compatible CPUs
Encrypted Migration model
Specify management interface for optimal performance
XenServer Hypervisor
VDI(s)
Live Virtual
Machine
More about Storage XenMotion
Migration vs. Storage Migration
Start VM migration
Copy VM’s RAM
Copy VM’s RAM delta
Repeat until no
delta left
End VM migration
Use VM’s Hard disk from
destination Host
Start Storage
VM migration
Snapshot VMs first / next disk
Transfer snapshot
disks
End Storage
VM migration
XenMotion Storage XenMotion
If transfer is finished, repeat until no disk left
to copy
Mirror all write activity after snapshot to destination host
All disks mirroring to
destination host
“normal” XenMotion
Heterogeneous Resource Pools
Safe Live Migrations
Feature 5
Virtual Machine
Older CPU
Feature 1
Feature 2
Feature 3
Feature 4
XenServer 1
Newer CPU
Feature 1
Feature 2
Feature 3
Feature 4
XenServer 2
Mixed Processor Pools
High Availability in XenServer
Automatically monitors hosts and VMs
Easily configured within XenCenter
Relies on Shared Storage• iSCSI, NFS, HBA
Reports failure capacity for DR planning purposes
More about HA
Taking advantage of GPUs
NVIDIA• vGPU with NVIDIA GRID providing 96 GPU instances• GPU pass-through• CUDA support on Linux• Uses NVIDIA drivers for capability
Intel• GVT-d support with Haswell and newer
• No extra hardware!!• Uses standard Intel drivers
AMD• GPU pass-through
More about GPU
Distributed Virtual Network Switching
Virtual Switch• Open source: www.openvswitch.org• Provides a rich layer 2 feature set• Cross host private networks• Rich traffic monitoring options• ovs 1.4 compliant
DVS Controller• Virtual appliance• Web-based GUI• Can manage multiple pools• Can exist within pool it manages• Note: Controller is deprecated, but supported
VM
VM
VM
VM
VM
Deployment Design
Host requirements
Validate Hardware Compatibility List (HCL)• http://hcl.xenserver.org• Component’s firmware version could be important
BIOS configuration• VT extensions enabled• EFI profiles disabled
Limits• Up to 1TB RAM• Up to 160 pCPUs• Up to 16 physical NICs• Up to 16 hosts per cluster
Network topologies
Management networks• Handle pool configuration and storage traffic• Require default VLAN configuration• IPv4 only
VM networks• Handle guest traffic• IPv4 and IPv6• Can assign VLAN and QoS• Can define ACL and mirroring policy• Should be separated from mgmt networks
All networks in pool must match More about network design
Storage topologies
Local storage• Yes: SAS, SATA, RAID, DAS• No: USB, Flash, SW RAID• Supports live migration
Shared Storage• iSCSI Unipath/Multipath, NFSv3• HBA – Check HCL• Supports live migration
Cloud Storage• Only if presented as iSCSI/NFS
ISO storage• CIFS/NFSv3
More about storage design
Installation
Installation options
Boot from DVD/USB• Intended for low volume• ISO media on device• Install from local/NFS/HTTP/FTP
Boot from PXE• For scale deployments• Install from NFS/HTTP/FTP• Post installation script capabilities
Boot from SAN/iSCSI• Diskless option
Driver disks
Shipped as supplemental packs• Often updated when kernel is patched• Option to specify during manual install
Network drivers• Slipstream into XenServer installer• Modify XS-REPOSITORY-LIST
Storage drivers• Add to unattend.xml• <driver-source type="url">ftp://192.168.1.1/ftp/xs62/driver.qlcnic</driver-source>
Types of updates
New version• Delivered as ISO installer• Requires host reboot
Feature Pack• Typically delivered as ISO installer• Typically requires host reboot
Hotfix• Delivered as .xsupdate file• Applied via CLI/XenCenter• May require host reboot• Subscribe to KB updates
Backup more than just your VMs
Local storage• Always use RAID controller with battery backup to reduce risk of corruption
dom0 (post install or reconfiguration)• xe host-backup file-name=<filename> -h <hostname> -u root -pw <password>
Pool metadata (weekly – or when pool structure changes)• xe pool-dump-database file-name=<NFS backup>
VM to infrastructure relationships (daily or as VMs created/destroyed)• xe-backup-metadata -c -i -u <SR UUID for backup>
LVM metadata (weekly)• /etc/lvm/backup
XenServer host upgrade
Disk Partitions
4GB
1st Partition XenServer 6.2
installed
2nd Partition XenServer Backup
-empty-
4GB
1. Initial installation 2. Backup existing installation, then upgrade
3. XenServer upgraded
Rem
aining space
Local Storage Space
Disk Partitions
4GB
1st Partition XenServer
Installation of Version 6.5
2nd Partition XenServer Backup
of Version 6.24G
BR
emaining
space
Local Storage Space
XenServer 6.5Install Media
Disk Partitions
4GB
1st Partition XenServer 6.5
installed
2nd Partition XenServer 6.2 kept
as Backup
4GB
Rem
aining space
Local Storage Space
Pool upgrade process
Upgrade host to new version
Place host into normal operation
Evacuate virtual machines from host
Place host into maintenance mode
=
Proceed with next host
1 2 3
3rd party components
dom0 is tuned for XenServer usage• yum is intentionally disabled• Avoid installing new packages into dom0
• Performance/scalability/stability uncertain
Updates preserve XenServer config only!• Unknown drivers will not be preserved• Unknown packages will be removed• Manual configuration changes may be lost
Citrix Ready Marketplace has validated components
Exchange SSL certificate on XenServer
• By default, XenServer uses a self-signed certificate created during installation to encrypt communication via SSH and XAPI or HTTPS.
• To trust this certificate, verify its fingerprint to the one shown on its physical console (xsconsole / status display).
• The certificate can also be exchanged for a certificate issued from a trusted corporate certificate authority.
Company Certificate Authority
Request certificate & key
Issue certificate & key
XenServer Host
Upload to /etc/xensource
Replace xapi-ssl.pem
Convert to PEM format
Performance planning
Configuration Maximums: XenServer 6.2 vs 6.5Per-VM scalability limits more is better
VM
...
VM
...
Host
...
Host
VM VM ... VM
Host
VM VM...
VM
Per-host scalability limits more is better
RAM per VM
RAM per host
Running VMs per host VBDs per host
pCPUs per host
vCPUs per VM
Host
...
Multipathed LUNs per host
Host
...
XS6.2XS6.5XS6.5
16 (Windows)16 (Windows)
32 (Linux)
XS6.2XS6.5
160160
XS6.2XS6.5 1000
XS6.2XS6.5 256
150
500
XS6.2XS6.5 192GB
128GB
XS6.2XS6.5
1TB1TB
XS6.2XS6.5 2,048
512
Highlights of XenServer XenServer 6.5 Performance Improvements
Bootstorm data transferred lower is better
XS 6.2XS 6.5
18.0 GB0.7 GB = -96%
VMVM
VMVM
XenServer
Bootstorm duration lower is better
XS 6.2XS 6.5
470 s140 s = -70%
Measurements were taken on various hardware in representative configurations. Measurements made on other hardware or in other configurations may differ.
XenServer
VMVM
VMVM
VMVM
VMVM
XenServer
XS 6.2XS 6.5
3 Gb/s25 Gb/s = +700%
Aggregate storage read throughput higher is better
VMVM
VMVM
XenServer
XS 6.2XS 6.5
2.2 GB/s9.9 GB/s= +350%
XS 6.2XS 6.5
2.8 GB/s7.8 GB/s = +175%
VMVM
VMVM
XenServer
Aggregate network throughput higher is better
Aggregate storage write throughput higher is better
Booting a large number of VMs is significantly quicker in XS 6.5 due to the read-caching feature.
The read-caching feature significantly reduces the IOPS hitting the storage array when VMs share a common base image.
XS 6.5 brings many improvements relating to network throughput.For example, the capacity for a large number of VMs to send or receive data at a high throughput has been significantly improved.
The new, optimized storage datapath in XS 6.5 enables aggregate throughput to scale much better with a large number of VMs. This allows a large number of VMs to sustain I/O at a significantly higher rate, for both reads and writes.
vGPU scalability higher is better
XS 6.2XS 6.5
6496 = +50%
VM VM VM VM
XenServer
The number of VMs that can share a GPU has increased in XS 6.5. This will reduce TCO for deployments using vGPU-enabled VMs.
GPU
VMVM
VMVM
XenServer
NFS
NFS
NFS
NFS
64 bit control domain improves overall scalability
In XenServer 6.2:• dom0 was 32-bit so had 1GB of ‘low memory’• Each running VM ate about 1 MB of dom0’s low memory• Depending on what devices you had in the host, you would exhaust dom0’s low memory with a
few hundred VMs
In Creedence:• dom0 is 64-bit so has a practically unlimited supply of low memory • There is no longer any chance of running out of low memory• Performance will not degrade with larger dom0 memory allocations
Limits on number of VMs per hostScenario 1: HVM guests, each having 1 vCPU, 1 VBD, 1 VIF, and having PV drivers
Limitation XenServer 6.1 XenServer 6.2 Creedencedom0 event channels 225 800 no limit
tapdisk minor numbers 1024 2048 2048aio requests 1105 2608 2608
dom0 grant references 372 no limit no limitxenstored connections 333 500 1000consoled connections no limit no limit no limit
dom0 low memory 650 650 no limitOverall limit 225 500 1000
Limits on number of VMs per hostScenario 2: HVM guests, each having 1 vCPU, 3 VBDs, 1 VIF, and having PV drivers
Limitation XenServer 6.1 XenServer 6.2 Creedencedom0 event channels 150 570 no limit
tapdisk minor numbers 341 682 682aio requests 368 869 869
dom0 grant references 372 no limit no limitxenstored connections 333 500 1000consoled connections no limit no limit no limit
dom0 low memory 650 650 no limitOverall limit 150 500 682
Limits on number of VMs per hostScenario 3: PV guests each having 1 vCPU, 1 VBD, 1 VIF
Limitation XenServer 6.1 XenServer 6.2 Creedencedom0 event channels 225 1000 no limit
tapdisk minor numbers 1024 2048 2048aio requests 1105 2608 2608
dom0 grant references no limit no limit no limitxenstored connections no limit no limit no limitconsoled connections 341 no limit no limit
dom0 low memory 650 650 no limitOverall limit 225 650 2048
Netback thread-per-VIF model improves fairness
Improves fairness and reduces interference from other VMs
XenServer host
VM
VM
VM
VM
VM
VM
net-back
net-back
net-back
net-back
XenServer host
VM
VM
VM
VM
VM
VM
netb
ack
netb
ack
netb
ack
netb
ack
netb
ack
netb
ack
XenServer 6.2 Creedence
OVS 2.1 support for ‘megaflows’ helps when you have many flows
The OVS kernel module can only cache a certain number of flow rules
If a flow isn’t found in the kernel cache then the ovs-vswitchd userspace process is consulted• This adds latency and can lead to a severe CPU contention bottleneck when there are many
flows on a host
OVS 2.1 has support for ‘megaflows’• This allows the kernel to cache substantially more flow rules
More than just virtualization
Visibility into Docker Containers
Containers• Great for application packaging• Extensive tools for deployment
Virtualization• Total process isolation• Complete control
Docker and XenServer• View container details• Manage container life span• Integrated in XenCenter
WORK BETTER. LIVE BETTER.
GPU enablement
Deployment sceanrios
XenServer
XenDesktopVMs
3D VM
1:1 GPU passthrough
Hypervisor(optional)
XenAppServer VMs
XenAppVM
Session 1 Session 2 Session n...
1:nRDS Sessions
XenServer
vGP
U
vGP
U
vGP
U
vGP
U
vGP
U
XenDesktopWindows VMs
3DVM
3DVM
3DVM
3DVM
3DVM
1:nHardware virtualization
Nvidia, AMD& Intel
AMD & Nvidia
Nvidia only
3D VM
3D VM
HYPERVISOR
NVIDIA/AMD/Intel GPU
NVIDIA/AMD/Intel GPU
Responsiveness VM has direct access to GPU and includes NVIDIA fast remoting technology
VM PortabilityCannot migrate VM to any node.
App PerformanceFull API support including latest OpenGL, DirectX and CUDA. Includes Application certifications
VIRTUAL MACHINE
Guest OS
Native GPU Driver
Virtual Desktop
AppsRemote Protocol
VIRTUAL MACHINE
Guest OS
Native GPU Driver
Virtual Desktop
Apps
Remote Protocol
DensityLimited by the number of GPUs in the server
Dedicated GPU per User/VM
Direct GPU access from Guest VMDirect GPU access from Guest VM
Remote Workstations1:1 GPU pass-through
VM
pgpu, vgpus and gpu-group objects
XenServer automatically creates gpu-group, pgpu, vgpu-type objects for the physical GPUs it discovers on startup
GRID K1GRID K1
5:0.0
gpu-group
GRID K1
allocation: depth-first
GRID K1
6:0.0GRID K1
7:0.0
GRID K1
8:0.0
GRID K2GRID K2
11:0.0GRID K2
12:0.0
GRID K2GRID K2
85:0.0GRID K2
86:0.0
pgpu
5:0.0GRID K1
pgpu
6:0.0GRID K1
pgpu
7:0.0GRID K1
pgpu
8:0.0GRID K1
gpu-group
GRID K2
allocation:depth-first
pgpu
11:0.0GRID K1
pgpu
12:0.0GRID K1
pgpu
85:0.0GRID K1
pgpu
86:0.0GRID K1
vgpu-typeGRID K100
vgpu-typeGRID K120Q
User creates vgpu objects: - owned by a specific VM - associated with a gpu-group - with a specific vgpu-type
At VM boot, XenServer picks an available pgpu in the group to host the vgpu
vgpu
GRID K260QGRID K286:0.0
VM
vgpu
GRID K100GRID K18:0.0
vgpu-typeGRID K140Q
vgpu-typeGRID K160Q
vgpu-typeGRID K180Q
vgpu-typeGRID K200
vgpu-typeGRID K220Q
vgpu-typeGRID K240Q
vgpu-typeGRID K260Q
vgpu-typeGRID K280Q
How GPU Pass-through Works
Identical GPUs in a host auto-create a GPU group
The GPU Group can be assigned to set of VMs – each VM will attach to a GPU at VM boottime
When all GPUs in a group are in use, additional VMs requiring GPUs will not start
GPU and non-GPU VMs can (and should) be mixed on a host
GPU groups are recognized within a pool• If Server 1, 2, 3 each have GPU type 1, then VMs
requiring GPU type 1 can be started on any of those servers
Limitations of GPU Pass-through
GPU Pass-through binds the VM to host for duration of session • Restricts XenMotion
Multiple GPU types can exist in a single server• E.g. high performance and mid performance GPUs
VNC will be disabled, so RDP is required
Fully supported for XenDesktop, best effort for other windows workloads
HCL is very important
GRID K1/K2 GPUVirtual GPU
Virtual GPU
Virtual GPU
Virtual GPU
Virtual GPU
VIRTUAL MACHINE
Guest OS
NVIDIA Driver
Virtual Desktop
Apps
Remote ProtocolXenServer
NVIDIA GRID Virtual GPU Manager
Physical GPU
Management
State
Direct G
raphic
Commands
VM PortabilityCannot migrate VM to any node.
DensityLimited by number of Virtual GPUs in system
Responsiveness VM has direct access to GPU and includes NVIDIA fast remoting technologyApp PerformanceFull API support including latest OpenGL & DirectX . Includes Application certifications
NVIDIA Grid ArchitectureHardware virtualized GPU
Overview of vGPU on XenServer GRID vGPU enables multiple VMs to share a single physical GPU
VMs run an NVIDIA driver stack and get direct access to the GPU• Supports same graphics APIs as
physical GPUs (DX9/10/11, OGL 4.x)
NVIDIA GRID Virtual GPU Manager for XenServer runs in dom0
GRID K1 or K2
Xen Hypervisor
XenServer dom0
GRID Virtual GPU Manager
NVIDIAKernel Driver
Graphics fast path - direct GPU
access from guest VMs
HypervisorControl
Interface
Host channel registers, framebuffer regions, display, etc.
Per-VM dedicated channels, framebuffer. Shared access to GPU
engines.
Virtual Machine
Guest OS
NVIDIA
Driver
Apps
Management
Interface
Citrix XenDeskt
op
Virtual Machine
Guest OS
NVIDIA
Driver
AppsCitrix
XenDesktop
NVIDIA paravirtualize
d interface
GRID GPU enabled Server
Nvidia vGPU Resource Sharing
Nvidia GRID vGPU
Citrix XenServer dom0
GRID Virtual GPU Manager
Timeshared Scheduling
Virtual Machine 1
Guest OS
NVIDIA
Driver
Apps
Citrix VDA
CPU MMU
3D CE NVENC
NVDEC
Virtual Machine 2
Guest OS
NVIDIA
Driver
Apps
Citrix VDA
GPU BAR VM BAR 1 VM BAR 2
Framebuffer
VM1 Framebuffer
VM2 Framebuffer
Channels
Framebuffer• Allocated at VM startup
Channels• Used to post work to the GPU• VM accesses its channels via
GPU Base Address Register; isolated by CPU‘s Memory Management Unit (MMU)
GPU Engines• Timeshared among VMs, like
context on single OS
57
GPUs in XenCenter
vGPU Settings in XenCenter
GPU profile
High Availability Details
Protecting Workloads
Not just for mission critical applications anymore
Helps manage VM density issues
"Virtual" definition of HA a little different than physical
Low cost / complexity option to restart machines in case of failure
High Availability Operation
Pool-wide settings
Failure capacity – number of hosts to carry out HA Plan
Uses network and storage heartbeat to verify servers
VM Protection Options
Restart Priority• Do not restart• Restart if possible• Restart
Start Order• Defines a sequence and delay to ensure applications run correctly
HA Design – Hot Spares
Simple Design• Similar to hot spare in disk array• Guaranteed available• Inefficient Idle resources
Failure Planning• If surviving hosts are fully loaded – VMs will be forced to start on spare• Could lead to restart delays due to resource plugs• Could lead to performance issues if spare is pool master
HA Design – Distributed Capacity
Efficient Design• All hosts utilized
Failure Planning• Impacted VMs automatically placed for best fit• Running VMs undisturbed• Provides efficient guaranteed availability
HA Design – Impact of Dynamic Memory
Enhances Failure Planning• Define reduced memory which meets SLA• On restart, some VMs may “squeeze” their memory• Increases host efficiency
High Availability – No Excuses
Shared storage the hardest part of setup• Simple wizard can have HA defined in minutes• Minimally invasive technology
Protects your important workloads• Reduce on-call support incidents• Addresses VM density risks• No performance, workload, configuration penalties
Compatible with resilient application designs
Fault tolerant options exist through ecosystem
Storage XenMotion
VHD Benefits
Many SRs implement VDIs as VHD trees
VHDs are a copy-on-write format for storing virtual disks
VDIs are the leaves of VHD trees
Interesting VDI operation: snapshot (implemented as VHD “cloning”)
A: Original VDI
B: Snapshot VDI
ARW
BRO
ARW
RO
Source Destination
Storage XenMotion
“A” represents the VHD of a VM
The VHD structure (not contents) of “A” is duplicated on the Destination
VirtualMachine
AAParent
Child
Empty
Source
A
Destination
Storage XenMotion
A snapshot is taken on the Source
The new child object is duplicated on the Destination
VirtualMachine
AB BParent
Child
Empty
Source Destination
B B
Storage XenMotion
VM writes are now synchronous to both Source & Destination Active child VHDs
Parent VHD (now Read-Only) is now background copied to the Destination
AA
VirtualMachine
AB BCopy
AParent
Child
Empty
Source Destination
Storage XenMotion
Once the Parent VHD is copied, the VM is moved using XenMotion
The synchronous writes continue until the XenMotion is complete
AB BA
VirtualMachine
AParent
Child
Empty
Source Destination
Storage XenMotion
The VHDs not required are removed
The VM and VDI move is complete
AB B A
VirtualMachine
AParent
Child
Empty
Benefits of VDI Mirroring
Optimization: start with most similar VDI• Another VDI with the least number of different blocks• Only transfer blocks that are different
New VDI field: Content ID for each VDI• Easy way to confirm that different VDIs have identical content• Preserved across VDI copy, refreshed after VDI attached RW
Worst case is a full copy (common in server virtualization)
Best case occurs when you use VM “gold images” (i.e. CloudStack)
Network topologies
XenServer Network Terminology
Internal Switches
PIF (eth0)
VIF
VIF
VIF
Virtual Machine
Virtual Machine
Network 0 (xenbr0)
Private(xapi1)
Network Card
Virtual Machine
XenServer Network Terminology
Internal Switches
PIF (eth1)
PIF (eth0)
VIF
VIF
VIF
Virtual Machine
Network 1 (xenbr1)
Network 0 (xenbr0)
Network Card
Network Card
Virtual Machine
XenServer Network Terminology
PIF (bond0)
PIF
VIF
VIF
Virtual Machine
Network Card
Network Card
VIF
Bond 0+1 (xapi2)
PIF (eth0)
PIF (eth1)
XenServer Networking Configurations
Linux NIC Drivers
vSwitch Config
XenServer PoolDB
PhysicalNetwork
Card
XAPI
Command Line
XenCenter
xsconsole
Bonding Type (Balance SLB)
Virtual Machine
Network Card
Network Card
Virtual Machine
Bond
0:00 SEC0:10 SEC0:20 SEC0:30 SEC
Stacked Switches
Virtual Machine
30 Gbps
14 Gbps
6 Gbps
9 Gbps
25 Gbps
4 Gbps
22 Gbps
5 Gbps
9 Gbps
11 Gbps
4 Gbps
18 Gbps
Virtual Machine
10.1.2.3:80/MAC1
Virtual Machine
10.1.2.2:80/MAC1
Virtual Machine10.1.2.1:80/MAC1
10.1.2.1:443/MAC1
Bonding Type (LACP)
Network Card
Network Card Bond
Stacked Switches
Distributed Virtual Switch – Flow ViewDVS Controller
OVS
Flow Table
Flow Table CachevSwitch
Network A
Flow Table
Flow Table CachevSwitch
Network B
ovsdb-server vswitchd
OpenFlowJSON-RPC
PIF PIF
VIF
VIF
VIF
VIF
Storage Networks
Independent management network• Supports iSCSI multipath• Bonded for redundancy; multipath as best practice• Best practice to enable Jumbo frames• Must be consistent across pool members• 802.3ad LACP provides limited benefit (hashing)
Guest VM Networks
Single server private network• No off host access• Can be used by multiple VMs
External network• Off host network with 802.1q tagged traffic• Multiple VLANs can share physical NIC• Physical switch port must be trunked
Cross host private network• Off host network with GRE tunnel• Requires DVSC or Apache CloudStack controller
Storage Topologies
Storage Repository
VBD
Virtual Machine
VDI
VDI VBD
VDI VBD Virtual Machine
XenServer storage concepts
PBDXenServer Host
PBD
PBDXenServer Host
XenServer Host
Thick provisioning
With thick provisioning, disk space is allocated statically.
As virtual machines are created, their virtual disks utilize the entire available disk size on the physical storage.
This can result in a large amount of unused allocated disk space.
A virtual machine created using a 75 GB virtual disk would consume the entire 75 GB of physical storage disk space, even if it only requires a quarter of that.
Thick Provisioning
75 GB Disk
Space Required75 GB
Allocated, but unused space
50 GB
Actually Used25 GB
Thin Provisioning
75 GB Disk
Thin Provisioning
With thin provisioning, disk space is allocated on an “as-needed” basis.
As virtual machines are created, their virtual disks will be created using only the specific amount of storage required at that time.
Additional disk space is automatically allocated for a virtual machine once it requires it. The unused storage space remains available for use by other virtual machines.
A virtual machine created using a 75 GB virtual disk, but that only uses 25 GB, would consume only 25 GB of space on the physical storage.
Space Required
25 GB
Free Space for
Allocation 50 GB
Actually Used25 GB
Thin Provisioning
75 GB Disk
Sparse Allocation
Sparse allocation is used with thin provisioning.
As virtual machines are created, their virtual disks will be created using only the specific amount of storage required at that time.
Additional disk space is automatically allocated for a virtual machine once it requires it. If the OS allocates the blocks at the end of the disk, intermediate blocks will become allocated
A virtual machine created using a 75 GB virtual disk, but that uses 35 GB in two blocks, could consume between 35 GB and 75GB of space on the physical storage.
Space Required
75 GB
Allocated, but unused space
40 GB
Actually Used25 GB
10GB used
Local DiskLocal Disk
XenServer Disk Layouts (Local)
LVM Volume Group
LVHD Logical Volumes (Thick)
Virtual Machine Virtual Machine
Storage Repository
Default Layout
dom0 Partition
(4GB)
EXT-Based Layout
xxx.vhd yyy.vhd zzz.vhd
EXT File SystemBackup Partition
dom0 Partition
(4GB)
Backup Partition
Virtual Machine Virtual Machine
Files (Thin)
Storage Repository
VHD Header
OS Partition &
File System
SAN “Raw” Disk
NASVolume
XenServer Disk Layouts (Shared)
NFS Share
xxx.vhd yyy.vhd zzz.vhd
LUNLVM Volume GroupLVHD Logical Volumes (Thick)
Virtual Machine Virtual Machine
Storage Repository
Virtual Machine Virtual Machine
Native iSCSI & Fiber Channel
NFS-Based Storage
Storage Repository
Management and MonitoringFibre Channel LUN Zoning
Since Enterprise SANs consolidate data from multiple servers and operating systems, many types of traffic and data are sent through the interface, whether it is fabric or the network.
With Fibre Channel, to ensure security and dedicated resources, an administrator creates zones and zone sets to restrict access to specified areas. A zone divides the fabric into groups of devices.
Zone sets are groups of zones. Each zone set represents different configurations that optimize the fabric for certain functions.
WWN - Each HBA has a unique World Wide Name (similar to an Ethernet MAC)
node WWN (WWNN) - can be shared by some or all ports of a deviceport WWN (WWPN) - necessarily unique to each port
Fibre Channel LUN Zoning
Initiator GroupXen1, Xen2
LUN0 LUN1
Xen1 Xen2 Xen3
Pool1 Pool2
LUN2
Initiator GroupXen3
FC Switch
Storage
Zone1Xen1 WWN Xen2 WWN
Storage WWN
Zone2Xen3 WWN
Storage WWN
FC Switch example
Management and MonitoringiSCSI Isolation
With iSCSI type storage a similar concept of isolation as fibre-channel zoning can be achieved by using IP subnets and, if required, VLANs.
IQN – Each storage interface (NIC or iSCSI HBA) has configured a unique iSCSI Qualified Name
Target IQN – Typically associated with the storage provider interfaceInitiator IQN – Configured on the client side
IQN format is standardized:iqn.yyyy-mm.{reversed domain name} (e.g. iqn.2001-04.com.acme:storage.tape.sys1.xyz)
Storage
iSCSI Isolation
Controller Interface 1
LUN0 LUN1
Xen1 Xen2 Xen3
Pool1 Pool2
LUN2
Controller Interface 2
Network Switch
iSCSI Example
VLAN1 / Subnet1Xen1 Initiator IQN Xen2 Initiator IQN
Controller 1 Target IQN
VLAN2 / Subnet2Xen3 Initiator IQN
Controller 2 Target IQN
Storage multipathing
• Routes storage traffic over multiple physical paths
• Used for redundancy and increased throughput
• Unique logical networks are required
• Available for Fibre Channel and iSCSI
• Uses Round-Robin Load Balancing (Active- Active)
Storage Array
Network 1
XenServer Host
Storage Controller 1
Storage Controller 2
192.168.1.200
192.168.2.200
192.168.1.201
192.168.2.201
192.168.1.202
192.168.2.202
Network 1
Understanding dom0 storage
dom0 isn’t general purpose Linux• Don’t manage storage locally• Don’t use software RAID• Don’t mount extra volumes• Don’t use dom0 storage as “scratch”
Local storage is automatically an SR
Adding additional local storage• xe sr-create host-id=<host> content-type=user name-label=”Disk2” device-config:device=”/dev/sdb” type=ext
Spanning multiple local storage drives• xe sr-create host-id=<host> content-type=user name-label=”Group1” device-config:device=”/dev/sdb,/dev/sdc” type=ext
Snapshots
Snapshot Behavior Varies By
The type of SR in use• LVM-based SRs use “volume-based” VHD• NFS and ext SRs use “file-based” VHDs• Native SRs use capabilities of array
Provisioning type• Volume-based VHDs are always thick provisioned• File-based VHDs are always thin provisioned
For LVM-based SR types• If SR/VM/VDI created in previous XS version, VDIs (volumes) will be RAW
Snapshot (NFS and EXT Local Storage)
Resulting VDI tree Disk utilization• VHD files thin provisioned• VDI A contains writes up to point of snapshot
• VDI B and C are empty*• Total:• VDI A: 20• VDI B: 0*• VDI C: 0*
• Snapshot requires no space*
A
B
20 40
400 40C 0
(1)(2)
(1) Size of VDI(2) Data written in VDI
Key
Snapshot CloneParent Active* Plus VHD headers
Snapshot (Local LVHD, iSCSI or FC SR)
Resulting VDI tree Disk utilization• Volumes are thick provisioned• Deflated where possible• Total:• VDI A: 20• VDI B: 40*• VDI C: 0*
• Snapshot requires 40 + 20GB
A 4020
400 B 40C 0
(3) (1)(2)
(1) Size of VDI(2) Data written in VDI(3) Inflated / deflated state
Key
Snapshot CloneParent Active* Plus VHD headers
Automated Coalescing Example
1) VM with two snapshots,
C and EA
CB
D E
A + B
3) Parent B is no longer required and will be coalesced into A
D E
Key
Snapshot CloneParent Active
2) When snapshot C is deleted…
A
B
D E
http://support.citrix.com/article/CTX122978
Suspend VM / Checkpoints
Suspend and snapshot checkpoints store VM memory content on storage
The storage selection process• SR specified in pool parameter suspend-image-sr is used• Suspend-image-sr (pool) by default is default storage repository• In case no suspend-image-sr (e.g. no default SR) is set on pool level
XenServer falls back to local SR of the host running the VM
Size of suspend image is ~ 1.5 * memory size
Best practice: configure an SR as the suspend images store • xe pool-param-set uuid=<pool uuid> suspend-image-SR=<shared sr uuid>
VM suspend
Save memory on storage
Snapshot storage utilization
LVM-based VHD
Read-Write Child Image (VHD)
Read-only Parent Image (VHD)
60GB VDI
60GBRead-Write Child Image
(VHD)
60GB
File-based VHD
Read-Write Child Image (VHD)
Read-only Parent Image (VHD)
60GB VDI 50% Allocated
30GB ofData Used
Read-Write Child Image (VHD)
Size equals data written to disk
since cloning (thin)
Size equals data written to disk
since cloning (thin)
50% Allocated
30GB ofData Used
Integrated Site Recovery Details
Integrated Site Recovery
Supports LVM SRs only
Replication/mirroring setup outside scope of solution• Follow vendor instructions• Breaking of replication/mirror also manual
Works all iSCSI and FC arrays on HCL
Supports active-active DR
Feature Set
Integrated in XenServer and XenCenter
Support failover and failback
Supports grouping and startup order through vApp functionality
Failover pre-checks• Powerstate of source VM• Duplicate VMs on target pool• SR connectivity
Ability to start VMs paused (e.g. for dry-run)
How it Works
Depends on “Portable SR” technology• Different from Metadata backup/restore functionality
Creates a logical volume on SR during setup
Logical Volume contains• SR metadata information• VDI metadata information for all VDIs stored on SR
Metadata information is read during failover sr-probe
Integrated Site Recovery - Screenshots
Recommended