50

TECHNOLOGY EXPERT, TEAM LEAD VDI & CITRIX TECHNOLOGIES ... · PDF fileBringing your datacenter to the ... TECHNOLOGY EXPERT, TEAM LEAD VDI & CITRIX TECHNOLOGIES. Intro ... NetScaler

Embed Size (px)

Citation preview

Bringing your datacenter to the cloud, a different way of thinking

IR. KOENRAAD MEULEMANTECHNOLOGY EXPERT, TEAM LEAD VDI & CITRIX TECHNOLOGIES

Intro

• Despite the fact that ‘the cloud’ is slowly getting more mature, there is still a lot of ‘fog’ around the concept. This not the least because the flag is covering lots of cargos. Just think of the different possibilities of SaaS, PaaS or IaaS solutions.

• Today we focus on Infrastructure as a Service (IaaS). In a highly simplified model, we could say that this is "just another external datacenter '.

• We go deeper into why this is or is not a correct approach and how Microsoft Windows 2016 can be used to address a number of ‘inconveniences’.

• Doing so, we cover some real-life aspects we learned from the deployment of a large Citrix XenApp ‘server-based shared desktop’ project in Azure.

Content

• Part 1:Citrix XenApp ‘server-based shared desktop’ POC in Azure

• Part 2:Citrix XenApp ‘server-based shared desktop’ project in Azure – Iaas:Architecting a datacenter in Azure, a different way of thinking

Part 1:

Citrix XenApp ‘server-based shared desktop’ POC in Azure

Citrix XenApp ‘server-based shared desktop’ POC in Azure

• Current Situation:▶ +/- 3000 End-users with thin clients (peak)

- Day use (8 am – 6 pm) +/- 2000 internal users- Day use (7 am – 8 pm) +/- 800 external users- Night use +/- 100 external users

▶ Highly- Available Citrix XenApp v6.5 with +/- 150 XenApp Workers- 2 datacenters, fully redundant

▶ XenApp Workers provisioned with Citrix Provisioning Services (PVS)- Nightly XenApp Server reboot -> reprovisioning ( 2 am)

▶ Windows Server 2008 R2

Citrix XenApp ‘server-based shared desktop’ POC in Azure

• Desired new situation (February 2016):▶ Windows Server 2012 R2▶ XenApp v7.6 or higher▶ Rapid server reprovisioning▶ Cost reduction by using modern techniques▶ Easy management▶ High Availability

- Using Cloud (Azure) ‘datacenters’ (hybrid: as back-end is out of scope)- Using fail-over to the cloud- Using 2 on-prem datacenters

Citrix XenApp ‘server-based shared desktop’ architecture

10 © 2016 Citrix | Confidential

Users

Resources

Access & Control

Hardware

• Manage users• Manage user groups• Manage entitlements

• Install & configure apps• Set policies• Patch & update

• Configure broker• Configure user storefront• Monitor & troubleshoot

• Purchase hardware• Rack & stack• Network & orchestration

On Premises

Cloud hosted

Citrix Workspace

Cloud

Citrix Partner

Citrix Service Provider

IT managed

Partner managed

Citrix XenDesktop Hybrid Flavours

11 © 2016 Citrix | Confidential

XenDesktop – On-Premises Cloud Hosted XenDesktop

Citrix Workspace Cloud Citrix Service Provider

Hardware Layer

Control Layer

Access LayerUser Layer Resource Layer

StoreFront

Delivery Controller

Remote PC Access

Windows apps

Windows desktops

Linux Desktops

SQL Database

SSL

Delivery GroupNetScaler Gateway

Director

Studio

Active Directory

License Server

Access & Control Hosts

Resource Hosts

Hardware Layer

Control Layer

Access LayerUser Layer Resource Layer

StoreFront

Delivery Controller

Remote PC Access

Windows apps

Windows desktops

Linux Desktops

SQL Database

SSL

Delivery GroupNetScaler Gateway

Director

Studio

Active Directory

License Server

Access & Control Hosts

Resource Hosts

Hardware Layer

Control Layer

Access LayerUser Layer Resource Layer

StoreFront

Delivery Controller

Remote PC Access

Windows apps

Windows desktops

Linux Desktops

SQL Database

SSL

Delivery GroupNetScaler Gateway

Director

Studio

Active Directory

License Server

Access & Control Hosts

Resource Hosts

Hardware Layer

Control Layer

Access LayerUser Layer Resource Layer

StoreFront

Delivery Controller

Remote PC Access

Windows apps

Windows desktops

Linux Desktops

SQL Database

SSL

Delivery GroupNetScaler Gateway

Director

Studio

Active Directory

License Server

Access & Control Hosts

Resource Hosts

Underlying hypervisors:Resource Hosts vs Access & Control Hosts

Citrix XenApp ‘server-based shared desktop’ POC in Azure

• POC goal: provide additional information to make founded decisions.

▶ Desired new situation (February 2016):- Windows Server 2012 R2- XenApp v7.6 or higher- Rapid server reprovisioning- Cost reduction by using modern techniques- Easy management- High Availability

▹ Using Cloud (Azure) ‘datacenters’ (hybrid: back-end out of scope)▹ Using fail-over to the cloud▹ Using 2 on-prem datacenters

Rapid server reprovisioning for POC in Azure

• Citrix Provisioning Services (PVS)▶ No option as it needs PXE-boot and Azure does not support that▶ Not supported on Azure by Citrix, even if we would be able to trick it somehow

• Citrix Machine Creation Services (MCS)▶ No support for Azure initially (February 2016)▶ Only support for ‘Classic’ Azure (Summer 2016)▶ Now (February 2017) support for Azure Resource Management (ARM)

- Missing user defined naming convention support- Missing user defined (security) roles support- Realdolmen as a member of the Citrix Partner Technical Expert Council works with the Citrix product team to include these features

• Powershell▶ In only a few weeks Realdolmen’s consultants wrote the needed scripts to

successfully conclude this part of the POC:- Automatically create Virtual Machines (VMs) from a ‘golden master’- “Domain Join” these virtual machines- Enable extensive logging and monitoring of the VMs from within Azure- Start VMs at a predefined time- Stop VMs from a predefined time onwards, but only when not in use- Other

High Availability: on-prem, cloud or hybrid ?

• Most important decision factor: cost calculation• Known on-prem cost

▶ After a fast on-prem POC (March 2016):- WS 2012R2- XenApp 7.8

• How to calculate on-Azure cost ?▶ Data transfer cost (in and) out of Azure

- Tests showed an average of 100kbps per XenApp user (-> fixed)

▶ Storage cost- Depending on the number of Virtual Hard Disks needed- Depending on the number of VMs needed- Relatively low influence in the total cost

▶ Compute cost (VMs)- Depending on the number of VMs needed- Depending on the time the VMs are allocated (active)

How to calculate the minimal compute cost ?

• Created different Azure VMs with different sizes

• Used Loadrunner (and a representative in-house created test scenario) to put load on the VMs

• Checked the VMs’ resource usage (memory, cpu, disk I/O)

• Determined two values per machine size:▶ U75: # of users when one of the resources is 75% depleted▶ Umax: Maximum number of users possible

How to calculate the minimal compute cost ?

• Calculated per machine size the “Cost per user per hour”

• We did the same tests for WS2016 and had almost the same numbers for U75 and UMAX !

Size Cores Memory Cost/h (€) U75 Umax Cost/U75/h Cost/Umax/hStandard D1 v2 1 3.5 0.1088 5 14 0.0218 0.0078Standard D2 v2 2 7 0.2176 22 38 0.0099 0.0057Standard D11 v2 2 14 0.253 29 39 0.0087 0.0065Standard D12 v2 4 28 0.506 49 50 0.0103 0.0101Standard D13 v2 8 56 0.9108 75 80 0.0121 0.0114Standard D14 v2 16 112 1.6394 125 140 0.0131 0.0117Standard D15 v2 20 140 2.0492 150 170 0.0137 0.0121

Size Cores Memory Cost/h (€) U75 Umax Cost/U75/h Cost/Umax/hStandard D1 v2 1 3.5 0.1088 5 15 0.0218 0.0073Standard D11 v2 2 14 0.253 30 42 0.0084 0.0060Standard D12 v2 4 28 0.506 38 50 0.0133 0.0101

How to calculate the minimal transfer cost ?

• Data Transfer Cost▶ # Users is Fixed -> total cost is fixed▶ ~1750€/month

Est. Outb. Trafic /session (kbps) 96Est. outb. trafic (TB) 21.93

Bandwidth req. (mbps) 237.09

Outbound Trafic price per GB € 0.0700 31

Outbound Traffic price € 1 606

How to calculate the minimal storage cost ?

• Storage Cost▶ Less VMs -> less disks -> less costs (€/U/h)▶ Largest VMs are expected to run about 150 users▶ We need 2 disks per server (OS disk & Programs disk), 3000 Users

- D11_V2: 30 Users/VM -> 100 VMs -> 200 Disks- D15_V2: 150 Users/VM -> 20 VMs -> 40 Disks

▶ -> Larger VMs are cheaper when it comes to storage

▶ But…- ‘Blue Screen Of Death’ Impact ?- The Shutdown Dilemma

How to calculate the minimal compute cost ?

• The Shutdown Dilemma▶ In the morning, we can turn on servers as we need them▶ In the evening, we can only turn off servers once they are ‘empty’▶ The Citrix XenApp Delivery Controller distributes the load on logon

- The server with the lowest load is chosen

▶ -> All servers remain on as long as there are more users than servers▶ -> Big servers (D15_V2, 150 users)

- will remain always on - are very expensive per hour (> 2€ /h)

▶ -> Very small servers would become cost effective

How to calculate the minimal compute cost ?

• The VDI Dilemma▶ How about 1 user per VM ?▶ Start the VM on logon, shutdown the VM after logoff▶ Very cost effective in terms of ‘pay what you use’▶ VDI Cost per user per hour ?

▶ Compared to Server Based Shared Desktop (SBSD)

▶ A VDI solution is about 10x more expensive than a SBSD solution▶ VDI is only interesting when the average user works less than 24/10 = 2.4 h/d ▶ This might (will) change in the future (2018 ?)

- Citrix & Microsoft joint effort: nested virtualization

Size Cores Memory Cost/h (€) U75 Cost/U75/hA0 Basic 1 0.75 0.0152 1 0.0152A1 Basic 1 1.75 0.0632 1 0.0632A2 Basic 2 3.5 0.1265 1 0.1265A0 Standard 1 0.75 0.0169 1 0.0169A1 Standard 1 1.75 0.0759 1 0.0759A2 Standard 2 3.5 0.1518 1 0.1518Standard D1 v2 1 3.5 0.1088 1 0.1088

Size Cores Memory Cost/h (€) U75 Umax Cost/U75/h Cost/Umax/hStandard D11 v2 2 14 0.253 29 39 0.0087 0.0065

How to calculate the minimal compute cost ?

• The Startup/Shutdown Dilemma (Part 2)▶ If relatively small machines (30 users per machine) are the optimal

choice in terms of cost, how do we minimize runtime ?▶ Based on the current Citrix License Usage, we have historic data on

the number of users per 15 minutes for the past year.

0

500

1000

1500

2000

2500

3000

License Usage for 1 week

How to calculate the minimal compute cost ?

• The Startup/Shutdown Dilemma (Part 2)• Working day

• Sun- & Holidays: < 300 users

0

500

1000

1500

2000

2500

3000

00:0

6:57

01:0

6:59

02:0

7:01

03:0

7:02

04:0

7:03

05:0

7:05

06:0

7:06

07:0

7:07

08:0

7:08

09:0

7:09

10:0

7:12

11:0

7:13

12:0

7:15

13:0

7:16

14:0

7:18

15:0

7:19

16:0

7:20

17:0

7:21

18:0

7:23

19:0

7:24

20:0

7:25

21:0

7:27

22:0

7:28

23:0

7:30

Licenses used for 1 day

How to calculate the minimal compute cost ?

▶ Excel emulations:- Dynamic Holiday Function: construction of a usage prediction function (2nd

degree, 3rd degree, 4th degree) based on the usage evolution over the previous period of 15min, 30min, 45min, 1h

- -> Failed due to overshoot in the morning

0

20

40

60

80

100

120

140

Servers needed

-500

0

500

1000

1500

2000

2500

3000

3500

Users forcasted

How to calculate the minimal compute cost ?

▶ Excel emulations:- Static trapped function based on historic data -> Failed due to seasonal

changes:▹ Year-end higher▹ Summertime lower▹ Others (changing Easter vacations, …)

How to calculate the minimal compute cost ?

▶ Excel emulations:- One-trap approach:

▹ 7:00 – 19:00 100 Servers (30 users each) running▹ 19:00 – 7:00 10 Servers running▹ One-trap approach is cheapest ! (~3750€/month for XenApp Workers)

0

20

40

60

80

100

120

0:00

2:00

4:00

6:00

8:00

10:0

012

:00

14:0

016

:00

18:0

020

:00

22:0

00:

002:

004:

006:

008:

0010

:00

12:0

014

:00

16:0

018

:00

20:0

022

:00

VMs startup schedule

Group 0 Group 1 Group2

How to calculate the minimal compute cost ?

▶ New evolution: Citrix Smart Scale- Supported for on-prem XenApp/XenDesktop virtual power management- For Azure: support to be announced ?

▹ Realdolmen PTEC provides input to Citrix

High Availability: on-prem, cloud or hybrid ? (recap)

• Most important decision factor: cost calculation• Known on-prem cost

▶ fast on-prem POC tests:- WS 2012R2- XenApp 7.8

• How to calculate on-Azure cost ?▶ Data transfer cost in and out of Azure

- Tests showed an average of 100kbps per XenApp user (-> fixed)- ~1750€/month

▶ Storage cost- Depending on the number of Virtual Hard Disks needed- Depending on the number of VMs needed- Relatively low influence in the total cost - <500€/month

▶ Compute cost (VMs)- Depending on the number of VMs needed- Depending on the time the VMs are allocated (active)- ~3750€/month

High Availability: on-prem, cloud or hybrid ? (cont)

• Most important decision factor: cost calculation• Known on-prem cost• Known (estimated) Azure cost• Estimated Azure cost is cheaper than on-prem cost !• Decision was taken: Citrix XenApp ‘server-based shared desktop’ project in Azure !

Part 2:

Citrix XenApp ‘server-based shared desktop’ project in Azure – IaaS:

Architecting a datacenter in Azure, a different way of thinking

Azure Availability: a deciding factor

• SLA for virtual Machines:▶ “For any Single Instance Virtual Machine using premium storage for all

disks, we guarantee you will have Virtual Machine Connectivity of at least 99.9%.”

▶ 99.9% availability = up to 45min/month of unplanned downtime without penalty

▶ Single Instance Virtual Machines NOT using (more expensive) premium storage are not in the SLA (have no guarantee?).

Azure Availability: a deciding factor

• SLA for virtual Machines:▶ “For all Virtual Machines that have two or more instances deployed in

the same Availability Set, we guarantee you will have Virtual Machine Connectivity to at least one instance at least 99.95% of the time.”

▶ “Availability Set”: a container for similar virtual machines- Web servers- Citrix XenApp/XenDesktop Workers- Cluster Nodes (eg file server clusters)- Active Directory Domain Controllers- HA Node pairs (Firewalls, Load Balancers, …)

How to increase service availability ?

• If we want to improve service availability excellence, we have to understand the influencing factors:

▶ Hardware failures▶ Hypervisor patching▶ Hardware or software upgrades▶ Storage (service) failures

Hardware failures

Hardware failures

• Hardware:▶ Heavy competition between

- cloud providers (Azure, AWS, Google, …)- on-premise datacenters

▶ Datacenter (containers) constructed at lowest possible price▶ No server redundancy (power, fan, nic, memory, cpu)▶ -> If you need redundancy, deploy over multiple servers▶ -> Servers grouped in (up to) 3 ‘Fault-Domains’

• By design, servers in different Fault-Domains should not stop functioning at the same time.

▶ power distribution EU 3 phases + N -> 3 fault-domains▶ power distribution US 2 phases + N -> 2 fault-domains

Hypervisor Patching

• Software:▶ Open Source hypervisors: Xen▶ In-house developed hypervisors: Microsoft Hyper-V▶ Open Source or In-house management and billing tools▶ -> These softwares need to be patched and upgraded regularly▶ -> Servers grouped in 20 ‘Patching-Domains’

• By design, servers in different Patching-Domains should not be patched at the same time.

▶ Original patching implementation: - VM Pause, server restart, VM Resume -> up to 15min downtime

▶ Current patching implementation:- VM Pause, VM Resume on different hypervisor – seconds downtime

Hardware or software upgrades

• Hardware in active hardware containers is not repaired or replaced• Hardware containers are repaired when efficiency becomes too low• -> Active VMs are stopped and restarted elsewhere• Similar for software upgrades

▶ Hyper-V 2012R2 -> Hyper-V 2016• Tip: deallocate (stop) and reallocate (start) your VMs regularly

• Restart without warning:▶ Single Instance VM on Standard storage

• Planned downtime (announced):▶ Single Instance VM on Premium storage▶ VMs deployed in an Availability Set

• VMs can only be added to an AS at creation time• -> always create VMs in a AS, even when initially single instance

Storage Service Failure

• Standard Storage: ▶ Standard Storage Accounts ▶ Always minimum 3 copies of data (6 for geo-redundant)▶ Software defined storage on Standard disks▶ Max 500TB, 20.000 IOps per Storage Account (SA)▶ Standard VMs have 500 IOps disks -> max 40 disks per SA

• Premium Storage:▶ Premium Storage Accounts▶ Software defined storage on SSD disks▶ Up to 5000 IOps per disk▶ VMs with up to 64TB, 80.000 IOps, 2000MBps throughput possible

• Security on Storage Account Blobs and files:▶ Access Keys▶ No (Azure) Active Directory Integration Possible

• https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits

Windows Server 2016 DE Storage Spaces Direct on Scale-Out File Server clusters• Shared-nothing cluster• Minimum 4 nodes needed for Storage Spaces Direct• Additional Cloud Witness on Azure Storage Account Blob• High-Available cluster

▶ 5 witnesses▶ 3 is enough for a majority (half + 1)▶ 2 storage nodes can be down

without losing the cluster

• Storage tiering possible (disks + SSD)• Limited in performance by the 200Mbps network• -> possible need for Accelerated Networking

▶ Only available on Azure D15_V2 (1500€/month -> 6000€/cluster)

Architecture: Original Thought

WAN

Internet

Azure

XenAppWorkers

XenAppWorkers

XenAppDelivery

Controller

AD DC File Server

On-prem A

AD DC File Server

Back-endProcesssing

Mail server

Proxy Server

DatabaseServer

On-prem B

AD DC File Server

Back-endProcesssing

Mail server

Proxy Server

DatabaseServer

XenAppStorefront

XenAppStorefront

LoadBalancer

LoadBalancer

To Accommodate for hardware failures

• We need to implement every component highly available• Hence we need every component at least twice• Availability groups (and possibly loadbalancers) are needed

On Azure, you need Azure access

• In order to create licensed Windows VMs• In order to domain join VMs• In order to provide Azure Management tools with VM health info• In order to access Azure blob storage

• Optionally▶ Fast Access to Exchange Online and Office 365

• ‘Azure Access’ means ‘Internet Access’ unless you implement routing tables and/or firewalls

Architecture: What Is Needed

Internet

WAN

Internet

WAN

Azure

XenAppWorkers

XenAppWorkers

XenAppDelivery

Controller

AD DC

On-prem A

AD DC File Server

Back-endProcesssing

Mail server

Proxy Server

DatabaseServer

On-prem B

AD DC File Server

Back-endProcesssing

Mail server

Proxy Server

DatabaseServer

XenAppStorefront

XenAppStorefront

LoadBalancer

LoadBalancer

Azure

FileFile FileFileAD DC

XenAppDelivery

Controller

ProxyServer

ProxyServer

Microsoft Hypervisor Patch Week

• During MS Hypervisor patch week, we will lose in term 5% of our XenApp Workers while Microsoft is patching that patching-domain

• MS reduced downtime from 15 minutes to a ‘10 seconds freeze’• We do not know the order of patching nor the exact time• This is still unacceptable in a 24 x 7 customer facing environment

• Solution:▶ When patching Europe-West, we move to Europe-North▶ When patching Europe-North, we move to Europe-West

Architecture: High Available

Internet

WAN

Internet

WANEurope West

XenAppWorkers

XenAppWorkers

XenAppDelivery

Controller

AD DC

On-prem A

AD DC File Server

Back-endProcesssing

Mail server

Proxy Server

DatabaseServer

On-prem B

AD DC File Server

Back-endProcesssing

Mail server

Proxy Server

DatabaseServer

XenAppStorefront

XenAppStorefront

LoadBalancer

LoadBalancer

Azure

FileFile FileFileAD DC

XenAppDelivery

Controller

ProxyServer

ProxyServer

Europe North

XenAppWorkers

XenAppWorkers

XenAppDelivery

Controller

AD DC FileFile FileFileAD DC

XenAppDelivery

Controller

ProxyServer

ProxyServer

Dual Region Location Implementation

• Cost reduction:▶ We do not need to run machines that we don’t actively need▶ Exception: Active Directory Domain Controllers (and firewalls)

• Negative impact:▶ We will need data replication between file server clusters in different

locations▶ Microsoft Windows Server 2016 provides techniques for this

Naming Convention

• You have to create lots and lots and lots of ‘objects’• Formally define a naming convention

▶ Extend your existing naming convention to include new object types▶ Avoid ‘long’ names▶ Use names that are ‘meaningfull’ for humans▶ Some object names (eg storageaccounts)

- can only have lowercase letters and digits- no uppercase letters, no dashes

▶ Do not put object properties in the object name (eg -VM, -NIC )- If you do put object properties in the object name, make sure they cannot change

▹ eg object Location: NE, WE▹ storage type: s(tandard), p(remium)▹ storage redundancy: lrs, grs

• Stick to your naming convention▶ Or change it and start all over again (most objects cannot be renamed)

Azure ARM Resource Groups

• Resource Groups▶ Bunch of resources together▶ Security boundary▶ RG Networking -> Networking team is contributor, others have read

access▶ RG Common -> Supporting services (AD, DNS, AV, SCOM, …)▶ RG per project

Connectivity

• On-Prem to Cloud Connectivity▶ VPNs▶ Express Route▶ Direct Internet Access (& Direct Access)

• Bandwidth and Latency▶ Best effort (VPN)▶ Guaranteed Bandwidth and Latency (Express Route)

• Availability▶ How about single points of failure ?

- Virtual firewalls▹ Check Point Firewall-1 cluster

- Loadbalancers▹ Citrix Netscaler cluster

- Others• Service Guarantee (CoS, QoS)

▶ Firewall to Virtual-firewall▶ Network Bandwidth Optimizer to virtual Network Bandwidth Optimizer

Azure in short

• Azure is a worthy datacenter extension / replacement• A lot of things are still happening in the cloud (new / preview)• The feature set is huge, start with what you need and know

▶ Then expand and explore – and optimize

• Cloud architecture requires a different (cheaper) approach• If you don’t go to the cloud (yet), use cloud principles on-prem• Microsoft Windows Server 2016: build for the cloud

▶ private & public !

• Citrix Netscaler can help you control your cloud traffic• Citrix XenApp/XenDesktop brings your users to your cloud

workloads for better performance• The cloud is now !

Questions ?