42
Building a Stretched Cluster with Virtual SAN Rawlinson Rivera, VMware, Inc Duncan Epping, VMware, Inc STO5333 #STO5333

Building a Stretched Cluster using Virtual SAN 6.1

Embed Size (px)

Citation preview

Page 1: Building a Stretched Cluster using Virtual SAN 6.1

Building a Stretched Cluster with Virtual SANRawlinson Rivera, VMware, Inc

Duncan Epping, VMware, Inc

STO5333

#STO5333

Page 2: Building a Stretched Cluster using Virtual SAN 6.1

• This presentation may contain product features that are currently under development.

• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

• Technical feasibility and market demand will affect final delivery.

• Pricing and packaging for any new technologies or features discussed or presented have not been determined.

Disclaimer

CONFIDENTIAL 2

Page 3: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 3

Agenda

1 Introduction

2 Requirements and Architectural details

3 Configuring and operating a VSAN Stretched Cluster

4 Failure Scenarios

5 Interoperability

Page 4: Building a Stretched Cluster using Virtual SAN 6.1

VMware Virtual SAN 6.1Introduction to Stretched Clustering

Page 5: Building a Stretched Cluster using Virtual SAN 6.1

Typical Use Cases For Virtual SAN Stretched Clusters

• Planned maintenance of one site without any service downtime

• Transparent to app owners and end users

• Avoid lengthy approval processes

• Ability to migrate applications back after maintenance is complete

Planned Maintenance

• Automated initiation of VM restart or recovery

• Very low RTO for majority of unplanned failures

• Allows users to focus on app health after recovery, not how to recover VMs

Automated Recovery

• Prevent service outages before an impending disaster (e.g. hurricane, rising flood levels)

• Avoid downtime, not recover from it

• Zero data loss possible if you have the time

Disaster Avoidance

CONFIDENTIAL 5

Page 6: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 6

Virtual SAN Stretch Cluster

• Increases Enterprise availability and data protection• Based on an Active – Active architecture• Supported on both Hybrid and All-Flash architectures• Enables synchronous replication of data between sites

Site A Site B

vSphere + Virtual SAN Stretched Cluster

HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD

Page 7: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 7

Virtual SAN Stretch Cluster

• Site-level protection with zero data loss and near-instantaneous recovery • Virtual SAN Stretched Cluster can be scaled up to 15 nodes per-site• Beneficial solution for disaster avoidance and planned maintenance

Site A Site B

vSphere + Virtual SAN Stretched Cluster

HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD

Page 8: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 8

Virtual SAN Stretched Cluster

Virtual SAN Clusters:• Required a minimum of 3 fault domains to

allow tolerating a single failure

• Virtual Machine objects remain accessible after a single fault domain failure

Fault Domains

Fault Domain A Fault Domain B

Virtual SAN Datastore

Fault Domain C

Fault Domain A Fault Domain B Fault Domain CData WitnessData

Page 9: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 9

Virtual SAN Stretched Cluster

Stretched Clusters:• Provides similar availability with 2 active – active

fault domains plus a witness only fault domain– Light-weight witness host needed only for quorum– Virtual SAN 6.1 allows a single witness host in a third

fault domain

• Virtual Machine disk objects (VMDKs) remain accessible after one data fault domain fails

Fault Domains

Fault Domain A Fault Domain B

Virtual SAN Datastore

Fault Domain C

Fault Domain A Fault Domain B Fault Domain CData WitnessData

• Virtual SAN increases its availabilitycapabilities to:

– Rack failures

– Network failures

– Hardware failures

– Site failures

Page 10: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 10

Virtual SAN Stretch Cluster

• Virtual SAN cluster is formed across the 3 fault domains• Witness fault domain is utilized for witness purposes ONLY, not running VMs!• Availability Policy supported (FTT=1)• Automated failover in the event of site failure

Site A Site C

vSphere + Virtual SAN Stretched Cluster

HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD

Fault Domain A Fault Domain CFault Domain B

HDD SSD

active activewitnessSite B

Page 11: Building a Stretched Cluster using Virtual SAN 6.1

Virtual SAN Stretched ClusterRequirements and Architectural Details

Page 12: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 12

Requirements• Network

– Virtual SAN storage networking– Virtual SAN witness networking– vSphere and virtual machine networking

• Storage– Virtual machine storage– Witness appliance storage

• Compute– Virtual SAN witness– vSphere HA– vSphere DRS

Page 13: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 13

Virtual SAN Stretched Cluster Networking Requirements

HDDSSD

vSphere + Virtual SAN

FD1200 ms latency over 100

mbps over L3 no multicast

< 5 ms latency over >10/20/40 Gbps over L2 with multicast

• Network Requirements between data fault domains/sites– 10 Gbps connectivity or greater– < 5 millisecond latency RTT – Layer 2 or Layer 3 network connectivity with multicast

• Network Requirements to witness fault domain– 100 Mbps connectivity – 200 milliseconds latency RTT– Layer 3 network connectivity without multicast

• Network bandwidth requirements calculated based on write operations between fault domains – Kbps= (Nodes * Writes * 125)– Deployment of 5+5+1 and ~300 VM would be ~4Gbps

FD3FD2

HDDSSDHDDSSD

200 ms latency over 100 mbps over L3 no multicast

Page 14: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 14

Virtual SAN Witness Appliance Overviewand Storage Requirements

Witness overview and requirements• Witness appliance:

– ONLY supported with Stretched Cluster *

– ONLY stores meta-data NOT customer data

– is not able to host any virtual machines

– can be re-created in event of failure

• Appliance requirements:

– at least three VMDK’s

– Boot disks for ESXi requires 20GB

– Capacity tier requires 16MB per witness component

– Caching tier is 10% of capacity tier

– Both tiers on witness could be on MDs

• The amount of storage on the witness is related to number of components on the witness

Witness Appliance

vESXi

Page 15: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 15

Virtual SAN Witness ApplianceSizing Requirements

Resource Requirements• Large scale (15+15+1) – Max 3000 VMs and 18000 components on the

witness

– Memory: 32 GB

– CPU: 2vCPU

– Storage: 350 GB for capacity and 10GB for caching tier

• Medium (4+4+1) – Max 800 VMs & ~5000 Components on the witness

– Memory: 16 GB

– CPU: 2vCPU

• Storage: 50 GB for capacity and 5GB for caching tier

– Small (1+1+1) – Max 200 VMs & 1200 Component on the witness

– Memory: 16 GB

– CPU: 2vCPU

– Storage: 20 GB for capacity and 3GB for caching tier

OR

Data Center

vCloud Air

vESXi

Page 16: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 16

Virtual SAN Witness ApplianceNetwork Requirements

Witness network requirements

• Network communication– between the witness and main sites is L3 (IP based) and

no multicast requirement!

– Witness node was optimized to receive minimal metadata traffic

– Read and write operations do not require any communication to the witness

– Traffic is mostly limited to metadata updates

– Must not be route communication through the witness site

– Heartbeat between the witness and other fault domains happens once a second.

– After 5 consecutive failures the communication is declared failed

HDDSSD HDDSSD

vSphere + Virtual SAN

Witness Appliance

< 5 ms latency over >10/20/40 Gbps

FD3

FD1 FD2

200 m

s late

ncy ove

r 100

mbps

L3 no m

ulticas

t200 ms latency over 100 mbps

L3 no multicast

L2 with multicast

vESXi

Page 17: Building a Stretched Cluster using Virtual SAN 6.1

17

Virtual SAN Stretched Clusters – Supported Deployment Scenarios

CONFIDENTIAL

Complete Layer 3 Deployment

Stretched L2 with multicast

HDDSSD HDDSSD

vSphere + Virtual SAN

Witness Appliance

< 5 ms latency over >10/20/40 Gbps

FD3

FD1 FD2

<200

ms l

atency

over 1

00 m

bps

L3 no m

ulticas

t200 ms latency over 100 mbps

L3 no multicast

HDDSSD HDDSSD

vSphere + Virtual SAN

Witness Appliance

< 5 ms latency over >10/20/40 Gbps

FD3

FD1 FD2

200 m

s late

ncy ove

r 100

mbps

L3 no m

ulticas

t

200 ms latency over 100 mbps

L3 no multicast

Layer 3 Network Layer 3 Network

Traditional Layer 3 and Layer 2 Deployment

routerrouter

router

Layer 3 Network

router router

Layer 3 Network

3rd party solution to manage VM networks required

vESXi vESXi

Page 18: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 18

Virtual SAN Stretched Cluster – Supported Storage Policies

• Maximum supported “FailuresToTolerate” is 1 due to the support of only 3 fault domains

– “FailuresToTolerate=1” object will be implicitly “forceProvisioned” when only two of the three sites are available

– Compliance will be fixed for such objects once the third site becomes available

Fault Domain AActive

Fault Domain CActive

vSphere + Virtual SAN Stretched Cluster

HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD HDDSSD

WitnessFault Domain B

HDD SSD

Page 19: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 19

Virtual SAN Stretched Clusters – Preferred, Non-preferred Sites

HDDSSD HDDSSD

vSphere + Virtual SAN

Witness Appliance

< 5 ms latency over >10/20/40 Gbps

FD3

FD1 FD2

<=20

0 ms l

atency

over 1

00 m

bps

L3 no m

ulticas

t<=200 ms latency over 100 mbps

L3 no multicast

L2 with multicast

Preferred• Preferred fault domain or site is one of the two active -

active fault domains

• The Preferred fault domain or site could be changed dynamically

• One of the active data sites is designated as “preferred” fault domain

– Required to handle the “split-brain” scenario - link failure between the active sites

– Determines which active site the witness joins

Preferred

major partition

Non-Preferred

vESXi

Page 20: Building a Stretched Cluster using Virtual SAN 6.1

20

Stretched Clusters and Read Locality

Read Locality• A VM will be running in (at most) one site

• FTT=1 implies that there are two copies of the data, one in each site

• Reads will be served from the copy of the data that resides on the same site as where the VM runs

• If the VM moves to the other site, then reads will be served from the (consistent) copy of the data in the new site

HDDSSD HDDSSD

vSphere + Virtual SAN

Witness Appliance

< 5 ms latency over >10/20/40 Gbps

FD3

Read Operation

FD1 FD2

200 m

s late

ncy ove

r 100

mbps

L3 no m

ulticas

t200 ms latency over 100 mbps

L3 no multicast

L2 with multicast

vESXi

CONFIDENTIAL

Page 21: Building a Stretched Cluster using Virtual SAN 6.1

21

Stretched Clusters and Writes

Writes• There is no locality for writes, availability over

performance!

• Writes must be acknowledged from both sites before we ACK to the application

• A typical write operation does not include any communication to the witness

HDDSSD HDDSSD

vSphere + Virtual SAN

Witness Appliance

< 5 ms latency over >10/20/40 Gbps

FD3

Write Operation

FD1 FD2

200 m

s late

ncy ove

r 100

mbps

L3 no m

ulticas

t200 ms latency over 100 mbps

L3 no multicast

L2 with multicast

vESXi

CONFIDENTIAL

Page 22: Building a Stretched Cluster using Virtual SAN 6.1

VMware Virtual SAN 6.1Configuring and operating a Stretched Cluster

Page 23: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 23

Configuring VMware Virtual SAN Stretched Cluster• Simple configuration procedure

• Necessary L3 and L2 with multicasts network connectivity and configuration should be completed before setting up stretched cluster

Configure Fault Domains

Select Witness Host

Create Disk Groups on Witness

Validate health of stretched

cluster configuration

Page 24: Building a Stretched Cluster using Virtual SAN 6.1

24

DEMO

Austin, TX Dallas, TX

witnessPlano, TX

5 ms latency over 10 gbps

L2 with Multicast

<200 ms latency over 100 mbps <200 ms latency over 100 mbps

Dell Switches S6000-ON ToR

FX2 IO Modules - FN410S

Active Active

vSphere + Virtual SAN Stretched Cluster

SSD SSD SSDSSD SSD SSD SSD SSD SSDSSD SSD SSD

Dell PowerEdge FX2 Dell PowerEdge FX2

vESXi

Dell Switches S6000-ON ToR FX2 IO Modules - FN410S

CONFIDENTIAL

Page 25: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 25

Configuring VMware Virtual SAN Stretched Cluster

• Health Check includes additional checks for stretched cluster:– Witness host configuration– Network configuration– Host compatibility – Fault domain configuration

To configure stretched cluster go to > cluster > manager tab > Fault Domains > click on icon to start wizard

Page 26: Building a Stretched Cluster using Virtual SAN 6.1

26

What about vSphere HA & DRS?

HDDSSD HDDSSD

vSphere + Virtual SAN

Witness Appliance

< 5 ms latency over >10/20/40 Gbps

FD3

FD1 FD2

200 m

s late

ncy ove

r 100

mbps

L3 no m

ulticas

t200 ms latency over 100 mbps

L3 no multicast

L2 with multicast

HA and DRS Behavior• HA/DRS will not use the witness host as a target since the

witness is a standalone host in the VC and it will appear to be an “incompatible” target

• HA failover

• If one site partitions away or fails, all Virtual SAN objects will become inaccessible in that partition

• HA will failover the VMs running in that site to the other active data site

• DRS will treat it as a normal cluster, migrations can happen across sites

vESXi

CONFIDENTIAL

Page 27: Building a Stretched Cluster using Virtual SAN 6.1

27

vSphere HA Recommendations• Make sure to set aside 50% of resources using

Admission Control!– Admission control is not resource management– Only guarantees power-on

• Enable “Isolation Response”– “Power Off” recommended response

• Manually specify multiple isolation addresses– One for each site using

“das.isolationaddressX”– Disable the default gateway using

“das.useDefaultIsolationAddress=false”

• Make sure vSphere HA respects VM Host affinity rules during failover!

CONFIDENTIAL

Page 28: Building a Stretched Cluster using Virtual SAN 6.1

28

vSphere DRS Recommendations• Enable DRS, you want your VMs happy

• Remember Read Locality? Setup VM/HostAffinity rules– DRS will only migrate VMs to hosts which belong to

the VM/Host group– Avoid “must rules” as they can bite you– Use “should rules”, HA can respect these as of

vSphere 6.0!– HA is smart and will go for “availability” over “rule

compliance”

CONFIDENTIAL

Page 29: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 29

Virtual SAN Stretched Clusters – Maintenance

HDDSSD HDDSSD

vSphere + Virtual SAN

Witness Appliance

< 5 ms latency over >10/20/40 Gbps

FD3

FD1 FD2

200 m

s late

ncy ove

r 100

mbps

L3 no m

ulticas

t200 ms latency over 100 mbps

L3 no multicast

L2 with multicast

Stretched Cluster supported policies• On the witness node host/disk/disk group maintenance

mode is only allowed the “NoAction” mode

• In the UI the witness host is a standalone host and by default only “NoAction” is supported for stand-alone hosts. So no change in behavior

• Default mode for API has been modified to be “NoAction”

• If disks on the witness node are decommissioned, objects will lose compliance. CLOM crawler will fix the compliance by rebuilding the witnesses

• For all other hosts in the cluster - “Enter maintenance mode” is supported in all 3 modes

vESXi

Page 30: Building a Stretched Cluster using Virtual SAN 6.1

Virtual SAN 6.1Stretched Cluster Failure Scenarios

Page 31: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 31

Face Your Fears, Test Failover Scenarios!• Data Site Partition

• Data Site Failure

• Witness Site Failure

• Witness network failure (1 site)

• Site Failure that hosts vCenter Server

• Host Isolation or Failure

HDDSSD HDDSSD

vSphere + Virtual SAN

Witness Appliance

< 5 ms latency over >10/20/40 Gbps

FD3

FD1 FD2

200 m

s late

ncy ove

r 100

mbps

L3 no m

ulticas

t

200 ms latency over 100 mbps

L3 no multicast

L2 with multicast

Page 32: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 32

Failure Scenarios – Network Partition Between Data Sites

HDDSSD HDDSSD

vSphere + Virtual SAN

Witness Appliance

< 5 ms latency over >10/20/40 Gbps

FD3

FD1 FD2

<=20

0 ms l

atency

over 1

00 m

bps

L3 no m

ulticas

t

<=200 ms latency over 100 mbps

L3 no multicast

L2 with multicast

Failure Scenario A• What if there is a network partition between the two active

data sites aka “split brain scenario”?

• Witness always forms a cluster with the “preferred” site in such a case and that is the partition that will make progress

• This means that VMs in the “non-preferred” site will lose access with storage

• If the HA network (most likely) is also isolated, then VMs in the “non-preferred” site will be restarted in the preferred site

• HA does not know what happened to the host in the non-preferred site!

Preferred Non-Preferred

HA Restart

Page 33: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 33

Failure Scenarios – Full Site Failure

HDDSSD HDDSSD

vSphere + Virtual SAN

Witness Appliance

< 5 ms latency over >10/20/40 Gbps

FD3

FD1 FD2

<=20

0 ms l

atency

over 1

00 m

bps

L3 no m

ulticas

t

<=200 ms latency over 100 mbps

L3 no multicast

L2 with multicast

What if one active site fails?• Since both active sites will have a copy of the data, the

second site can transparently take over

• Preferred or non-preferred makes no difference here

• Impacted VMs by full site failure will be restarted by vSphere HA

• Site failure is detected if it misses the heartbeat for 5 consecutive times

• The heartbeat is sent every second• Customers could continue creating VMs, etc. but they will

be out of compliant if FTT=1 (Force provisioning is added by default)

• What would happen once the site comes back? This is automatically detected which starts the re-sync of changed data. Once re-sync is done then customer should use DRS to distribute the VMs.

• Ideally you want all the nodes to be back online at the same time

Preferred Non-Preferred

HA Restart

vCenter Server

Page 34: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 34

Failure Scenarios – Witness Site Failure

HDDSSD HDDSSD

vSphere + Virtual SAN

Witness Appliance

< 5 ms latency over >10/20/40 Gbps

FD3

FD1 FD2

<200

ms l

atency

over 1

00 m

bps

L3 no m

ulticas

t

<200 ms latency over 100 mbps

L3 no multicast

L2 with multicast

Details• Witness failure is detected if it misses the heartbeat for 5

consecutive times• The heartbeat is sent every second by both master and

backup• If witness fails then there will be no disruption to IO

traffic for VMs

• VMs will continue running with no interruption since the two main sites could create a quorum

• One can create a completely new witness and connect it to the cluster

• What would happen once the witness comes back? We will communicate all the meta-data (For all the objects in the cluster) to the witness and cluster becomes healthy

Page 35: Building a Stretched Cluster using Virtual SAN 6.1

VMware Virtual SAN 6.0Interoperability

Page 36: Building a Stretched Cluster using Virtual SAN 6.1

36

Virtual SAN Stretched Cluster with vSphere Replication and SRM

CONFIDENTIAL

• Live migrations and automated HA restarts between stretched cluster sites• Replication between Virtual SAN datastores enables RPOs as low as 5 minute• 5 minutes RPO is exclusively available to Virtual SAN 6.x• Lower RPO’s are achievable due to Virtual SAN’s efficient vsanSparse snapshot mechanism• SRM does not support standalone Virtual SAN with one vCenter Server

Any distance >5 min RPOsite a

vSphere + Virtual SAN Stretched Cluster

< 5 ms latency over >10/20/40 gbpsActive Active

site b

<200 ms latency over 100 mbps

L3 no multicast<200 ms latency over 100 mbps

L3 no multicast

L2 with Multicast

site x

vSphere + Virtual SAN

VR

DR

vCenter

vCenter

witnessappliance

SRM

SRM

Page 37: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 37

DR orchestration for vCloud Air DRSingle-click recovery of on-premises applications in the cloud

Roadmap

Overview

• Multi-VM recovery plans to define application/site recovery procedures

• Easy to use workflows for DR testing, DR failover and failback

• Graceful migration workflows to ensure no data loss before planned downtime

• Drastically reduce RTOs when recovering multiple applications or entire site workloads

vSphere + Virtual SAN Stretched cluster

VCD – vCloud Air

Page 38: Building a Stretched Cluster using Virtual SAN 6.1

CONFIDENTIAL 38

2-Node Remote Office Branch Office Solution

Centralized Data Center

Cen

trally

man

aged

by

one

vCen

ter S

erve

r 38

ROBO1

HDDSSD HDDSSD

vSphere + Virtual SAN

HDDSSD HDDSSD

vSphere + Virtual SAN

HDDSSD HDDSSD

vSphere + Virtual SAN

witness

witness

witness

vESXiappliance

vESXiappliance

vESXiappliance

ROBO2

ROBO3

vCenter Server

Overview

• Extension of Virtual SAN Stretched Cluster solution

• Each of the node will be in a Fault Domain (FD)

• One witness per Virtual SAN cluster

• 500ms Latency tolerated!

• Witness node is an ESXi appliance (VM)

• All sites managed centrally by one vCenter

• Patching and software upgrades performed centrally through vCenter

• If there are N ROBOs then there will be N witness VMs

Page 39: Building a Stretched Cluster using Virtual SAN 6.1

39

THANK YOU

CONFIDENTIAL

Page 40: Building a Stretched Cluster using Virtual SAN 6.1
Page 41: Building a Stretched Cluster using Virtual SAN 6.1
Page 42: Building a Stretched Cluster using Virtual SAN 6.1

Building a Stretched Cluster with Virtual SANRawlinson Rivera, VMware, Inc

Duncan Epping, VMware, Inc

STO5333

#STO5333