STORAC SvSAN TNESS - StorMagic...Each scenario describes what happens during the failure and subsequently what happens when the infrastructure is returned to the optimal state. The

StorMagic. Copyright © 2020. All rights reserved.

WHITE PAPER

EXPLAINING THESTORMAGIC SvSAN WITNESS

OVERVIEW

Businesses that require high service availability in their applications can design their storage infrastructure in such a way as to create redundancy and eliminate single points of failure. StorMagic SvSAN achieves this through clustering - each instance of SvSAN is deployed as a two node cluster, with storage shared across the two nodes.

However, in order to properly mitigate against failures and the threat of downtime, an additional element is required. This is the "witness". This white paper explores the SvSAN witness, its requirements and tolerances and the typical failure scenarios that it helps to eliminate.

AVOIDING A "SPLIT-BRAIN"

Without a witness, even-numbered clustered environments are at risk of a scenario known as "split-brain" which can affect both the availability and integrity of the data. Split-brain occurs when the clustered, synchronously mirrored nodes lose contact with one another, becoming isolated. The nodes then operate independently from one another, and the data on each node diverges becoming inconsistent, ultimately leading to data corruption and potentially data loss.

To prevent split-brain scenarios from occurring a witness is used. The witness acts as an arbitrator or tiebreaker, providing a majority vote in the event of a cluster leader election process, ensuring there is only one cluster leader. If a leader cannot be determined the storage under

the cluster control is taken offline, preventing data corruption.The SvSAN witness:

Provides the arbitration service in a cluster leader election processIs a passive element of an SvSAN configuration and does not service any I/O requests for dataMaintains the cluster and mirror stateHas the ability to provide arbitration for thousands of SvSAN mirrorsCan be local to the storage or at a remote location:

Used over a wide area network (WAN) linkCan tolerate high latencies and low bandwidth network links

Is an optional SvSAN component

NOTE: It is possible to have SvSAN configurations that do not use an SvSAN witness, however implementation best practices must be followed and this is outside the scope of this white paper. If you would like to explore this further, please reach out to the StorMagic team by emailing [email protected]

Fig. 1: The witness is located separately from the SvSAN nodes.

mailto:[email protected]


SvSAN WITNESS SYSTEM REQUIREMENTS

As the witness sits separately from the SvSAN nodes and does not service I/O requests, it has noticeably lower minimum requirements:

CPU 1 x virtual CPU core (1 GHz)Memory 512MB (reserved)Disk 512MB

Network

1 x 1Gb Ethernet NIC

When using the witness over a WAN link use the following recommendations for optimal operation:

Latency of less than 3000ms, thiswould allow the witness to be located anywhere in the world9Kb/s of available network bandwidth between the VSA and witness (less than 100 bytes of data is transmitted per second)

OperatingSystem

The SvSAN witness can be deployed onto a physical server or virtual machine with the following:

StorMagic SvSAN Witness Appliance Windows Server 2016 (64-bit)Hyper-V Server 2016 (64-bit)Raspbian Buster (32-bit)CentOS 7.6 & 8.0RHEL 7.6 & 8.0vCenter Server Appliance (vCSA) 1

1 VMware vSphere 5.5 and higher

NOTE: The witness should be installed onto a server separate from the SvSAN VSA.

USING THE SvSAN WITNESS REMOTELY - BANDWIDTH AND LATENCY LIMITATIONS

The SvSAN witness can be deployed both locally and remotely, and this is of particular use in a multi-site deployment where a single witness handles every site from a central location. When deploying remotely however, restrictions on the bandwidth and latency of the connection should be taken into consideration.

When the network performance is poor, a typical response to solve the issue is to increase network bandwidth, which is a relatively simple

thing to achieve. However, this only improves performance when there is network congestion. Bandwidth addresses the amount of data that can be transmitted, but does not govern the speed of the link. In general having more bandwidth reduces the likelihood of congestion. Bandwidth is similar to lanes on a highway – more lanes enable more vehicles (data packets) to use the highway at the same time. When all lanes are full the bandwidth limit is reached, which leads to congestion (traffic jams).

Latency or round trip time (RTT) is another important factor and determines the speed of the link – lower latency equates to better network speeds. Referring back to the highway analogy, latency is the highway speed limit and impacts the time it takes to complete a round trip. Unfortunately, getting better network latency is not as simple as increasing bandwidth as it is affected by a number of factors, most of which are out of the user’s control. These include:

Propagation delay - the time that data takes to travel through a medium relative to the speed of light, such as fibre optic cable, or copper wire.

Routing and switching delays - the number of routers and switches data has to pass through.

Data protocol conversions - decoding and re-encoding slows down data.

Network congestion and queuing - bottlenecks caused by routers and switches.

Application latency - some applications canintroduce or only tolerate a certain amount of latency.

PUTTING THE SvSAN WITNESS' LATENCY TOLERANCE TO THE TEST

StorMagic has conducted tests with the SvSAN witness to determine the bandwidth and latency (and distance) that it can tolerate before service is adversely affected.

LATENCY EMULATOR – WANEMTo simulate different network latencies, the


WANem (http://

wanem.sourceforge.

net) Wide Area Network

Emulator tool was used. WANem

can be used to simulate Wide Area

Network conditions, allowing different

network characteristics, such as network delay

(latency), bandwidth, packet loss, packet corruption,

disconnections, packet re-ordering, jitter, etc. to be

configured.

The configuration used for the tests is illustrated in fig. 2.

The network characteristics of latency, bandwidth and packet loss were

increased until communication was lost between the VSA and the witness. Once a

failure had been observed the characteristic being tested was reset, allowing the

connection to be re-established. The following results are the extreme limits and show the SvSAN witness' tolerances:

The network latency between the VSA and the witness reached 3,000ms before the connection became unreliable and the VSA and witness disconnected from one another. Reducing the latency ensured that the VSA and witness reconnected.

The witness has minimal network bandwidth requirements. During the tests the bandwidth was reduced to 9 kilobits per second (Kb/s), before connectivity was lost.

Ideally there would be zero packet loss across the WAN, however factors such as excessive electromagnetic noise, signal degradation, faulty hardware and packet corruption all contribute to packet loss. The tests showed that the witness connectivity could withstand a packet loss of 20%.

RECOMMENDATIONS

In general the witness can function on very high latencies and has very low network bandwidth

Fig. 2: SvSAN witness testing configuration.

requirements while tolerating high packet loss. Although these are extreme scenarios and networks with these characteristics are rarely used in practice, it shows how efficient the witness is. The following are recommendations to ensure optimal operation:

Latency should ideally be less than 3,000ms, which allows the witness to be located almost anywhere in the world.

The amount of data transmitted from the VSA to the witness is small (under 100 bytes per second). It is recommended that there is at least 9Kb/s of available network bandwidth between the VSA and witness.

FAILURE SCENARIOS

This section discusses the common failure scenarios relating to two node SvSAN configurations with a witness. SvSAN in this configuration is designed to withstand failures for a single infrastructure component. However, for some scenarios it is possible to tolerate multiple failures.

Each scenario describes what happens during the failure and subsequently what happens when the infrastructure is returned to the optimal state.

The scenarios are as follows: 1. Network link failure between SvSAN VSA and

witness2. Mirror network link interruption3. Server failure4. Witness failure5. Network isolation6. Mirror network link and witness failure

The SvSAN witness can withstand latencies of up to:

3,000msThis allows the witness to be located almost any-where in the world.

And the witness requires network bandwidth of just:

9Kb/s

http://wanem.sourceforge.net





7. Server failure followed by witness failure8. Witness failure followed by a server failure

For all the failure scenarios the following assumptions are made:

The cluster/mirror leader is VSA1SvSAN is in the optimum state before the failure occursThere are multiple, resilient mirror network links between servers/VSAs

For a full list of best practices when deploying SvSAN, please refer to this white paper.

OPTIMUM STATE

Fig. 3 shows the optimum SvSAN cluster state.In the optimal cluster state:

Fig. 3: The optimum SvSAN cluster state.

All servers, VSAs (SvSAN), witness and network links are fully operationalQuorum is determined and one of the VSAs is elected the cluster leaderI/O can be performed by any of the VSAsMirror state is synchronized

SCENARIO #1NETWORK LINK FAILURE BETWEEN VSA AND WITNESS

This scenario occurs when the network link between a single server/VSA (VSA1) and the witness is interrupted, as shown in fig. 4:

Fig. 4: After a network link failure.

During the failure period:The VSAs and witness remain fully operational with the VSAs continuing to serve I/O requests, without degradation to performanceAll mirror targets remain synchronized ensuring that the data is fully protected and that the required service availability is maintainedDuring the network interruption, VSA1 continues to remain as the cluster leader and the quorum is maintainedVSA1 makes periodic attempts to connect to the witness and recover the network connection

After recovery:When the network connectivity is restored, communication between VSA1 and the witness is re-established. As the network interruption did not affect the environment, operation continues as normal.

https://stormagic.com/building-highly-available-svsan-configuration/


SCENARIO #2MIRROR NETWORK LINK INTERRUPTION

The mirror traffic network link between the servers is interrupted, as shown in fig. 5:

Fig. 5: After a mirror network link interruption.

During the failure period, where there are multiple redundant network connections between VSA1 and VSA2:

Both VSAs and witness remain fully operational and the VSAs continue to serve I/O requestsMirror traffic is automatically redirected over the alternate network links if permittedDuring the network interruption, VSA1 continues to remain as the cluster leader and the quorum is maintainedThe mirror state remains synchronized

NOTE: Potential performance issues could arise if the alternative network links do not have the same characteristics (speed and bandwidth) as the primary mirror network. Furthermore, using alternative network links for SvSAN mirror traffic could potentially affect other applications or users on the same network. Both of these are especially important when there is a high rate of change of data such as a full mirror re-synchronization.

After recovery:When the network links are recovered, mirror traffic will automatically fail back and use the primary mirror traffic network.

In the event that all network communication between VSA1 and VSA2 is lost (multiple failures),

but they are able to communicate with the witness, one of the mirror plexes will be taken offline to prevent split-brain from occurring and avoid data corruption or loss. When recovering from this scenario, the node with offline plexes brings them online and its mirror state will be unsynchronized. The VSAs will then perform a fast re-synchronization of the mirrored targets and issue initiator rescans to all hosts, mounting those targets automatically.

SCENARIO #3SERVER FAILURE

This occurs when a single server (Server A) fails, as shown in fig. 6:

Fig. 6: After a server failure.

During the failure period:The surviving VSA (VSA2) and the witness remain fully operationalVSA2 is promoted to cluster leader and the witness is updated to reflect the state changeOnly the surviving VSA (VSA2) can perform I/OVirtual machines that were running on the failed server (Server A) will be restarted on the surviving server (Server B)Virtual machines running on Server B continue to run uninterruptedThe mirror state becomes unsynchronized

After the recovery of Server A:VSA1 re-joins the cluster. Its mirror state is marked as unsynchronizedVSA2 remains as cluster leaderThe VSAs will perform a fast re-synchronization of the mirrored targets and issue initiator rescans to all hosts, mounting those targets automatically

On completion, the mirror state is marked as


synchronizedNOTE: If the failure was caused by the total loss of storage, then this will require a full re-synchronization of the data

The virtual machines remain running on Server B

Virtual machines can be moved to Server A manually (vMotion/Live Migration) or automatically (VMware Distributed Resource Scheduler or Microsoft Hyper-V Dynamic Optimization)

SCENARIO #4WITNESS FAILUREThis occurs when the witness fails, as shown in fig. 7:

Fig. 7: After a witness failure.

During the failure period:Both servers (Server A & Server B) remain fully operational

I/O requests can be serviced by both VSAs without disruption to service

Quorum is maintained, with VSA1 remaining as cluster leaderThe VSAs periodically retry to connect to the witnessMirror state remains synchronized

After recovery:The witness is recoveredVSA1 and VSA2 reconnect to the witnessCurrent cluster state is propagated to the witness

SCENARIO #5NETWORK ISOLATIONThis scenario leads to server isolation when multiple network links fail between the servers

and witness, and the server (Server B) remains operational. This is shown in fig. 8:

Fig. 8: During network isolation.

During the failure period:VSA1 continues as normal accepting and servicing I/O requestsVSA1 remains as cluster leaderAs VSA2 cannot contact either VSA1 or the witness, it identifies itself as being isolated

VSA2 takes its mirror plexes offline to stop updates to the storage and to prevent a split-brain condition occurring.VSA1 marks VSA2 mirror plexes as unsynchronized. VSA2 experiences loss of quorum and has its volumes taken offline until quorum is restored.The virtual machines that were running on Server B experience a HA event and are restarted on Server A.

After recovery when the network connectivity to Server B is restored:

VSA2 re-joins the cluster. Its mirror state is marked as unsynchronizedThe VSAs will perform a fast re-synchronization of the mirrored targets and issue initiator rescans to all hosts, mounting those targets automaticallyOn completion, the mirror state is marked as synchronizedThe virtual machines remain running on Server A

Virtual machines can be moved to Server B manually (vMotion/Live Migraton) or automatically (VMware Distributed Resource Scheduler or Microsoft Hyper-V Dynamic Optimization)


SCENARIO #6MIRROR NETWORK LINK AND WITNESS FAILUREThis multiple failure scenario explains what happens when the mirror network link and the witness fails. As shown in fig. 9:

Fig. 9: After a dual mirror network link and witness failure.

During the failure period:Both servers remain online, with VSA1 remaining as the cluster leaderIf there are other network links between the VSA1 and VSA2:

The mirror traffic will be redirected to utilize those links and the mirror state will remain synchronizedEither server can perform I/O requestsBoth VSAs periodically poll for the presence of the witness

If all the links between the servers (Server A and Server B) are severed:

All storage will be immediately taken offline to prevent data corruption and split-brain scenarios occurring

After recovery of the network links:The VSAs negotiate leadershipStorage is brought back onlineThe VSAs will perform a fast re-synchronization of the mirrored targets and issue initiator rescans to all hosts, mounting those targets automaticallyOn completion, the mirror state is marked as synchronized.Guest virtual machines will be restarted on the servers

SCENARIO #7SERVER FAILURE FOLLOWED BY WITNESS FAILUREThis scenario occurs when multiple infrastructure components fail. Here a server (Server A) fails followed by a subsequent failure of the witness or failure of communication to the witness. This is shown in fig. 10:

Fig. 10: Here, the server then the witness fails.

After the failure of Server A:If VSA2 was able to update the cluster state on the witness before it failed:

VSA2 remains online and is promoted to leaderThe mirror state becomes unsynchronizedI/O requests are serviced by VSA2 without service interruption

If VSA2 was NOT able to update the cluster state on the witness before it failed:

VSA2 takes its mirror plexes offline experiencing loss of quorum

Recovery Scenario 1 - Server A is recovered first, followed by the witness:

VSA1 re-joins the cluster; its storage will be in an unsynchronized stateMirrors are automatically re-synchronizedWhen the witness returns to service, it is updated with the cluster and mirror state

Recovery Scenario 2 - Witness is recovered first, followed by Server A:

The NSH is updated with the cluster and mirror statusVSA1 re-joins the clusterMirrors are re-synchronized


SCENARIO #8WITNESS FAILURE FOLLOWED BY A SERVER FAILURE

This scenario occurs when multiple infrastructure components fail. Here the witness or the link to the witness fails first followed by a server (Server A) failure. This is shown in fig. 11:

Fig. 11: Here, the witness then the server fails.

During the failure period:

The remaining VSA (VSA2) is unable to contact either its partner server or the witnessVSA2 assumes it has become isolated and takes its mirror plexes offline to prevent data corruptionService disruption occurs - no I/O requests are serviced by the VSAs

Recovery Scenario 1 - Server A is recovered first, followed by the witness:

Servers renegotiate cluster leadershipThe storage is brought back online and the mirrors are re-synchronizedWhen the witness is returned to service, it is updated with the cluster and mirror state

Recovery Scenario 2 - Witness is recovered first, followed by Server A:

VSA2 elects itself as leader and brings its mirror plexes onlineWitness is updated with the cluster and mirror statusVSA1 re-joins the cluster and mirrors are re-synchronized

CONCLUSIONS

SvSAN, deployed with a witness, has been developed to withstand single infrastructure component failures. However, for some scenarios it is possible to tolerate multiple component failures, ensuring that service availability is maintained wherever possible.

For single component failure scenarios, SvSAN preserves cluster stability, avoiding split-brain conditions. When the failure is rectified, SvSAN automatically recovers and returns the infrastructure back to the optimal state, performing a fast re-synchronization of the mirrors where possible, reducing the time frame and exposure to subsequent failures and avoiding potential service disruptions.

As shown in the failure scenarios in this white paper, SvSAN protects the integrity of the data at all costs during an infrastructure failure, while keeping the storage available.

The SvSAN witness is a key element in delivering this protection. It provides a significant competitive advantage with its ability to be located remotely and provide quorum to hundreds or even thousands of clusters. The witness communication protocol is lightweight and efficient in that it only requires a small amount of bandwidth and can tolerate very high latencies and packet losses enabling it to be used over high latency, low bandwidth WAN links allowing the witness to be located nearly anywhere in the world.

Storage infrastructure for edge environments should be simple, cost-effective and flexible and SvSAN's witness is an integral part in ensuring that SvSAN is perfectly designed for typical edge deployments such as remote sites, retails stores and branch offices.


FURTHER READINGThere are many features that make up SvSAN, of which the witness is just one. Why not explore some of the others, such as Predictive Storage Caching, or Data Encryption? These features and more can be accessed through the extensive collection of white papers on the StorMagic website.

Additional details on SvSAN are available in the Technical Overview which details SvSAN's capabilities and deployment options.

If you're ready to test SvSAN in your environment, you can do so totally free of charge, with no obligations. Simply download our fully-functioning free trial of SvSAN from the website.

If you still have questions, or you'd like a demo of SvSAN you can contact the StorMagic team directly by sending an email to [email protected]

StorMagicUnit 4, Eastgate

Office CentreEastgate Road

BristolBS5 6XX

United Kingdom

+44 (0) 117 952 [email protected]

www.stormagic.com

http://stormagic.com/closing-the-performance-gap-svsan-caching/

http://stormagic.com/closing-the-performance-gap-svsan-caching/

http://stormagic.com/data-rest-encryption-best-practices-edge-environments/

http://stormagic.com/resources/white-papers/

https://stormagic.com/svsan-technical-overview/

https://stormagic.com/trial/

mailto:[email protected]

mailto:sales%40stormagic.com?subject=

http://www.stormagic.com

Documents

STORAC SvSAN TNESS - StorMagic...Each scenario describes what happens during the failure and subsequently what happens when the infrastructure is returned to the optimal state. The