63
© Copyright IBM Corporation, 2011 IBM Storwize V7000 Clustering and SVC Split I/O Group Deeper Dive Bill Wiegand - ATS Senior I/T Specialist Storage Virtualization

Split Svc Nodes

Embed Size (px)

Citation preview

Page 1: Split Svc Nodes

© Copyright IBM Corporation, 2011

IBM Storwize V7000 Clustering and SVC Split I/O Group Deeper Dive

Bill Wiegand - ATSSenior I/T SpecialistStorage Virtualization

Page 2: Split Svc Nodes

2 © Copyright IBM Corporation, 2011

Agenda

Quick Basics of Virtualization

Scaling Storwize V7000 via Clustering

Scaling Storwize V7000 Unified

Q&A 1

SVC Split I/O Group V6.3

Q&A 2

Page 3: Split Svc Nodes

3 © Copyright IBM Corporation, 2011

StorageNetwork

Managed Disks

VolumesVolumes Volumes Volumes

Node Node Node Node Node

Designed to be redundant, modular and scalable solution

Node NodeNode

Cluster consisting of one to four I/O Groups managed as a single system

Two nodes make up an I/O Group and own given volumes

Virtualization – The Big Picture

Page 4: Split Svc Nodes

4 © Copyright IBM Corporation, 2011

MDG1

Pool 2

MDG3

Cluster:Max 4 I/O Groups built from 4 SWV7K control enclosures or 8 SVC nodesSVC Cluster or Storwize V7000 Clustered System

Managed Disks (MDisks):• Internally or externally provided • Max 4096 MDisks per System

Virtualization – The Big Picture

I/O Group A I/O Group B

Volumes:Max 8192 Volumes, 2048 per I/O Group, with each up to 256TB in size and each assigned to:• A specific I/O Group• Built from a specific Storage Pool

Pool 1 Pool 3 Storage Pools: • Max 128 Storage Pools• Max 128 MDisks per Pool

Control Enclosure Control Enclosure

Nodes Nodes

Page 5: Split Svc Nodes

5 © Copyright IBM Corporation, 2011

Scale the Storwize V7000 Multiple Ways

An I/O Group is a control enclosure and its associated SAS attached expansion enclosures

Clustered system can consist of 2-4 I/O Groups

– SCORE approval for > 2

Scale Storage– Add up to 4x the capacity

– Add up to 4x the throughput

Non-disruptive upgrades– From smallest to largest

configurations

– Purchase hardware only when you need it

Virtualize storage arrays behind Storwize V7000 for even greater capacity and throughput

Exp

and

Cluster

ClusterControl Enclosure Control Enclosure Control Enclosure

Expansion Enclosures

Expansion Enclosures

Expansion Enclosures

Storwize V7000One I/O Group

System

Storwize V70002-4 I/O Groups

Clustered System

NOTE: Storwize V7000 Clustered System with greater then two I/O Groups/Frames

requires SCORE/RPQ approval

An I/O Group is a control enclosure and its associated

SAS connected expansion enclosures

Exp

and

No interconnection of SAS chains between control enclosures as control enclosures communicate via FC and must use all 8 FC ports on enclosures

Page 6: Split Svc Nodes

6 © Copyright IBM Corporation, 2011

Storwize V7000 Unified Scaling

Storwize V7000 Unified can scale disk capacity by adding up to nine expansion enclosures to the standard control enclosure

Virtualize external storage arrays behind Storwize V7000 Unified for even greater capacity

– CIFS not supported currently with externally virtualized storage

CAN NOT horizontally scale out by adding additional Unified systems or even adding just another Storwize V7000 control enclosure and associated expansion enclosures at this time

– If customer has clustered Storwize V7000 system today they will not be able to upgrade to Unified system in 2012 when MES is available

Control Enclosure

Expansion Enclosures

Control Enclosure

Expansion Enclosures

Storwize V7000 Unified2-4 I/O Groups

Clustered System NOT SUPPORTED

Storwize V7000 Unified

One I/O Group System

Control Enclosure

Expansion Enclosures

Exp

and

Page 7: Split Svc Nodes

7 © Copyright IBM Corporation, 2011

Clustered System Facts

Clustered system provides ability to independently grow capacity and performance

– Add expansion enclosures for more capacity

– Add control enclosure for more performance

– No extra feature to order and no extra charge for a clustered system• Configure one system using USB stick and then add second using GUI

Clustered systems GA support is for up to 480 SFF disk drives or 240 LFF disk drives or a mix thereof

– Up to 480TB raw capacity in one 42U rack

– Enables Storwize V7000 to compete effectively against larger EMC, NetApp, HP systems

Support for a larger system can be requested by submitting a SCORE/RPQ

– E.g. “Eight Storwize V7000 node canisters in four control enclosures”

– Up to 960TB raw capacity in two 42U racks

Page 8: Split Svc Nodes

8 © Copyright IBM Corporation, 2011

Clustered System Facts

Adding additional control enclosures to existing V6.2+ system is non-disruptive

– Requires new control enclosures be loaded with V6.2.x minimum

Control enclosures can be any combination of models– 2076-112, 124, 312, 324

Clustered system operates as a single storage system– Managed via one IP address

Both node canisters in a given control enclosure are part of the same I/O Group

– Cannot create an I/O Group with one node from each of 2 different control enclosures

– Adding one node in control enclosure to an I/O Group will automatically add the other

– Storwize V7000 clustered system does not support “split I/O group” configurations - (also known as “stretch cluster”)

Page 9: Split Svc Nodes

9 © Copyright IBM Corporation, 2011

Clustered System Facts

Inter control enclosure communication provided by a Fibre Channel (FC) SAN

– Must use all 4 FC ports on each node canister and zone all together

– All FC ports on a node canister must have at least one path to every node canister in the clustered system that is not in the same control enclosure

– Node canisters in the same control enclosure have connectivity via the PCIe link of the midplane and don’t require FC ports be zoned together

• However, recommended guideline is to zone them together as it provides a secondary path should the PCIe link have issues

Only 1 control enclosure can appear on a given SAS chain Only 1 node canister can appear on a single strand of SAS chain

– Key to realize is there is no access by one control enclosure (I/O Group) to the SAS attached expansion enclosures of another control enclosure (I/O Group) other then via the SAN

Page 10: Split Svc Nodes

10 © Copyright IBM Corporation, 2011

Clustered System Facts

Currently volumes built on internal MDisks in a storage pool will be owned by the same I/O group (IOG) that owns the majority of the MDisks in that storage pool

– E.g. Pool-1 has 3 MDisks from IOG-0 and 4 from IOG-1 then by default IOG-1 will own all volumes created

• Default GUI behavior can be overridden using the “Advanced” option in GUI

– If pool owns exact same number of MDisks from each I/O group then volumes will be owned by IOG-0

Expansion enclosures only communicate with their owning control enclosure meaning host I/Os coming into IOG-0 but data is on IOG-1 means I/O is forwarded to IOG-0 over FC

– Similar process to SVC accessing external storage systems

– Does not go thru cache on owning I/O group but directly to MDisk

• Uses very lowest layer of I/O stack to minimize any additional latency

Page 11: Split Svc Nodes

11 © Copyright IBM Corporation, 2011

Clustered System Example

• Expansion enclosures are connected through one control enclosure and can be part of only one I/O group

• All MDisks are part of only one I/O group

• Storage pools can contain MDisks from more than one I/O group

• Inter-control enclosure communications happens over the SAN

• A volume is serviced by only one I/O group

All cabling shown is logical

SAN

I/O Group #2I/O Group #1

Expansion Enclosure

Expansion Enclosure

Expansion Enclosure

Control Enclosure #2

Expansion Enclosure

Expansion Enclosure

Expansion Enclosure

Control Enclosure #1

Storage Pool B

mdisk mdiskmdisk mdisk

Storage Pool A

mdisk mdisk

Storage Pool C

mdisk mdisk

Node Canister Node Canister Node Canister Node Canister

Page 12: Split Svc Nodes

12 © Copyright IBM Corporation, 2011

Storwize V7000 Clustered System – DR

An I/O Group is a control enclosure and its associated SAS attached expansion enclosures

A Clustered System can consist of 2-4 I/O Groups

– SCORE approval for > 2

Replication between clustered systems is via fibre channel ports only

– Replication between up to four clustered systems is allowed

– Requires 5639-RM1 license(s) at each site

Storwize V7000One I/O GroupSingle Frame

System

Storwize V70002-4 I/O Group

Clustered System

NOTE: Storwize V7000 Clustered System with greater then 2 I/O Groups/Frames requires SCORE/RPQ approval

Control Enclosure

Expansion Enclosures

Expansion Enclosures

Control Enclosure

Expansion Enclosures

Expansion Enclosures

Control Enclosure

Expansion Enclosures

Expansion Enclosures

Control Enclosure

Expansion Enclosures

Expansion Enclosures

Storwize V7000One to four I/O Group System

Control Enclosure

Expansion Enclosures

Expansion Enclosures

Control Enclosure

Expansion Enclosures

Expansion Enclosures

Control Enclosure

Expansion Enclosures

Expansion Enclosures

Control Enclosure

Expansion Enclosures

Expansion Enclosures

Production Site Disaster Recovery Site

Global Mirroror

Metro Mirror

Page 13: Split Svc Nodes

13 © Copyright IBM Corporation, 2011

Expansion Enclosures

Storwize V7000 Clustered System

I/O Group 1

Control Enclosure

Expansion Enclosures

Expansion Enclosures

Storwize V7000 Clustered System

I/O Group 2

Control Enclosure

Expansion Enclosures

Expansion Enclosures

Production Site A Production Site B

Clustered System

Separated by Distance

Host

MirroredVolume

Storwize V7000 Clustered System – HA A High Availability clustered system

similar to a SVC Split I/O Group configuration is not possible since we can not split a control enclosure in half and install at two different sites

– One I/O Group will be at each site unlike SVC where each node in an I/O Group can be installed in a different site

• So if you lose a site you lose access to all volumes owned by that I/O Group

• There is no automatic failover of a volume from one I/O Group to another

– Volume mirroring does allow for a single host volume to have pointers to two sets of data which can be on different I/O Groups in a clustered system, but again if you lose a site you lose the entire I/O Group so any volumes owned by that I/O Group will be offline

• You can migrate the volume ownership from the failed IOG to the other IOG but data may be lost as unwritten data still in cache on offline IOG is discarded in process of migration or could have been lost if IOG failed hard without saving cached data

Page 14: Split Svc Nodes

14 © Copyright IBM Corporation, 2011

So Begs the Question Why Cluster

One reason it is offered is because we can– Runs same software as SVC which supports 1-4 I/O Groups

Can start very small and grow very large storage system with single management interface

– Helps to compete with larger midrange systems from other vendors

Can virtualize external storage too providing same virtualization features across entire Clustered System

– Just like SVC cluster so desirable for same reasons large SVC clusters are

************************************************************************** However, nothing wrong with going with 1-4 separate systems

versus a Clustered System if customer prefers– System management isn’t that hard anyway

– If customer will lose sleep over possible complete failure of a control enclosure, no matter how unlikely that is, then go with separate systems

Page 15: Split Svc Nodes

15 © Copyright IBM Corporation, 2011

Q&A

Page 16: Split Svc Nodes

16 © Copyright IBM Corporation, 2011

Q&A

Page 17: Split Svc Nodes

© 2010 IBM Corporation

SVC Split I/O Group UpdateBill Wiegand/Thomas VogelATS System Storage

IBM System Storage

Page 18: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

18

Terminology

SVC Split I/O Group Review

Long distance: refresh– WDM devices– Buffer-to-Buffer credits

SVC Quorum disk

Split I/O Group without ISLs between SVC nodes– Supported configurations– SAN configuration for long distance

Split I/O Group with ISLs between SVC nodes– Supported configurations– SAN configuration for long distance

Agenda

Page 19: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

19

Terminology

SVC Split I/O Group = SVC Stretched Cluster = SVC Split Cluster

– Two independent SVC nodes in two independent sites + one independent site for Quorum– Acts just like a single I/O Group with distributed high availability

Distributed I/O groups – NOT a HA Configuration and not recommended, if one site failed:– Manual volume move required– Some data still in cache of offline I/O Group

I/O Group 1 I/O Group 1

Site 1 Site 2

I/O Group 1

I/O Group 2

I/O Group 1 I/O Group 2

Site 1 Site 2

Site 2Site 1

Storwize V7000 Split I/O Group not an option:– Single enclosure includes both nodes– Physical distribution across two sites not possible

Site 1 Site 2

Page 20: Split Svc Nodes

IBM Systems and Technology Group

© 2008 IBM Corporation20

SVC – What is a Failure Domain Generally a failure domain will

represent a physical location, but depends on what type of failure you are trying to protect against Could all be in one building on different

floors/rooms or just different power domains in same data center

Could be multiple buildings on the same campus

Could be multiple buildings up to 300KM apart

Key is the quorum disk If only have two physical sites and

quorum disk to be in one of them then some failure scenarios won’t allow cluster to survive

Minimum is to have active quorum disk system on separate power grid in one of the two failure domains

Page 21: Split Svc Nodes

IBM Systems and Technology Group

© 2008 IBM Corporation21

SVC Quorum 3 SVC Quorum 2

Failure Domain 1

Failure Domain 2

Node 1 Node 2

SVC Quorum 1Active Quorum

1) Loss of active quorum: SVC selects another quorum Continuation of operations

2) Loss of storage system: Loss of active quorum SVC selects another quorum Continuation of operations Mirrored Volumes continue

operation but may take 60sec or more since active quorum disk failed

Note: The loss of all quorum disks will not cause the cluster to stop as long as there are a majority of the nodes operational in the cluster. However, mirrored Volumes will likely go offline. This is why you would manually configure the cluster so the quorum disk candidates are located on disk systems in both failure domains.

Volume Mirroring

TotalStorage Storage Engine 336

TotalStorage Storage Engine 336

ISL 1

ISL 2

SVC – How Quorum Disks Affect Availability (1)

Page 22: Split Svc Nodes

IBM Systems and Technology Group

© 2008 IBM Corporation22

SVC Quorum 3 SVC Quorum 2

Failure Domain 1

Failure Domain 2

Node 1 Node 2

SVC Quorum 1Active Quorum

Volume Mirroring

TotalStorage Storage Engine 336

TotalStorage Storage Engine 336

ISL 1

ISL 2

Lose of Failure Domain 2: Active quorum lost Half of nodes lost Loss of cluster majority Node 1 can not utilize

quorum candidate to recover and survive

Node 1 shuts down and cluster stopped

May not be recoverable and may require cluster rebuild and data restore from backups

Lose of Failure Domain 1 : Active quorum not affected Continuation of operations

No Access to Data on Disk

No Active Quorum

SVC – How Quorum Disks Affect Availability (2)

Active Quorum

Page 23: Split Svc Nodes

IBM Systems and Technology Group

© 2008 IBM Corporation23

SVC Quorum 3

Failure Domain 1

Failure Domain 2

Node 1 Node 2

SVC Quorum 1SVC Quorum 2Active Quorum

Volume Mirroring

TotalStorage Storage Engine 336

TotalStorage Storage Engine 336

ISL 1

ISL 2

Current Supported Configuration for Split I/O Group

Automated failover

with SVC handling

The loss of:

- SVC node

- Quorum disk

- Storage subsystem

Can incorporate

MM/GM to provide

disaster recovery

- 3 site like capability

Failure Domain 3

SVC Quorum 1Active Quorum

Active Quorum

Disk system that supports

“Extended Quorum”

Page 24: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

24

SVC Split I/O Group

Site 1 SVC Node 1

Site 2 SVC Node 2

Site 3 Quorum disk

Cluster Status

Operational Operational Operational Operational, optimal

Failed Operational Operational Operational, Write cache disabled

Operational Failed Operational Operational, Write cache disabled

Operational Operational Failed Operational, Write Cache enabled, but different active Quorum disk

Operational, link to Site 2 failed: Split Brain

Operational, link to Site 1 failed: Split Brain

Operational Whichever node accesses the active quorum disk first survives and the partner node goes offline

Operational Failed same time with Site 3 Failed same time with Site 2 Stopped

Failed same time with Site 3 Operational Failed same time with Site 1 Stopped

Page 25: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

25

Advantages / Disadvantages of SVC Split I/O Group

Advantages– No manual intervention required– Automatic and fast handling of storage failures – Volumes mirrored in both locations – Transparent for servers and host based clusters– Perfect fit in a vitualized environment (like VMware VMotion, AIX Live Partition Mobility)

Disadvantages– Mix between HA and DR solution but not a true DR solution– Non-trivial implementation

Page 26: Split Svc Nodes

© Copyright IBM Corporation, 2011

SVC Split I/O Group V6.3 Enhancements

Page 27: Split Svc Nodes

27 © Copyright IBM Corporation, 2011

Split I/O Group – Physical Configurations

The following charts show supported physical configurations for the new Split I/O Group support in V6.3

– VSANs (CISCO) and Virtual Fabrics (Brocade) are not supported by all switch models from the respective vendors

• Consult vendor for further information Enhancements designed to help us compete more effectively

with EMC VPLEX at longer distances Note that this information is all very new even to ATS and

some requirements could change prior to GA Highly recommend engaging ATS for solution design review

– w3.ibm.com/support/techxpress

Storwize V7000 does not provide any sort of split I/O group, split cluster, stretch cluster HA configurations

– A clustered Storwize V7000 provides the ability to grow system capacity and scale performance within a localized single system image

Page 28: Split Svc Nodes

28 © Copyright IBM Corporation, 2011

Extension of Currently Supported Configuration

Server Cluster 1

SVC + UPS

Server Cluster 2

SVC + UPS

SAN

SAN SAN

SAN

User chooses number of ISLs on SAN

User chooses number of ISLs on SAN

Two ports per SVC node attached to local SANsTwo ports per SVC node attached to remote SANs via DWDMHosts and storage attached to SANs via ISLs sufficient for workload3rd site quorum (not shown) attached to SANs

Active DWDM over shared single mode

fibre(s)

0-10 KM Fibre Channel distance supported up to 8Gbps

11-20KM Fibre Channel distance supported up to 4Gbps

21-40KM Fibre Channel distance supported up to 2Gbps

Page 29: Split Svc Nodes

29 © Copyright IBM Corporation, 2011

Configuration With 4 Switches at Each Site

Server Cluster 1

SVC + UPS

Server Cluster 2

SVC + UPS

PrivateSAN

PrivateSAN

PrivateSAN

PrivateSAN

PublicSAN

PublicSAN

PublicSAN

PublicSAN

1 ISL per I/O groupConfigured as trunk

1 ISL per I/O groupConfigured as trunk

User chooses number of ISLs on public SAN

User chooses number of ISLs on public SAN

Two ports per SVC node attached to public SANsTwo ports per SVC node attached to private SANsHosts and storage attached to public SANs3rd site quorum (not shown) attached to public SANs

Page 30: Split Svc Nodes

30 © Copyright IBM Corporation, 2011

Configuration Using CISCO VSANs

Server Cluster 1

SVC + UPS

Server Cluster 2

SVC + UPS

PublicVSAN

PrivateVSAN

PublicVSAN

PrivateVSAN

PrivateVSAN

PublicVSAN

PrivateVSAN

PublicVSAN

Note ISLs/Trunks for private VSANs are dedicatedrather than being shared to guarantee dedicatedbandwidth is available for node to node traffic

Switches are partitioned using VSANs

Page 31: Split Svc Nodes

31 © Copyright IBM Corporation, 2011

Configuration Using Brocade Virtual Fabrics

Server Cluster 1

SVC + UPS

Server Cluster 2

SVC + UPS

PublicSAN

PrivateSAN

PublicSAN

PrivateSAN

PrivateSAN

PublicSAN

PrivateSAN

PublicSAN

Physical switches are partitioned intotwo logical switches

Note ISLs/Trunks for private SANs are dedicatedrather than being shared to guarantee dedicatedbandwidth is available for node to node traffic

Page 32: Split Svc Nodes

© 2009 IBM Corporation32

Split I/O Group – Distance

■ The new Split I/O Group configurations will support distances of up to 300km (same recommendation as for Metro Mirror)

■ However for the typical deployment of a split I/O group only 1/2 or 1/3rd of this distance is recommended because there will be 2 or 3 times as much latency depending on what distance extension technology is used

■ The following charts explain why

Page 33: Split Svc Nodes

© 2009 IBM Corporation33

Metro Mirror

Technically SVC supports distances up to 8000km

SVC will tolerate a round trip delay of up to 80ms between nodes

The same code is used for all inter-node communication Global Mirror, Metro Mirror, Cache Mirroring, Clustering SVCs proprietary SCSI protocol only has 1 round trip

In practice Applications are not designed to support a Write I/O latency of 80ms

Hence Metro Mirror is deployed for shorter distances (up to 300km) and Global Mirror is used for longer distances

Page 34: Split Svc Nodes

© 2009 IBM Corporation34

Metro Mirror: Application Latency = 1 long distance round trip

IBM Presentation Template Full Version

Data center 1 Data center 2

4) Metro Mirror Data transfer to remote site

5) Acknowledgment

• Steps 1 to 6 affect application latency• Steps 7 to 10 should not affect the application

Server Cluster 1

1) Write request from host2) Xfer ready to host3) Data transfer from host6) Write completed to host

7a) Write request from SVC8a) Xfer ready to SVC9a) Data transfer from SVC10a) Write completed to SVC

SVC Cluster 1

Server Cluster 2

7b) Write request from SVC8b) Xfer ready to SVC9b) Data transfer from SVC10b) Write completed to SVC

SVC Cluster 2

1 round trip

Page 35: Split Svc Nodes

© 2009 IBM Corporation35

Split I/O Group for Business Continuity

Split I/O Group splits the nodes in an I/O group across two sites

SVC will tolerate a round trip delay of up to 80ms Cache Mirroring traffic rather than Metro Mirror traffic is

sent across the inter-site link

Data is mirrored to back-end storage using Volume Mirroring

Data is written by the 'preferred' node to both the local and remote storage

The SCSI Write protocol results in 2 round trips This latency is generally hidden from the Application by the

write cache

Page 36: Split Svc Nodes

© 2009 IBM Corporation36

Split I/O Group – Local I/O: Application Latency = 1 round trip

IBM Presentation Template Full Version

Data center 1 Data center 2• Steps 1 to 6 affect application latency• Steps 7 to 10 should not affect the application

Server Cluster 1

1) Write request from host2) Xfer ready to host3) Data transfer from host6) Write completed to host

Server Cluster 2

4) Cache Mirror Data transfer to remote site

5) Acknowledgment

SVC Split I/O Group

7b) Write request from SVC8b) Xfer ready to SVC9b) Data transfer from SVC10b) Write completed to SVC

1 round trip

2 round trips – but SVCwrite cache hides thislatency from the host

Node 1 Node 2

Page 37: Split Svc Nodes

© 2009 IBM Corporation37

Split I/O Group for Mobility

• Split I/O Group is also often used to move workload between servers at different sites

• VMotion or equivalent can be used to move Applications between servers

• Applications no longer necessarily issue I/O requests to the local SVC nodes

• SCSI Write commands from hosts to remote SVC nodes results in an additional 2 round trips worth of latency that is visible to the Application

Page 38: Split Svc Nodes

© 2009 IBM Corporation38

Split I/O Group – Remote I/O: Application Latency = 3 round trips

IBM Presentation Template Full Version

Data center 1 Data center 2• Steps 1 to 6 affect application latency• Steps 7 to 10 should not affect the application

Server Cluster 1

1) Write request from host2) Xfer ready to host3) Data transfer from host6) Write completed to host

Server Cluster 2

7b) Write request from SVC8b) Xfer ready to SVC9b) Data transfer from SVC10b) Write completed to SVC

4) Cache Mirror Data transfer to remote site

5) Acknowledgment

SVC Split I/O Group

2 round trips

1 round trip

2 round trips – but SVCwrite cache hides thislatency from the host

Node 1 Node 2

Page 39: Split Svc Nodes

© 2009 IBM Corporation39

Split I/O Group for Mobility

Some switches and distance extenders use extra buffers and proprietary protocols to eliminate one of the round trips worth of latency for SCSI Write commands

These devices are already supported for use with SVC No benefit or impact inter-node communication Does benefit Host to remote SVC I/Os Does benefit SVC to remote Storage Controller I/Os

Page 40: Split Svc Nodes

© 2009 IBM Corporation40

Split I/O Group – Remote I/O: Application Latency = 2 round trips

IBM Presentation Template Full Version

Data center 1 Data center 2• Steps 1 to 12 affect application latency• Steps 13 to 22 should not affect the application

Server Cluster 1

1) Write request from host2) Xfer ready to host3) Data transfer from host12) Write completed to host

Server Cluster 2

4) Write+ data transfer to remote site

8) Cache Mirror Data transfer to remote site

9) Acknowledgment

SVC Split I/O Group

11) Write completion to remote site

21) Write completion to remote site

16) Write+ data transfer to remote site

Distance Extenders

5) Write request to SVC6) Xfer ready from SVC7) Data transfer to SVC10) Write completed from SVC

13) Write request from SVC14) Xfer ready to SVC15) Data transfer from SVC22) Write completed to SVC

17) Write request to storage18) Xfer ready from storage19) Data transfer to storage20) Write completed from storage

1 round trip

1 round trip

1 round trip – hiddenfrom the host

Node 1 Node 2

Page 41: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

41

Long Distance Impact

Additional latency because of long distance

Light speed in glass: ~ 200.000 km/sec – 1 km distance = 2 km round trip

Additional round trip time because of distance:– 1 km = 0.01 ms – 10 km = 0.10 ms– 25 km = 0.25 ms– 100 km = 1.00 ms – 300 km = 3.00 ms

SCSI protocol:– Read: 1 I/O operation = 0.01 ms / km

• Initiator requests data and target provides data– Write: 2 I/O operations = 0.02 ms / km

• Initiator announces amount of data, target acknowledges• Initiator send data, target acknowledge

– SVC’s proprietary SCSI protocol for node-to-node traffic has only 1 round trip

Fibre channel frame:– User data per FC frame (Fibre channel payload): up to 2048 bytes = 2KB

• Also for very small user data (< 2KB) a complete frame is required• Large user data is split across multiple frames

Page 42: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

42

Passive/Active WDM devices

Passive WDM

No power required

Can use CWDM or DWDM technology

Colored SFPs required – They create different Wavelength

Customer must own the physical cable end to end

– No rental of some wavelength from a service provider possible

Limited equipment cost

Max distance 70km depending on SFP

Active WDM

Power required

Can use CWDM or DWDM technology

Change incoming/outgoing wavelengths

Adds negligible latency because of signal change

Consolidate multiple wavelengths in one cable

No dedicated link required– Customers can rent some frequencies

High equipment cost

Longer distances supported

Page 43: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

43

CWDM / DWDM Devices

CWDM (Coarse Wavelength Division Multiplex)

16 or 32 wavelength into a fibre

Uses wide-range frequencies

Wider channel spacing - 20nm (2.5THz grid)

CWDM Spectrum

WDM means Wavelength Division MultiplexingParallel transmission of number of wavelengths over a fiber

DWDM (Dense Wavelength Division Multiplex )

32, 64 or 128 wavelength into a fibre

Narrow frequencies

Narrow channel spacing - e.g. 0.8nm (100GHz grid)

DWDM Spectrum

Page 44: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

44

FSP 3000

Advanced features through usage of active xWDM technology Advanced features through usage of active xWDM technology

Active

Passive

TXP

TXP

TDMTDM

TXP

TXP

TXP

TDMTDM

TXP

8G

10G

2G N x 4G 8G

100G

FSP 3000

Higher capacity (more channels per fiber)

Higher aggregate bandwidth (up to 100G per wavelength)

Higher distance (up to 200 km without mid-span amplifier)

More secure (automated fail over, NMS, optical monitoring tools, embedded encryption)

Source: ADVA

WDM Optical Networking: Passive vs. Active Solutions

Page 45: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

45

SAN and Buffer-to-Buffer Credits

Buffer-to-Buffer (B2B) credits– Are used as a flow control method by Fibre Channel technology and represent the number of

frames a port can store• Provides best performance

Light must cover the distance 2 times– Submit data from Node 1 to Node 2– Submit acknowledge from Node 2 back to Node 1

B2B Calculation depends on link speed and distance– Number of multiple frames in flight increase equivalent to the link speed

Page 46: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

46

SVC Split I/O Group – Quorum Disk

SVC creates three Quorum disk candidates on the first three managed MDisks

One Quorum disk is active

SVC 5.1 and later:– SVC is able to handle the Quorum disk management in a very flexible way, but in a

Split I/O Group configuration a well defined setup is required. – -> Disable the dynamic quorum feature using the “override” flag for V6.2 and later

• svctask chquorum -MDisk <mdisk_id or name> -override yes• This flag is currently not configurable in the GUI

“Split Brain” situation:– SVC uses the quorum disk to decide which SVC node(s) should survive

No access to the active Quorum disk:– In a standard situation (no split brain): SVC will select one of the other Quorum

candidates as active Quorum– In a split brain situation: SVC may take mirrored Volumes offline

Page 47: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

47

SVC Split I/O Group – Quorum Disk

Quorum disk requirements:– Must be placed in a third, independent site– Must be fibre channel connected– ISLs with one hop to Quorum storage system are supported

Supported infrastructure:– WDM equipment similar to Metro Mirror– Link requirement similar to Metro Mirror

• Max round trip delay time is 80 ms, 40 ms each direction– FCIP to Quorum disk can be used with the following requirements:

• Max round trip delay time is 80 ms, 40 ms each direction • The fabrics are not merged so routers required

Independent long distance equipment from each site to Site 3 is required

iSCSI storage not supported

Requirement for active / passive storage devices (like DS3/4/5K):– Each quorum disk storage controller must be connected to both sites

Page 48: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

48

Split I/O Group without ISLs between SVC nodes

Minimum distance

Maximum distance

Maximum Link Speed

>= 0 km = 10 km 8 Gbps

> 10 km = 20 km 4 Gbps

> 20km = 40km 2 Gbps

SVC 6.3: – Similar to the support statement in SVC 6.2– Additional: support for active WDM devices– Quorum disk requirement similar to Remote Copy

(MM/GM) requirments:• Max. 80 ms Round Trip delay time, 40 ms each direction• FCIP connectivity supported• No support for iSCSI storage system

Split I/O Group without ISLs between SVC nodes (Classic Split I/O Group)

SVC 6.2 and earlier:– Two ports on each SVC node needed to be connected to the “remote” switch – No ISLs between SVC nodes– Third site required for Quorum disk– ISLs with max. 1 hop can be used for Server traffic and Quorum disk attachment

SVC 6.2 (late) update:– Distance extension to max. 40 km with passive WDM devices

• Up to 20km at 4Gb/s or up to 40km at 2Gb/s.• LongWave SFPs for long distances required• LongWave SFPs must be supported from the switch vendor

Page 49: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

49

Split I/O Group without ISLs between SVC nodes

Supported configuration– Site 1 and Site 2 are connected via fibre channel

connections– A third site is required for Quorum disk placement– Quorum disk must be listed as “Extended Quorum” in the

SVC Supported Hardware List– Two ports on each SVC node needed to be connected to

the “remote” switches – SVC Volume mirroring between Site 1 and Site 2

Minimum distance

Maximum distance

Maximum Link Speed

>= 0 km = 10 km 8 Gbps

> 10 km = 20 km 4 Gbps

> 20km = 40km 2 Gbps

Switch 1

Switch 2

Switch 3

Switch 4

Active Quorum

SVC node1 SVC node2

Server 1 Server 2

Storage Storage

S ite 1 S ite 2

S ite 3

Page 50: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

50

Split I/O Group without ISLs between SVC nodes

Supported configuration– Site 1 and Site 2 are connected via fibre channel

connections– A third site is required for Quorum disk placement– Quorum disk must be listed as “Extended Quorum” in the

SVC Supported Hardware List– Two ports on each SVC node needed to be connected to

the “remote” switch – SVC Volume mirroring between Site 1 and Site 2

Active/Passive WDM devices can be used to reduce number of required FC links between both sites

Distance extension to max. 40km with WDM devices

Minimum distance

Maximum distance

Maximum Link Speed

>= 0 km = 10 km 8 Gbps

> 10 km = 20 km 4 Gbps

> 20km = 40km 2 Gbps

Switch 1

Switch 2

Switch 3

Switch 4

Switch 5 Switch 6

S V C n o d e 1 S V C n o d e 2

ISL (Server)

ISL (Server)

Server 1 Server 2

Storage 3 Storage 2

ISL (Server)

ISL (Server)

S ite 1 S ite 2

S ite 3A c t . Q u o ru m

Page 51: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

51

Split I/O Group without ISLs between SVC nodes

Quorum devices with active / passive controller without I/O re-routing (for example DS3/4/5K) must be connected to both controllers from each Site

Switch 1

Switch 2

Switch 3

Switch 4

Switch 5 Switch 6

S V C n o d e 1 S V C n o d e 2

ISL (Server)

ISL (Server)

Server 1 Server 2

Storage 3 Storage 2

ISL (Server)

ISL (Server)

S ite 1 S ite 2

S ite 3D S4700

A c t . Q u o ru m

C t l. A C t l. B

Supported configuration– Site 1 and Site 2 are connected via fibre channel

connections– A third site is required for Quorum disk placement– Quorum disk must be listed as “Extended Quorum” in the

SVC Supported Hardware List– Two ports on each SVC node needed to be connected to

the “remote” switch – SVC Volume mirroring between Site 1 and Site 2

Active/Passive WDM devices can be used to reduce number of required FC links between both sites

Distance extension to max. 40km with WDM devices

Minimum distance

Maximum distance

Maximum Link Speed

>= 0 km = 10 km 8 Gbps

> 10 km = 20 km 4 Gbps

> 20km = 40km 2 Gbps

Page 52: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

52

Split I/O Group without ISLs: Long distance configuration

SVC Buffer to Buffer credits– 2145–CF8 / CG8 have 41 B2B credits

•Enough for 10km at 8Gb/sec with 2 KB payload– All earlier models:

•Use 1/2/4Gb/sec fibre channel adapters•Have 8 B2B credits which is enough for 4km at 4Gb/sec

Recommendation 1:– Use CF8 / CG8 nodes for more than 4km distance for best performance

Recommendation 2:– SAN switches do not auto-negotiate B2B credits and 8 B2B credits is the default setting so change

the B2B credits in the switch to 41 as well

Link speed FC fram e length

R equired B 2B credits for 10 km d istance

M ax d istance w ith 8 B2B credits

1G b/sec 1 km 5 16 km

2 G b/sec 0.5 km 10 8 km

4 G b/sec 0.25 km 20 4 km

8 G b/sec 0.125 km 40 2 km

Link speed FC fram e length

R equired B 2B credits for 10 km d istance

M ax d istance w ith 8 B2B credits

1G b/sec 1 km 5 16 km

2 G b/sec 0.5 km 10 8 km

4 G b/sec 0.25 km 20 4 km

8 G b/sec 0.125 km 40 2 km

Page 53: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

53

Split I/O Group with ISLs between SVC nodes

Split I/O Group with ISLs between SVC nodes

Support with SVC 6.3: – Supports Metro Mirror distances between nodes– Third Site required for Quorum disk– ISLs with max. 1 hop can be used for:

• Quorum traffic• SVC node to node communication

– Requires dedicated private SAN only for inter-node traffic (which can be a Brocade virtual fabric, or a Cisco VSAN)

– Requires one ISL for each I/O Group between the private SANs at each site

Maximum distances:– 100km for live data mobility (150km with distance extenders)– 300km for fail-over / recovery scenarios

•SVC supports up to 80ms latency, far greater than most application workloads would tolerate

– The two sites can be connected using active or passive technologies such as CWDM / DWDM if desired.

Supported infrastructure:– WDM equipment similar to Metro Mirror– Link requirement similar to Metro Mirror

Pr iv.SAN 1

Publ.SAN 2

Pr iv.SAN 1

Publ.SAN 2

Switch Switch

SVC -01 SVC -02

ISL

ISL

ISL

ISL

Server 1

Server 2

Server 3

Server 4

Publ.SAN 1 Publ.SAN 1ISL

Pr iv.SAN 2 Pr iv.SAN 2

ISL

Site 1 S ite 2

S ite 3

A c t . Q u o ru m

C t l. A C t l. B

Q u o ru m c a n d id a t e

Storage

Q u o ru m c a n d id a t e

Storage

W D M W D M

W D M W D M

Page 54: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

54

Split I/O Group with ISLs between SVC nodes

Pr iv.SAN 1

Publ.SAN 2

Pr iv.SAN 1

Publ.SAN 2

Switch Switch

SVC -01 SVC -02

ISL

ISL

ISL

ISL

Server 1

Server 2

Server 3

Server 4

Publ.SAN 1 Publ.SAN 1ISL

Pr iv.SAN 2 Pr iv.SAN 2

ISL

Site 1 S ite 2

S ite 3

A c t . Q u o ru m

C t l. A C t l. B

Q u o ru m c a n d id a t e

Storage

Q u o ru m c a n d id a t e

Storage

W D M W D M

W D M W D M

WDM devices:– Same link and device requirements as for Metro

Mirror

Distances– Support of up to 300km (same recommendation as for

Metro Mirror)– Typical deployment of Split I/O Group only 1/2 or

1/3rd of this distance is recommended because there will be 2 or 3 times as much latency depending on what distance extension technology is used

Supported configuration– Site 1 and Site 2 are connected via fibre channel

connections– A third site is required for Quorum disk placement– Quorum disk must be listed as “Extended Quorum”

in the SVC Supported Hardware List– Two ports per SVC node attached to private SANs – Two ports per SVC node attached to private SANs– SVC Volume mirroring between Site 1 and Site 2– Hosts and storage attached to public SANs– 3rd site quorum attached to public SANs

Note 1: ISLs / Trunks are dedicated to a CiscoVSAN to guarantee bandwidth rater than beeing shared

Note 2: ISLs / Trunks are dedicated to a Brocade logical switch to guarantee bandwidth rather than beeing shared

– (i.e. ISLs are supported, LISLs and XISLs are not)

Page 55: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

55

Long distance with ISLs between SVC nodes

Technically SVC supports distances up to 8000km

SVC will tolerate a round trip delay of up to 80ms between nodes

In practice Applications are not designed to support a Write I/O latency of 80ms

Some switches and distance extenders use extra buffers and proprietary protocols to eliminate one of the round trips worth of latency for SCSI Write commands

– These devices are already supported for use with SVC– No benefit or impact inter-node communication– Does benefit Host to remote SVC I/Os– Does benefit SVC to remote Storage Controller I/Os

Consequences:– Metro Mirror is deployed for shorter distances (up to 300km) – Global Mirror is used for longer distances– Split I/O Group supported distance will depend on application latency restrictions

• 100km for live data mobility (150km with distance extenders)• 300km for fail-over / recovery scenarios• SVC supports up to 80ms latency, far greater than most application workloads would tolerate

Page 56: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

56

Split I/O Group Configuration: Examples

Pr iv.SAN 1

Publ.SAN 2

Pr iv.SAN 1

Publ.SAN 2

Switch Switch

SVC -01 SVC -02

ISL

ISL

ISL

ISL

Server 1

Server 2

Server 3

Server 4

Publ.SAN 1 Publ.SAN 1ISL

Pr iv.SAN 2 Pr iv.SAN 2

ISL

Site 1 S ite 2

S ite 3

A c t . Q u o ru m

C t l. A C t l. B

Q u o ru m c a n d id a t e

Storage

Q u o ru m c a n d id a t e

Storage

W D M W D M

W D M W D M

Example 3)

Configuration without live data mobility :

VMware ESX with SRM, AIX HACMP, or MS Cluster

Distance between sites: 180km

-> Only SVC 6.3 Split I/O Group with ISLs is supported or

-> Metro Mirror configuration

Because of long distances: only in active / passive configuration

Example 1)

Configuration with live data mobility:

VMware ESX with VMotion or AIX with live partition mobility

Distance between sites: 12km

-> SVC 6.3: Configuration with or without ISLs are supported

-> SVC 6.2: Only configuration without ISLs is supported

Example 2)

Configuration with live data mobility :

VMware ESX with VMotion or AIX with live partition mobility

Distance between sites: 70km

-> Only SVC 6.3 Split I/O Group with ISLs is supported.

Page 57: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

57

Split I/O Group - Disaster Recovery

Split I/O groups provide distributed HA functionality

Usage of Metro Mirror / Global Mirror is recommended for disaster protection

Both major Split I/O Group sites must be connected to the MM / GM infrastructure

Without ISLs between SVC nodes: – All SVC ports can be used for MM / GM connectivity

With ISLs between SVC nodes: – Only MM / GM connectivity to the public SAN network is supported– Only 2 FC ports per SVC node will be available for MM or GM and will also be used for

host to SVC and SVC to disk system I/O• Going to limit capabilities of overall system in my opinion

Page 58: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

58

Summary

SVC Split I/O Group:– Is a very powerful solution for automatic and fast handling of storage failures – Transparent for servers– Perfect fit in a vitualized environment (like VMware VMotion, AIX Live Partition Mobility)– Transparent for all OS based clusters – Distances up to 300 km (SVC 6.3) are supported

Two possible scenarios:– Without ISLs between SVC nodes (classic SVC Split I/O Group)

• Up to 40 km distance with support for active (SVC 6.3) and passive (SVC 6.2) WDM– With ISLs between SVC nodes:

• Up to 100 km distance for live data mobility (150 km with distance extenders)• Up to 300 km for fail-over / recovery scenarios

Long distance performance impact can be optimized by: – Load distribution across both sites– Appropriate SAN Buffer to Buffer credits

Page 59: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

Q&A

Page 60: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

Q&A

Page 61: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

Q&A

Page 62: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

Q&A

Page 63: Split Svc Nodes

© 2011 IBM Corporation

IBM System Storage

Q&A