Upload
lucien-van-remoortere
View
125
Download
0
Embed Size (px)
Citation preview
© Copyright IBM Corporation, 2011
IBM Storwize V7000 Clustering and SVC Split I/O Group Deeper Dive
Bill Wiegand - ATSSenior I/T SpecialistStorage Virtualization
2 © Copyright IBM Corporation, 2011
Agenda
Quick Basics of Virtualization
Scaling Storwize V7000 via Clustering
Scaling Storwize V7000 Unified
Q&A 1
SVC Split I/O Group V6.3
Q&A 2
3 © Copyright IBM Corporation, 2011
StorageNetwork
Managed Disks
VolumesVolumes Volumes Volumes
Node Node Node Node Node
Designed to be redundant, modular and scalable solution
Node NodeNode
Cluster consisting of one to four I/O Groups managed as a single system
Two nodes make up an I/O Group and own given volumes
Virtualization – The Big Picture
4 © Copyright IBM Corporation, 2011
MDG1
Pool 2
MDG3
Cluster:Max 4 I/O Groups built from 4 SWV7K control enclosures or 8 SVC nodesSVC Cluster or Storwize V7000 Clustered System
Managed Disks (MDisks):• Internally or externally provided • Max 4096 MDisks per System
Virtualization – The Big Picture
I/O Group A I/O Group B
Volumes:Max 8192 Volumes, 2048 per I/O Group, with each up to 256TB in size and each assigned to:• A specific I/O Group• Built from a specific Storage Pool
Pool 1 Pool 3 Storage Pools: • Max 128 Storage Pools• Max 128 MDisks per Pool
Control Enclosure Control Enclosure
Nodes Nodes
5 © Copyright IBM Corporation, 2011
Scale the Storwize V7000 Multiple Ways
An I/O Group is a control enclosure and its associated SAS attached expansion enclosures
Clustered system can consist of 2-4 I/O Groups
– SCORE approval for > 2
Scale Storage– Add up to 4x the capacity
– Add up to 4x the throughput
Non-disruptive upgrades– From smallest to largest
configurations
– Purchase hardware only when you need it
Virtualize storage arrays behind Storwize V7000 for even greater capacity and throughput
Exp
and
Cluster
ClusterControl Enclosure Control Enclosure Control Enclosure
Expansion Enclosures
Expansion Enclosures
Expansion Enclosures
Storwize V7000One I/O Group
System
Storwize V70002-4 I/O Groups
Clustered System
NOTE: Storwize V7000 Clustered System with greater then two I/O Groups/Frames
requires SCORE/RPQ approval
An I/O Group is a control enclosure and its associated
SAS connected expansion enclosures
Exp
and
No interconnection of SAS chains between control enclosures as control enclosures communicate via FC and must use all 8 FC ports on enclosures
6 © Copyright IBM Corporation, 2011
Storwize V7000 Unified Scaling
Storwize V7000 Unified can scale disk capacity by adding up to nine expansion enclosures to the standard control enclosure
Virtualize external storage arrays behind Storwize V7000 Unified for even greater capacity
– CIFS not supported currently with externally virtualized storage
CAN NOT horizontally scale out by adding additional Unified systems or even adding just another Storwize V7000 control enclosure and associated expansion enclosures at this time
– If customer has clustered Storwize V7000 system today they will not be able to upgrade to Unified system in 2012 when MES is available
Control Enclosure
Expansion Enclosures
Control Enclosure
Expansion Enclosures
Storwize V7000 Unified2-4 I/O Groups
Clustered System NOT SUPPORTED
Storwize V7000 Unified
One I/O Group System
Control Enclosure
Expansion Enclosures
Exp
and
7 © Copyright IBM Corporation, 2011
Clustered System Facts
Clustered system provides ability to independently grow capacity and performance
– Add expansion enclosures for more capacity
– Add control enclosure for more performance
– No extra feature to order and no extra charge for a clustered system• Configure one system using USB stick and then add second using GUI
Clustered systems GA support is for up to 480 SFF disk drives or 240 LFF disk drives or a mix thereof
– Up to 480TB raw capacity in one 42U rack
– Enables Storwize V7000 to compete effectively against larger EMC, NetApp, HP systems
Support for a larger system can be requested by submitting a SCORE/RPQ
– E.g. “Eight Storwize V7000 node canisters in four control enclosures”
– Up to 960TB raw capacity in two 42U racks
8 © Copyright IBM Corporation, 2011
Clustered System Facts
Adding additional control enclosures to existing V6.2+ system is non-disruptive
– Requires new control enclosures be loaded with V6.2.x minimum
Control enclosures can be any combination of models– 2076-112, 124, 312, 324
Clustered system operates as a single storage system– Managed via one IP address
Both node canisters in a given control enclosure are part of the same I/O Group
– Cannot create an I/O Group with one node from each of 2 different control enclosures
– Adding one node in control enclosure to an I/O Group will automatically add the other
– Storwize V7000 clustered system does not support “split I/O group” configurations - (also known as “stretch cluster”)
9 © Copyright IBM Corporation, 2011
Clustered System Facts
Inter control enclosure communication provided by a Fibre Channel (FC) SAN
– Must use all 4 FC ports on each node canister and zone all together
– All FC ports on a node canister must have at least one path to every node canister in the clustered system that is not in the same control enclosure
– Node canisters in the same control enclosure have connectivity via the PCIe link of the midplane and don’t require FC ports be zoned together
• However, recommended guideline is to zone them together as it provides a secondary path should the PCIe link have issues
Only 1 control enclosure can appear on a given SAS chain Only 1 node canister can appear on a single strand of SAS chain
– Key to realize is there is no access by one control enclosure (I/O Group) to the SAS attached expansion enclosures of another control enclosure (I/O Group) other then via the SAN
10 © Copyright IBM Corporation, 2011
Clustered System Facts
Currently volumes built on internal MDisks in a storage pool will be owned by the same I/O group (IOG) that owns the majority of the MDisks in that storage pool
– E.g. Pool-1 has 3 MDisks from IOG-0 and 4 from IOG-1 then by default IOG-1 will own all volumes created
• Default GUI behavior can be overridden using the “Advanced” option in GUI
– If pool owns exact same number of MDisks from each I/O group then volumes will be owned by IOG-0
Expansion enclosures only communicate with their owning control enclosure meaning host I/Os coming into IOG-0 but data is on IOG-1 means I/O is forwarded to IOG-0 over FC
– Similar process to SVC accessing external storage systems
– Does not go thru cache on owning I/O group but directly to MDisk
• Uses very lowest layer of I/O stack to minimize any additional latency
11 © Copyright IBM Corporation, 2011
Clustered System Example
• Expansion enclosures are connected through one control enclosure and can be part of only one I/O group
• All MDisks are part of only one I/O group
• Storage pools can contain MDisks from more than one I/O group
• Inter-control enclosure communications happens over the SAN
• A volume is serviced by only one I/O group
All cabling shown is logical
SAN
I/O Group #2I/O Group #1
Expansion Enclosure
Expansion Enclosure
Expansion Enclosure
Control Enclosure #2
Expansion Enclosure
Expansion Enclosure
Expansion Enclosure
Control Enclosure #1
Storage Pool B
mdisk mdiskmdisk mdisk
Storage Pool A
mdisk mdisk
Storage Pool C
mdisk mdisk
Node Canister Node Canister Node Canister Node Canister
12 © Copyright IBM Corporation, 2011
Storwize V7000 Clustered System – DR
An I/O Group is a control enclosure and its associated SAS attached expansion enclosures
A Clustered System can consist of 2-4 I/O Groups
– SCORE approval for > 2
Replication between clustered systems is via fibre channel ports only
– Replication between up to four clustered systems is allowed
– Requires 5639-RM1 license(s) at each site
Storwize V7000One I/O GroupSingle Frame
System
Storwize V70002-4 I/O Group
Clustered System
NOTE: Storwize V7000 Clustered System with greater then 2 I/O Groups/Frames requires SCORE/RPQ approval
Control Enclosure
Expansion Enclosures
Expansion Enclosures
Control Enclosure
Expansion Enclosures
Expansion Enclosures
Control Enclosure
Expansion Enclosures
Expansion Enclosures
Control Enclosure
Expansion Enclosures
Expansion Enclosures
Storwize V7000One to four I/O Group System
Control Enclosure
Expansion Enclosures
Expansion Enclosures
Control Enclosure
Expansion Enclosures
Expansion Enclosures
Control Enclosure
Expansion Enclosures
Expansion Enclosures
Control Enclosure
Expansion Enclosures
Expansion Enclosures
Production Site Disaster Recovery Site
Global Mirroror
Metro Mirror
13 © Copyright IBM Corporation, 2011
Expansion Enclosures
Storwize V7000 Clustered System
I/O Group 1
Control Enclosure
Expansion Enclosures
Expansion Enclosures
Storwize V7000 Clustered System
I/O Group 2
Control Enclosure
Expansion Enclosures
Expansion Enclosures
Production Site A Production Site B
Clustered System
Separated by Distance
Host
MirroredVolume
Storwize V7000 Clustered System – HA A High Availability clustered system
similar to a SVC Split I/O Group configuration is not possible since we can not split a control enclosure in half and install at two different sites
– One I/O Group will be at each site unlike SVC where each node in an I/O Group can be installed in a different site
• So if you lose a site you lose access to all volumes owned by that I/O Group
• There is no automatic failover of a volume from one I/O Group to another
– Volume mirroring does allow for a single host volume to have pointers to two sets of data which can be on different I/O Groups in a clustered system, but again if you lose a site you lose the entire I/O Group so any volumes owned by that I/O Group will be offline
• You can migrate the volume ownership from the failed IOG to the other IOG but data may be lost as unwritten data still in cache on offline IOG is discarded in process of migration or could have been lost if IOG failed hard without saving cached data
14 © Copyright IBM Corporation, 2011
So Begs the Question Why Cluster
One reason it is offered is because we can– Runs same software as SVC which supports 1-4 I/O Groups
Can start very small and grow very large storage system with single management interface
– Helps to compete with larger midrange systems from other vendors
Can virtualize external storage too providing same virtualization features across entire Clustered System
– Just like SVC cluster so desirable for same reasons large SVC clusters are
************************************************************************** However, nothing wrong with going with 1-4 separate systems
versus a Clustered System if customer prefers– System management isn’t that hard anyway
– If customer will lose sleep over possible complete failure of a control enclosure, no matter how unlikely that is, then go with separate systems
15 © Copyright IBM Corporation, 2011
Q&A
16 © Copyright IBM Corporation, 2011
Q&A
© 2010 IBM Corporation
SVC Split I/O Group UpdateBill Wiegand/Thomas VogelATS System Storage
IBM System Storage
© 2011 IBM Corporation
IBM System Storage
18
Terminology
SVC Split I/O Group Review
Long distance: refresh– WDM devices– Buffer-to-Buffer credits
SVC Quorum disk
Split I/O Group without ISLs between SVC nodes– Supported configurations– SAN configuration for long distance
Split I/O Group with ISLs between SVC nodes– Supported configurations– SAN configuration for long distance
Agenda
© 2011 IBM Corporation
IBM System Storage
19
Terminology
SVC Split I/O Group = SVC Stretched Cluster = SVC Split Cluster
– Two independent SVC nodes in two independent sites + one independent site for Quorum– Acts just like a single I/O Group with distributed high availability
Distributed I/O groups – NOT a HA Configuration and not recommended, if one site failed:– Manual volume move required– Some data still in cache of offline I/O Group
I/O Group 1 I/O Group 1
Site 1 Site 2
I/O Group 1
I/O Group 2
I/O Group 1 I/O Group 2
Site 1 Site 2
Site 2Site 1
Storwize V7000 Split I/O Group not an option:– Single enclosure includes both nodes– Physical distribution across two sites not possible
Site 1 Site 2
IBM Systems and Technology Group
© 2008 IBM Corporation20
SVC – What is a Failure Domain Generally a failure domain will
represent a physical location, but depends on what type of failure you are trying to protect against Could all be in one building on different
floors/rooms or just different power domains in same data center
Could be multiple buildings on the same campus
Could be multiple buildings up to 300KM apart
Key is the quorum disk If only have two physical sites and
quorum disk to be in one of them then some failure scenarios won’t allow cluster to survive
Minimum is to have active quorum disk system on separate power grid in one of the two failure domains
IBM Systems and Technology Group
© 2008 IBM Corporation21
SVC Quorum 3 SVC Quorum 2
Failure Domain 1
Failure Domain 2
Node 1 Node 2
SVC Quorum 1Active Quorum
1) Loss of active quorum: SVC selects another quorum Continuation of operations
2) Loss of storage system: Loss of active quorum SVC selects another quorum Continuation of operations Mirrored Volumes continue
operation but may take 60sec or more since active quorum disk failed
Note: The loss of all quorum disks will not cause the cluster to stop as long as there are a majority of the nodes operational in the cluster. However, mirrored Volumes will likely go offline. This is why you would manually configure the cluster so the quorum disk candidates are located on disk systems in both failure domains.
Volume Mirroring
TotalStorage Storage Engine 336
TotalStorage Storage Engine 336
ISL 1
ISL 2
SVC – How Quorum Disks Affect Availability (1)
IBM Systems and Technology Group
© 2008 IBM Corporation22
SVC Quorum 3 SVC Quorum 2
Failure Domain 1
Failure Domain 2
Node 1 Node 2
SVC Quorum 1Active Quorum
Volume Mirroring
TotalStorage Storage Engine 336
TotalStorage Storage Engine 336
ISL 1
ISL 2
Lose of Failure Domain 2: Active quorum lost Half of nodes lost Loss of cluster majority Node 1 can not utilize
quorum candidate to recover and survive
Node 1 shuts down and cluster stopped
May not be recoverable and may require cluster rebuild and data restore from backups
Lose of Failure Domain 1 : Active quorum not affected Continuation of operations
No Access to Data on Disk
No Active Quorum
SVC – How Quorum Disks Affect Availability (2)
Active Quorum
IBM Systems and Technology Group
© 2008 IBM Corporation23
SVC Quorum 3
Failure Domain 1
Failure Domain 2
Node 1 Node 2
SVC Quorum 1SVC Quorum 2Active Quorum
Volume Mirroring
TotalStorage Storage Engine 336
TotalStorage Storage Engine 336
ISL 1
ISL 2
Current Supported Configuration for Split I/O Group
Automated failover
with SVC handling
The loss of:
- SVC node
- Quorum disk
- Storage subsystem
Can incorporate
MM/GM to provide
disaster recovery
- 3 site like capability
Failure Domain 3
SVC Quorum 1Active Quorum
Active Quorum
Disk system that supports
“Extended Quorum”
© 2011 IBM Corporation
IBM System Storage
24
SVC Split I/O Group
Site 1 SVC Node 1
Site 2 SVC Node 2
Site 3 Quorum disk
Cluster Status
Operational Operational Operational Operational, optimal
Failed Operational Operational Operational, Write cache disabled
Operational Failed Operational Operational, Write cache disabled
Operational Operational Failed Operational, Write Cache enabled, but different active Quorum disk
Operational, link to Site 2 failed: Split Brain
Operational, link to Site 1 failed: Split Brain
Operational Whichever node accesses the active quorum disk first survives and the partner node goes offline
Operational Failed same time with Site 3 Failed same time with Site 2 Stopped
Failed same time with Site 3 Operational Failed same time with Site 1 Stopped
© 2011 IBM Corporation
IBM System Storage
25
Advantages / Disadvantages of SVC Split I/O Group
Advantages– No manual intervention required– Automatic and fast handling of storage failures – Volumes mirrored in both locations – Transparent for servers and host based clusters– Perfect fit in a vitualized environment (like VMware VMotion, AIX Live Partition Mobility)
Disadvantages– Mix between HA and DR solution but not a true DR solution– Non-trivial implementation
© Copyright IBM Corporation, 2011
SVC Split I/O Group V6.3 Enhancements
27 © Copyright IBM Corporation, 2011
Split I/O Group – Physical Configurations
The following charts show supported physical configurations for the new Split I/O Group support in V6.3
– VSANs (CISCO) and Virtual Fabrics (Brocade) are not supported by all switch models from the respective vendors
• Consult vendor for further information Enhancements designed to help us compete more effectively
with EMC VPLEX at longer distances Note that this information is all very new even to ATS and
some requirements could change prior to GA Highly recommend engaging ATS for solution design review
– w3.ibm.com/support/techxpress
Storwize V7000 does not provide any sort of split I/O group, split cluster, stretch cluster HA configurations
– A clustered Storwize V7000 provides the ability to grow system capacity and scale performance within a localized single system image
28 © Copyright IBM Corporation, 2011
Extension of Currently Supported Configuration
Server Cluster 1
SVC + UPS
Server Cluster 2
SVC + UPS
SAN
SAN SAN
SAN
User chooses number of ISLs on SAN
User chooses number of ISLs on SAN
Two ports per SVC node attached to local SANsTwo ports per SVC node attached to remote SANs via DWDMHosts and storage attached to SANs via ISLs sufficient for workload3rd site quorum (not shown) attached to SANs
Active DWDM over shared single mode
fibre(s)
0-10 KM Fibre Channel distance supported up to 8Gbps
11-20KM Fibre Channel distance supported up to 4Gbps
21-40KM Fibre Channel distance supported up to 2Gbps
29 © Copyright IBM Corporation, 2011
Configuration With 4 Switches at Each Site
Server Cluster 1
SVC + UPS
Server Cluster 2
SVC + UPS
PrivateSAN
PrivateSAN
PrivateSAN
PrivateSAN
PublicSAN
PublicSAN
PublicSAN
PublicSAN
1 ISL per I/O groupConfigured as trunk
1 ISL per I/O groupConfigured as trunk
User chooses number of ISLs on public SAN
User chooses number of ISLs on public SAN
Two ports per SVC node attached to public SANsTwo ports per SVC node attached to private SANsHosts and storage attached to public SANs3rd site quorum (not shown) attached to public SANs
30 © Copyright IBM Corporation, 2011
Configuration Using CISCO VSANs
Server Cluster 1
SVC + UPS
Server Cluster 2
SVC + UPS
PublicVSAN
PrivateVSAN
PublicVSAN
PrivateVSAN
PrivateVSAN
PublicVSAN
PrivateVSAN
PublicVSAN
Note ISLs/Trunks for private VSANs are dedicatedrather than being shared to guarantee dedicatedbandwidth is available for node to node traffic
Switches are partitioned using VSANs
31 © Copyright IBM Corporation, 2011
Configuration Using Brocade Virtual Fabrics
Server Cluster 1
SVC + UPS
Server Cluster 2
SVC + UPS
PublicSAN
PrivateSAN
PublicSAN
PrivateSAN
PrivateSAN
PublicSAN
PrivateSAN
PublicSAN
Physical switches are partitioned intotwo logical switches
Note ISLs/Trunks for private SANs are dedicatedrather than being shared to guarantee dedicatedbandwidth is available for node to node traffic
© 2009 IBM Corporation32
Split I/O Group – Distance
■ The new Split I/O Group configurations will support distances of up to 300km (same recommendation as for Metro Mirror)
■ However for the typical deployment of a split I/O group only 1/2 or 1/3rd of this distance is recommended because there will be 2 or 3 times as much latency depending on what distance extension technology is used
■ The following charts explain why
© 2009 IBM Corporation33
Metro Mirror
Technically SVC supports distances up to 8000km
SVC will tolerate a round trip delay of up to 80ms between nodes
The same code is used for all inter-node communication Global Mirror, Metro Mirror, Cache Mirroring, Clustering SVCs proprietary SCSI protocol only has 1 round trip
In practice Applications are not designed to support a Write I/O latency of 80ms
Hence Metro Mirror is deployed for shorter distances (up to 300km) and Global Mirror is used for longer distances
© 2009 IBM Corporation34
Metro Mirror: Application Latency = 1 long distance round trip
IBM Presentation Template Full Version
Data center 1 Data center 2
4) Metro Mirror Data transfer to remote site
5) Acknowledgment
• Steps 1 to 6 affect application latency• Steps 7 to 10 should not affect the application
Server Cluster 1
1) Write request from host2) Xfer ready to host3) Data transfer from host6) Write completed to host
7a) Write request from SVC8a) Xfer ready to SVC9a) Data transfer from SVC10a) Write completed to SVC
SVC Cluster 1
Server Cluster 2
7b) Write request from SVC8b) Xfer ready to SVC9b) Data transfer from SVC10b) Write completed to SVC
SVC Cluster 2
1 round trip
© 2009 IBM Corporation35
Split I/O Group for Business Continuity
Split I/O Group splits the nodes in an I/O group across two sites
SVC will tolerate a round trip delay of up to 80ms Cache Mirroring traffic rather than Metro Mirror traffic is
sent across the inter-site link
Data is mirrored to back-end storage using Volume Mirroring
Data is written by the 'preferred' node to both the local and remote storage
The SCSI Write protocol results in 2 round trips This latency is generally hidden from the Application by the
write cache
© 2009 IBM Corporation36
Split I/O Group – Local I/O: Application Latency = 1 round trip
IBM Presentation Template Full Version
Data center 1 Data center 2• Steps 1 to 6 affect application latency• Steps 7 to 10 should not affect the application
Server Cluster 1
1) Write request from host2) Xfer ready to host3) Data transfer from host6) Write completed to host
Server Cluster 2
4) Cache Mirror Data transfer to remote site
5) Acknowledgment
SVC Split I/O Group
7b) Write request from SVC8b) Xfer ready to SVC9b) Data transfer from SVC10b) Write completed to SVC
1 round trip
2 round trips – but SVCwrite cache hides thislatency from the host
Node 1 Node 2
© 2009 IBM Corporation37
Split I/O Group for Mobility
• Split I/O Group is also often used to move workload between servers at different sites
• VMotion or equivalent can be used to move Applications between servers
• Applications no longer necessarily issue I/O requests to the local SVC nodes
• SCSI Write commands from hosts to remote SVC nodes results in an additional 2 round trips worth of latency that is visible to the Application
© 2009 IBM Corporation38
Split I/O Group – Remote I/O: Application Latency = 3 round trips
IBM Presentation Template Full Version
Data center 1 Data center 2• Steps 1 to 6 affect application latency• Steps 7 to 10 should not affect the application
Server Cluster 1
1) Write request from host2) Xfer ready to host3) Data transfer from host6) Write completed to host
Server Cluster 2
7b) Write request from SVC8b) Xfer ready to SVC9b) Data transfer from SVC10b) Write completed to SVC
4) Cache Mirror Data transfer to remote site
5) Acknowledgment
SVC Split I/O Group
2 round trips
1 round trip
2 round trips – but SVCwrite cache hides thislatency from the host
Node 1 Node 2
© 2009 IBM Corporation39
Split I/O Group for Mobility
Some switches and distance extenders use extra buffers and proprietary protocols to eliminate one of the round trips worth of latency for SCSI Write commands
These devices are already supported for use with SVC No benefit or impact inter-node communication Does benefit Host to remote SVC I/Os Does benefit SVC to remote Storage Controller I/Os
© 2009 IBM Corporation40
Split I/O Group – Remote I/O: Application Latency = 2 round trips
IBM Presentation Template Full Version
Data center 1 Data center 2• Steps 1 to 12 affect application latency• Steps 13 to 22 should not affect the application
Server Cluster 1
1) Write request from host2) Xfer ready to host3) Data transfer from host12) Write completed to host
Server Cluster 2
4) Write+ data transfer to remote site
8) Cache Mirror Data transfer to remote site
9) Acknowledgment
SVC Split I/O Group
11) Write completion to remote site
21) Write completion to remote site
16) Write+ data transfer to remote site
Distance Extenders
5) Write request to SVC6) Xfer ready from SVC7) Data transfer to SVC10) Write completed from SVC
13) Write request from SVC14) Xfer ready to SVC15) Data transfer from SVC22) Write completed to SVC
17) Write request to storage18) Xfer ready from storage19) Data transfer to storage20) Write completed from storage
1 round trip
1 round trip
1 round trip – hiddenfrom the host
Node 1 Node 2
© 2011 IBM Corporation
IBM System Storage
41
Long Distance Impact
Additional latency because of long distance
Light speed in glass: ~ 200.000 km/sec – 1 km distance = 2 km round trip
Additional round trip time because of distance:– 1 km = 0.01 ms – 10 km = 0.10 ms– 25 km = 0.25 ms– 100 km = 1.00 ms – 300 km = 3.00 ms
SCSI protocol:– Read: 1 I/O operation = 0.01 ms / km
• Initiator requests data and target provides data– Write: 2 I/O operations = 0.02 ms / km
• Initiator announces amount of data, target acknowledges• Initiator send data, target acknowledge
– SVC’s proprietary SCSI protocol for node-to-node traffic has only 1 round trip
Fibre channel frame:– User data per FC frame (Fibre channel payload): up to 2048 bytes = 2KB
• Also for very small user data (< 2KB) a complete frame is required• Large user data is split across multiple frames
© 2011 IBM Corporation
IBM System Storage
42
Passive/Active WDM devices
Passive WDM
No power required
Can use CWDM or DWDM technology
Colored SFPs required – They create different Wavelength
Customer must own the physical cable end to end
– No rental of some wavelength from a service provider possible
Limited equipment cost
Max distance 70km depending on SFP
Active WDM
Power required
Can use CWDM or DWDM technology
Change incoming/outgoing wavelengths
Adds negligible latency because of signal change
Consolidate multiple wavelengths in one cable
No dedicated link required– Customers can rent some frequencies
High equipment cost
Longer distances supported
© 2011 IBM Corporation
IBM System Storage
43
CWDM / DWDM Devices
CWDM (Coarse Wavelength Division Multiplex)
16 or 32 wavelength into a fibre
Uses wide-range frequencies
Wider channel spacing - 20nm (2.5THz grid)
CWDM Spectrum
WDM means Wavelength Division MultiplexingParallel transmission of number of wavelengths over a fiber
DWDM (Dense Wavelength Division Multiplex )
32, 64 or 128 wavelength into a fibre
Narrow frequencies
Narrow channel spacing - e.g. 0.8nm (100GHz grid)
DWDM Spectrum
© 2011 IBM Corporation
IBM System Storage
44
FSP 3000
Advanced features through usage of active xWDM technology Advanced features through usage of active xWDM technology
Active
Passive
TXP
TXP
TDMTDM
TXP
TXP
TXP
TDMTDM
TXP
8G
10G
2G N x 4G 8G
100G
FSP 3000
Higher capacity (more channels per fiber)
Higher aggregate bandwidth (up to 100G per wavelength)
Higher distance (up to 200 km without mid-span amplifier)
More secure (automated fail over, NMS, optical monitoring tools, embedded encryption)
Source: ADVA
WDM Optical Networking: Passive vs. Active Solutions
© 2011 IBM Corporation
IBM System Storage
45
SAN and Buffer-to-Buffer Credits
Buffer-to-Buffer (B2B) credits– Are used as a flow control method by Fibre Channel technology and represent the number of
frames a port can store• Provides best performance
Light must cover the distance 2 times– Submit data from Node 1 to Node 2– Submit acknowledge from Node 2 back to Node 1
B2B Calculation depends on link speed and distance– Number of multiple frames in flight increase equivalent to the link speed
© 2011 IBM Corporation
IBM System Storage
46
SVC Split I/O Group – Quorum Disk
SVC creates three Quorum disk candidates on the first three managed MDisks
One Quorum disk is active
SVC 5.1 and later:– SVC is able to handle the Quorum disk management in a very flexible way, but in a
Split I/O Group configuration a well defined setup is required. – -> Disable the dynamic quorum feature using the “override” flag for V6.2 and later
• svctask chquorum -MDisk <mdisk_id or name> -override yes• This flag is currently not configurable in the GUI
“Split Brain” situation:– SVC uses the quorum disk to decide which SVC node(s) should survive
No access to the active Quorum disk:– In a standard situation (no split brain): SVC will select one of the other Quorum
candidates as active Quorum– In a split brain situation: SVC may take mirrored Volumes offline
© 2011 IBM Corporation
IBM System Storage
47
SVC Split I/O Group – Quorum Disk
Quorum disk requirements:– Must be placed in a third, independent site– Must be fibre channel connected– ISLs with one hop to Quorum storage system are supported
Supported infrastructure:– WDM equipment similar to Metro Mirror– Link requirement similar to Metro Mirror
• Max round trip delay time is 80 ms, 40 ms each direction– FCIP to Quorum disk can be used with the following requirements:
• Max round trip delay time is 80 ms, 40 ms each direction • The fabrics are not merged so routers required
Independent long distance equipment from each site to Site 3 is required
iSCSI storage not supported
Requirement for active / passive storage devices (like DS3/4/5K):– Each quorum disk storage controller must be connected to both sites
© 2011 IBM Corporation
IBM System Storage
48
Split I/O Group without ISLs between SVC nodes
Minimum distance
Maximum distance
Maximum Link Speed
>= 0 km = 10 km 8 Gbps
> 10 km = 20 km 4 Gbps
> 20km = 40km 2 Gbps
SVC 6.3: – Similar to the support statement in SVC 6.2– Additional: support for active WDM devices– Quorum disk requirement similar to Remote Copy
(MM/GM) requirments:• Max. 80 ms Round Trip delay time, 40 ms each direction• FCIP connectivity supported• No support for iSCSI storage system
Split I/O Group without ISLs between SVC nodes (Classic Split I/O Group)
SVC 6.2 and earlier:– Two ports on each SVC node needed to be connected to the “remote” switch – No ISLs between SVC nodes– Third site required for Quorum disk– ISLs with max. 1 hop can be used for Server traffic and Quorum disk attachment
SVC 6.2 (late) update:– Distance extension to max. 40 km with passive WDM devices
• Up to 20km at 4Gb/s or up to 40km at 2Gb/s.• LongWave SFPs for long distances required• LongWave SFPs must be supported from the switch vendor
© 2011 IBM Corporation
IBM System Storage
49
Split I/O Group without ISLs between SVC nodes
Supported configuration– Site 1 and Site 2 are connected via fibre channel
connections– A third site is required for Quorum disk placement– Quorum disk must be listed as “Extended Quorum” in the
SVC Supported Hardware List– Two ports on each SVC node needed to be connected to
the “remote” switches – SVC Volume mirroring between Site 1 and Site 2
Minimum distance
Maximum distance
Maximum Link Speed
>= 0 km = 10 km 8 Gbps
> 10 km = 20 km 4 Gbps
> 20km = 40km 2 Gbps
Switch 1
Switch 2
Switch 3
Switch 4
Active Quorum
SVC node1 SVC node2
Server 1 Server 2
Storage Storage
S ite 1 S ite 2
S ite 3
© 2011 IBM Corporation
IBM System Storage
50
Split I/O Group without ISLs between SVC nodes
Supported configuration– Site 1 and Site 2 are connected via fibre channel
connections– A third site is required for Quorum disk placement– Quorum disk must be listed as “Extended Quorum” in the
SVC Supported Hardware List– Two ports on each SVC node needed to be connected to
the “remote” switch – SVC Volume mirroring between Site 1 and Site 2
Active/Passive WDM devices can be used to reduce number of required FC links between both sites
Distance extension to max. 40km with WDM devices
Minimum distance
Maximum distance
Maximum Link Speed
>= 0 km = 10 km 8 Gbps
> 10 km = 20 km 4 Gbps
> 20km = 40km 2 Gbps
Switch 1
Switch 2
Switch 3
Switch 4
Switch 5 Switch 6
S V C n o d e 1 S V C n o d e 2
ISL (Server)
ISL (Server)
Server 1 Server 2
Storage 3 Storage 2
ISL (Server)
ISL (Server)
S ite 1 S ite 2
S ite 3A c t . Q u o ru m
© 2011 IBM Corporation
IBM System Storage
51
Split I/O Group without ISLs between SVC nodes
Quorum devices with active / passive controller without I/O re-routing (for example DS3/4/5K) must be connected to both controllers from each Site
Switch 1
Switch 2
Switch 3
Switch 4
Switch 5 Switch 6
S V C n o d e 1 S V C n o d e 2
ISL (Server)
ISL (Server)
Server 1 Server 2
Storage 3 Storage 2
ISL (Server)
ISL (Server)
S ite 1 S ite 2
S ite 3D S4700
A c t . Q u o ru m
C t l. A C t l. B
Supported configuration– Site 1 and Site 2 are connected via fibre channel
connections– A third site is required for Quorum disk placement– Quorum disk must be listed as “Extended Quorum” in the
SVC Supported Hardware List– Two ports on each SVC node needed to be connected to
the “remote” switch – SVC Volume mirroring between Site 1 and Site 2
Active/Passive WDM devices can be used to reduce number of required FC links between both sites
Distance extension to max. 40km with WDM devices
Minimum distance
Maximum distance
Maximum Link Speed
>= 0 km = 10 km 8 Gbps
> 10 km = 20 km 4 Gbps
> 20km = 40km 2 Gbps
© 2011 IBM Corporation
IBM System Storage
52
Split I/O Group without ISLs: Long distance configuration
SVC Buffer to Buffer credits– 2145–CF8 / CG8 have 41 B2B credits
•Enough for 10km at 8Gb/sec with 2 KB payload– All earlier models:
•Use 1/2/4Gb/sec fibre channel adapters•Have 8 B2B credits which is enough for 4km at 4Gb/sec
Recommendation 1:– Use CF8 / CG8 nodes for more than 4km distance for best performance
Recommendation 2:– SAN switches do not auto-negotiate B2B credits and 8 B2B credits is the default setting so change
the B2B credits in the switch to 41 as well
Link speed FC fram e length
R equired B 2B credits for 10 km d istance
M ax d istance w ith 8 B2B credits
1G b/sec 1 km 5 16 km
2 G b/sec 0.5 km 10 8 km
4 G b/sec 0.25 km 20 4 km
8 G b/sec 0.125 km 40 2 km
Link speed FC fram e length
R equired B 2B credits for 10 km d istance
M ax d istance w ith 8 B2B credits
1G b/sec 1 km 5 16 km
2 G b/sec 0.5 km 10 8 km
4 G b/sec 0.25 km 20 4 km
8 G b/sec 0.125 km 40 2 km
© 2011 IBM Corporation
IBM System Storage
53
Split I/O Group with ISLs between SVC nodes
Split I/O Group with ISLs between SVC nodes
Support with SVC 6.3: – Supports Metro Mirror distances between nodes– Third Site required for Quorum disk– ISLs with max. 1 hop can be used for:
• Quorum traffic• SVC node to node communication
– Requires dedicated private SAN only for inter-node traffic (which can be a Brocade virtual fabric, or a Cisco VSAN)
– Requires one ISL for each I/O Group between the private SANs at each site
Maximum distances:– 100km for live data mobility (150km with distance extenders)– 300km for fail-over / recovery scenarios
•SVC supports up to 80ms latency, far greater than most application workloads would tolerate
– The two sites can be connected using active or passive technologies such as CWDM / DWDM if desired.
Supported infrastructure:– WDM equipment similar to Metro Mirror– Link requirement similar to Metro Mirror
Pr iv.SAN 1
Publ.SAN 2
Pr iv.SAN 1
Publ.SAN 2
Switch Switch
SVC -01 SVC -02
ISL
ISL
ISL
ISL
Server 1
Server 2
Server 3
Server 4
Publ.SAN 1 Publ.SAN 1ISL
Pr iv.SAN 2 Pr iv.SAN 2
ISL
Site 1 S ite 2
S ite 3
A c t . Q u o ru m
C t l. A C t l. B
Q u o ru m c a n d id a t e
Storage
Q u o ru m c a n d id a t e
Storage
W D M W D M
W D M W D M
© 2011 IBM Corporation
IBM System Storage
54
Split I/O Group with ISLs between SVC nodes
Pr iv.SAN 1
Publ.SAN 2
Pr iv.SAN 1
Publ.SAN 2
Switch Switch
SVC -01 SVC -02
ISL
ISL
ISL
ISL
Server 1
Server 2
Server 3
Server 4
Publ.SAN 1 Publ.SAN 1ISL
Pr iv.SAN 2 Pr iv.SAN 2
ISL
Site 1 S ite 2
S ite 3
A c t . Q u o ru m
C t l. A C t l. B
Q u o ru m c a n d id a t e
Storage
Q u o ru m c a n d id a t e
Storage
W D M W D M
W D M W D M
WDM devices:– Same link and device requirements as for Metro
Mirror
Distances– Support of up to 300km (same recommendation as for
Metro Mirror)– Typical deployment of Split I/O Group only 1/2 or
1/3rd of this distance is recommended because there will be 2 or 3 times as much latency depending on what distance extension technology is used
Supported configuration– Site 1 and Site 2 are connected via fibre channel
connections– A third site is required for Quorum disk placement– Quorum disk must be listed as “Extended Quorum”
in the SVC Supported Hardware List– Two ports per SVC node attached to private SANs – Two ports per SVC node attached to private SANs– SVC Volume mirroring between Site 1 and Site 2– Hosts and storage attached to public SANs– 3rd site quorum attached to public SANs
Note 1: ISLs / Trunks are dedicated to a CiscoVSAN to guarantee bandwidth rater than beeing shared
Note 2: ISLs / Trunks are dedicated to a Brocade logical switch to guarantee bandwidth rather than beeing shared
– (i.e. ISLs are supported, LISLs and XISLs are not)
© 2011 IBM Corporation
IBM System Storage
55
Long distance with ISLs between SVC nodes
Technically SVC supports distances up to 8000km
SVC will tolerate a round trip delay of up to 80ms between nodes
In practice Applications are not designed to support a Write I/O latency of 80ms
Some switches and distance extenders use extra buffers and proprietary protocols to eliminate one of the round trips worth of latency for SCSI Write commands
– These devices are already supported for use with SVC– No benefit or impact inter-node communication– Does benefit Host to remote SVC I/Os– Does benefit SVC to remote Storage Controller I/Os
Consequences:– Metro Mirror is deployed for shorter distances (up to 300km) – Global Mirror is used for longer distances– Split I/O Group supported distance will depend on application latency restrictions
• 100km for live data mobility (150km with distance extenders)• 300km for fail-over / recovery scenarios• SVC supports up to 80ms latency, far greater than most application workloads would tolerate
© 2011 IBM Corporation
IBM System Storage
56
Split I/O Group Configuration: Examples
Pr iv.SAN 1
Publ.SAN 2
Pr iv.SAN 1
Publ.SAN 2
Switch Switch
SVC -01 SVC -02
ISL
ISL
ISL
ISL
Server 1
Server 2
Server 3
Server 4
Publ.SAN 1 Publ.SAN 1ISL
Pr iv.SAN 2 Pr iv.SAN 2
ISL
Site 1 S ite 2
S ite 3
A c t . Q u o ru m
C t l. A C t l. B
Q u o ru m c a n d id a t e
Storage
Q u o ru m c a n d id a t e
Storage
W D M W D M
W D M W D M
Example 3)
Configuration without live data mobility :
VMware ESX with SRM, AIX HACMP, or MS Cluster
Distance between sites: 180km
-> Only SVC 6.3 Split I/O Group with ISLs is supported or
-> Metro Mirror configuration
Because of long distances: only in active / passive configuration
Example 1)
Configuration with live data mobility:
VMware ESX with VMotion or AIX with live partition mobility
Distance between sites: 12km
-> SVC 6.3: Configuration with or without ISLs are supported
-> SVC 6.2: Only configuration without ISLs is supported
Example 2)
Configuration with live data mobility :
VMware ESX with VMotion or AIX with live partition mobility
Distance between sites: 70km
-> Only SVC 6.3 Split I/O Group with ISLs is supported.
© 2011 IBM Corporation
IBM System Storage
57
Split I/O Group - Disaster Recovery
Split I/O groups provide distributed HA functionality
Usage of Metro Mirror / Global Mirror is recommended for disaster protection
Both major Split I/O Group sites must be connected to the MM / GM infrastructure
Without ISLs between SVC nodes: – All SVC ports can be used for MM / GM connectivity
With ISLs between SVC nodes: – Only MM / GM connectivity to the public SAN network is supported– Only 2 FC ports per SVC node will be available for MM or GM and will also be used for
host to SVC and SVC to disk system I/O• Going to limit capabilities of overall system in my opinion
© 2011 IBM Corporation
IBM System Storage
58
Summary
SVC Split I/O Group:– Is a very powerful solution for automatic and fast handling of storage failures – Transparent for servers– Perfect fit in a vitualized environment (like VMware VMotion, AIX Live Partition Mobility)– Transparent for all OS based clusters – Distances up to 300 km (SVC 6.3) are supported
Two possible scenarios:– Without ISLs between SVC nodes (classic SVC Split I/O Group)
• Up to 40 km distance with support for active (SVC 6.3) and passive (SVC 6.2) WDM– With ISLs between SVC nodes:
• Up to 100 km distance for live data mobility (150 km with distance extenders)• Up to 300 km for fail-over / recovery scenarios
Long distance performance impact can be optimized by: – Load distribution across both sites– Appropriate SAN Buffer to Buffer credits
© 2011 IBM Corporation
IBM System Storage
Q&A
© 2011 IBM Corporation
IBM System Storage
Q&A
© 2011 IBM Corporation
IBM System Storage
Q&A
© 2011 IBM Corporation
IBM System Storage
Q&A
© 2011 IBM Corporation
IBM System Storage
Q&A