A Multivendor Post on using

8/7/2019 A Multivendor Post on using..

http://slidepdf.com/reader/full/a-multivendor-post-on-using 1/21

irtual Geek

an insider's perspective, technical tips n' tricks in the era of the VMware Revolution

Home

Archives

Subscribe

September 21, 2009

A “Multivendor Post” on using iSCSI with VMware vSphere

One of the most popular posts we’ve ever done was the original “A ‘Multivendor Post’ to help our mutual iSCSI

customers using VMware” that focused on the operation of the software iSCSI initiator in ESX 3.5 with several iSCSI

targets from multiple vendors. There’s been a lot of demand for a follow-up, so without further ado, here’s a

multivendor collaborative effort on an update, which leverages extensively content from VMworld 2009 sessions

TA2467 and TA3264. The post was authored by the following vendors and people: VMware (Andy Banta), EMC (Chad

Sakac), NetApp (Vaughn Stewart), Dell/EqualLogic( Eric Schott), HP/Lefthand Networks (Adam Carter)

One important note – in this post (and going forward we’ll be trying to do this consistently) all commands,

configurations and features noted apply to vSphere ESX and ESXi equally. Command line formats are those used

in when using the VMware vMA, which can be used with both ESX and ESXi. Alternate command line variants are

possible when using the remote CLI or service console, but we’re standardizing on the vMA.

This post covers a broad spectrum of topics surrounding the main point (changes in, and configuration of the iSCSI

software initiator in vSphere ESX/ESXi 4) including:

Multipathing using NMP and the Pluggable Storage Architecture

vmknic/vSwitch setup

Subnet configuration

Jumbo Frames

Delayed Ack

Other configuration recommendations

If this sounds interesting – or you’re a customer using (or considering using!) iSCSI and vSphere 4 – read on!

First topic: core changes in the iSCSI software initiator from ESX 3.x.

The ESX software iSCSI initiator was completely rewritten for vSphere 4. This was done primarily for performance

reasons, but also because the vSphere 4 compatibility base for Linux drivers transitioned from the 2.4 kernel to the 2.6

kernel. Remember that while the vmkernel doesn’t “run” Linux, or “run on Linux” – the core driver

stack has common elements with Linux. Along with the service console running a Linux variant, these are the

two common sources of the “VMware runs on Linux” theory – which is decidedly incorrect.

As an aside, there is also an interest in publishing a iSCSI HBA DDK, allowing HBA vendors to write and supply their

own drivers, decoupled from ESX releases. The changes could also allow storage vendors to write and supply

components to manage sessions to make better use of the pluggable multipathing capability delivered in ESX4.

(Neither the HBA DDK nor the session capability have been released, yet. Development, documentation and

certification suites are still underway.)

Some of the goodness that was in ESXi 3.5, has also made it into all ESX versions:

The requirement for a Console OS port on your iSCSI network has been removed. All iSCSI control path

operations are done through the same vmkernel port used for the data path. This compares with ESX 3.x where

iSCSI control operations required a console port. This is a very good thing: no console port needed for

ESX4 – all versions.

Enabling the iSCSI service also automatically configures all the firewall properties needed.

Performance is improved several ways:

Storage paths are more efficient and keep copying and potentially blocking operations to a minimum.

Systems using Intel Nehalem processors can offload digest calculation the the processors' built-in CRC calculation



engine.

However, the biggest performance gain is allowing the storage system to to scale to the number of NICs available

on the system. The idea is that the storage multipath system can make better use of the multiple paths it has

available to it than NIC teaming at the network layer.

If each physical NIC on the system looks like a port to a path to storage, the storage path selection policies can

make better use of them.

Second topic: iSCSI Multipathing

This is the perhaps the most important change in the vSphere iSCSI stack.

iSCSI Multipathing is sometimes also referred to as "port binding.", However, this term is ambiguous enough (often

it makes people think of “link aggregation” incorrectly) that we should come up with a better term…

By default, iSCSI multipathing is not enabled in vSphere4. The ESX iSCSI initiator uses vmkernel networking similarly

to ESX 3.5, out of the box. The initiator presents a single endpoint and NIC teaming through the ESX vswtich takes

care of choosing the NIC. This allows easy upgrades from 3.5 and simple configuration of basic iSCSI setups.

Setting up iSCSI multipathing requires some extra effort because of the additional layer of virtualization provided by

the vSwitch. The ESX vmkernel networking stack, used by the iSCSI initiator, communicates with virtual vmkernel

NICs, or vmkNICs. The vmkNICs are attached to a virtual switch, or vswitch, that is then attached to physical NICs.

Once iSCSI multipathing is set up, each port on the ESX system has its own IP address, but they all share the same

iSCSI initiator iqn. name.

So – setup in 4 easy steps:

Step 1 – configure multiple vmkNICs

Ok, the first obvious (but we’re not making any assumptions) is that you will need to configure multiple physical

Ethernet interfaces, and multiple vmkernel NIC (vmkNIC) ports, as shown in the screenshot below.

You do this by navigating to the Properties dialog for a vSwitch and select “add”, or by simply clicking on “add

Networking” and add additional vmkNICs.

This can also be done via the command line:

esxcfg-vmknic --server <servername> -a -i 10.11.246.51 -n 255.255.255.0

Note: certain vmkNIC parameters (such as jumbo frame configuration) can only be done as the vmkNIC is being

initially configured. Changing them subsequently requires removing and re-adding the vmkNIC. For the jumbo frame

example, see that section later in this post.

Step 2 – configure explicit vmkNIC-to-vmNIC binding.



To make sure the vmkNICs used by the iSCSI initiator are actual paths to storage, ESX configuration requires the

vmkNIC is connected to a portgroup that only has one active uplink and no standby uplinks. This way, if the uplink is

unavailable, the storage path is down and the storage multipathing code can choose a different path. Let’s be

REALLY clear about this – you shouldn’t use link aggregation techniques with iSCSI – you

should/will use MPIO (which defines end-to-end paths from initiator to target). This isn’t stating that

these aren’t bad (they are often needed in the NFS datastore use case) – but remember that block storage models use

MPIO in the storage stack, not the networking stack for multipathing behavior.

Setting up the vmkNICs to use only a single uplink can be done through the UI, as shown below – just select the

adapter in the the “active” list and move it down to “unused adapters”, such that each vmkNIC used for iSCSI has only

one active physical adapter.

Instructions for doing this are found in Chapter 3 of the iSCSI SAN Configuration Guide, currently page 32.

Step 3 – configuring the iSCSI initiator to use the multiple vmkNICs

Then the final step requires command line configuration. This step is where you assign, or bind, the vmkNICs to the

ESX iSCSI initiator. Once the vmkNICs are assigned, the iSCSI initiator uses these specific vmkNICs as outboundports, rather than the vmkernel routing table. Get the list of the vmkNICs used for iSCSI (in the screenshot below, this

was done using the vicfg-vmknic --server <servername> –l command



Then, explicitly tell the iSCSI software initiator to use all the appropriate iSCSI vmkNICs using the following

command:

esxcli –-server <servername> swiscsi nic add -n <VMkernelportname> -d <vmhbaname>

To identify the vmhba name, navigate to the “Configuration” tab in the vSphere client, and select “Storage Adapters”.

You’ll see a screen like the one below. In the screenshot below, the vmhba_name is “vmhba38”. Note also in the

screenshot below, the 2 devices have four paths.



The end result of this configuration is that you end up with multiple paths to your storage. How many depends on your

particular iSCSI target (storage vendor type). The iSCSI intiator will login to each iSCSI target reported by the “Send

targets” command issued to the iSCSI target listed in the “Dynamic Discovery” dialog box from each iSCSI vmkNIC.

Before we proceed we need to introduce the storage concept of a storage portal to those whom may not be familiar

with iSCSI. At a high level an iSCSI portal is the IP address(es) and port number of a SCSI storage target. Each storage vendor may implement storage portals in slightly different manners.

In storage nomenclature you will see devices “runtime name” represented in the following format: vmhba#:C#:T#:L#.

The C represents a controller, the T is the SCSI target, and the L represents the LUN.

With single-portal storage, such as EqualLogic or LeftHand systems, you'll get as many paths to the storage as you

have vmkNICs (Up to the ESX maximum of 8 per LUN/Volume) for iSCSI use. These storage systems only advertise a

single storage port, even though connections are redirected to other ports, so ESX establishes one path from each

server connection point (the vmkNICs) to the single storage port.

A single-portal variation is an EMC Celerra iSCSI target . In the EMC Celerra case, a large number of iSCSI targets can

be configured, but a LUN exists behind a single target – and the Celerra doesn’t redirect in the way EqualLogic or



Lefthand do. In the EMC Celerra case, configure an iSCSI target network portal with multiple IP addresses. This is

done by simply assigning multiple logical (or physical) interfaces to a single iSCSI target. ESX will establish one path

from each server connection (the vmkNICs) to all the IP addresses of the network portal.

Yet other storage advertises multiple ports for the storage, either with a separate target iqn. name or with different

target portal group tags (pieces of information returned to the server from the storage during initial discovery). These

multi-portal storage systems, such as EMC CLARiiON, NetApp FAS, and IBM N-Series, allow paths to be established

between each server NIC and each storage portal. So, if your storage has three vmkNICs assigned for iSCSI and your

storage has two portals, you'll end up with six paths.

These variations shouldn’t be viewed as intrinsically better/worse (at least for the purposes of this multivendor post –

let’s leave positioning to the respective sales teams). Each array has a different model for how iSCSI works.

There are some limitations for multiple-portal storage that require particular consideration. For example, EMC

CLARiiON currently only allows a single login to each portal from each initiator iqn. Since all of the initiator ports

have the same iqn., this type of storage rejects the second login. (You can find log messages about this with logins

failing reason 0x03 0x02, "Out of Resources."). You can work around this problem by using the subnet configuration

described here. Details on the CLARiiON iSCSI target configuration and multipathing state can be seen in the EMC

Storage Viewer vCenter plugin.

By default storage arrays from NetApp, including the IBM N-Series, provide an iSCSI portal for every IP address on the

controller. This setting can be modified by implementing access lists and / or disabling iSCSI access on physical

Ethernet ports. The NetApp Rapid Cloning Utility provides an automated means to configure these settings from

within vCenter.

Note that iSCSI Multipathing is not currently supported with Distributed Virtual Switches, either the VMware

offering or the Cisco Nexus 1000V. Changes are underway to fix this and allow any virtual switch to be supported.

Step 4 – Enabling Multipathing via the Pluggable Storage Architecture

Block storage multipathing is handled by the MPIO part of the storage stack, and selects paths (for both performance

and availability purposes) based on an end-to-end path.



This is ABOVE the SCSI portion of the storage stack (which is above iSCSI which in turn is above the networkingstack). Visualize the “on-ramp” to a path as the SCSI initiator port. More specifically in the iSCSI case, this is based

on the iSCSI session – and after step 3, you will have multiple iSCSI sessions. So, if you have multiple iSCSI sessions

to a single target (and by implication all the LUNs behind that target), you have multiple ports, and MPIO can do it’s

magic across those ports.

This next step is common across iSCSI, FC, & FCoE.

When it comes to path selection, bandwidth aggregation and link resiliency in vSphere, customers have the option to

use one of VMware's Native Multipathing (NMP) Path Selection Policies (PSP), 3rd party PSPs, or 3rd party

Multipthing Plug-ins (MPP) such as PowerPath V/E from EMC.

All vendors on this post support all of the NMP PSPs that ship with vSphere, so we’ll put aside the relative pros/cons of

3rd party PSPs and MPPs in this post, and assume use of NMP.

NMP is included in all vSphere releases at no additional cost. NMP is supported in turn by two “pluggable modules”.

The Storage Array Type Plugin (SATP) identifies the storage array and assigns the appropriate Path Selection Plugin

(PSP) based on the recommendations of the storage partner.

VMware ships with a set of native SATPs, and 3 PSPs: Fixed, Most Recently Used (MRU), & Round Robin (RR). Fixed

and MRU options were available in VI3.x and should be familiar to readers. Round Robin was experimental in VI3.5,

and is supported for production use in vSphere (all versions)

Configuring NMP to use a specific PSP (such as Round Robin) is simple and easy. You can do it in the vSphere Client

under configuration, storage adapter, select the devices, and right click for properties. That shows this dialog box (note

that Fixed or MRU are always the default, and with those policies, depending on your array type – you may have many

paths as active or standby, only one of them will be shown as “Actve (I/O)”):



You can change the Path Selection Plugin with the pull down in the dialog box. Note that this needs to be done

manually for every device, on every vSphere server when using the GUI. It’s important to do this consistently across all

the hosts in the cluster. Also notice that when you switch the setting in the pull-down, it takes effect immediately –

and doesn’t wait for you to hit the “close” button.

You can also configure the PSP for any device using this command:

esxcli -–server <servername> nmp device setpolicy --device <device UID> --psp <PSP type

Alternatively, vSphere ESX/ESXi 4 can be configured to automatically choose round robin for any device claimed by a

given SATP. To make all new devices that use a given SATP to automatically use round robin, configure ESX/ESXi to

use it as the default path selection policy from command line.

esxcli --server <servername> corestorage claiming unclaim --type location

esxcli --server <servername> nmp satp setdefaultpsp --satp <SATP type> --psp VMW_PSP_RR

esxcli --server <servername> corestorage claimrule load

esxcli --server <servername> corestorage claimrule run

Three Additional Questions and Answers on the Round Robin PSP…

Question 1: “When shouldn’t I configure Round Robin?”

Answer: While configuring interface you may note that Fixed and MRU are always the default PSP associated with

the native SATP options – across all arrays. This is a protective measure in case you have VMs running Microsoft

Cluster Services (MSCS). Round Robin can interfere with applications that use SCSI reservations for sharing LUNs

among VMs and thus is not supported with the use of LUNs with MSCS. Otherwise, there’s no particular reason not to

use NMP Round Robin, with the additional exception of the note below (your iSCSI array requires the use of ALUA,

and for one reason or another you cannot change that)

Question 2: “If I’m using an Active/Passive array – do I need to use ALUA”?

Answer: There is another important consideration if you are using an array that has an “Active/Passive” LUN

ownership model when using iSCSI. With these arrays, Round-Robin can result in path thrashing (where a storage

target bounces behind storage processors in a race condition with vSphere) if the storage target is not properly

configured.



Of the vendors on this list, EMC CLARiiON and NetApp traditionally are associated with an “Active/Passive” LUN

ownership model – but it’s important to note that the NetApp iSCSI target operates in this regard more like the EMC

Celerra iSCSI target and is “Active/Active” (fails over with the whole “brain” when the cluster itself fails over rather

than the LUN transiting from one “brain to another” - “brain” in Celerra-land is called a Datamover, in NetApp land is

called a cluster controller).

Conversely the CLARiiON iSCSI target LUN operates the same as an CLARiiON FC target LUN – and trespasses from

one storage processor to another. So – ALUA configuration is important for CLARiiON for iSCSI, Fibre-Channel/FCoE

connected hosts, and NetApp when using Fibre Channel/FCoE connectivity (beyond the scope of this post). So – if

you’re not a CLARiiON iSCSI customers, or using CLARiiON or NetApp with Fibre Channel/FCoE)

customer (since this multipathing section applies to FC/FCoE), you can skip to the third Round Robin

note.

Active/Passive LUN ownership models in VMware lingo doesn’t mean that one storage processor (or “brain” of the

array) is idle – rather that a LUN is “owned” (basically “behind”) one of the two storage processors at any given

moment. If using EMC CLARiiON CX4 (or NetApp array with Fibre-Channel) and vSphere, the LUNs should be

configured for Asymmetric Logical Unit Access (ALUA). When ALUA is configured, as opposed to the ports on the

“non-owning storage processor” showing up in the vSphere client as “standby”, they show up as “active”. Now – they

will not be used for I/O in a normal state – as the ports on the “non-owning storage processor” are “non-optimized”

paths (there is a slower, more convoluted path for I/O via those ports). This is shown in the diagram below.

On each platform – configuring ALUA entails something specific.

On an EMC CLARiiON array when coupled with vSphere (which implements ALUA support specifically with SCSI-3

commands, not SCSI-2), you need to be running the latest FLARE 28 version (specifically 04.28.000.5.704 or later).

This in turn currently implies CX4 only, not CX3, and not AX. You then need to run the Failover Wizard and

configure the hosts in the vSphere cluster to use failover mode 4 (ALUA mode). This is covered in the

CLARiiON/VMware Applied Tech guide (the CLARiiON/vSphere bible) here, and is also discussed on this post here.

Question 3: “I’ve configured Round Robin – but the paths aren’t evenly used”

Answer: The Round Robin policy doesn’t issue I/Os in a simple “round robin” between paths in the way many expect.

By default the Round Robin PSP sends 1000 commands down each path before moving to the next path; this is called

the IO Operation Limit. In some configurations, this default configuration doesn't demonstrate much path aggregation

because quite often some of the thousand commands will have completed before the last command is sent. Thatmeans the paths aren't full (even though queue at the storage array might be). When using 1Gbit iSCSI, quite often the

physical path is often the limiting factor on throughput, and making use of multiple paths at the same time shows

better throughput.



You can reduce the number of commands issued down a particular path before moving on to the next path all the way

to 1, thus ensuring that each subsequent command is sent down a different path. In a Dell/EqualLogic configuration,

Eric has recommended a value of 3.

You can make this change by using this command:

esxcli --server <servername> nmp roundrobin setconfig --device <lun ID> --iops <IOOperationLimit_value>

--type iops

Note that cutting down the number of iops does present some potential problems. With some storage arrays caching isdone per path. By spreading the requests across multiple paths, you are defeating any caching optimization at the

storage end and could end up hurting your performance. Luckily, most modern storage systems don't cache per port.

There's still a minor path-switch penalty in ESX, so switching this often probably represents a little more CPU

overhead on the host.

That’s it!

If you go through these steps, and you will a screen that looks like this one. Notice that Round Robin is the Path

Selection configuration, and the multiple paths to the LUN are both noted as “Active (I/O)”. With an ALUA-configured

CLARiiON, the paths to the “non-owning” storage processor ports will show as “Active” – meaning they are active, but

not being used for I/O

This means you’re driving traffic down multiple vmknics (and under the vSphere client performance tab, you will see

multiple vmknics chugging away, and if you look at your array performance metrics, you will be driving traffic down

multiple target ports).

Now, there are couple other important notes – so let’s keep reading :-)

Third topic: Routing Setup

With iSCSI Multipathing via MPIO, the vmkernel routing table is bypassed in determining which outbound port to use

from ESX. As a result of this VMware officially says that routing is not possible in iSCSI SANs using iSCSI

Multipathing. Further – routing iSCSI traffic via a gateway is generally a bad idea. This will introduce unnecessary

latency – so this is being noted only academically. We all agree on this point – DO NOT ROUTE iSCSI TRAFFIC.

But, for academic thoroughness, you can provide minimal routing support in vSphere because a route look-up is done

when selecting the vmknic for sending traffic. If your iSCSI storage network is on a different subnet AND you iSCSI

Multipathing vmkNICs are on the same subnet as the gateway to that network, routing to the storage works. For

example look at this configuration:

on the vSphere ESX/ESXi server:

vmk0 10.0.0.3/24 General purpose vmkNIC



vmk1 10.1.0.14/24 iSCSI vmkNIC

vmk2 10.1.0.15/24 iSCSI vmkNIC

Default route: 10.0.0.1

on the iSCSI array:

iSCSI Storage port 1: 10.2.0.8/24

iSCSI Storage port 2: 10.2.0.9/24

In this situation, vmk1 and vmk2 are unable to communicate with the two storage ports because the only route to the

storage is accessible through vmk0, which is not set up for iSCSI use. If you add the route:

Destination: 10.2.0.0/24 Gateway: 10.1.0.1 (and have a router at the gateway address)

then vmk1 and vmk2 are able to communicate with the storage without interfering with other vmkernel routing setup.

Fourth topci: vSwitch setup

There are no best practices for whether vmkNICs should be on the same or different vswitches for iSCSI Multipathing.

Provided the vmkNIC only has a single active uplink, it doesn't matter if there are other iSCSi vmkNICs on the same

switch or not.

Configuration of the rest of your system should help you decide the best vswitch configuration. For example, if the

system is a blade with only two NICs that share all iSCSI and general-purpose traffic, it makes best sense for both

uplinks to be on the same vswitch (to handle teaming policy for the general, non-iSCSI traffic). Other configurations

might be best configured with separate vswitches.

Either configuration works.

Forth topic: Jumbo frames

Jumbo frames are supported for iSCSI in vSphere 4. There was confusion about whether or not they were supported

with ESX 3.5 – the answer is no, they are not supported for vmkernel traffic (but are supported for virtual machine

traffic).

Jumbo frames simply means that the size of largest the Ethernet frame passed between one host and another on the

Ethernet network is larger than than the default. By default, the "Maximum Transmission Unit" (MTU) for Ethernet is

1500 bytes. Jumbo frames are often set to 9000 bytes, the maximum available for a variety of Ethernet equipment.

The idea is that larger frames represent less overhead on the wire and less processing on each end to segment and then

reconstruct Ethernet frames into the TCP/IP packets used by iSCSI. Note that recent Ethernet enhancements TSO

(TCP Segment Offload) and LRO (Large Receive Offload) lessen the need to save host CPU cycles, but jumbo frames

are still often configured to extract any last benefit possible.

Note that jumbo frames must be configured end-to-end be useful. This means the storage, Ethernet switches, routers

and host NIC all must be capable of supporting jumbo frames – and Jumbo frames must be correctly configured

end-to-end on the network. If you miss a single Ethernet device, you will get a significant number of Ethernet layer

errors (which are essentially fragmented Ethernet frames that aren’t correctly reassembled).

Inside ESX, jumbo frames must be configured on the physical NICs, on the vswitch and on the vmkNICs used by

iSCSI. The physical uplinks and vswitch are set by configuring the MTO of the vswitch. Once this is set, any physical

NICs that are capable of passing jumbo frames are also configured. For iSCSI, the vmkNICs must also be configured to

pass jumbo frames.

Unfortunately, the vSwitch and the vmkNICs must be added (or, if already existing, removed and

re-created) from the command line to provide jumbo frame support: Note this will disconnect any

active iSCSI connections so this should be done as a maintenance operation while VMs residing on the

Datastores/RDMs are running on other ESX hosts. (I know this sounds like an “of course” but just a

good warning).Below is an example:

# esxcfg-vmknic --server <servername> -l|cut -c 1-161

Interface Port Group/DVPort IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled



Type

vmk1 iSCSI2 IPv4 10.11.246.51 255.255.255.0 10.11.246.255 00:50:56:7b:00:08 1500 65535 true STAT

vmk0 iSCSI1 IPv4 10.11.246.50 255.255.255.0 10.11.246.255 00:50:56:7c:11:fd 9000 65535 true STAT

# esxcfg-vmknic --server <servername> -d iSCSI2

# esxcfg-vmknic --server <servername> -a -i 10.11.246.51 -n 255.255.255.0 -m 9000 iSCSI2

# esxcfg-vmknic --server <servername> -l|cut -c 1-161

Interface Port Group/DVPort IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled

Type

vmk0 iSCSI1 IPv4 10.11.246.50 255.255.255.0 10.11.246.255 00:50:56:7c:11:fd 9000 65535 true STAT

vmk1 iSCSI2 IPv4 10.11.246.51 255.255.255.0 10.11.246.255 00:50:56:7b:00:08 9000 65535 true STAT

If the vmkNICs are already set up as iSCSI Multipath vmkNICs, you must remove them from the iSCSI

configuration before deleting them and re-adding them with the changed MTU.

Fifth topic: Delayed ACK

Delayed ACK is a TCP/IP method of allowing segment acknowledgements to piggyback on each other or other data

passed over a connection with the goal of reducing IO overhead.

If your storage system is capable of supporting delayed ACK, verify with your vendor if delayed ACK should be enabled.

Sixth topic: other configuration recommendations:

Most of the original multivendor iSCSI post “general recommendations” are as true as ever. When setting up the

Ethernet network for iSCSI (or NFS datastores) use – don’t think of it as “it’s just on my LAN”, but rather “this is the

storage infrastructure that is supporting my entire critical VMware infrastructure”. IP-based storage needs the same

sort of design thinking traditionally applied to FC infrastructure – and when you do, it can have the same availability

envelope as traditional FC SANs. Here are some things to think about:

Are you separating you storage and network traffic on different ports? Could you use VLANs for this? Sure. But is

that “bet the business” thinking? It’s defensible if you have a blade, and a limited number of high bandwidth

interfaces, but think it through… do you want a temporarily busy LAN to swamp your storage (and vice-versa) for

the sake of a few NICs and switch ports? So if you do use VLANs, make sure you are thorough and implement

QoS mechanisms. If you’re using 10GbE using VLANs can make a lot of sense and cut down on your network

interfaces, cables, and ports, sure – but GbE – not so much.

Think about Flow-Control (should be set to receive on switches and transmit on iSCSI targets)

Either disable spanning tree protocol (only on the most basic iSCSI networks) – or enable it only with either RSTP



or portfast enabled. Another way to accomplish this if you share the network switches with the LAN, you can

filter / restrict bridge protocol data units on storage network ports

If at all possible, use Cat6a cables rather than Cat5e (and don’t use Cat5). Yes, Cat5e can work – but remember –

this is “bet the business”, right? Are you sure you don’t want to buy that $10 cable?

Things like cross-stack Etherchannel trunking can be handy in some configurations where iSCSI is used in

conjunction with NFS (see the “Multivendor NFS post” here)

Each Ethernet switch also varies in its internal architecture – for mission-critical, network intensive Ethernet

purposes (like VMware datastores on iSCSI or NFS), amount of port buffers, and other internals matter – it’s agood idea to know what you are using.

In closing.....

We would suggest that anyone considering iSCSI with vSphere should feel very confident that their deployments can

provide high performance and high availability. You would be joining many, many customer enjoying the benefits of

VMware and advanced storage that leverages Ethernet.

With the new iSCSI initiator, the enablement of multiple TCP sessions per target, and the multipathing enhancements

in vSphere ESX 4 it is possible to have highly availabily and high performing storage using your existing Ethernet

infrastructure. The need for some of the workarounds discussed here for ESX 3.5 can now be parked in the past.

To make your deployment a success, understand the topics discussed in this post, but most of all ensure that you

follow the best practices of your storage vendor and VMware.

VMware: iSCSI SAN Configuration Guide,

EMC CLARiiON: VMware Applied Technology Guide

http://www.emc.com/collateral/hardware/white-papers/h1416-emc-clariion-intgtn-vmware-wp.pdf

Symmetrix: VMware Applied Technology Guide

http://www.emc.com/collateral/hardware/white-papers/h6531-using-vmware-vsphere-with-emc-

symmetrix-wp.pdf

Celerra: VMware Applied Technology Guide

http://www.emc.com/collateral/hardware/white-papers/h6337-introduction-using-celerra-vmware-

vsphere-wp.pdf

vSphere on NetApp – Storage Best Practices TR-3749 http://www.netapp.com/us/library/technical-reports

/tr-3749.html

Dell/EqualLogic TR1049:

http://www.equallogic.com/resourcecenter/assetview.aspx?id=8453

Posted at 10:46 AM | Permalink

Digg This | Save to del.icio.us

TrackBack TrackBack URL for this entry:

http://www.typepad.com/services/trackback/6a00e552e53bd288330120a587a074970b

Listed below are links to weblogs that reference A “Multivendor Post” on using iSCSI with VMware vSphere:

Comments

You can follow this conversation by subscribing to the comment feed for this post.

Chad – thanks for the effort in making this happen. I trust the VMware customers, regardless if the are Dell, EMC, HP, or NetApp

customers, really appreciate the content.

What’s next – a thorough discussion of Multipathing with vSphere?

Cheers!



Posted by: Vaughn Stewart | September 21, 2009 at 09:45 PM

Tip for the readers:

If you like to see the iSCSI session, connection information, try the following command on ESX.

cat /proc/scsi/iscsi_vmk/5

Posted by: twitter.com/sthoppay | September 22, 2009 at 03:40 PM

Vaughn my pleasure. I'm always up for a collaborative effort. Much more fulfilling than the silly negative selling I see out there so

often.

Posted by: Chad Sakac | September 24, 2009 at 06:36 PM

I'm having trouble with MPIO on LeftHand ...

My LH VIP : .100My LH storage node IPs : .110 , .111, .112, .113, etc

My VM iscsi IPs : .210, .211, .212, .213, etc

When I rescan my iscsi adapter I get 4 paths with a default setting of "Fixed". This host has 4 NICs configured per the article with

umbo frames enabled. Each of the paths seem to be attaching directly to one of the nodes ".112".

With a "Fixed" setting I get roughly 90MB/sec writes.

With a "Round Robin" setting I get roughly 50MB/sec writes.

I have 3 LeftHand nodes with 15x 750GB SATA.

1) Any thoughts on worse performance with RR?

2) Should I expect better than 90MB/sec writes with RR working correctly?

3) I expected to see 4 different target IPs - is that the case, or is this what you were referring to above when you say that LeftHandsystems use a single target? I assumed you were talking about the VIP with that statement and not the storage node addresses.

BTW: thanks for the article - excellent information to have all in the same place!!

Thanks

Posted by: Cameron Smith | October 03, 2009 at 04:40 PM

Chad,

Thanks for updating this topic to vSphere. There are so many changes it really did warrant a new article. I initially built our vSphere

hosts with two vmkernel iSCSI pNICs and two dedicated pNICs for guest iSCSI connections. I find I'm not really using the two guestiSCSI NICs. At this point I think I'll add those NICs to the vmkernel connections, as it looks like ESX can take advantage of them now.

Any additional thoughts on that subject in vSphere?

Posted by: Rob D. | October 06, 2009 at 12:17 PM

Awesome article, thanks for posting!

Any word on when iSCSI multipath will be supported on a DVS?

Posted by: Dominic | October 06, 2009 at 08:14 PM

@Cameron - are you testing with one LUN, or many (across many ESX hosts)? This behavior (RR being worse than fixed/MRU) is

often triggered by the IOOperationLimit (default is 1000) when the LUN count is small (1000 IO operations before changing paths).



@Rob - glad you liked the post - it was a labor of love for my fellow partners and I. You got it on the configuration! some of the details

vary depending on your iSCSI target. What are you array are you using - and I'll be happy to help further...

@Dominic - this is a case of qual more than anything (VERY important, but a different class of issue than known techncial issue) -

working on it hard. Will get the target date and get back to you.

Posted by: Chad Sakac | October 06, 2009 at 08:40 PM

Chad,

I'm testing this with the VSA and realized that the VSA doesn't support jumbo frames (although the CMC thinks it does ...)

After turning off jumbos, RR is performing evenly with Fixed ~80MB and increases to ~90MB/sec when I fiddle with the iops

threshold with values between 1-100.

My expectation was that bandwidth for any single LUN would scale linearly with the number of NICs (4 adapters == 4 x the amount

of bandwidth), and that we would be limited by the bandwidth consumed by all VMs or by the bandwidth available at the SAN. If I'm

way off on that assumption can you point me in the right direction?

Thanks so much!

Posted by: Cameron Smith | October 07, 2009 at 11:42 AM

Hi,

Thanks for the great post, this helped me a lot to go after performance on our MSA2012. I set up the NMP and Jumbo Frames. Now I

got a weird problem. Before I changed the iops parameter I got following readings in my performance test (just used dd on a VM):

BS=8192k, Count= 1000 Write 109MB/s Read 102MB/s

BS=512k, Count=10000 Write 121 MB/s Read 104MB/s

BS=64k, Count=100000 Write 111MB/s Read 105MB/s

Then I started to play with the following:

esxcli nmp roundrobin setconfig --device naa.something --iops 3 --type iops

BS=8192k, Count= 1000 Write 86,7 MB/s Read 90,9MB/s

BS=512k, Count=10000 Write 92,4 MB/s Read 86,1MB/s

BS=64k, Count=100000 Write 87,9 MB/s Read 96,2MB/s

So I decided to switch back to the original 1000 setting. But now I cannot get the same write performance anymore! Is there

something that I have missed here? Should I somehow change something else as well in addition to the iops parameter?

Best regards,

Andy

Posted by: www.facebook.com/profile.php?id=505438434 | October 12, 2009 at 10:13 AM

Andy,

I experienced the same and the only thing that seemed to work for me was to set to "Fixed" in the GUI and then back to round robin ...

that seemed to get me back to my baseline.

Best,

Cameron

Posted by: Cameron Smith | October 12, 2009 at 03:58 PM

Hi,

Thanks for the info. I tried that already but still no effect :) I have to check out my configuration once more. Does changing to Fixed

and back reset the NMP settings somehow?



What kind of performance are you getting btw?

Andy

Posted by: www.facebook.com/profile.php?id=505438434 | October 13, 2009 at 06:12 AM

Andy,

That's strange, it seemed to work for me.

I'm getting roughly 120-130MB/sec writes with the LeftHand nodes - when I enable the network RAID level 2 I get right around

75-80MB/sec.

Reads are currently peaking around 105MB/sec.

I think I'm using similar methods as you: dd if=/dev/zero and hdparm -t for reads.

I've also run bonnie++ with a similar result.

One thing I'm starting to realize is that (at least with my SAN), this post leads to better SAN performance and redundancy and not

necessarily better individual VM performance. Anyone, please feel free to correct me if I'm wrong here.

Posted by: Cameron Smith | October 13, 2009 at 10:50 AM

Hi Chad!

Thanks for all your hard work on this blog. Your recent iSCSI posts have definitely stirred MUCH discussion among my peers. :)

One thing we can't really find a definitive/good answer to is this...

Is there really any benefit to having more than a 1:1 ratio of vmkernel ports to each physical NICs? I've seen a few documents (mostly

from EQL) that actually suggest binding 2 or even 3 vmkernel ports to a single NIC, but I just don't understand what is to be gained by

that. Sure, it allows multiple paths to the same volume, but it's still all going through the same physical NIC.

If you could explain that, or cite some references elsewhere, I (we) would greatly appreciate it.

Thanks!

Dave

Posted by: DaveM | October 15, 2009 at 04:34 PM

Chad,

Thanks a lot for this post. Very useful, and a great resource for many diverse bits of information in one spot.

Posted by: farewelldave | October 30, 2009 at 02:41 PM

Hello,

Great job on this post about vSphere and iSCSI multi-path!

It all works much as you show, except that I cannot get the

NMP 'policy=' settings to work properly after an ESX host

reboot.

When I set the policy 'type' to 'iops', with an '--iops'

value of '10', it works, but after a reboot, the '--iops'

value gets reset to a crazy large value of '1449662136',

which is much worse than the default of '10000'.

OK, so I decided to try the policy 'type' set to 'bytes',

with a '--bytes' value of '11'.

Unfortunately, although this policy value does stick after a

reboot, when the policy is set to 'type' 'bytes', no round-



robin multipath seems to occur at all, even before the

reboot.

What am I doing wrong?

I remember reading somewhere that there was a similar

problem with the '--iops' value not sticking in ESX 3.5, but

it now seems to be worse.

Thanks.

~ Jeff Byers ~

# esxcli nmp device setpolicy -d naa.600174d001000000010f003048318438 --psp VMW_PSP_RR

# esxcli nmp roundrobin setconfig -d naa.600174d001000000010f003048318438 --iops 10 --type iops

# esxcli nmp device list -d naa.600174d001000000010f003048318438

naa.600174d001000000010f003048318438

Device Display Name: StoneFly iSCSI Disk (naa.600174d001000000010f003048318438)

Storage Array Type: VMW_SATP_DEFAULT_AA

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_RR

Path Selection Policy Device Config: {policy=iops,iops=10,bytes=10485760,useANO=0;lastPathIndex=1:

NumIOsPending=0,numBytesPending=0}

Working Paths: vmhba34:C1:T0:L0, vmhba34:C2:T0:L0

# sync;sync;reboot


naa.600174d001000000010f003048318438





Path Selection Policy Device Config: {policy=iops,iops=1449662136,bytes=10485760,useANO=0;lastPathIndex=1:

NumIOsPending=0,numBytesPending=0} Working Paths: vmhba34:C1:T0:L0, vmhba34:C2:T0:L0

=====

# esxcli nmp roundrobin setconfig -d naa.600174d001000000010f003048318438 --iops 10 --type iops

# esxcli nmp roundrobin setconfig -d naa.600174d001000000010f003048318438 --bytes 11 --type bytes


naa.600174d001000000010f003048318438




Path Selection Policy: VMW_PSP_RR Path Selection Policy Device Config: {policy=bytes,iops=10,bytes=11,useANO=0;lastPathIndex=1:



[root@esx-223 ~]# sync;sync;reboot

[root@esx-223 ~]# esxcli nmp device list -d naa.600174d001000000010f003048318438

naa.600174d001000000010f003048318438





Path Selection Policy Device Config: {policy=bytes,iops=1000,bytes=11,useANO=0;lastPathIndex=0:



Posted by: Jeff Byers | November 09, 2009 at 09:13 PM



Chad,

Thanks for both this and the previous multivendor ISCSI post. I found both very informative and focused on solutions for those

deploying ISCSI in their VMWare environments regardless of the storage vendor.

We are currently running ESX 3.5 in our data center with redundant 10gig nic connections (Neterion) with jumbo frames (I know it's

not officially supported but were not seeing any issues with it) and a LeftHand SAN. With the LeftHand SAN providing a different

target for each LUN and our using link aggregation, I'm seeing very consistent load balancing across both NICs in our current 3.5

deployment. Given that link aggregation will provide faster convergence in the event of a NIC/link failure are there other compelling

reasons for me to use ISCSI MPIO instead of my current setup as we migrate to vSphere.

Tim

Posted by: Tim Tyndall | November 12, 2009 at 10:08 PM

Thanks for a great post.

However, for the jumbo frame section, I would add a line about configuring vSwitch MTU. It seems that every article I could find

about multipathing, iSCSI and jumbo frames includes configuration for the nic and hba but not the vSwitch.

esxcfg-vswitch -m 9000

Posted by: John | November 19, 2009 at 11:53 PM

I'm struggling with setting the routing correct.

We use different VLAN for ESX console, iSCSI network and for storage network.

iSCSI connections with Netapp are succesfull and I see all my LUNS untill I add the iSCSI kernels to the software HBA adapter.

(esxcli –-server swiscsi nic add -n -d )

After using this command my LUN's dissapear after a rediscover?

I stil can ping my iSCSI kernels but the Netapp does not see any succesful connections. I think it has to do with setting up the routing

correctly. (third topic in this post) Where/how can I add the route?

Posted by: Remy Heuvink | December 03, 2009 at 01:13 AM

Note that iSCSI Multipathing is not currently supported with Distributed Virtual Switches, either the VMware offering or the Cisco

Nexus 1000V. Changes are underway to fix this and allow any virtual switch to be supported.

Any indication when this will be supported?

Posted by: Ian | January 05, 2010 at 10:23 AM

Superb post!

Can anyone comment on the guest side of the iSCSI vSphere story?

Should the guest continue to use their own iSCSI initiator (like Windows 2008 Server R2) for performance reasons?

Posted by: Arian van der Pijl | January 15, 2010 at 07:43 AM

We are set up with CX4 / ALUA / RR. Whenever we reboot a host all our LUNs trespass from SPA to SPB. Anyone else experiencingthis?

Posted by: gallopsu | January 21, 2010 at 11:55 AM



The solution is edit the rc.local file (which basically is a script that runs at the end of boot up) to set the limit IOPS limit on on all luns

Enter this command, the only variable you will need to change is the “naa.600” witch pertains to the identifier on array

for i in ̀ ls /vmfs/devices/disks/ | grep naa.600` ; do esxcli nmp roundrobin setconfig --type “iops” --iops=1--device $i; done

http://virtualy-anything.blogspot.com/2009/12/how-to-configure-vsphere-mpio-for-iscsi.html

Posted by: stuart | January 27, 2010 at 11:26 AM

Hello

please can i have instructions on how to add / edit the script you have mentioned. also with the (naa.600`) do i need the full identifier

for all LUNS or is that a catch all. thanks

Posted by: Robin | February 12, 2010 at 11:27 AM

Regarding using Round Robin MPIO in conjunction with Microsoft Cluster Services...

Can the boot volume of the clustered server reside on an MPIO RR-enabled LUN? Then, have the shared volume(s) reside on either

RDM without RR MPIO or use the Microsoft iSCSI initiator inside the guest?

Posted by: Ed Swindelles | March 09, 2010 at 10:08 AM

This is the perhaps the most important change in the vSphere iSCSI stack.

Posted by: generic viagra | March 10, 2010 at 02:51 PM

some major bug fixes to iscsi released in the past week, recommend people upgrade to esx4.0 build 244038 - specifically if you have

multiple vmknic's to your iscsi storage. go read vmware knowledgebase about it...

Posted by: Chimera | April 07, 2010 at 04:32 AM

Love this article. Refer to it frequently. :)

Just as an aside, Cisco introduced iSCSI multipath support for the Nexus 1000v DVS in their most recent release, with some

additional improvements (one of them addressing an issue with jumbo frames) coming hopefully later this month or early next.

Cheers, and thanks for the great blogs.

Posted by: Ryan | April 07, 2010 at 11:27 AM

The prescription posted by 'stuart' does no checking and attempts to apply RR to any and all devices even when it doesn't apply, or to

devices that are pseudo-bogus. This is what I use instead.

# 6090a058 is my Equallogic's prefix. You can use naa.60* and the 'getconfig' test will skip an entry if not suitable. I also end up with

'*:1' paths which can't be set.

# fix broken RR path selection

for i in ̀ ls /vmfs/devices/disks/naa.6090a058* | grep -v ':[0-9]$'`; doesxcli nmp roundrobin getconfig --device ${i##*/} 2>/dev/null && \

esxcli nmp roundrobin setconfig --type "iops" --iops=64 --device ${i##*/}

done

Posted by: matt | April 13, 2010 at 07:22 PM



I am very grateful for this post. If I could hug you I would.

Let me just say that I have ESXi 4 running with a 4 Port ISCSI SAN and was not seeing very good performance. My maximum speed

was about 120 MB/s no matter if I was using 2 or 4 GB ports. So I spoke to my SAN vendor and they sent me some information which

was kinda correct, but didn't increase performance.

Then I found this post. With some time, testing and patience I have gotten my speeds up to 250-300 MB/s on the 4 port GB cards

(with other hosts connected to the SAN in the background). And that is without jumbo frames enabled.

With JUMBO enabled I would expect to see a 10-15% increase in speeds.

This is entirely due to this post and your information. I thank you for your time and patience in posting this as it helped me

tremendously.

Steve

Posted by: Steve M | May 24, 2010 at 09:54 AM

@Steve M - you are giving me a virtual hug :-) I'm really glad that the post helped you. In the end, your comment was the best thing I

heard all day.

Happy to help!

Posted by: Chad Sakac | May 24, 2010 at 10:26 AM

We are facing a number of problems with configurating iSCSI on our EqualLogic and have a posting at:

http://communities.vmware.com/message/1529506#1529506. The conclusion is that there are some serious bugs with vmware that

has to be patched with: http://kb.vmware.com/kb/1019492A

Hopefully these problems will be resolved after we schedule a downtime for this patching.

Posted by: Bhwong | June 04, 2010 at 05:51 AM

Thank you very much for that outstanding post and the effort on investigations and documentation that will make many desperate

people like me help to understand the complexity of iSCSI connections between ESX(i) hosts and storage devices and to improve their

performance significantly.

I still haven't tried all these recommendations yet, but I will do that within the next days being very confident that they will solve the

majority of my performance problems.

While reading that post I was wondering that the harmonization between the iSCSI settings in /etc/vmware/vmkiscsid/iscsid.conf

and the corresponding settings on the storage array side was not mentioned at all in that post. I always believed that appropriate and well adjusted settings on both sides are the basis of a good performance. Was I completely wrong with my assumption?

Thanks in advance for your reply

Hans

Posted by: me.yahoo.com/a/3_LXlMpiluTuRZz4krjXe_r22w-- | June 08, 2010 at 09:28 PM

Thanks for these posts.

I am still confused about the port aggregation (Etherchannel) advice. I get that load balancing can only be done by the storage and I

understand why iSCSI boxes don't support aggregation, but what about connection redundancy for the esx host? Particularly with a

single portal storage solution - where the storage doesn't create a full mesh of paths, switch failues could make the storage

unreachable. As I understand it, I would need 6 pnics and 6 vmknics per esx host to get fully redundant connections. I can get that

using 2 pnics bonded togther into one Etherchannel group and 3 vmknics. So if bandwidth isn't an issue, why shouldn't I? There's

less overhead on the esx and fewer iSCSI connections - so performance could be better. What am I missing?



Posted by: Azriphale | November 15, 2010 at 08:06 PM

This is very important change and for the better

Posted by: IT Consultants | November 26, 2010 at 12:43 AM

Fantastic post; thorough and informative. Well done.

Posted by: Paul | December 12, 2010 at 03:37 PM

Verify your Comment

Previewing your Comment

Posted by: |

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:

Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post

another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents

automated programs from posting comments.

Having trouble reading this image? View an alternate.

Documents

A Multivendor Post on using