Upload
cameroon45
View
981
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
04/08/231 Copyrighted IBM 2003Copyrighted, International Business Machines Corporation, 2003
Server I/O NetworksServer I/O NetworksPast, Present, and FuturePast, Present, and Future
Renato RecioRenato RecioDistinguished EngineerDistinguished EngineerChief Architect, IBM eServer I/OChief Architect, IBM eServer I/O
04/08/232 Copyrighted IBM 2003
Legal Notices
All statements regarding future direction and intent for IBM, InfiniBandTM Trade Association, RDMA Consortium, or any other standard organization mentioned are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your IBM local Branch Office or IBM Authorized Reseller for the full text of a specific Statement of General Direction.
IBM may have patents or pending patent applications covering subject matter in this presentation. The furnishing of this presentation does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, 500 Columbus Avenue, Thornwood, NY 10594 USA.
The information contained in this presentation has not been submitted to any formal IBM test and is distributed as is. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. The use of this information or the implementation of any techniques described herein is a customer responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. Customers attempting to adapt these techniques to their own environments do so at their own risk.
The following terms are trademarks of International Business Machines Corporation in the United States and/or other countries: AIX, PowerPC, RS/6000, SP, S/390, AS/400, zSeries, iSeries, pSeries, xSeries, and Remote I/O.
UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company, Limited. Ethernet is a registered trademark of Xerox Corporation. TPC-C, TPC-D, and TPC-H are trademarks of the Transaction Processing Performance Council. InfinibandTM is a trademark of the InfinibandTM Trade Association. Other product or company names mentioned herein may be trademarks or registered trademarks of their respective companies or organizations.
04/08/233 Copyrighted IBM 2003
In other words…Regarding Industry Trends and Directions
IBM respects the copyright and trademark of other companies…andThese slides represent my views: Does not imply IBM views or directions. Does not imply the views or directions of InfiniBandSM Trade Association,
RDMA Consortium, PCI-SIG, or any other standard group.
These slides simply represent my view.
04/08/234 Copyrighted IBM 2003
AgendaServer I/O
Network types Requirements, Contenders.
Server I/O I/O Attachment and I/O Networks
PCI family and InfiniBand Network stack offload
Hardware, OS, and Application considerations Local Area Networks Cluster Area Networks
InfiniBand and Ethernet Storage Area Networks
FC and EthernetSummary
04/08/235 Copyrighted IBM 2003
Purpose of Server I/O Networks
uP, $uP, $
uP, $uP, $
I/O ExpansionNetwork
VirtualAdapter
Bridge
. . . .VirtualAdapter
VirtualAdapter
uP, $uP, $
uP, $uP, $
MemoryMemoryController
Memory MemoryController
I/O ExpansionNetwork
VirtualAdapterVirtual
AdapterVirtualAdapter
Switch
Storage Area Network
Cluster Network
I/O I/O
I/O I/O
Local Area Network
VirtualAdapter
VirtualAdapter
I/OI/O
I/O Attachment
I/O Attachment
Server I/O networks are used to connect devices and other servers.
04/08/236 Copyrighted IBM 2003
Server I/O Network Requirements In the past, servers have placed the following requirements on I/O
networks: Standardization, so many different vendors' products can be connected; Performance (scalable throughput and bandwidth; and low latency/overhead); High availability, so connectivity is maintained despite failures; Continuous operations, so changes can occur without disrupting availability; Connectivity, so many units can be connected; Distance, both to support scaling and to enable disaster recovery; and Low total cost, interacts strongly with standardization through volumes; also
depends on amount of infrastructure build-up required.
More recently, servers have added the following requirements: Virtualization of host, fabric and devices; Service differentiation (including QoS), to manage fabric utilization peaks; and Adequate security, particularly in multi-host (farm or cluster) situations.
04/08/237 Copyrighted IBM 2003
Server I/O Network HistoryHistorically, no single technology satisfied all the above requirements,
So many types of fabrics proliferated:Local Area NetworksCluster Networks (a.k.a. HPCN, CAN)Storage Area NetworksI/O Expansion Networks, etc…
and many link solutions proliferated:Standard:
FC for SAN, Ethernet for LAN
Proprietary: a handful of IEONs (IBM’s RIO, HP’s remote I/O, SGI’s XIO, etc…) a handful of CANs (IBM’s Colony, Myricom’s Myrinet, Quadrics, etc…)
Consolidation solutions are now emerging, but the winner is uncertain. PCI family: IOA and IOEN Ethernet: LAN, SAN, CAN InfiniBand: CAN, IOEN, and possibly higher-end IOA/SAN.
04/08/238 Copyrighted IBM 2003
Proprietary fabrics(e.g. IBM channels, IBM RIO,
IBM-STI, IBM-Colony, SGI-XIO, Tandem/Compaq/HP-ServerNet)
Rattner pitch(2/98)
FIO goes public (2/99)
FIO and NGIO
merge into IB (9/99)
NGIO goes public (11/98)
NGIO Spec
available (7/99)
FIO spec available
(9/99)
InfiniBand Spec Releases Verb1.0
(10/00)1.0.a(6/01)
PCIPCI-X
1.0 spec available
(9/99)
PCI-X 2.0 announced
(2/02)
3GIOdescribed
at IDF (11/01)
PCI-Express 1.0 Spec
(7/02)
PCI-X 2.0 DDR/QDR
Spec(7/02)
1.1(11/2002)
AS1.0 Spec(2003)
Recent Server I/O NetworkEvolution Timeline
?
RDMA over IP begins (6/00)
53rd IETF ROI BOF calls for
IETF ROI WG
(12/01)
RDMAC announced
(5/02)
54th IETF RDDPRDDP WG Chartered
(3/02)
Ext.(12/03)
RDMA, DDP, MPA
1.0 specs(10/02)
Verbs, SDP, iSER,
…1.0 specs
(4/03)
04/08/239 Copyrighted IBM 2003
PCIPCI standard’s strategy is:
Add evolutionary technology enhancements to the standard,that maintain the existing PCI eco-system.
Within the standard, two contenders are vying for IOA market share: PCI-X
1.0 is shipping now,2.0 is next and targets 10 Gig Networking generation.
PCI-ExpressMaintains existing PCI software/firmware programming model,
adds new: protocol layers, physical layer, and associated connectors.Can also be used as an IOEN, but does not satisfy all enterprise class requirements
Enterprise class RAS is optional (e.g. multipathing) Fabric Virtualization is missing, More efficient I/O communication model, …
Will likely be extended to support: Faster speed link, Mandatory enterprise class RAS.
04/08/2310 Copyrighted IBM 2003
I/O Attachment ComparisonPCI-X (1.0, 2.0) PCI-Express (1.0, 2.0)
Performance
Effective link widthsEffective link frequencyBandwidth range
Parallel 32 bit, 64 bit33, 66, 100, 133, 266, 533 MHz132 MB/s to 4.17 GB/s
Serial 1x, 4x, 8x, 16x2.5 GHz -> 5 or 6.25 GHz250 MB/s to 4 GB/s
ConnectivityDistance
Multi-drop bus or point-pointChip-chip, card-card connector
Memory mapped switched fabricChip-chip, card-card connector, cable
Self-management
Unscheduled outage protection
Schedule outage protectionService level agreement
Interface checks, Parity, ECCNo redundant pathsHot-plug and dynamic discoveryN/A
Interface checks, CRCNo redundant pathsHot-plug and dynamic discoveryTraffic classes, virtual channels
Virtualization
Host virtualizationNetwork virtualizationI/O virtualization
Performed by hostNoneNo standard mechanism
Performed by hostNoneNo standard mechanism
Cost
Infrastructure build upFabric consolidation potential
Delta to existing PCI chipsNone
New chip core (macro)IOEN and I/O Attachment
04/08/2311 Copyrighted IBM 2003
InfiniBand IB’s strategy:
Provide a new, very efficient, I/O communication model, that satisfies enterprise server requirements, andcan be used for I/O, cluster, and storage.
IB’s modelEnables middleware to communicate across a low latency, high bandwidth fabric,
through messages queues, that can be accessed directly out of user space.But… required a completely new infrastructure,
(management, software, endpoint hardware, fabric switches, and links).
I/O adapter industry viewed IB’s model as too complex.Sooo… I/O adapter vendors are staying on PCI,IB may be used to attach high-end I/O to enterprise class servers.
Given current I/O attachment reality, enterprise class vendors will likely:Continue extending their proprietary fabric(s), orTunnel PCI traffic through IB, and provide IB-PCI bridges.
04/08/2312 Copyrighted IBM 2003
I/O Expansion Network Comparison
PCI-Express IB
Performance
Link widthsLink frequencyBandwidth rangeLatency
Serial 1x, 4x, 8x, 16x2.5 GHz250 MB/s to 4 GB/sPIO based synchronous operations (network traversal for PIO Reads)
Serial 1x, 4x, 12x2.5 GHz250 MB/s to 3 GB/sNative: Message based asynchronous operations (Send and RDMA)Tunnel: PIO based sync. operations
ConnectivityDistanceTopology
Memory mapped switched fabricChip-chip, card-card connector, cableSingle host, root Tree
Identifier based switched fabricChip-chip, card-card connector, cableMulti-host, general
Self-management
Unscheduled outage protection
Schedule outage protectionService level agreement
Interface checks, CRCNo native memory access controlsNo redundant paths
Hot-plug and dynamic discovery
Traffic classes, virtual channels
Interface checks, CRCMemory access controlsRedundant paths
Hot-plug and dynamic discovery
Service levels, virtual channels
04/08/2313 Copyrighted IBM 2003
I/O Expansion Network Comparison… Continued
PCI-Express IB
Cost
Infrastructure build upFabric consolidation potential
New chip core (macro)IOEN and I/O Attachment
New infrastructureIOEN, CAN, high-end I/O Attachment
Virtualization
Host virtualizationNetwork virtualizationI/O virtualization
Performed by hostNoneNo standard mechanism
Standard mechanisms availableEnd-point partitioningStandard mechanisms available
Next steps
Higher frequency linksAdvanced functions
5 or 6.25 GHz (work in process)Mandatory interface checks, CRC
5 or 6.25 GHz (work in process)Verb enhancements
04/08/2314 Copyrighted IBM 2003
Server Scale-up Topology Options
PCI-X BridgeSwitch
Mem
ory
uP, $uP, $uP, $uP, $
MemoryController
PCI-ExpressBridge
Ad
apte
r
Ad
apte
r
Ad
apte
r
Ad
apte
r Ad
apter
Ad
apter
Ad
apter
Ad
apter
PCI-X Bridge
Mem
ory
uP, $uP, $uP, $uP, $
MemoryController
Mem
ory
uP, $uP, $uP, $uP, $
MemoryController
Switch
Key PCI-Express IOEN value proposition Bandwidth scaling Short-distance remote I/O Proprietary based virtualization QoS (8 traffic classes, virtual channels) Low infrastructure build-up
Evolutionary compatibility with PCI
Key IB IOEN value proposition Bandwidth scaling Long distance remote I/O Native, standard based virtualization Multipathing for performance and HA QoS (16 service levels, virtual lanes) CAN and IOEN convergence
PCI-Express IOEN IB or Proprietary IOEN
PCI-Express: SMP only
PCI tunneling
For large SMPs, a memory fabric must be used to access I/O that is not local to a SMP sub-node.
Proprietaryor IB
SMP sub-node SMP sub-nodeSMP sub-node
04/08/2315 Copyrighted IBM 2003
Server IOA Outlook Server I/O Attachment
Next steps in PCI family roadmap: 2003-05: PCI-X 2.0 DDR and QDR. 2005: PCI-Express
Key drivers for PCI-Express are: AGP replacement on clients (16x) CPU chipset on clients and servers (8 or 16x)
IB as an IOA Complexity and eco-system issues will limit IB
to a small portion of high-end IOA.
Server I/O Expansion Options for scale-up servers:
Migrate to IB and tunnel PCI I/O through it. Continue upgrading proprietary IOEN. Migrate to PCI-Express.
SHV servers will likely pursue PCI-Express: Satisfies low-end requirements,
but not all enterprise class requirements.
I/O Attachment (GB/s)
I/O Expansion Networks (GB/s)
.01
.1
1
10
100
1994 1999 2004 2009
MCAPCI/PCI-XPCI-Exp.
.01
.1
1
10
100
1994 1999 2004 2009
PCI-E (8/16x) IB (12x)ServerNet SGI XIOIBM RIO/STI HP
04/08/2316 Copyrighted IBM 2003
Problems with Sockets over TCP/IP
Network intensive applications consume a large percent of the CPU cycles: Small 1 KB transfers spend 40% of the time in TCP/IP, and 18% in copy/buffer mgt Large 64 KB transfers spend 25% of the time in TCP/IP, and 49% in copy/buffer mgt
Network stack processing consumes a significant amount of the available server memory bandwidth (3x the link rate on receives).
Copy/Data Mgt
TCPIP
NICInterrupt ProcessingSocket Library
Available forApplication
* Note:*1 KB Based on Erich Nahum’s Tuxedo on Linux, 1 KB files, 512 clients run, but adds .5 instructions per byte for copy.*64 KB Based on Erich Nahum’s Tuxedo on Linux, 64 KB files, 512 clients run, but adds .5 instructions per byte for copy.
Receive server memory to link bandwidth ratio% CPU utilization*
0
20
40
60
80
100
No Offload,1 KB
No Offload,64 KB
0
1
2
3
Standard NIC
04/08/2317 Copyrighted IBM 2003
Network Offload – Basic MechanismsSuccessful network stack offload requires five basic mechanisms:
Direct user space access to a send/receive Queue Pair (QP) on the offload adapter.Allows middleware to directly send/receive data through the adapter.
Registration of virtual to physical address translations with the offload adapter.Allows the hardware adapter to directly access user space memory.
Access controls between registered memory resources and work queues.Allows privileged code to associate adapter resources (memory registrations, QPs, and
Shared Receive Queues) to a combination of: OS image, process, and, if desired, thread.
Remote direct data placement (a.k.a. Remote Direct Memory Access - RDMA).Allows adapter to directly place incoming data into a user space buffer.
Efficient implementation of the offloaded network stack.Otherwise offload may not yield desired performance benefits.
04/08/2318 Copyrighted IBM 2003
Network Stack Offload – InfiniBandHost Channel Adapter Overview
Verb consumer – Software that uses HCA to communicate to other nodes.
Communication is thru verbs, that: Manage connection state. Manage memory and queue access. Submit work to HCA. Retrieve work and events from HCA.
Channel Interface (CI) performs work on behalf of the consumer.CI consists of: Driver – Performs privileged functions. Library – Performs user space functions. HCA – hardware adapter.
SQ – Send Queue RQ – Receive Queue SRQ – Shared RQ
QP – Queue Pair QP = SQ + RQ
CQ – Completion Queue
HCA Driver/Library
Verb consumer
Verbs
HCA
Data Engine Layer
QPContext(QPC)
IB Transport, IB Network,…
CI
CQRQSQ AE
Memory Translation
and Protection
Table(TPT)
SRQ
04/08/2319 Copyrighted IBM 2003
Network Stack Offload – iONICs iONIC
An internet Offload Network Interface Controller (iONIC).
Supports one or more internet protocol suite offload services.
RDMA enabled NIC (RNIC) An iONIC that supports the RDMA
Service. IP suite offload services, include,
but are not limited to: TCP/IP Offload Engine (TOE) Service Remote Direct Memory Access (RDMA)
Service iSCSI Service iSCSI Extensions for RDMA (iSER)
Service IPSec Service
SocketsApplication
Transport
Network
Sockets over Ethernet Link
Service
NICMgt
Host
Sockets over TOE Service
Sockets overRDMA Service
TOE Drv
TOE ServiceLibrary
iONIC
TCPIP
Ethernet
RDMA/DDP/MPA
NICDvr
RNIC Drv
RDMA ServiceLibrary
Only the Ethernet Link, TOE, and RDMA Services are shown.
04/08/2320 Copyrighted IBM 2003
RNIC Driver/Library
Verb consumer
Verbs
Network Stack Offload – iONICRDMA Service Overview
iONIC RDMA Service
Data Engine Layer
QPContext(QPC)
RDMA/DDP/MPA/TCP/IP …
RI
CQRQSQ AE
Memory Translation
and Protection
Table(TPT)
SRQ
Verb consumer – Software that uses RDMA Service to communicate to other nodes.
Communication is thru verbs, that: Manage connection state. Manage memory and queue access. Submit work to iONIC. Retrieve work and events from iONIC.
RDMA Service Interface (RI) performs work on behalf of the consumer.RI consists of: Driver – Performs privileged functions. Library – Performs user space functions. RNIC – hardware adapter.
SQ – Send Queue RQ – Receive Queue SRQ – Shared RQ
QP – Queue Pair QP = SQ + RQ
CQ – Completion Queue
04/08/2321 Copyrighted IBM 2003
Network I/O Transaction Efficiency
Graph shows a complete transaction: Send and Receive for TOE Combined Send+RDMA Write and Receive for RDMA
Send and Receive pair
.01
.1
1
10
100
1000
10000
100000
1 10 100 1000 10000 100000Transfer size in bytes
CP
U I
nst
ruct
ion
s/B
yte
ELSTOE 1TOE 2SDPRDMA
04/08/2322 Copyrighted IBM 2003
Network Offload BenefitsMiddleware View
Multi-tier Server Environment
Benefit of network stack offload (IB or iONIC) depends on the ratio of:Application/Middleware (App) instructions :: network stack instructions.
ClientTier
BrowserUser
Web
Ser
ver
Presentation Server
PresentationData
DB
Clie
nt &
Rep
licat
ion
Web Application
Server
ApplicationData
Business Function
Server: OLTP & BI DB; HPC
BusinessData
NC not useful at present due to XML & Java overheads
Sockets-level NC support beneficial- (5 to 6% performance gain for communication between App tier and business function tier)- (0 to 90% performance gain for communication between browser and web server)
Low-level (uDAPL, ICSC) support most beneficial- (4 to 50% performance gain for business function tier)
iSCSI, DAFS support beneficial- (5 to 50% gain for NFS/RDMA compared to NFS performance)
Lege
nd
Note: All tiers are logical; they can potentially run on the same server OS instance(s).Traditionally use Cluster Network.
04/08/2323 Copyrighted IBM 2003
TCP/IP/Ethernet Are King of LANsEthernet is standard and widely deployed as a LAN.
Long distance links (from card-card to 40 Km). High availability through session, adapter, or port level switchover. Dynamic congestion management when combined with IP Transports. Scalable security levels. Sufficient performance for LAN.
Good enough performance for many clusters, and high performance (when combined with TCP Offload)
Strategy is to extend Ethernet’s role in Wide Area Networks, Cluster, and Storage, through a combination of: Faster link speed (10 Gb/s) at competitive costs
No additional cost for copper (XAUI).$150 to 2000 transceiver cost for fiber.
internet Offload Network Interface Controllers (iONICs)Multiple services
Lower latency switchesSub-microsecond latencies for data-center (cluster and storage) networks.
04/08/2324 Copyrighted IBM 2003
Market Value of Ethernet250 Million Ethernet ports installed to date
Cu
mu
lati
ve S
hip
men
ts
Ser
ver
Eth
ern
et N
IC P
rice
s$
10 Mb/s
100 Mb/s
1 Gb/s
IB 4x
10 Gb/s Cu
10 Gb/s Fiber
10 Gb/s Fiber iONIC
10 Gb/s Cu iONIC
1 Gb/siONIC
10
100
1,000
10,000
100,000
1,000,000
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Po
rts
(000
s)
Sw 10Sw 100Sw 1000Sw 10000
1
10
100
1000
10000
Jan-96 Jan-99 Jan-02 Jan-05
04/08/2325 Copyrighted IBM 2003
LAN Switch TrendsTraditional LAN switch IHV
business model has been to pursue higher level protocols and functions; with less attention to latency.
iONICs and 10 GigE are expected to increase the role Ethernet will play in Cluster and Storage Networks. iONICs and 10 GigE provides an
additional business model for switch vendors, that is focused on satisfying the performance needs of Cluster and Storage Networks.
Some Ethernet switch vendors (e.g. Nishan) are pursuing this new model.
General purpose switch
Data Center (eg iSCSI) focused switch
1997 20-100 us range
2002 10-20 us range <2 us range
2006 3-5 us range <1 us range
Switch latencies
Nishan switch:- IBM 0.18um ASIC- 25 million transistors- 15mm x 15mm die size- 928 signal pins- Less than 2us latency
04/08/2326 Copyrighted IBM 2003
LAN Outlook
Ethernet vendors are gearing up for storage and higher performance cluster networks: 10 GigE to provide higher bandwidths; iONIC to solve CPU and memory
overhead problems; and lower latency switches to satisfy end-
end process latency requirements.
Given above comes to pass,How well will Ethernet play in Cluster market? Storage market?
2x every 16 months(over the past 10 years)
Local Area Networks
.001
.01
.1
1
10
100
1974 1984 1994 2004
GB
/s
EthernetToken RingATMFDDI
04/08/2327 Copyrighted IBM 2003
Cluster Network ContendersProprietary networks
Strategy:Provide advantage over standard networks.
Lower latency/overhead, higher link performance, and advanced functions Eco-system completely supplied by one vendor.
Two approaches:Multi-server – can be used on server from more than one vendor.Single-server – only available on server from one vendor.
Standard networks IB
Strategy is to provide almost all the advantages available on proprietary networks;thereby, eventually, displacing proprietary fabrics.
10 Gb/s Ethernet internet Offload NIC (iONIC)Strategy is to increase Ethernet’s share of the Cluster network pie,
by providing lower CPU/memory overhead and advanced functions,though at a lower bandwidth than IB and proprietary networks.
Note: PCI-Express doesn’t play, because it is missing many functions.
04/08/2328 Copyrighted IBM 2003
0%
20%
40%
60%
80%
100%
Standards BasedMulti PlatformSingle Platform
June 2002 November 2002June 2000
Cluster Interconnect TechnologyTop 500 Supercomputers
Top100
Next100
Last100
Top100
Next100
Last100
Top100
Next100
Last100
Cluster Network Usage in HPC
Use of proprietary cluster networks for high-end clusters will continue to decline. Multi-platform cluster networks have already begun to gain significant share. Standards-based cluster networks will become the dominant form.
* Source: Top 500 study by Tom Heller.
*
04/08/2329 Copyrighted IBM 2003
Reduction in Process-Process Latencies256 B and 8 KB Block
0.1
1
10
100
1000
GbEnet 10GbE 10GbE opt IB-12x
Link Time, 256BSwitch, 5 HopsNetwork Stack+ Adapter Driver/Library
LAN Process-Process Latencies
0
2
4
6
8
GbEnet 10GbE 10GbE opt IB-12x
Normalized LAN Process-Process
Latencies
1 GigE 100 MFLOP=19us10 GigE 100 MFLOP= 6usIB 100 MFLOP= 6us
1
10
100
1000
GbEnet 10GbE 10GbE opt IB-12x
Link Time, 8 KB
0
2
4
6
8
GbEnet 10GbE 10GbE opt IB-12x
1.2x lower
4.6x lower
8.4x lower
2.5x lower
3.0x lower
3.9x lower
04/08/2330 Copyrighted IBM 2003
HPC Cluster Network Outlook
Proprietary fabrics will be displaced by Ethernet and IB.
Server’s with the most stringent performance requirements will use IB.
Cluster Networks will continue to be predominately Ethernet. iONIC and low-latency switches will
increase Ethernet’s participation in the cluster network market.
High PerformanceStandard Links (IB/Ethernet)
.0001
.001
.01
.1
1
10
100
1985 1989 1994 1999 2004 2009
Ban
dwid
th (
GB
/s)
Ethernet Token Ring ATMFDDI FCS IBHiPPI ServerNet Mem. ChannelSGI-GIGAchnl. IBM SP/RIO/STI Synfinity
04/08/2331 Copyrighted IBM 2003
SAN
Current SAN and NAS Positioning Overview
Current SAN differentiators Block level I/O access High performance I/O
Low latency, high bandwidth Vendor unique fabric mgt protocols
Learning curve for IT folks Homogeneous access to I/O.
Current NAS differentiators File level I/O access LAN level performance
High latency, lower bandwidth Standard fabric mgt protocols
Low/zero learning curve for IT folks Heterogeneous platform access to files
Commonalities Robust remote recovery and storage management requires special tools for both. Each can optimize disk access, though SAN does require virtualization to do it.
Contenders: SAN: FC and Ethernet.
LAN
NFS,HTTP,etc…
LUN LBA
NAS: Ethernet.
04/08/2332 Copyrighted IBM 2003
Storage Models for IP
Parallel SCSI and FC have very efficient path through O/S Existing driver to hardware interface has been tuned for many years. An efficient driver-HW interface model has been a key iSCSI adoption issue.
Next steps in iSCSI development: Offload TCP/IP processing to the host bus adapter, Provide switches that satisfy SAN latencies requirements, Improve read and write processing overhead at the initiator and target.
CPU
SCSI or FC
FS APIApplication
Stg. Adapter
FS/LVMStg Driver
Parallel SCSI or FC
CPUApplication
iSCSI Service in iONIC
Adapter Drv
iSCSI HBA
iSCSITCP/IPEthernet
CPU
FS APIApplication
iSCSI Service in host
NIC
FS/LVM
P. OffloadEthernet
iSCSITCP/IP
NIC Driver
FS APIFS/LVM
Stg Driver
.01
.1
1
10
100
1000
10000
100000
1 10 100 1000 10000 100000Transfer size in bytes
CP
U I
nst
ruct
ion
s/B
yte
Parallel SCSI
iSCSI Service in host
iSCSI Service in iONIC
04/08/2333 Copyrighted IBM 2003
Storage Models for IP
RDMA will significantly improve NAS server performance. Host network stack processing will be offloaded to an iONIC.
Removes tcp/ip processing from host path. Allows zero copy.
NAS (NFS with RDMA Extensions) protocols will exploit RDMA. RDMA allows a file level access device to approach
block level access device performance levels. Creating a performance discontinuity for storage.
NFS over ELS NIC
NFS Extensions for RDMA over
RDMA Service in iONIC
CPU
NFS APIApplication
NIC
NFS
P. OffloadEthernet
TCP/IPNIC Driver
CPUApplication
RNIC
RDMA/DDPMPA/TCPIP/Ethernet
NIC Driver
NFS APINFS
.01
.1
1
10
100
1000
10000
100000
1 10 100 1000 10000 100000Transfer size in bytes
CP
U I
nstr
ucti
ons/
Byt
e
NFS over ELS NICNFS over RNIC
Parallel SCSI
04/08/2334 Copyrighted IBM 2003
Storage I/O Network Outlook
Storage network outlook: Link bandwidth trends will continue.
Paced by optic technology enhancements. Adapter throughput trend will continue
Paced by higher frequency circuits, higher performance microprocessors, and larger fast-write and read cache memory.
SANs will gradually transition from FC to IP/Ethernet networks.
Motivated by TCO/complexity reduction.Paced by availability of:
iSCSI with efficient TOE (possibly RNIC) Lower latency switches
NAS will be more competitive against SAN.Paced by RNIC availability.
* Sources: Product literature from 14 companies. Typically use a workload that is 100% read of 512 byte data; not a good measure of overall sustained performance, but it is a good measure of adapter/controller front-end throughput capability.
Single Adapter/Controller Throughput
SAN
.001
.01
.1
1
10
1990 1995 2000 2005
GB
/s
SCSIFCDisk HeadiSCSI/E
.1
1
10
100
1000
1994 1998 2003 2008
K I
OP
S
04/08/2335 Copyrighted IBM 2003
Summary I/O server adapters will likely attach through PCI family,
because of PCI’s low cost and simplicity of implementation. I/O expansion networks will likely use
Proprietary or IB (with PCI tunneling) links that satisfy enterprise requirements, and
PCI-Express on Standard High Volume, Low-End servers.Clusters networks will likely use
Ethernet networks for the high-volume portion of market, and InfiniBand when performance (latency, bandwidth, throughput) is required.
Storage area networks will likely Continue using Fibre Channel, but gradually migrate to iSCSI over Ethernet.
LANs Ethernet is King.Several network stack offload design approaches will be attempted
From all firmware on slow microprocessors, to heavy state machine usage, to all points in between.
After weed design approaches are rooted out of the market,iONICs will eventually become a prevalent feature on low to high-end servers.