Upload
phungkien
View
225
Download
1
Embed Size (px)
Citation preview
5/27/2002 Dragon Slayer Consulting 1
Dragon Slayer ConsultingDragon Slayer ConsultingDragon Slayer Consulting
Introduction to the Value Proposition of InfiniBand
Marc Staimer Marc Staimer –– [email protected]@earthlink.net(503) 579(503) 579--37633763
5/27/2002 Dragon Slayer Consulting 2
Introduction to InfiniBand (IB) Agenda
IB definedIB definedIB vs. FC & GbEIB vs. FC & GbEIB architectureIB architectureReal market problems IB solvesReal market problems IB solvesMarket projectionsMarket projectionsConclusionsConclusions
5/27/2002 Dragon Slayer Consulting 3
Definition of Input/Output
““The transfer of data into and out of a computer”The transfer of data into and out of a computer”Maintain data integrityMaintain data integrityProtect all other data in the computer from corruption Protect all other data in the computer from corruption Through the use of Operating System defined mechanisms Through the use of Operating System defined mechanisms
UsuallyUsually
5/27/2002 Dragon Slayer Consulting 4
Three (3) Distinct Classes of I/O
Block protocol Block protocol Typically disk orientedTypically disk oriented
Network protocol Network protocol Typically IP orientedTypically IP oriented
InterInter--Process Communication Process Communication IPCIPC
5/27/2002 Dragon Slayer Consulting 5
Characteristics I/O Classes
Block Protocol Network Protocol IPCLatency Tolerance
Dozens of milliseconds 100s of Milliseconds Dozens of Microseconds
Avg Message Size
Very large Small to large Small to large
Context Data center/campus FC Global Server cluster/data
centerPredominate Protocol
Fibre Channel Protocol (FCP) Ethernet / TCP/IP Emerging - VI
5/27/2002 Dragon Slayer Consulting 6
The 1st unified, simplified, & consolidated I/O FabricThe 1st unified, simplified, & consolidated I/O FabricDesigned from the ground up for all aspects of I/ODesigned from the ground up for all aspects of I/OShared memory vs. shared busShared memory vs. shared busLeverages virtual lanes or pipes Leverages virtual lanes or pipes
(multiple fabrics in one)(multiple fabrics in one)Spec’d for today & tomorrowSpec’d for today & tomorrow
1x = 2.5Gbps1x = 2.5Gbps4x = 10Gbps4x = 10Gbps12x = 30Gbps12x = 30Gbps
Native VI protocolNative VI protocolOS bypassOS bypass
Credit based flow controlCredit based flow controlKey: extends server I/O Key: extends server I/O
Outside the boxOutside the box
InfiniBand (VI Protocol)
Virtual Lanes
IB Defined
FC GbE SASSATAUltraSCSI
5/27/2002 Dragon Slayer Consulting 7
Why Do We Need Yet Another Fabric?
The issue is not the fabric, the issue is server I/OThe issue is not the fabric, the issue is server I/OCurrent GbE & FC fabrics do not solve server I/O bottlenecksCurrent GbE & FC fabrics do not solve server I/O bottlenecks
Bus contentionBus contentionGbE & FC fabrics weren’t specifically designed for clusteringGbE & FC fabrics weren’t specifically designed for clustering
They can do it…ANDThey can do it…ANDMessage queue depths and performance not optimalMessage queue depths and performance not optimalPerformance is often inadequatePerformance is often inadequate
5/27/2002 Dragon Slayer Consulting 8
IB vs. FC vs. GbE Conclusion
Initially complimentary Initially complimentary –– IB will not replace FC or GbEIB will not replace FC or GbEInvestment protectionInvestment protection
Eventually competitive and complimentaryEventually competitive and complimentaryThey will compete for some of the same budget dollarsThey will compete for some of the same budget dollars
5/27/2002 Dragon Slayer Consulting 9
IB Architecture
SysSysMemMemCPUCPU
CPUCPU
Syste
m Bu
sSy
stem
Bus MemMem
CntlrCntlr
MgtMgtServicesServices
Target Channel AdapterInterface to I/O Controller
FC, GbE, SCSI, etc.
Host Channel AdapterProtocol Engine
Moves data via messages queued in memory
SwitchInternal or External
Simple, low cost Multi-stage
HCAHCA
Controller Controller RouterRouter
Link
LinkMultiMultiStageStageSwitchSwitch
Link TCATCA I/OI/OCntlrCntlr
Link
TCATCA
I/OI/OCntlrCntlr
Interconnect ControllerControllerRouterRouter Link
RoutersConnects subnets together
5/27/2002 Dragon Slayer Consulting 10
Subnet BSubnet A
EndNode
EndNode
EndNode
EndNode
EndNode
EndNode
EndNode
EndNode
EndNode
IB Fabric BW Increases as Switches are Added
routerrouter
routerrouterEndNode
EndNode
EndNode
EndNode
EndNode
EndNode
EndNode
EndNode
EndNode
SwitchSwitch
5/27/2002 Dragon Slayer Consulting 11
I/O Architecture Today
Traditional Server & Infrastructure w/dedicated I/OTraditional Server & Infrastructure w/dedicated I/O
CPU
Traditional Server Architecture
CPUStorageStorage
iSCSI? DAFS?
HBA FCFCSANSAN
NIC
EthernetEthernetLANLAN
RouterRouter
NIC
HBA
IPCIPCIPC
NetworkNetwork
IPC
Syste
m Bu
s
MemoryController
I/OBridge
Loca
l Bus
(PCI
)
Complex and expensive
Memory
5/27/2002 Dragon Slayer Consulting 12
InfiniBand Based I/O
InfiniBand Server Hardware Architecture
Multiple IBA links • 2.5 Gbps• 10 Gbps
Solve redundancy problem once
CPU
CPU
Syste
m Bu
s
MemoryController
Memory
Host Channel Adapter
5/27/2002 Dragon Slayer Consulting 13
InfiniBand Based I/O
InfiniBand Server Hardware Architecture
CPU
CPU
Syste
m Bu
s
MemoryController
Memory
Host Channel Adapter(HCA)
IBSwitch
RDMA based protocols
InfiniBand I/O Unit Hardware Architecture
Target Channel Adapter(TCA)
iSCSI I/O Controller
Fibre Channel I/O Controller
Ethernet I/O Controller
UltraSCSII/O Controller
5/27/2002 Dragon Slayer Consulting 14
Market Problems IB Solves
Higher performance lower cost I/O (Shared I/O)Higher performance lower cost I/O (Shared I/O)Converges clustering, networking, & storage into one fabricConverges clustering, networking, & storage into one fabric
The IAN (I/O Area Fabric)The IAN (I/O Area Fabric)Reduces:Reduces:
IT management tasksIT management tasksServer workloadsServer workloadsTCOTCO
PCI Bus I/O constraintsPCI Bus I/O constraintsLow cost HP/HA server clusteringLow cost HP/HA server clustering
Lowers the cost of server blade systemsLowers the cost of server blade systemsEnables higher density server blade clustersEnables higher density server blade clusters
5/27/2002 Dragon Slayer Consulting 15
IB Fabric
Higher Performance Lower Cost I/O (Shared I/O)
ModemRemote Monitoring
MgtMgtLANLAN
New IBServer
Clusters
I/O UnitI/O Unit
IB Storage
FCFCSANSAN
EthernetEthernetLAN/WANLAN/WAN
EthernetEthernetSANSAN
IB IB ⇒⇒ iSCSIiSCSI
IB IB ⇒⇒ IBIB
IB IB ⇒⇒ EE--netnet
IB IB ⇒⇒ FCFC
TCA
TCA
5/27/2002 Dragon Slayer Consulting 16
EthernetEthernetLAN/WANLAN/WAN
MaintenanceMaintenanceLANLAN
FCFCSANSAN
Current High Availability I/O Configuration16 Rack mount servers with dedicated I/O per server16 Rack mount servers with dedicated I/O per server
= 210 connections= 210 connections(2) HBA FC paths/server to FC fabric(2) HBA FC paths/server to FC fabric(4) FC paths to storage to FC fabric(4) FC paths to storage to FC fabric(2) Ethernet paths/server to network(2) Ethernet paths/server to network(1) Ethernet maint path/server to network(1) Ethernet maint path/server to network
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
5/27/2002 Dragon Slayer Consulting 17
EthernetEthernetLAN/WANLAN/WAN
MaintenanceMaintenanceLANLAN
FCFCSANSAN
Non-Productive Costly Connectivity
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
Non-productive connectivity
Productive connectivity
5/27/2002 Dragon Slayer Consulting 18
InfiniBand Shared I/O Chassis Example19” rack mount environment19” rack mount environment3U high3U highIBA single high single wide stdIBA single high single wide stdIntegrated IBA fabricIntegrated IBA fabricUp to 45 watts / linecard slotUp to 45 watts / linecard slotHot swappable componentsHot swappable componentsChassis Management Entity Chassis Management Entity (CME)(CME)
Fabric Card InfiniBand ports
Fabric Card InfiniBand ports
Line CardsUp to 8 - GE/FC/IB ports
3U
Front-to-Back cooling
5/27/2002 Dragon Slayer Consulting 19
EthernetEthernetLAN/WANLAN/WAN
MaintenanceMaintenanceLANLAN
FCFCSANSAN
IB Enabled High Availability Shared I/OAdd dual redundant IB I/O ChassisAdd dual redundant IB I/O Chassis
10 slots each10 slots eachIB form factor I/O cardsIB form factor I/O cardsMultiMulti--protocolprotocol
FC, GigE, FastE, iSCSI, etc.FC, GigE, FastE, iSCSI, etc.Eliminate FC edge switchesEliminate FC edge switches
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
Eliminate FC Edge Switches
5/27/2002 Dragon Slayer Consulting 20
EthernetEthernetLAN/WANLAN/WAN
MaintenanceMaintenanceLANLAN
FCFCSANSAN
IB Enabled High Availability Shared I/OReduces LAN switch requirementsReduces LAN switch requirements
Total Connections = 116 = ~ 45% reductionTotal Connections = 116 = ~ 45% reduction(2) IB paths/server (2) IB paths/server -- IB fabricIB fabric(6) FC paths to storage (6) FC paths to storage -- FC fabricFC fabric(2) Ethernet paths/I/O subsystem (2) Ethernet paths/I/O subsystem –– networknetwork(2) E(2) E--net maint path/I/O subsystem net maint path/I/O subsystem -- networknetwork
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
5/27/2002 Dragon Slayer Consulting 21
Potential Savings
Current dedicated I/O subsystem/serverCurrent dedicated I/O subsystem/serverCosts = ~ $225,000Costs = ~ $225,000
IB shared I/O System with IB shared I/O System with Improved Improved
BW, connectivity, manageability, availabilityBW, connectivity, manageability, availabilityCosts = ~ $112,500 Costs = ~ $112,500 Savings = ~ 50%Savings = ~ 50%
Additional nonAdditional non--hardware TCO gainshardware TCO gainsOperational Expense Operational Expense
Estimated at 3x Estimated at 3x –– 8x Capital Expense reduction8x Capital Expense reductionSimpler system design to manageSimpler system design to manage
5/27/2002 Dragon Slayer Consulting 22
System Benefits
Increased BW & connectivity per serverIncreased BW & connectivity per serverReduced infrastructure complexityReduced infrastructure complexityReduced power & spaceReduced power & spaceBW migration to bursting servers BW migration to bursting servers Natural low latency IPC networkNatural low latency IPC network
5/27/2002 Dragon Slayer Consulting 23
EthernetEthernetLAN/WANLAN/WAN
MaintenanceMaintenanceLANLAN
FCFCSANSAN
Managing Scalability w/Traditional I/OWhat happens when just 2 more servers are added?What happens when just 2 more servers are added?
In the FC SANIn the FC SAN(1) new switch has to be added(1) new switch has to be addedFabric will need to be reconfiguredFabric will need to be reconfigured
Maintenance LAN will also need to changeMaintenance LAN will also need to changeFrom a 16From a 16--pt switch/router to 24pt switch/router to 24--port port
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
5/27/2002 Dragon Slayer Consulting 24
EthernetEthernetLAN/WANLAN/WAN
MaintenanceMaintenanceLANLAN
FCFCSANSAN
Managing Scalability w/Traditional I/OAdding servers takes a lot of hard work & timeAdding servers takes a lot of hard work & time
20 Net new connections20 Net new connectionsDisruptive FC fabric reconfigurationsDisruptive FC fabric reconfigurations
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
5/27/2002 Dragon Slayer Consulting 25
EthernetEthernetLAN/WANLAN/WAN
MaintenanceMaintenanceLANLAN
FCFCSANSAN
Managing Scalability w/IB based I/OAdding additional servers is significantly simpler & easierAdding additional servers is significantly simpler & easier
8 net new connections = a 60% reduction8 net new connections = a 60% reduction(2) IB paths/new server (2) IB paths/new server -- IB fabricIB fabricNo new switches or reconfigurationsNo new switches or reconfigurationsFaster & nonFaster & non--disruptive implementationdisruptive implementation
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
PowerEdge2450
5/27/2002 Dragon Slayer Consulting 26
Scalability Net Results w/IB shared I/O
Making adds, moves, or changes meansMaking adds, moves, or changes meansLess timeLess timeLess costLess costLess effortLess effortLess complexityLess complexityLess personnelLess personnelLess disruptionsLess disruptionsMore controlMore controlMore simplicityMore simplicityMore stabilityMore stabilityBetter RASBetter RAS
5/27/2002 Dragon Slayer Consulting 27
PCI Bus Constraints
PCI bus limitations have been strangling CPU I/OPCI bus limitations have been strangling CPU I/OLike trying to drink from a fire hoseLike trying to drink from a fire hose
5/27/2002 Dragon Slayer Consulting 28
PCI Bus Constraints
PCI PCI (66Mhz)PCI-X
(133 Mhz)DDR QDR 3GIO
Max BW 4 Gbps 8 Gbps 16Gbps 32Gbps 64Gbps
I/O Constraint (4) GbE w/TOE or (2) 2gig FC
(4) SCSI 320 (1) 4x IB(1) 10gigE, FC, or 4x IB
(2) 10gigE, FC, or 4x IB
ArchitectureSwitched
serialIssues Not until 04
Shared Parallel Bus
Bus contention
5/27/2002 Dragon Slayer Consulting 29
PCI Bus vs. IB
Comparison
Scalability: Ports & BW
QoS Security
PCI PCI-X DDR QDR 3GIO
InfiniBand
Protects software base
Out-of-box connectivity
Fabric Convergence
DisadvantagesAdvantages
Simpler for chip-to-chip
PCB, Copper, & Fiber
Lower cost
Fault Tolerance
Clustering
Multi-cast
Until there is 3GIO, bus contention
Software
5/27/2002 Dragon Slayer Consulting 30
Solution: PCI Bus AND IB
It’s not “either:or”It’s not “either:or”They are complimentary not mutually exclusiveThey are complimentary not mutually exclusive
The best solutions takes advantage of bothThe best solutions takes advantage of bothThis is why you rarely hear anymore that IB is the PCI replacemeThis is why you rarely hear anymore that IB is the PCI replacementnt
There are new HCAs There are new HCAs WITHWITH PCIPCI--X interfacesX interfacesExpect DDR, QDR, & 3GIO as wellExpect DDR, QDR, & 3GIO as wellThe IB benefits are almost as greatThe IB benefits are almost as great
Eliminates bus contentionEliminates bus contentionPreserves PCI software basePreserves PCI software base
Provides IB benefits NOWProvides IB benefits NOWDon’t have to wait for native server IBDon’t have to wait for native server IB
PCI-X HCA Example
5/27/2002 Dragon Slayer Consulting 31
Low Cost HP/HA Server Clustering
IB clustering costs less for scaling out than SMP or NUMA scalinIB clustering costs less for scaling out than SMP or NUMA scaling upg upIB eliminates fabric messaging performance Issues with clusterinIB eliminates fabric messaging performance Issues with clusteringg
Long queuesLong queuesPCI bus contentionPCI bus contention
IB enables low cost server (shared I/O arguments even stronger hIB enables low cost server (shared I/O arguments even stronger here)ere)Diskless bladesDiskless blades
Personality on the storagePersonality on the storageHigher Fault Tolerance and AvailabilityHigher Fault Tolerance and Availability
One connection for clustering and shared I/OOne connection for clustering and shared I/OLess I/O interfaces than any other interconnectLess I/O interfaces than any other interconnect
Higher performanceHigher performanceLower TCOLower TCO
5/27/2002 Dragon Slayer Consulting 32
Industry Analyst’s IB Enabled Server Forecast
0
1000000
2000000
3000000
4000000
5000000
6000000
2002 2003 2004 2005
Gartner IDC
Analysts are split in their forecast of IB’s TAM; but, not on its potential
5/27/2002 Dragon Slayer Consulting 33
IB Enabled Servers as a % of Total
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
2002 2003 2004 2005
Gartner IDC
5/27/2002 Dragon Slayer Consulting 34
Conclusions
Even if the analysts views are optimisticEven if the analysts views are optimisticHuge % of servers will be I/B enabledHuge % of servers will be I/B enabledThe value proposition is far too strong to ignoreThe value proposition is far too strong to ignoreInitial deployment will utilize PCIInitial deployment will utilize PCI--X HCAsX HCAsNative deployments will enable lower cost server blade clustersNative deployments will enable lower cost server blade clustersAs more and more servers become IB enabledAs more and more servers become IB enabled
Clever IT people will realize that they can run IB native for:Clever IT people will realize that they can run IB native for:Clustering, Networking, and StorageClustering, Networking, and Storage
When IB becomes native with the server motherboardWhen IB becomes native with the server motherboardThe perception becomes that it’s freeThe perception becomes that it’s free
There is always high market demand for…free.There is always high market demand for…free.
5/27/2002 Dragon Slayer Consulting 35
Dragon Slayer ConsultingDragon Slayer ConsultingDragon Slayer Consulting
?????Questions?????
5/27/2002 Dragon Slayer Consulting 36
Why Not Just Use GbE or FC?
GbE and FC are the current fabric infrastructuresGbE and FC are the current fabric infrastructuresIT personnel already know & understand the technologiesIT personnel already know & understand the technologiesFC & GbE are already battling it out for SAN infrastructuresFC & GbE are already battling it out for SAN infrastructures
FCP vs. iSCSIFCP vs. iSCSI
5/27/2002 Dragon Slayer Consulting 37
IB vs. FC vs. GbE
TechnologyStandards
BodySignaling
SpeedFirst
StandardMaximum
Frame SizePrimary
ApplicationGigabit Ethernet
IEEE & IETF 1.25 Gbps 1999 1.5KLAN: Local
Area NetworkFibre Channel
ANSI 2.125 Gbps 1988 2KSAN: Storage Area Network
InfiniBand Architecture
InfiniBand Trade
Association
2.5Gbps (1x) 10Gbps (4x)
30Gbps (12x)2001 4K
IAN: I/O Area Network
5/27/2002 Dragon Slayer Consulting 38
How IB compares w/GbE & FC in OSIFibre
ChannelIB
ArchitectureUpper Level
ProtocolsApplication Application
Transport Layer
TCP UDPFC-4:
Protocol Mappings
IBA Operations
(FC-3)
Line Encoding
FC-1: Encoding
Media Access Control
Physical Layer
FC-0: Physical
MediaPhysicalPhysical Layer Entities
Link Layer
Ethernet (802.3)
Application
Logical Link Control
IPNetwork
Layer FC-2: Framing
Service Class
Network
Media Access Control
= layers not included in the protocol standards
5/27/2002 Dragon Slayer Consulting 39
Data Center Fabric & I/O Consolidation
IB enables convergence through shared server I/OIB enables convergence through shared server I/OOne I/O interface forOne I/O interface for
ClusteringClusteringNetworkNetworkStorageStorage
Eliminates the need for multiple server I/O blades/portsEliminates the need for multiple server I/O blades/portsIB virtual lanes providesIB virtual lanes provides
Multiple independent logical fabrics multiplexed on one physicalMultiple independent logical fabrics multiplexed on one physical oneoneQoS to prioritize trafficQoS to prioritize trafficThe benefits of independent fabrics with:The benefits of independent fabrics with:
The management and maintenance of one fabricThe management and maintenance of one fabricSwitches, directors, and routers provideSwitches, directors, and routers provide
Scalability, redundancy, availability, and flexibilityScalability, redundancy, availability, and flexibility
5/27/2002 Dragon Slayer Consulting 40
Requirements of a Shared I/O SystemCooperative Software ArchitectureCooperative Software Architecture
Ability to productively distribute work between host & external Ability to productively distribute work between host & external shared I/O system shared I/O system Virtualization of I/OVirtualization of I/O
Host manipulates logical resourcesHost manipulates logical resourcesHost has no awareness of underlying physical resources Host has no awareness of underlying physical resources
All I/O managed external to hostAll I/O managed external to hostHost originates requests and receives resultHost originates requests and receives result
Heterogeneous Operating SystemsHeterogeneous Operating Systems3 Classes of I/O 3 Classes of I/O
Efficiently handle small to very large messagesEfficiently handle small to very large messagesMicrosecond sensitive latency without sacrificing bandwidthMicrosecond sensitive latency without sacrificing bandwidth
Channel ArchitectureChannel ArchitectureHighly differentiated priority and service levelsHighly differentiated priority and service levelsConnection oriented guaranteed delivery mechanismConnection oriented guaranteed delivery mechanismInherent memory semantics and protectionInherent memory semantics and protectionHigh speed / low latencyHigh speed / low latency
5/27/2002 Dragon Slayer Consulting 41
Dragon Slayer ConsultingDragon Slayer ConsultingDragon Slayer Consulting
Market Projections
IDC & GartnerIDC & Gartner--DataquestDataquest