Upload
lorin-lewis
View
219
Download
3
Tags:
Embed Size (px)
Citation preview
Implementing Convergent Networking: Partner Concepts
Uri Elzur Broadcom CorporationDirector, Advanced Technology
Brian Hausauer Neteffect Inc.Chief Architect
Convergence In The Data Center:Convergence Over IP
Uri ElzurBroadcom CorporationDirector, Advanced Technology
AgendaAgenda
Application requirements in Data Center
Data flows and Server architecture
Convergence
Demo
Hardware and software challenges and advantages
Summary
Enterprise Network TodayEnterprise Network TodayIT: Get ready for tomorrow’s Data Center, todayIT: Get ready for tomorrow’s Data Center, today
Multiple networks drive Total Cost of Ownership (TCO) up
Consolidation, convergence, virtualization requires Flexible I/OFlexible I/O
Higher speeds (2.5G, 10G) requires more Efficient I/OEfficient I/O
Issue: Best use of Memory and CPU resources
Additional constraints: Limited power, cooling and smaller form factor
Convergence Over EthernetConvergence Over Ethernet
Multiple networks and multiple stacks in the OS are used to provide these services
Wire protocols; e.g., Internet Small Computer System Interface (iSCSI) and iWARP (Remote Direct Memory Access {RDMA}) enable the use of Ethernet as the converged network
Direct Attach storage migrates to Networked Storage
Proprietary clustering can now use RDMA over Ethernet
The OS supports one device servicing multiple stack with the Virtual Bus Driver
To accommodate these new traffic types, Ethernet’s efficiency must be optimal
CPU utilization
Memory BW utilization
Latency
File System
TCP/IP
NDIS
NDIS IM Driver
NDIS Miniport
Class Driver
iSCSIMiniport
iSCSI Port Driver.
Storage Applications
NIC
Partition
HBA
Windows Socket Switch
Sockets Applications
Windows Sockets
RDMA Driver
User User ModeMode
KernelKernelModeMode
RDMA Provider
RNIC
(iscsiprt sys)
Data Center Application CharacteristicsData Center Application Characteristics
Application Servers
Cluster
email, print, storage,
clustering etc,
medium - large
< 1000
long
CPU util, latency
TOE, iSCSI, RDMA, iSCSI
Boot
Tier 2
Database
database, storage, clustering
medium - large
100's - 1000's
long
CPU util, latency
TOE, iSCSI, RDMA, iSCSI
Boot
Tier 3
Web Servers
IP/Ethernet Storage Cluster Management
DAS
Load Balancers
Applications
WebServer with Load balancer
WebServer without Load
balancer comm to Tier 2
I/O size medium - large 15KB medium - large
# of Connections Few Many < 1000
Duration Long Short long
Main Challenges CPU Util
Connection set up and tear down, load sharing on
multiple CPUs CPU Util
TechnologiesTOE, iSCSI, iSCSI Boot
RSS and TOE, iSCSI, iSCSI Boot
TOE, iSCSI, iSCSI Boot
Tier 1
The Server In The Data CenterThe Server In The Data CenterDatabaseDatabaseApplication ServersApplication ServersWeb ServersWeb Servers
IP/EthernetIP/Ethernet StorageStorage ClusterCluster ManagementManagement
ClusterCluster
DASDAS
Server network requirements – Data, Storage, Clustering, and Management
Acceleration required for Data = TCP, Storage = iSCSI, Clustering = RDMA
Application requirements: More transactions per server, Higher rate, Larger messages (e.g., e-mail)
Load Load BalancersBalancers
Long Lived connection Long Lived connection Short Lived connection Short Lived connection
Traditional L2 NIC Rx Flow And Traditional L2 NIC Rx Flow And Buffer ManagementBuffer Management
1. Application pre-posts buffer
2. Data arrives at Network Interface Adapter (NIC)
3. NIC Direct Memory Access (DMA) data to driver buffers (Kernel)
4. NIC notifies Driver after a frame is DMA’d (Interrupt moderation per frame)
5. Driver notifies Stack
6. Stack fetches headers, processes TCP/IP, strip headers
7. Stack copies data from driver to Application buffers
8. Stack notifies Application
1
3
2
4
56
Driver
7
8
Minimum of one copy
L2 NIC
Application
TCP Stack
iSCSI iSCSI
iSCSI provides a reliable high performance block storage service
Microsoft Operating System support for iSCSI accelerates iSCSI’s deployment
Microsoft iSCSI Software Initiator
iSCSI HBA
iSCSI HBA provides forBetter performance
iSCSI Boot
iSER enablement
File System
Storage Applications
Class Driver
iSCSIMiniport
iSCSI Port Driver
Partition Manager
HBA
(iscsiprtysys)
The Value Of iSCSI BootThe Value Of iSCSI Boot
Storage consolidation – lower TCOEasier maintenance, replacement
No need to replace server blade for a HD failure
No disk on blade/motherboard – space, power savings
Smaller blades, higher density
Simpler board design, no need for HD specific mechanical restrainer
Higher reliabilityHot replacement of disks if a disk fails
RAID protection over boot disk
Re-assign disk to another server in case of server failure
WSD And RDMAWSD And RDMA
Kernel by pass attractive for High Performance Computing (HPC), Databases, and any Socket application
WSD model supports RNICs with RDMA over Ethernet (a.k.a., iWARP)
As latency improvements are mainly due to kernel by-pass, WSD is competitive with other RDMA-based technologies, e.g., Infiniband
Socket App
WinSock
TCP/IP WinSock Provider
TCP/IP Transport Driver
NDIS Miniport
NIC
TCP/IP WinSock Provider
Socket App
WinSock
WinSock Switch
UserUser
KernelKernel
NDISNDIS
WinSockWinSock
SPISPI
Microsoft WSD ModuleMicrosoft WSD Module
Traditional ModelTraditional Model WSD ModelWSD Model
RNIC
Private Private interfaceinterface
OEM WSD SoftwareOEM WSD Software
OEM OEM SANSAN HardwareHardware
RDMA service Provider
TCP/IP Transport Driver
SAN NDIS Miniport
RDMAprovider
Driver
L2 Technology Can’t Efficiently Handle L2 Technology Can’t Efficiently Handle iSCSI And RDMAiSCSI And RDMA
iSCSI HBA implementation concerns iSCSI Boot
Digest overhead – CRC-32C
Copy overhead – Zero Copy requires iSCSI protocol processing
RDMA RNIC implementation concerns Throughput – high Software overhead for RDMA processing
MPA – CRC-32C, Markers every 512B
DDP/RDMA – protocol processing, zero copy, User mode interaction, special queues
Minimal latency – Software processing doesn’t allow for kernel bypass
Thus, for optimal performance specific offload is required
ClusteringClustering(IPC, HPC)(IPC, HPC)
RemoteRemoteManagementManagement
Convergence Over Ethernet: Convergence Over Ethernet: TOE, iSCSI, RDMA, ManagementTOE, iSCSI, RDMA, Management
Leverage existing Standard Ethernet equipment
Lower TCO – one technology for multiple purposes
Storage Storage NetworkingNetworking
Block Block StorageStorage
FileFileStorageStorage
Legacy modelLegacy model New modelNew model
NetworkingNetworking
NIC
Host
C-NICTOETOERSSRSS
HostApplication layer
Presentation layer
Session layer
Transport layer
Network layer
Data Link layer
Physical layer
Cluster IPCCluster IPC
ConvergenceConvergence
over over Std. EthernetStd. Ethernet
LANLANStorageStorage
ManagementManagement
Converges functionsConverges functions
Multiple functions (SAN, LAN, Multiple functions (SAN, LAN, IPC, Mgmt.) can be consolidated IPC, Mgmt.) can be consolidated to a single fabric typeto a single fabric type
Blade server storage connectivity Blade server storage connectivity (low cost)(low cost)
Consolidates portsConsolidates ports
Leverage Ethernet Leverage Ethernet pervasiveness, knowledge, cost pervasiveness, knowledge, cost leadership leadership aand volumend volume
Consolidate KVM over IPConsolidate KVM over IP
C-NIC Demo
C-NIC Hardware Design – C-NIC Hardware Design – Advantages/ChallengesAdvantages/Challenges
Performance – wire speed Find the right split between Hardware and Firmware
Hardware for Speed – e.g., connection look up, frame validity, buffer selection, and offset computation
Hardware connection look up is significantly more efficient than softwareIPv6 address length (128-bits) exacerbates it
FlexibilityFirmware provides flexibility, but maybe slower than hardware…
Specially optimized RISC CPU – it’s not about MHz…
Accommodate future protocol changes: e.g., TCP ECN
Minimal latency From wire to application buffer (or from application to wire for Tx)
Not involving the CPU
Flat ASIC architecture for minimal latency
Scalability – 1G, 2.5G, 10GZero Copy architecture – a match to server memory BW and latency; additional copy or few copies in any L2 solution
Power goals – under 5W per 1G/2.5G, under 10W per 10GCPU consumes 90W
C-NIC Software Design – C-NIC Software Design – Advantages/ChallengesAdvantages/Challenges
Virtual Bus Driver Reconcile requests from all stacks
Plug and Play
Reset
Network control and speed
Power
Support of multiple stacks Resource allocation and management
Resource isolation
Run time – priorities
Interfaces separation
Interrupt moderation per stack
Statistics
SummarySummary
C-NIC AdvantagesTCP Offload Engine in Hardware – for better application performance, lower CPU, and improved latency
RDMA – for Memory BW and ultimate Latency
iSCSI – for networked storage and iSCSI Boot
Flexible and Efficient I/O for the data center of today and tomorrow
WinHEC 2005
Brian HausauerChief ArchitectNetEffect, [email protected]
SANBlock StorageFibre Channel
block storage adapter▪ ▪
▪ ▪ ▪ ▪
▪switch
Today’s Data CenterToday’s Data Center
ClusteringMyrinet, Quadrics,
InfiniBand, etc.
clustering adapter
▪ ▪ ▪ ▪
▪ ▪ ▪switch
NAS
Users
LANEthernet
networking adapter
Applications▪ ▪
▪ ▪ ▪ ▪
▪switch
fast access, low latency
concurrent access, high throughput, low overhead
pervasive standard, plug-n-play interop
5.2 – 10.2 Gb/s2.3 – 4.6 Gb/s6.5 – 14.0 Gb/s
Datacenter Trends: Datacenter Trends: Traffic Increasing 3x AnnuallyTraffic Increasing 3x Annually
Scaling 3-fabric infrastructure expensive and cumbersomeServer density complicates connections to three fabricsSuccessful solution must meet different application requirements
Sources: 2006 IA Server I/O Analysis, Intel Corporation; Oracle
2006 I/O traffic requirements
Front-endweb servers
Mid-tierapplication
servers
Back-enddatabase servers
Network Heavy (5-10 Gb/s)
Intermediate(200-500 Mb/s)
Low (<200 Mb/s)
Storage Intermediate(1.5-4 Gb/s)
Low (<100 Mb/s)
Heavy (3-6 Gb/s)
Clustering IPC None Heavy(2-4 Gb/s)
Heavy(2-4 Gb/s)
Application requirements
Typical for each server
High Performance Computing:High Performance Computing:Clusters DominateClusters Dominate
Clusters in Top 500 Systems
0
50
100
150
200
250
300
1997
1998
1999
2000
2001
2002
2003
2004
•Clusters continue to grow in popularity and now dominate the Top 500 fastest computers
294 clusters in top 500 computers
294 clusters in top 500 computers
•Ethernet continues to increase share as the cluster interconnect of choice for the top clusters in the world
Ethernet is the interconnect for over
50% of the top clusters
Ethernet is the interconnect for over
50% of the top clusters
Ethernet-based Clusters All Other Clusters
Source: www.top500.org
Next-generation EthernetNext-generation Ethernetcan be can be thethe solution solution
Why Ethernet?pervasive standard
multi-vendor interoperability
potential to reach high volumes and low cost
powerful management tools/infrastructure
Why not?Ethernet does not meet the requirements for all fabrics
Ethernet overhead is the major obstacle
The solution: iWARP Extensions to Ethernet
Industry driven standards to address Ethernet deficiencies
Renders Ethernet suitable for all fabrics at multi-Gb and beyond
Reduces cost, complexity and TCO
device driver
OS
Overhead & Latency in Networking Overhead & Latency in Networking
application
I/O library
user
device driver
OS
I/O adapter
server software
kernel
I/O cmd
TCP/IPTCP/IP
software
hardware
standard Ethernet TCP/IP packet
adapter bufferadapter buffer
app bufferapp buffer
OS bufferOS buffer
driver bufferdriver buffer
I/O cmdI/O cmd
I/O cmdI/O cmd
I/O cmdI/O cmd
I/O cmdI/O cmd
I/O cmdI/O cmd % CPU Overhead
100%application to OScontext switches 40%
transport processing 40%
Intermediate buffer copies20%
Sources
TCP/IPTCP/IP
app bufferapp buffer
I/O cmdI/O cmd
application to OScontext switches 40%
Intermediate buffer copies20%
60%
40%application to OScontext switches 40%
Solutions
• Transport (TCP) offload
• RDMA / DDP
• User-Level Direct Access / OS Bypass
Packet ProcessingIntermediate Buffer CopiesCommand Context Switches
context switchcontext switch
Introducing NetEffect’sIntroducing NetEffect’s NE01 Ethernet Channel Adapter (ECA)NE01 Ethernet Channel Adapter (ECA)
A single chip supports:Transport (TCP) offloadRDMA/DDPOS bypass / ULDA
Meets requirements for:Clustering (HPC, DBC,…)Storage (file and block)Networking
Reduces overhead up to 100%
Strategic advantagesPatent-pending virtual pipeline and RDMA architectureOne die for all chips enables unique products for dual 10Gb / dual 1Gb
host CPUs
NetEffect ECA
Ethernet ports(10 Gb or 1 Gb; Copper or fibre)
host interface
transaction switchprotocol engine
DD
R2
SD
RA
Mc
on
trolle
r
MAC MAC
serverchipset
host CPUs
host memory
extern
al
DR
AM
with
EC
C
host CPUs
NetEffect ECA
Ethernet ports(10 Gb or 1 Gb; Copper or fibre)
host interface
transaction switchprotocol engine
DD
R2
SD
RA
Mc
on
trolle
r
MAC MAC
PCI Express or PCI-X
serverchipset
host CPUs
host memory
extern
al
DR
AM
with
EC
C
Future ServerFuture Server Ethernet Channel Adapter (ECA) for a Converged FabricEthernet Channel Adapter (ECA) for a Converged Fabric
NetEffect ECA delivers optimized file and block storage, networking, and clustering from a single adapter
NetEffect ECA
NIC
Block Storage
OS/driver s/w
Server
RDMA AcceleratorTCP
Accelerator
NetworkingOS/driver
s/w
ClusteringOS/driver s/w
Ethernet fabric(s)
Transaction Switch
iSER iSCSI
O/S AccelerationInterfaces
Existing Interfaces
(WSD, DAPL, VI, MPI)
iSCSIiWARPTOE
NetEffect ECA ArchitectureNetEffect ECA Architecture
...MAC
host interface
MAC
crossbar
AcceleratedNetworking
BasicNetworking
BlockStorage
Clustering
NetEffect ECA ArchitectureNetEffect ECA ArchitectureNetworkingNetworking
Related software standardsSockets
Microsoft WinSock Direct (WSD)
Sockets Direct Protocol (SDP)
TCP Accelerator Interfaces
Basic & Accelerated Networking
TOE
iWARP
Basic Networking
TC
P A
ccelerator
WS
D, S
DP
,
So
ckets (SW
Stack)
Block Storage
NetEffect ECA Architecture NetEffect ECA Architecture StorageStorage
Related software standardsFile system
NFS
DAFS
R-NFS
Block modeiSCSI
iSER
Basic Basic NetworkingNetworking
TOE
iWARP
iSC
SI, N
FS
iSE
R, R
-NF
S
Co
nn
ection
Mg
mt
Setup/Teardown and Exceptions only
Clustering
NetEffect ECA Architecture NetEffect ECA Architecture ClusteringClustering
Related software standardsMPI
DAPL API
IT API
RDMA Accelerator Interfaces
N/A
Setup/Teardown and Exceptions only
Basic Basic NetworkingNetworking
TOE
iWARP
MP
I, DA
PL
,
Co
nn
ection
Mg
mt
ClusteringMyrinet, Quadrics,
InfiniBand, etc.
ClusteringiWARP Ethernet
StorageiWARP Ethernet
SANBlock StorageFibre Channel
networkingstorageclustering
Tomorrow’s Data CenterTomorrow’s Data CenterSeparate Fabrics for Networking, Storage, and Separate Fabrics for Networking, Storage, and ClusteringClustering
networkingstorageclustering
▪ ▪ ▪ ▪
▪ ▪ ▪switch
networkingstorageclustering
Applications
networking
storage
clustering
adapter
adapter
adapter
▪ ▪ ▪ ▪
▪ ▪ ▪switch
LANiWARP Ethernet
NAS
Users
LANEthernet
▪ ▪ ▪ ▪
▪ ▪ ▪switch
ClusteringiWARP Ethernet
StorageiWARP Ethernet
SAN
Fat Pipe for Blades & StacksFat Pipe for Blades & StacksConverged Fabric for Networking, Storage & ClusteringConverged Fabric for Networking, Storage & Clustering
▪ ▪ ▪ ▪
▪ ▪ ▪switch
▪ ▪ ▪ ▪
▪ ▪ ▪switch
LANiWARP Ethernet
NAS
Users
▪ ▪ ▪ ▪
▪ ▪ ▪switch
Applications
networkingstorageclustering
adapter
Take-AwaysTake-Aways
Multi-gigabit networking is required for each tier of the data center
Supporting multiple incompatible network infrastructure is becoming increasingly more difficult as budget, power, cooling and space constraints tighten
With the adoption of iWARP, Ethernet for the first time meets the requirements for all connectivity within the data center
NetEffect is developing a high performance iWARP Ethernet Channel Adapter that enables the convergence of clustering, storage and networking
Call to ActionCall to Action
Deploy iWARP products for convergence of networking, storage and clustering
Deploy 10 Gb Ethernet for fabric convergence
Develop applications to RDMA-based APIs for maximum server performance
ResourcesResources
NetEffectwww.NetEffect.com
iWARP Consortiumwww.iol.unh.edu/consortiums/iwarp/
Open Group Authors of ITAPI, RNIC PI & Sockets API Extensions
www.opengroup.org/icsc/
DAT Collaborativewww.datcollaborative.org
RDMA Consortiumwww.rdmaconsortium.org
IETF RDDP WGwww.ietf.org/html.charters/rddp-charter.html
Community ResourcesCommunity Resources
Windows Hardware and Driver Central (WHDC)www.microsoft.com/whdc/default.mspx
Technical Communitieswww.microsoft.com/communities/products/default.mspx
Non-Microsoft Community Siteswww.microsoft.com/communities/related/default.mspx
Microsoft Public Newsgroupswww.microsoft.com/communities/newsgroups
Technical Chats and Webcastswww.microsoft.com/communities/chats/default.mspx
www.microsoft.com/webcasts
Microsoft Blogswww.microsoft.com/communities/blogs