Upload
lee-stewart
View
232
Download
1
Embed Size (px)
Citation preview
www.openfabrics.org
uDAPL development
Arkady Kanevsky – Network ApplianceJames Lentini – Network Appliance
Arlin Davis – Intel
2www.openfabrics.org
uDAPL - agenda
overview current release v2 features interop event experiences future plans
3www.openfabrics.org
uDAPL - overview
1. Whata. DAT (direct access transport) DAPL (direct access programming library)b. DAT collaborative - Industry group formed in 2001, 50+ members (
http://www.datcollaborative.org)c. Kernel/User API‘s defined. uDAPL 1.0, 1.1, 1.2, 2.0 (2002-2007)d. Set of API‘s that exploit RDMA capable hardwaree. Transport (VI, IB, iWARP) and Platform (O/S) independencef. Consider extensions beyond current functionalityg. Open source reference (http://sourceforge.net/projects/dapl)h. Structure lightweight, low overhead, members have all rightsi. Specifications/actions ratified by vote, one vote per member
2. Whya. Portability - library support for Linux, Windows, Solarisb. Expandability - IHV support goes beyond IB and iWARPc. Extendibility - platform/transport extension support built-ind. Performance - comparable to verbs
4www.openfabrics.org
uDAPL - release update
1. Multiple release packages built to coexista. 1.2.3 package – base
b. 2.0.2 package – base, devel, utils, debuginfo
2. dat.conf is backward compatiblea. original v1.2 entries unchanged
b. new entries for v2.0 using ofa-v2 prefix
3. All utils and development files are v2a. dtestx added for IB extensions
b. /usr/include/dat2 for development
5www.openfabrics.org
uDAPL - v2 features
1. Major releasea. not backward compatible (see transition guide)
b. many improvementsa. 1.2 spec errata
b. high availability
c. IB and iWARP extension support
c. Specification and latest errata on downloadshttp://www.openfabrics.org/downloads/dapl/documentation/
6www.openfabrics.org
uDAPL – Interop event - Fall 2007
1. dapltesta. default send/write inline data = 128 bytes, 64 is max for some adaptersb. rdma_in/out default setting is 8 in provider, 4 is max for most adapters (dapl bug)c. test sends data from server side first – iWARP protocol issued. reported max adapter attributes incorrect (qp depth 16 million reported, 1024
supported)
2. Intel MPI over uDAPLa. default request queue is 1100, <1024 for some adapters
(Intel MPI didn’t adjust queue size down when queried max values less then 1100)b. IMB runs on IB and iWARP, test suite issues on iWARP (err=6, status=EPROTO)c. Default overrides required to run IMB on iWARP:
-env RDMA_DEFAULT_MAX_WQE 400 -env RDMA_READ_RESERVE 100 -env I_MPI_RDMA_RECV_QUEUE_SIZE 10
Take-Aways• iWARP adapter resources less then IB HCA’s. • Apps and dapl providers should use query attributes, adapters must report correctly. • Vendors need pre-event validation (rping,dapltest) to insure basic operation• Need a verbs based limit test suite for max resource validation.
7www.openfabrics.org
uDAPL – Future
1. add conformance test suites to release
2. IB extensions – xRC, multicast, etc
3. validate sRQ and add new test utility
4. interop between OFED and WinOFa. rdma_cm support in WinOF or
b. bring back openib_scm support
8www.openfabrics.org
uDAPL – Backup Slides
9www.openfabrics.org
MPI performance study
Designing a Portable MPI-2 over Modern Interconnects Using uDAPL Interface Book Series Lecture Notes in Computer Science
Publisher Springer Berlin / Heidelberg ISSN0302-9743 Subject Computer Science Volume 3666/2005 Book
Recent Advances in Parallel Virtual Machine and Message Passing InterfaceDOI10.1007/11557265 Copyright 2005 ISBN 978-3-540-29009-4DOI10.1007/11557265_28
Pages200-208Online Date Sunday, October 02, 2005
http://www.springerlink.com/content/dx8uw0gdn09j/
“Evaluation with micro-benchmarks and applications on InfiniBand shows that the implementation with uDAPL performs comparably with that of MVAPICH2.”
Dr. Panda - OSU
10www.openfabrics.org
uDAPL API Overview
11www.openfabrics.org
DAT ObjectsAcronymns
CNO – consumer notification object CR – connection request EP – endpoint EVD – event dispatcher IA – interface adapter LMR – local memory region PSP – public service point PZ – protection zone RMR – remote memory region RSP – reserved service point
12www.openfabrics.org
Local Resource MgmtInterface Adapter
Maps to a hardware port Referred to as the root object
Other objects are descendants of the interface adapter
Opened using names stored in a DAT registry provider list
13www.openfabrics.org
Local Resource MgmtInterface Adapter
dat_ia_open Opens an interface adapter of a given name Specifies the size of the associated event
dispatcher queue
dat_ia_close Closes an interface adapter Can automatically cleanup all associated
resources
14www.openfabrics.org
Local Resource MgmtInterface Adapter
dat_ia_query Returns attributes for the interface adapter
and the provider of the adapter Callers can specific which attributes to query
15www.openfabrics.org
Local Resource MgmtConsumer Context
User-defined context may be associated with most DAT objects
Association of context is done separately from object allocation
16www.openfabrics.org
Local Resource MgmtConsumer Context
dat_set_consumer_context Associates a user-defined context with a DAT
object
dat_get_consumer_context Returns the user’s context for a DAT object
dat_get_handle_type Indicates the type of DAT object associated
with a given handle
17www.openfabrics.org
Event ManagementEvent Model
An event dispatcher delivers events Event streams feed into the event
dispatcher Order is maintained separately for each
stream Ordering between streams is defined by the
spec
18www.openfabrics.org
Event ManagementEvent Model
Event streams exist for: Completions Connection requests Connection events
Establishment, disconnect, time-outs
Asynchronous errors User notifications
19www.openfabrics.org
Event ManagementConsumer Notification Objects
Similar to an operating system event Limited to a single interface adapter Multiple event dispatchers may be mapped to a
single notification object CNOs may be associated with wait proxy agents
Wait proxy agents are signaled when the CNO is signaled
Multiple CNOs per wait agent is allowed Wait proxy agent implementation is OS-dependent
20www.openfabrics.org
Event ManagementConsumer Notification Objects
dat_cno_create Creates a new notification object Associates an optional wait proxy agent with the CNO
dat_cno_free Destroys a CNO
dat_cno_wait Waits a specified time for an event to occur on an
event dispatcher associated with the CNO
21www.openfabrics.org
Event ManagementConsumer Notification Objects
dat_cno_modify_agent Changes the wait proxy agent associated with
a CNO
dat_cno_query Returns user-specified information about the
CNO
22www.openfabrics.org
Event ManagementEvent Dispatcher
dat_evd_create Creates a new event dispatcher User specifies the size and type of events
reported by the dispatcher Dispatcher may be associated with a CNO
dat_evd_free Destroys an event dispatcher
23www.openfabrics.org
Event ManagementEvent Dispatcher
dat_evd_query Return user-specified information on the
event dispatcher
dat_evd_modify_cno Changes which CNO is associated with the
event dispatcher
24www.openfabrics.org
Event ManagementEvent Dispatcher
dat_evd_enable Indicates that the event dispatcher should
trigger the associated CNO when an event occurs
dat_evd_disable Prevents the event dispatcher from signaling
its associated CNO when an event occurs
25www.openfabrics.org
Event ManagementEvent Dispatcher
dat_evd_set_unwaitable Transitions the event dispatcher into an
unwaitable state Results in calls to dat_evd_wait failing Does not affect event generation or signaling
of the CNO
dat_evd_clear_unwaitable Allows waiting on the event dispatcher
26www.openfabrics.org
Event ManagementEvent Dispatcher
dat_evd_resize Changes the size of the EVD
dat_evd_wait Allows the user to wait on the EVD for a
specific number of events to occur or until the wait times out
Returns event information, if available Returns the number of events still on the EVD
27www.openfabrics.org
Event ManagementEvent Dispatcher
dat_evd_dequeue Retrieves information about an event, if
available, from the event dispatcher
dat_evd_post_se Posts a user-defined software event to the
event dispatcher
28www.openfabrics.org
Connection ManagementOverview
Supports client/server and peer-to-peer connection models
Addresses to interface adapters are standard socket addresses struct sockaddr * struct sockaddr_net6 * Cast to DAT_IA_ADDRESS_PTR when used with
DAPL API Connection qualifiers are used to associate
connection requests with the providing service
29www.openfabrics.org
Connection ManagementPublic Service Points
Allows consumers to listen for connection requests on a specified connection qualifier
Connection qualifier is advertised by a name service
Listen request is persistent Allows multiple connections Number of outstanding connections is limited
by associated EVD
30www.openfabrics.org
Connection ManagementPublic Service Points
dat_psp_create Creates a public service point to listen for
incoming connection requests PSP is limited to a single interface adapter User specifies if endpoints should be
automatically created User or DAPL provider can allocate endpoints with
each connection request DAPL allocated endpoints are not associated with
protection zones or event dispatchers
31www.openfabrics.org
Connection ManagementPublic Service Points
dat_psp_create_any Similar to dat_psp_create, except that the
DAPL provider selects the connection qualifier from the list of available qualifiers
dat_psp_free Cancels listening for connection requests on a
public service point dat_psp_query
Return user-specified information about the PSP
32www.openfabrics.org
Connection ManagementReserve Service Points
Supports peer-to-peer connection requests Connection qualifiers are not advertised by a
name service Applications must determine connection qualifiers
beforehand
Only a single connection is established per reserved service point Can be used to create auxiliary connections
33www.openfabrics.org
Connection ManagementReserve Service Points
dat_rsp_create Similar to dat_psp_create, except user
specifies local endpoint to use when establishing the connection
dat_rsp_free Cancels listening for connection requests on a
reserved service point dat_rsp_query
Return user-specified information about the RSP
34www.openfabrics.org
Connection ManagementConnection Requests
Connection requests are given to the consumer through a connection request event
dat_cr_query Return user-specified information about a
connection request dat_cr_handoff
Modifies a connection request to a specified connection qualifier
35www.openfabrics.org
Connection ManagementConnection Requests
dat_cr_accept Establishes a connection
Destroys connection request object
User specifies the endpoint, unless it is specified by the connection request
User may pass private data to the remote user
dat_cr_reject Rejects a connection request
36www.openfabrics.org
EndpointsAllocation
dat_ep_create Creates an endpoint with specified attributes Endpoints belong within a protection zone User specifies event dispatchers for receive
completions, send completions, and connection events
dat_ep_free Releases an endpoint
37www.openfabrics.org
EndpointsAllocation
dat_ep_get_status Returns an endpoint’s state and whether data
transfer operations are outstanding dat_ep_query
Returns user-specified information about an endpoint
dat_ep_modify Changes endpoint properties
Protection zone, event dispatchers, attributes
38www.openfabrics.org
EndpointsConnections
dat_ep_connect Initiates a connection request to a specified remote
service Remote interface adapter and connection qualifier
dat_ep_dup_connect Establishes a new connection to the same remote
service specified by a prior connection Similar to calling dat_ep_connect using the same
remote interface adapter and connection qualifier
39www.openfabrics.org
EndpointsConnections
dat_ep_disconnect Disconnects an endpoint Allows graceful and abrupt disconnections
dat_ep_reset Resets an endpoint to the unconnected state
40www.openfabrics.org
EndpointsData Transfer
dat_ep_post_send dat_ep_post_recv dat_ep_post_rdma_read dat_ep_post_rdma_write
41www.openfabrics.org
Memory ManagementProtection Zone
dat_pz_create Allocates a protection zone on an interface
adapter Associates endpoints with local and remote
memory regions
dat_pz_free dat_pz_query
42www.openfabrics.org
Memory ManagementLocal/Remote Memory Region
dat_lmr_create Registers memory with a protection zone for use with
an endpoint dat_rmr_create
Allocates a remote memory region within a protection zone No memory is associated with the RMR until bound
dat_lmr_free / dat_rmr_free dat_lmr_query / dat_rmr_free dat_rmr_bind
Enables a section of an LMR for remote access