42
www.openfabrics.o rg uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

Embed Size (px)

Citation preview

Page 1: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

www.openfabrics.org

uDAPL development

Arkady Kanevsky – Network ApplianceJames Lentini – Network Appliance

Arlin Davis – Intel

Page 2: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

2www.openfabrics.org

uDAPL - agenda

overview current release v2 features interop event experiences future plans

Page 3: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

3www.openfabrics.org

uDAPL - overview

1. Whata. DAT (direct access transport) DAPL (direct access programming library)b. DAT collaborative - Industry group formed in 2001, 50+ members (

http://www.datcollaborative.org)c. Kernel/User API‘s defined. uDAPL 1.0, 1.1, 1.2, 2.0 (2002-2007)d. Set of API‘s that exploit RDMA capable hardwaree. Transport (VI, IB, iWARP) and Platform (O/S) independencef. Consider extensions beyond current functionalityg. Open source reference (http://sourceforge.net/projects/dapl)h. Structure lightweight, low overhead, members have all rightsi. Specifications/actions ratified by vote, one vote per member

2. Whya. Portability - library support for Linux, Windows, Solarisb. Expandability - IHV support goes beyond IB and iWARPc. Extendibility - platform/transport extension support built-ind. Performance - comparable to verbs

Page 4: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

4www.openfabrics.org

uDAPL - release update

1. Multiple release packages built to coexista. 1.2.3 package – base

b. 2.0.2 package – base, devel, utils, debuginfo

2. dat.conf is backward compatiblea. original v1.2 entries unchanged

b. new entries for v2.0 using ofa-v2 prefix

3. All utils and development files are v2a. dtestx added for IB extensions

b. /usr/include/dat2 for development

Page 5: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

5www.openfabrics.org

uDAPL - v2 features

1. Major releasea. not backward compatible (see transition guide)

b. many improvementsa. 1.2 spec errata

b. high availability

c. IB and iWARP extension support

c. Specification and latest errata on downloadshttp://www.openfabrics.org/downloads/dapl/documentation/

Page 6: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

6www.openfabrics.org

uDAPL – Interop event - Fall 2007

1. dapltesta. default send/write inline data = 128 bytes, 64 is max for some adaptersb. rdma_in/out default setting is 8 in provider, 4 is max for most adapters (dapl bug)c. test sends data from server side first – iWARP protocol issued. reported max adapter attributes incorrect (qp depth 16 million reported, 1024

supported)

2. Intel MPI over uDAPLa. default request queue is 1100, <1024 for some adapters

(Intel MPI didn’t adjust queue size down when queried max values less then 1100)b. IMB runs on IB and iWARP, test suite issues on iWARP (err=6, status=EPROTO)c. Default overrides required to run IMB on iWARP:

-env RDMA_DEFAULT_MAX_WQE 400 -env RDMA_READ_RESERVE 100 -env I_MPI_RDMA_RECV_QUEUE_SIZE 10

Take-Aways• iWARP adapter resources less then IB HCA’s. • Apps and dapl providers should use query attributes, adapters must report correctly. • Vendors need pre-event validation (rping,dapltest) to insure basic operation• Need a verbs based limit test suite for max resource validation.

Page 7: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

7www.openfabrics.org

uDAPL – Future

1. add conformance test suites to release

2. IB extensions – xRC, multicast, etc

3. validate sRQ and add new test utility

4. interop between OFED and WinOFa. rdma_cm support in WinOF or

b. bring back openib_scm support

Page 8: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

8www.openfabrics.org

uDAPL – Backup Slides

Page 9: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

9www.openfabrics.org

MPI performance study

Designing a Portable MPI-2 over Modern Interconnects Using uDAPL Interface Book Series Lecture Notes in Computer Science

Publisher Springer Berlin / Heidelberg ISSN0302-9743 Subject Computer Science Volume 3666/2005 Book

Recent Advances in Parallel Virtual Machine and Message Passing InterfaceDOI10.1007/11557265 Copyright 2005 ISBN 978-3-540-29009-4DOI10.1007/11557265_28

Pages200-208Online Date Sunday, October 02, 2005

http://www.springerlink.com/content/dx8uw0gdn09j/

“Evaluation with micro-benchmarks and applications on InfiniBand shows that the implementation with uDAPL performs comparably with that of MVAPICH2.”

Dr. Panda - OSU

Page 10: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

10www.openfabrics.org

uDAPL API Overview

Page 11: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

11www.openfabrics.org

DAT ObjectsAcronymns

CNO – consumer notification object CR – connection request EP – endpoint EVD – event dispatcher IA – interface adapter LMR – local memory region PSP – public service point PZ – protection zone RMR – remote memory region RSP – reserved service point

Page 12: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

12www.openfabrics.org

Local Resource MgmtInterface Adapter

Maps to a hardware port Referred to as the root object

Other objects are descendants of the interface adapter

Opened using names stored in a DAT registry provider list

Page 13: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

13www.openfabrics.org

Local Resource MgmtInterface Adapter

dat_ia_open Opens an interface adapter of a given name Specifies the size of the associated event

dispatcher queue

dat_ia_close Closes an interface adapter Can automatically cleanup all associated

resources

Page 14: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

14www.openfabrics.org

Local Resource MgmtInterface Adapter

dat_ia_query Returns attributes for the interface adapter

and the provider of the adapter Callers can specific which attributes to query

Page 15: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

15www.openfabrics.org

Local Resource MgmtConsumer Context

User-defined context may be associated with most DAT objects

Association of context is done separately from object allocation

Page 16: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

16www.openfabrics.org

Local Resource MgmtConsumer Context

dat_set_consumer_context Associates a user-defined context with a DAT

object

dat_get_consumer_context Returns the user’s context for a DAT object

dat_get_handle_type Indicates the type of DAT object associated

with a given handle

Page 17: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

17www.openfabrics.org

Event ManagementEvent Model

An event dispatcher delivers events Event streams feed into the event

dispatcher Order is maintained separately for each

stream Ordering between streams is defined by the

spec

Page 18: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

18www.openfabrics.org

Event ManagementEvent Model

Event streams exist for: Completions Connection requests Connection events

Establishment, disconnect, time-outs

Asynchronous errors User notifications

Page 19: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

19www.openfabrics.org

Event ManagementConsumer Notification Objects

Similar to an operating system event Limited to a single interface adapter Multiple event dispatchers may be mapped to a

single notification object CNOs may be associated with wait proxy agents

Wait proxy agents are signaled when the CNO is signaled

Multiple CNOs per wait agent is allowed Wait proxy agent implementation is OS-dependent

Page 20: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

20www.openfabrics.org

Event ManagementConsumer Notification Objects

dat_cno_create Creates a new notification object Associates an optional wait proxy agent with the CNO

dat_cno_free Destroys a CNO

dat_cno_wait Waits a specified time for an event to occur on an

event dispatcher associated with the CNO

Page 21: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

21www.openfabrics.org

Event ManagementConsumer Notification Objects

dat_cno_modify_agent Changes the wait proxy agent associated with

a CNO

dat_cno_query Returns user-specified information about the

CNO

Page 22: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

22www.openfabrics.org

Event ManagementEvent Dispatcher

dat_evd_create Creates a new event dispatcher User specifies the size and type of events

reported by the dispatcher Dispatcher may be associated with a CNO

dat_evd_free Destroys an event dispatcher

Page 23: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

23www.openfabrics.org

Event ManagementEvent Dispatcher

dat_evd_query Return user-specified information on the

event dispatcher

dat_evd_modify_cno Changes which CNO is associated with the

event dispatcher

Page 24: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

24www.openfabrics.org

Event ManagementEvent Dispatcher

dat_evd_enable Indicates that the event dispatcher should

trigger the associated CNO when an event occurs

dat_evd_disable Prevents the event dispatcher from signaling

its associated CNO when an event occurs

Page 25: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

25www.openfabrics.org

Event ManagementEvent Dispatcher

dat_evd_set_unwaitable Transitions the event dispatcher into an

unwaitable state Results in calls to dat_evd_wait failing Does not affect event generation or signaling

of the CNO

dat_evd_clear_unwaitable Allows waiting on the event dispatcher

Page 26: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

26www.openfabrics.org

Event ManagementEvent Dispatcher

dat_evd_resize Changes the size of the EVD

dat_evd_wait Allows the user to wait on the EVD for a

specific number of events to occur or until the wait times out

Returns event information, if available Returns the number of events still on the EVD

Page 27: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

27www.openfabrics.org

Event ManagementEvent Dispatcher

dat_evd_dequeue Retrieves information about an event, if

available, from the event dispatcher

dat_evd_post_se Posts a user-defined software event to the

event dispatcher

Page 28: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

28www.openfabrics.org

Connection ManagementOverview

Supports client/server and peer-to-peer connection models

Addresses to interface adapters are standard socket addresses struct sockaddr * struct sockaddr_net6 * Cast to DAT_IA_ADDRESS_PTR when used with

DAPL API Connection qualifiers are used to associate

connection requests with the providing service

Page 29: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

29www.openfabrics.org

Connection ManagementPublic Service Points

Allows consumers to listen for connection requests on a specified connection qualifier

Connection qualifier is advertised by a name service

Listen request is persistent Allows multiple connections Number of outstanding connections is limited

by associated EVD

Page 30: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

30www.openfabrics.org

Connection ManagementPublic Service Points

dat_psp_create Creates a public service point to listen for

incoming connection requests PSP is limited to a single interface adapter User specifies if endpoints should be

automatically created User or DAPL provider can allocate endpoints with

each connection request DAPL allocated endpoints are not associated with

protection zones or event dispatchers

Page 31: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

31www.openfabrics.org

Connection ManagementPublic Service Points

dat_psp_create_any Similar to dat_psp_create, except that the

DAPL provider selects the connection qualifier from the list of available qualifiers

dat_psp_free Cancels listening for connection requests on a

public service point dat_psp_query

Return user-specified information about the PSP

Page 32: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

32www.openfabrics.org

Connection ManagementReserve Service Points

Supports peer-to-peer connection requests Connection qualifiers are not advertised by a

name service Applications must determine connection qualifiers

beforehand

Only a single connection is established per reserved service point Can be used to create auxiliary connections

Page 33: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

33www.openfabrics.org

Connection ManagementReserve Service Points

dat_rsp_create Similar to dat_psp_create, except user

specifies local endpoint to use when establishing the connection

dat_rsp_free Cancels listening for connection requests on a

reserved service point dat_rsp_query

Return user-specified information about the RSP

Page 34: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

34www.openfabrics.org

Connection ManagementConnection Requests

Connection requests are given to the consumer through a connection request event

dat_cr_query Return user-specified information about a

connection request dat_cr_handoff

Modifies a connection request to a specified connection qualifier

Page 35: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

35www.openfabrics.org

Connection ManagementConnection Requests

dat_cr_accept Establishes a connection

Destroys connection request object

User specifies the endpoint, unless it is specified by the connection request

User may pass private data to the remote user

dat_cr_reject Rejects a connection request

Page 36: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

36www.openfabrics.org

EndpointsAllocation

dat_ep_create Creates an endpoint with specified attributes Endpoints belong within a protection zone User specifies event dispatchers for receive

completions, send completions, and connection events

dat_ep_free Releases an endpoint

Page 37: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

37www.openfabrics.org

EndpointsAllocation

dat_ep_get_status Returns an endpoint’s state and whether data

transfer operations are outstanding dat_ep_query

Returns user-specified information about an endpoint

dat_ep_modify Changes endpoint properties

Protection zone, event dispatchers, attributes

Page 38: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

38www.openfabrics.org

EndpointsConnections

dat_ep_connect Initiates a connection request to a specified remote

service Remote interface adapter and connection qualifier

dat_ep_dup_connect Establishes a new connection to the same remote

service specified by a prior connection Similar to calling dat_ep_connect using the same

remote interface adapter and connection qualifier

Page 39: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

39www.openfabrics.org

EndpointsConnections

dat_ep_disconnect Disconnects an endpoint Allows graceful and abrupt disconnections

dat_ep_reset Resets an endpoint to the unconnected state

Page 40: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

40www.openfabrics.org

EndpointsData Transfer

dat_ep_post_send dat_ep_post_recv dat_ep_post_rdma_read dat_ep_post_rdma_write

Page 41: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

41www.openfabrics.org

Memory ManagementProtection Zone

dat_pz_create Allocates a protection zone on an interface

adapter Associates endpoints with local and remote

memory regions

dat_pz_free dat_pz_query

Page 42: Www.openfabrics.org uDAPL development Arkady Kanevsky – Network Appliance James Lentini – Network Appliance Arlin Davis – Intel

42www.openfabrics.org

Memory ManagementLocal/Remote Memory Region

dat_lmr_create Registers memory with a protection zone for use with

an endpoint dat_rmr_create

Allocates a remote memory region within a protection zone No memory is associated with the RMR until bound

dat_lmr_free / dat_rmr_free dat_lmr_query / dat_rmr_free dat_rmr_bind

Enables a section of an LMR for remote access