Analysing Real-Time Behaviour of Collective Communication ... Analysing Real-Time Behaviour of Collective

  • View
    0

  • Download
    0

Embed Size (px)

Text of Analysing Real-Time Behaviour of Collective Communication ... Analysing Real-Time Behaviour of...

  • Analysing Real-Time Behaviour of Collective Communication Patterns in MPI

    Alexander Stegmeier, Martin Frieb, Jörg Mische, Theo Ungerer

    University of Augsburg, Germany

    26th International Conference on Real-Time Networks and Systems

    11 October 2018

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 1

  • Motivation

    I increase in performance needs for real-time applications I multicore analysis with shared memory difficult

    I apply manycores with I Network-on-Chip (NoC) I local memory per node I explicit message passing

    I message passing interface (MPI) I standarad programming model

    I special focus on collective communication I programming similar to Bulk Synchronous Parallel

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 2

  • Outline

    Motivation

    Basic Knowledge

    Analysis

    Evaluation

    Conclusion

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 3

  • MPI Collectives

    Communication Structure

    I based on a central node (MPI Bcast, MPI Gather, . . . ) I communication along tree structures I investigated structures:

    I pipeline, chains, binary tree, binomial tree

    I uniform data exchange (MPI Allgather, MPI Barrier, . . . ) I based on point-to-point communication I investigated structures:

    I ring, recursive doubling, neighbour exchange, bruck

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 4

    0 1 2 3 4 5 6 7

    01 01 23 23 45 45 67 67

    0123 0123 0123 0123 4567 4567 4567 4567

    01234567 01234567 01234567 01234567 01234567 01234567 01234567 01234567

  • Time-Division Multiplexing

    I time-division multiplexing (TDM) for message scheduling I fixed time slots for sending I prevents conflicts between delivered flits I enables upper bounds for releasing and transporting flits

    I WCTT for TDM:

    WCTT = ta + tt ta: admission time tt : transportation time

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 5

  • Timing Analysis

    Analysis flow

    1. investigation of internal structure I separation of code execution and data transfer I send/receive operations as boundaries

    2. analysis of components (WCET, WCTT)

    3. combination regarding to communication pattern

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 6

  • Analysis issues

    Boundary between WCET and WCTT

    n0 ss0

    WCETs ta

    (a) send driven by ta

    n0 ss0

    WCETs ta

    (b) send driven by WCETs

    (f − 1) ·max(WCETs , ta) + WCETs + ta

    I similar for receive

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 7

  • Analysis issues

    Dispatch along multiple nodes

    I multiple options to accumulate times I identify longest path in terms of time

    I three candidates for longest path

    t

    t

    t

    (a) send operation takes longest time

    t

    t

    t

    (b) receive operation takes longest time

    t

    t

    t

    (c) receive and forward takes longest time

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 8

  • Analysis issues

    Dispatch along multiple nodes

    I multiple options to accumulate times I identify longest path in terms of time

    I three candidates for longest path

    t

    t

    t

    (a) send operation takes longest time

    t

    t

    t

    (b) receive operation takes longest time

    t

    t

    t

    (c) receive and forward takes longest time

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 8

  • Analysis issues

    Dispatch along multiple nodes

    I multiple options to accumulate times I identify longest path in terms of time

    I three candidates for longest path

    t

    t

    t

    (a) send operation takes longest time

    t

    t

    t

    (b) receive operation takes longest time

    t

    t

    t

    (c) receive and forward takes longest time

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 8

  • Analysis issues

    Dispatch along multiple nodes

    I multiple options to accumulate times I identify longest path in terms of time

    I three candidates for longest path

    t

    t

    t

    (a) send operation takes longest time

    t

    t

    t

    (b) receive operation takes longest time

    t

    t

    t

    (c) receive and forward takes longest time

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 8

  • Analysis issues

    Dispatch along multiple nodes

    I multiple options to accumulate times I identify longest path in terms of time

    I three candidates for longest path

    t

    t

    t

    (a) send operation takes longest time

    t

    t

    t

    (b) receive operation takes longest time

    t

    t

    t

    (c) receive and forward takes longest time

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 8

  • Analysis issues

    Consideration of communication pattern

    I treatment of tree structures I occurance of leaf at different tree levels

    I sending procedure for nodes with multiple children I deepest sub tree first

    I options for longest path regarding time I early forwarding + delivery along long sub tree I late forwarding + delivery along short sub tree

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 9

  • Applying the procedure

    Illustration with example

    I broadcast to 5 nodes I message contains f flits I chain pattern with 2 chains

    0

    1

    3

    2

    4

    5

    Communication details 0

    1 2 3 4

    5 t

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 10

  • Applying the procedure

    Boundaries between WCET/WCTT

    issue: n0 ss0

    WCETs ta

    n0 ss0

    WCETs ta

    resulting timing:

    Ws = (chi − 1) · max(WCETs , ta) + WCETs + ta (1)

    Wsr = (chi − 1) · max(WCETsr , ta) + WCETsr + ta (2)

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 11

    0

    1

    3

    2

    4

    5

    0

    1 2 3 4

    5 t

  • Applying the procedure

    Delivery along multiple nodes

    I consideration of 1 flit Wf = Ws + l · Wsr + (l + 1) · tt + WCETr (3)

    I consideration of f flits Wa = f · Ws + l · Wsr + (l + 1) · tt + WCETr (4)

    Wb = Ws + l · Wsr + (l + 1) · tt + f · WCETr (5)

    Wc = Ws + f · Wsr + (l − 1) · Wsr + (l + 1) · tt + WCETr (6)

    Wchain = max(Wa,Wb ,Wc) (7)

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 12

    0

    1

    3

    2

    4

    5

    0

    1 2 3 4

    5 t

    Ws = (chi − 1) ·max(WCETs , ta) + WCETs + ta (1)

    Wsr = (chi − 1) ·max(WCETsr , ta) + WCETsr + ta (2)

  • Applying the procedure

    Delivery along multiple nodes

    I consideration of 1 flit Wf = Ws + l · Wsr + (l + 1) · tt + WCETr (3)

    I consideration of f flits Wa = f · Ws + l · Wsr + (l + 1) · tt + WCETr (4)

    Wb = Ws + l · Wsr + (l + 1) · tt + f · WCETr (5)

    Wc = Ws + f · Wsr + (l − 1) · Wsr + (l + 1) · tt + WCETr (6)

    Wchain = max(Wa,Wb ,Wc) (7)

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 12

    0

    1

    3

    2

    4

    5

    0

    1 2 3 4

    5 t

    Ws = (chi − 1) ·max(WCETs , ta) + WCETs + ta (1)

    Wsr = (chi − 1) ·max(WCETsr , ta) + WCETsr + ta (2)

    t

    t

    t

  • Applying the procedure

    Delivery along multiple nodes

    I consideration of 1 flit Wf = Ws + l · Wsr + (l + 1) · tt + WCETr (3)

    I consideration of f flits Wa = f · Ws + l · Wsr + (l + 1) · tt + WCETr (4)

    Wb = Ws + l · Wsr + (l + 1) · tt + f · WCETr (5)

    Wc = Ws + f · Wsr + (l − 1) · Wsr + (l + 1) · tt + WCETr (6)

    Wchain = max(Wa,Wb ,Wc) (7)

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 12

    0

    1

    3

    2

    4

    5

    0

    1 2 3 4

    5 t

    Ws = (chi − 1) ·max(WCETs , ta) + WCETs + ta (1)

    Wsr = (chi − 1) ·max(WCETsr , ta) + WCETsr + ta (2)

    t

    t

    t

  • Applying the procedure

    Delivery along multiple nodes

    I consideration of 1 flit Wf = Ws + l · Wsr + (l + 1) · tt + WCETr (3)

    I consideration of f flits Wa = f · Ws + l · Wsr + (l + 1) · tt + WCETr (4)

    Wb = Ws + l · Wsr + (l + 1) · tt + f · WCETr (5)

    Wc = Ws + f · Wsr + (l − 1) · Wsr + (l + 1) · tt + WCETr (6)

    Wchain = max(Wa,Wb ,Wc) (7)

    2018-10-11 Alexander Stegmeier et al. / Real-Time Analysis of Collective Operations 12