17
Quantized Congestion Control (QCN) Experiences with Ethernet and RoCE Liran Liss Mellanox Technologies

Quantized Congestion Control (QCN) Experiences with Ethernet … · 2016. 2. 21. · – Provides congestion control for long-lived flows in limited BW-delay product networks –

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Quantized Congestion Control (QCN) Experiences with Ethernet and RoCE Liran Liss Mellanox Technologies

  • Agenda

    • Introduction • Why QCN? • Data plane • Management • QCN in ConnectX-3 and SwitchX • Initial results • Conclusions

    © 2013 Mellanox Technologies 2

  • QCN Introduction

    • IEEE standard (802.1Qau) – Provides congestion control for long-lived flows in limited

    BW-delay product networks – Part of the Data Center Bridging (DCB) protocol suite

    • ETS, PFC, QCN, DCBX

    • Conducted at L2 – Similar to IB FECN/BECN congestion management – Independent from L3 congestion control

    • E.g., ECN in TCP/DCTCP

    • Suitable for HW implementations – Simple – Minimal state

    © 2013 Mellanox Technologies 3

  • Why QCN?

    • Prevent congestion spreading – Mitigate victim flows – Important when PFC is used

    • Control switch buffer utilization – Less impact on other flows

    • Handle bursts better • Mitigate incast

    – Improve latency

    © 2013 Mellanox Technologies 4

    F A B C D E

    Y

    X

    Zzz

  • Tailored for DCB Environments

    • Diverse traffic patterns – Partition-aggregate incast – Long running BW-hungry flows – IPC

    • Diverse traffic classes – Best effort lossy traffic – Critical lossless traffic

    © 2013 Mellanox Technologies 5

  • Data Plane

    © 2013 Mellanox Technologies 6

    NIC NIC Switch

    Reaction Point (RP)

    Congestion Point (CP)

    Congestion Controlled Flow (CCF)

    Congestion Notification Messages

    (CNMs)

    Ethernet subnet

  • Congestion Point Behavior

    • Congestion feedback computed from Switch buffer state – Takes into account both the

    degree and trajectory of congestion

    • Feedback sent directly to RP

    • Statistical sampling – Sampling probability

    increases with congestion

    © 2013 Mellanox Technologies 7

    10%

    1% |Fb|

    Reflection probability

    Qeq Q

    Qoff

    Fb = -(Qoff + wQdelta)

    Qdelta = Q - Qold

  • Reaction Point Behavior

    © 2013 Mellanox Technologies 8

    Rate

    Time

    Target Rate (TR)

    Current Rate (CR)

    RAI

    RHAI

    Fast Recovery

    Active Increase

    Hyper Active Increase

    CR1/2(CR+TR) TRTR+RAI CR1/2(CR+TR)

    TRTR+i*RAI CR1/2(CR+TR)

    CNM received

    Rate increases due to timer or byte-count thresholds

    Move to AI after 5 rate increase cycles

    Move to HAI after both timer and byte-counter completed 5 cycles

    TRCR CRCR(1-Gd|Fb|)

  • Flow Segregation

    • Up to 7 Congestion Notification Priority Values (CNPVs) – At least one User Priority dedicated for best effort traffic

    • An endpoint may support multiple RPs per CNPV – Grouped into Reaction Point Groups (RPGs) for configuration

    • Congestion Controlled Flows (CCFs) determined solely by the RP – By Application – By DMAC – By 5-tuple hash – By QP

    • Optional Flow ID may be injected by RP using Congestion Notification tags (CNtag) – Reflected back in CNMs

    © 2013 Mellanox Technologies 9

  • CNM Frame Format

    © 2013 Mellanox Technologies 10

    Size 6B 6B 2B 2B 2B 2B 2B 2B 8B 2B 2B 2B 6B 2B 0-64B

    4b 6b 6b

    Field

    DA SA

    ET = 0x8100

    UP + CFI + VID

    ET = 0x22E9

    Flow ID

    ET = 0x22E7

    Ver

    Rsvd

    Fb CPID

    QO

    ffset

    QDelta

    Encap UP

    Encap DA

    MSDU

    len

    MSDU

    VLAN tag (optional)

    CN tag (optional)

    Congestion Notification Message (CNM)

  • QCN Management

    • Congestion Notification Domains (CNDs) – A connected subset of switches and endpoints

    configured to support a CNPV • CNDs must be defended

    – Accomplished by remapping CNPVs to non-CNPV priorities

    • End to end configuration required – May be configured manually or by DCBX

    © 2013 Mellanox Technologies 11

  • QCN in ConnectX-3 and SwitchX

    • ConnectX-3 – HW RP implementation – Reaction to CNMs within usecs – Operation transparent to SW

    • A must for offloaded protocols such as RoCE – Thousands of RPs

    • SwitchX – CP Active queue management

    • Ethernet driver – Multiple flow queues per priority – Selected based on L2/3/4 hash

    • RoCE driver – Each QP constitutes its own flow queue

    © 2013 Mellanox Technologies 12

  • Initial Testing

    • Experimental setup – 2 ConnectX-3 HCAs – SX1036 switch (SwitchX)

    • Manual CND configuration • Traffic pattern

    – Both ports of requestor HCA target the same responder port

    – Tested the dynamic response to congestion

    © 2013 Mellanox Technologies 13

    ConnectX-3 initiator

    SwitchX (SX1036)

    ConnectX-3 responder

  • QCN Convergence

    © 2013 Mellanox Technologies 14

    0

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    8000

    9000

    10000

    0 5 10 15 20 25 30 35 40 45

    Flow 1

    Flow 2

    Aggregate BW

    Time [ms]

    BW

    [Gbp

    s]

    Flow 2 added

    Flow 2 removed

    < 2ms

  • Observations

    • Rapid convergence – Under 2ms

    • Stable – Small oscillations – Negligible underflow

    • >99% utilization of congested link

    – No overflows in steady state • No pause frames while enabling PFC/global-pause • No packet loss while disabling pause

    © 2013 Mellanox Technologies 15

  • Conclusions

    • QCN offers important benefits to DCB environments – Lossless and lossy flows can happily live together

    • Efficient HW implementation in ConnectX-3 and SwitchX – Promising initial results

    • Next steps – Multi-flow behavior – Complex setups – Measure real application benefits

    © 2013 Mellanox Technologies 16

  • Thank You

    © 2013 Mellanox Technologies

    Quantized Congestion Control (QCN) Experiences with Ethernet and RoCEAgendaQCN IntroductionWhy QCN?Tailored for DCB EnvironmentsData PlaneCongestion Point BehaviorReaction Point BehaviorFlow SegregationCNM Frame FormatQCN ManagementQCN in ConnectX-3 and SwitchXInitial TestingQCN ConvergenceObservationsConclusionsThank You