Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency...

Not All Microseconds are Equal:Fine-Grained Per-Flow Measure-

ments with Reference Latency Inter-polation

Myungjin Lee†, Nick Duffield‡, Ramana Rao Kompella†

†Purdue University, ‡AT&T Labs–Research

Low-latency applications

Several new types of applications require ex-tremely low end-to-end latency Algorithmic trading applications in financial data

center networks High performance computing applications in data

center networks Storage applications

Low latency cut-through switches Arista 7100 series Woven EFX 1000 series

… … … …

ToR S/W

Edge Router

Core Router

…Need for high-fidelity measurements

At every router, high-fidelity measurements are critical to localize root causes

Once root cause localized, operators can fix by rerouting traffic, upgrade links or perform detailed diagnosis

Which router causes the prob-

Router

Measurement within a router is

necessary

Measurement solutions today

SNMP and NetFlow No latency measurements

Active probes Typically end-to-end, do not localize the root cause

Expensive high-fidelity measurement box Corvil boxes (£ 90,000): used by London Stock Exchange Cannot place these boxes ubiquitously

Lossy Difference Aggregator (LDA) [Kompella, SIG-COMM’09] Provides average latency and variance at high-fidelity

within a switch Provides a good start but may not be sufficient to diag-

nose flow-specific problems

Motivation for per-flow measurements

Key observation: Significant amount of difference in average latencies across flows at a router

Average la-tency

Measurement period

Large de-lay

Small de-lay

Outline of the rest of talk

Measurement model

Alternative approaches

Intuition behind our approach: Delay locality

Our architecture: Reference Latency Interpola-tion (RLI)

Evaluation

Measurement model

Assumption: Time synchronization between router interfaces Constraint: Cannot modify regular packets to carry time-

stamps Intrusive changes to the routing forwarding path Extra bandwidth consumption up to 10% capacity

Router

Ingress I

Egress E

Naïve approach

For each flow key, Store timestamps for each packet at I and E After a flow stops sending, I sends the packet timestamps

to E E computes individual packet delays E aggregates average latency, variance, etc for each flow

Problem: High communication costs At 10Gbps, few million packets per second Sampling reduces communication, but also reduces accu-

Ingress I Egress E

− =2023

2232Avg. delay = 22/2

= 11Avg. delay = 32/2 = 16

−+ −

A (naïve) extension of LDA

Maintaining LDAs with many counters for flows of interest

Problem: (Potentially) high communication costs Proportional to the number of flows

Ingress I Egress E

Packet

Sum of time-

stamps

Coordina-tion

Per-flow la-tency

Key observation: Delay locality

LocaTrue mean delay = W(D1 + WD2 + WD3) / 3

Localized mean delay = (WD1 + WD2 + WD3) / 3

WD1 WD3WD2

How close is localized mean delay to

true mean delay as window size varies?

Key observation: Delay locality

True Mean delay per key / ms

Global Mean

0.1ms: RMSRE=0.054

10ms: RMSRE=0.16

1s: RMSRE=1.72

Data sets from real router and synthetic queueing model

Exploiting delay locality

Reference packets are injected regularly at the ingress I Special packets carrying ingress timestamp Provide some reference delay samples Used to approximate the latencies of regular packets

ReferencePacket

IngressTimestamp

RLI architecture

Component 1: Reference Packet generator Injects reference packets regularly

Component 2: Latency Estimator Estimates packet latencies and updates per-flow statis-

tics Estimates directly at the egress with no extra state main-

tained at ingress side (reduces storage and communica-tion overheads)

Egress E

Ingress I

1) ReferencePacket

Generator

2) LatencyEstimator

123 123

IngressTimestamp

Component 1: Reference packet generator

Question: When to inject a reference packet ?

Idea 1: 1-in-n: Inject one reference packet every n packets Problem: low accuracy under low utilization

Idea 2: 1-in-τ: Inject one reference packet every τ seconds Problem: bad in case where short-term delay variance is

high Our approach: Dynamic injection based on utilization

High utilization low injection rate Low utilization high injection rate Adaptive scheme works better than fixed rate schemes

Component 2: Latency estimator

Question 1: How to estimate latencies using reference packets

Solution: Different estimators possible Use only the delay of a left reference packet (RLI-L) Use linear interpolation of left and right reference packets (RLI) Other non-linear estimators possible (e.g., shrinkage)

LInterpolated

Error indelay estimate

RegularPacket

ReferencePacket

Linear interpolationline

Arrival time is known

Arrival time and delay are

Estimateddelay

Error indelay estimate

Component 2: Latency estimatorFlowkey C1 C2 C3

8 11 39

Interpolation buffer

Estimate

Avg. latency = C2 / C1

Right Reference Packet arrived

When a flow is ex-

ported

Question 2: How to compute per-flow latency statistics Solution: Maintain 3 counters per flow at the egress side

C1: Number of packets C2: Sum of packet delays C3: Sum of squares of packet delays (for estimating variance) To minimize state, can use any flow selection strategy to main-

tain counters for only a subset of flows

Flow Key4 51Delay

Square of de-

Update

Any flow se-lection

strategy

Up-date

Selection

Experimental environment

Data sets No public data center traces with timestamps Real router traces with synthetic workloads: WISC Real backbone traces with synthetic queueing: CHIC

and SANJ

Simulation tool: Open source NetFlow software – YAF Supports reference packet injection mechanism Simulates a queueing model with RED active queue

management policy

Experiments with different link utilizations

Accuracy of RLI under high link utilization

Relative error

Median relative erroris 10-12%

Comparison with other solutions

Utilization

tive e

Packet sampling rate = 0.1%

1-2 orders of magnitude dif-

ference

Overhead of RLI

Bandwidth overhead is low less than 0.2% of link capacity

Impact to packet loss is small Packet loss difference with and without RLI is at

most 0.001% at around 80% utilization

Summary

A scalable architecture to obtain high-fidelity per-flow latency measurements between router interfaces

Achieves a median relative error of 10-12%

Shows 1-2 orders of magnitude lower relative error compared to existing solutions

Measurements are obtained directly at the egress side

Future work: Per-packet diagnosis

Thank you! Questions?

Backup

Comparison with other solutions

Relative error

Bandwidth overhead

Utilization

Interference with regular traffic

Per-flow delay interference (seconds)

tive f

Impact to packet lossesLo

Utilization

Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency...

Documents

1 MPLS In Perspective Kireeti Kompella Distinguished Engineer Juniper Networks

MP-4147 · Universal Counter. Find the bandwidth ofaCROif a signal having 17 microseconds rise time is 10 displaced with 21 microseconds rise time on the CRO. 7. Writeshort notes

Presentation Title2011 Annual General Meeting - ASX · ASX Trade - Latency • ASX Trade order execution latency has improved from 70,000 microseconds to 300 microseconds 0.0 0.5

VTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core Embedded Lab. Kim Sewoog Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella,

Myungjin, living in Seoul, Korea and Rahul living in …...Myungjin, living in Seoul, Korea and Rahul living in New Dehli, India no doubt lead two totally different lives. Except for

RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao

Smart City Promotion Policy and Strategy Lee Myungjin

Introduction to Genetic Algorithm - NIORDCkarafarin.niordc.ir/uploads/94_79_ga.pdf§§Genetic algorithmGenetic algorithm -- few microseconds after few microseconds after introduction

Microelectromechanical Systems (MEMS) - Wiley- · PDF fileMicroelectromechanical systems (MEMS) refer to a collection of microseconds and actuators that can sense its environment and

MCF52211 ColdFire Microcontroller MCF52210 / 52211 / …— Output pulse-widths variable from microseconds to seconds — Single 16-bit input pulse accumulator — Toggle-on-overflow

VSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu Department

Chapter 3 - 1 Secondary Storage Rough Speed Differentials –nanoseconds: retrieve data in main memory –microseconds: retrieve from disk cache or under a

Microseconds, milliseconds and seconds: deconvoluting the

Copyright © 2004 Juniper Networks, Inc. 1 Operational Aspects of Virtual Private LAN Service Kireeti Kompella

Fall 2014 Virtual Memory, Page Faults, Demand Paging… · Operating Systems Fall 2014 Virtual Memory, Page Faults, Demand Paging, and Page Replacement Myungjin Lee myungjin.lee@ed.ac.uk

Atomistic Simulations of Activated Processes in Materialstheory.cm.utexas.edu/henkelman/pubs/henkelman17_199.pdfdynamics (MD) simulation is limited to microseconds. When the electronic

Context Models and Out-of-context Objectspeople.csail.mit.edu/myungjin/publications/outOfContext.pdf · The context encapsulates rich information about how nat-ural scenes and objects

Myungjin Shin Walter Jin Shintera.yonsei.ac.kr/publication/pdf/졸업논문-신명진.pdf · 2017. 3. 16. · Modeling Depletion-Type Silicon Micro-Ring Modulator Considering Its

Improved Superalloy Grinding Performance with Novel CBN ... · IMPROVED SUPERALLOY GRINDING PERFORMANCE WITH NOVEL CBN CRYSTALS Sridhar Kompella and Kai Zhang Diamond Innovations,

JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]