73
1 Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015

Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

1

Tales of the TailHardware, OS, and Application-level

Sources of Tail Latency

Jialin Li, Naveen Kr. Sharma,Dan R. K. Ports and Steven D. Gribble

February 2, 2015

Page 2: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

2

Introduction What is Tail Latency?

What is Tail Latency?

In Facebook’s Memcached deployment,

Median latency is 100µs, but 95th percentile latency ≥ 1ms.

In this talk, we will explore

Why some requests take longer than expected?

What causes them to get delayed?

Page 3: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

2

Introduction What is Tail Latency?

What is Tail Latency?

Request Processing Time

Fraction of Requests

In Facebook’s Memcached deployment,

Median latency is 100µs, but 95th percentile latency ≥ 1ms.

In this talk, we will explore

Why some requests take longer than expected?

What causes them to get delayed?

Page 4: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

2

Introduction What is Tail Latency?

What is Tail Latency?

Request Processing Time

Fraction of Requests

In Facebook’s Memcached deployment,

Median latency is 100µs, but 95th percentile latency ≥ 1ms.

In this talk, we will explore

Why some requests take longer than expected?

What causes them to get delayed?

Page 5: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

2

Introduction What is Tail Latency?

What is Tail Latency?

Request Processing Time

Fraction of Requests

In Facebook’s Memcached deployment,

Median latency is 100µs, but 95th percentile latency ≥ 1ms.

In this talk, we will explore

Why some requests take longer than expected?

What causes them to get delayed?

Page 6: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

2

Introduction What is Tail Latency?

What is Tail Latency?

Request Processing Time

Fraction of Requests

In Facebook’s Memcached deployment,

Median latency is 100µs, but 95th percentile latency ≥ 1ms.

In this talk, we will explore

Why some requests take longer than expected?

What causes them to get delayed?

Page 7: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

3

Introduction What is Tail Latency?

Why is the Tail important?

Low latency is crucial for interactive services.

500ms delay can cause 20% drop in user traffic. [Google Study]

Latency is directly tied to traffic, hence revenue.

What makes it challenging is today’s datacenter workloads.

Interactive services are highly parallel.

Single client request spawns thousands of sub-tasks.

Overall latency depends on slowest sub-task latency.

Bad Tail ⇒ Probability of any one sub-task getting delayed is high.

Page 8: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

3

Introduction What is Tail Latency?

Why is the Tail important?

Low latency is crucial for interactive services.

500ms delay can cause 20% drop in user traffic. [Google Study]

Latency is directly tied to traffic, hence revenue.

What makes it challenging is today’s datacenter workloads.

Interactive services are highly parallel.

Single client request spawns thousands of sub-tasks.

Overall latency depends on slowest sub-task latency.

Bad Tail ⇒ Probability of any one sub-task getting delayed is high.

Page 9: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

4

Introduction What is Tail Latency?

A real-life example

Nishtala et. al. Scaling memcache at Facebook, NSDI 2013.

All requests have to finish within the SLA latency.

Page 10: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

4

Introduction What is Tail Latency?

A real-life example

Nishtala et. al. Scaling memcache at Facebook, NSDI 2013.

All requests have to finish within the SLA latency.

Page 11: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

5

Introduction What is Tail Latency?

What can we do?

People in industry have worked hard on solutions.

Hedged Requests [Jeff Dean et. al.]

Effective sometimes, but adds application specific complexity.

Intelligently avoid slow machines

Keep track of server status; route requests around slow nodes.

Attempts to build predictable response out of less predictable parts.

We still don’t know what is causing requests to get delayed.

Page 12: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

5

Introduction What is Tail Latency?

What can we do?

People in industry have worked hard on solutions.

Hedged Requests [Jeff Dean et. al.]

Effective sometimes, but adds application specific complexity.

Intelligently avoid slow machines

Keep track of server status; route requests around slow nodes.

Attempts to build predictable response out of less predictable parts.

We still don’t know what is causing requests to get delayed.

Page 13: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

6

Introduction What is Tail Latency?

Our Approach

1 Pick some real life applications: RPC Server, Memcached, Nginx.

2 Generate the ideal latency distribution.

3 Measure the actual distribution on a standard Linux server.

4 Identify a factor causing deviation from ideal distribution.

5 Explain and mitigate it.

6 Iterate over this till we reach the ideal distribution.

Page 14: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

7

Introduction What is Tail Latency?

Rest of the Talk

1 Introduction

2 Predicted Latency from Queuing Models

3 Measurements: Sources of Tail Latencies

4 Summary

Page 15: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 16: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 17: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 18: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

Server

Clients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 19: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 20: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 21: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 22: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

8

Predicted Latency from Queuing Models Ideal latency distribution

What is the ideal latency for a network server?

Ideal baseline for comparing measured performance.

Assume a simple model, and apply queuing theory.

ServerClients

Given the arrival distribution and request processing time,

We can predict the time spent by a request in the server.

Page 23: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Dummy

Page 24: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Dummy

Page 25: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

99th percentile ⇒ 60 µs

Page 26: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

99.9th percentile ⇒ 200 µs

Page 27: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Dummy

Page 28: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Dummy

Page 29: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Dummy

Page 30: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Inherent tail latency due to request burstiness.

Page 31: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Tail latency depends on the average server utilization.

Page 32: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

9

Predicted Latency from Queuing Models Tail latency characteristics

What is the ideal latency distribution?

Assume a server with single worker with 50 µs fixed processing time.

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Distribution 1

Distribution 2

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Uniform Request Arrival

Poisson at 70% Utilization

Poisson at 90% Utilization

Poisson at 70% - 4 workers

Additional workers can reduce tail latency, even at constant utilization.

Page 33: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

10

Measurements: Sources of Tail Latencies

1 Introduction

2 Predicted Latency from Queuing Models

3 Measurements: Sources of Tail Latencies

4 Summary

Page 34: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

11

Measurements: Sources of Tail Latencies

Testbed

Cluster of standard datacenter machines.

2 x Intel L5640 6 core CPU

24 GB of DRAM

Mellanox 10Gbps NIC

Ubuntu 12.04, Linux Kernel 3.2.0

All servers connected to a single 10 Gbps ToR switch.

One server runs Memcached, others run workload generating clients.

Other application results are in the paper.

Page 35: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

12

Measurements: Sources of Tail Latencies

Timestamping Methodology

Append a blank buffer ≈ 32 bytes to each request.

Overwrite buffer with timestamps as it goes through the server.

Incoming

Server NIC

Memcached

read() return

Outgoing

Server NIC

Memcached

write()

After TCP/UDP

processing

Memcached thread

scheduled on CPU

Very low overhead and no server side logging.

Page 36: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

13

Measurements: Sources of Tail Latencies

How far are we from the ideal?

Page 37: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

14

Measurements: Sources of Tail Latencies

How far are we from the ideal?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

Single CPU, single core, Memcached running at 80% utilization.

Page 38: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

14

Measurements: Sources of Tail Latencies

How far are we from the ideal?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

Single CPU, single core, Memcached running at 80% utilization.

Page 39: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

14

Measurements: Sources of Tail Latencies

How far are we from the ideal?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

Single CPU, single core, Memcached running at 80% utilization.

Page 40: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

15

Measurements: Sources of Tail Latencies

Rest of the talk

Source of Tail Latency Potential way to fix

Background Processes

Multicore Concurrency

Interrupt Processing

Page 41: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

15

Measurements: Sources of Tail Latencies

Rest of the talk

Source of Tail Latency Potential way to fix

Background Processes

Multicore Concurrency

Interrupt Processing

Page 42: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

16

Measurements: Sources of Tail Latencies

How can background processes affect tail latency?

Memcached threads time-share a CPU core with other processes.

We need to wait for other processes to relinquish CPU.

Scheduling time-slices are usually couple of milliseconds.

How can we mitigate it?

Raise priority (decrease niceness) ⇒ More CPU time.

Upgrade scheduling class to real-time ⇒ Pre-emptive power.

Run on a dedicated core ⇒ No interference what-so-ever.

Page 43: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

16

Measurements: Sources of Tail Latencies

How can background processes affect tail latency?

Memcached threads time-share a CPU core with other processes.

We need to wait for other processes to relinquish CPU.

Scheduling time-slices are usually couple of milliseconds.

How can we mitigate it?

Raise priority (decrease niceness) ⇒ More CPU time.

Upgrade scheduling class to real-time ⇒ Pre-emptive power.

Run on a dedicated core ⇒ No interference what-so-ever.

Page 44: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

17

Measurements: Sources of Tail Latencies

Impact of Background Processes

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10x

Ideal Model

Standard Linux

Maximum Priority

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

Dedicated CoreInterference from background processes hasa large effect on the tail.

Single CPU, single core, Memcached running at 80% utilization.

Page 45: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

17

Measurements: Sources of Tail Latencies

Impact of Background Processes

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10x

Ideal Model

Standard Linux

Maximum Priority

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

Dedicated CoreInterference from background processes hasa large effect on the tail.

Single CPU, single core, Memcached running at 80% utilization.

Page 46: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

17

Measurements: Sources of Tail Latencies

Impact of Background Processes

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10x

Ideal Model

Standard Linux

Maximum Priority

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

Dedicated CoreInterference from background processes hasa large effect on the tail.

Single CPU, single core, Memcached running at 80% utilization.

Page 47: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

17

Measurements: Sources of Tail Latencies

Impact of Background Processes

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10x

Ideal Model

Standard Linux

Maximum Priority

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

Dedicated Core

Interference from background processes hasa large effect on the tail.

Single CPU, single core, Memcached running at 80% utilization.

Page 48: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

17

Measurements: Sources of Tail Latencies

Impact of Background Processes

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

30x

Ideal Model

Standard Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

10x

Ideal Model

Standard Linux

Maximum Priority

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

Ideal Model

Standard Linux

Maximum Priority

Realtime Scheduling

Dedicated CoreInterference from background processes hasa large effect on the tail.

Single CPU, single core, Memcached running at 80% utilization.

Page 49: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

18

Measurements: Sources of Tail Latencies

Source of Tail Latency Potential way to fix

Background Processes Isolate by running on a dedicated core.

Multicore Concurrency

Interrupt Processing

Page 50: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

18

Measurements: Sources of Tail Latencies

Source of Tail Latency Potential way to fix

Background Processes Isolate by running on a dedicated core.

Multicore Concurrency

Interrupt Processing

Page 51: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

19

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

1 core Ideal Model

4 core Ideal Model

1 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 52: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

19

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

1 core Ideal Model

4 core Ideal Model

1 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 53: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

19

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

1 core Ideal Model

4 core Ideal Model

1 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 54: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

19

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

1 core Ideal Model

4 core Ideal Model

1 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 55: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

19

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

3x

1 core Ideal Model

4 core Ideal Model

1 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

1 core Ideal Model

4 core Ideal Model

1 core Linux

4 core Linux

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 56: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

20

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

Yes it does! Provided we maintain a single queue abstraction.

Memcached partitions requests statically among threads.

ServerClients

Ideal Model

ServerClients

Memcached Architecture

How can we mitigate it?

Modify Memcached concurrency model to use a single queue.

Page 57: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

20

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

Yes it does! Provided we maintain a single queue abstraction.

Memcached partitions requests statically among threads.

ServerClients

Ideal Model

ServerClients

Memcached Architecture

How can we mitigate it?

Modify Memcached concurrency model to use a single queue.

Page 58: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

20

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

Yes it does! Provided we maintain a single queue abstraction.

Memcached partitions requests statically among threads.

ServerClients

Ideal Model

ServerClients

Memcached Architecture

How can we mitigate it?

Modify Memcached concurrency model to use a single queue.

Page 59: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

20

Measurements: Sources of Tail Latencies

Does adding more CPU cores improve tail latency?

Yes it does! Provided we maintain a single queue abstraction.

Memcached partitions requests statically among threads.

ServerClients

Ideal Model

ServerClients

Memcached Architecture

How can we mitigate it?

Modify Memcached concurrency model to use a single queue.

Page 60: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

21

Measurements: Sources of Tail Latencies

Impact of Multicore Concurrency Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

For multi-threaded applications, a singlequeue abstraction can reduce tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 61: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

21

Measurements: Sources of Tail Latencies

Impact of Multicore Concurrency Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

For multi-threaded applications, a singlequeue abstraction can reduce tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 62: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

21

Measurements: Sources of Tail Latencies

Impact of Multicore Concurrency Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

For multi-threaded applications, a singlequeue abstraction can reduce tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 63: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

21

Measurements: Sources of Tail Latencies

Impact of Multicore Concurrency Model

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

15x

4 core Ideal Model

1 core Linux

4 core Linux

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

1 core Linux

4 core Linux

4 core Linux - Single Queue

For multi-threaded applications, a singlequeue abstraction can reduce tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 64: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

22

Measurements: Sources of Tail Latencies

Source of Tail Latency Potential way to fix

Background Processes Isolate by running on a dedicated core.

Concurrency Model Ensure a single queue abstraction.

Interrupt Processing

Page 65: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

22

Measurements: Sources of Tail Latencies

Source of Tail Latency Potential way to fix

Background Processes Isolate by running on a dedicated core.

Concurrency Model Ensure a single queue abstraction.

Interrupt Processing

Page 66: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

23

Measurements: Sources of Tail Latencies

How can interrupts affect tail latency?

By default, Linux irqbalance spreads interrupts across all cores.

OS pre-empts Memcached threads frequently.

Introduces extra context switching overheads and cache pollution.

How can we mitigate it?

Separate cores for interrupt processing and application threads.

3 cores run Memcached threads, and 1 core processes interrupts.

Page 67: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

23

Measurements: Sources of Tail Latencies

How can interrupts affect tail latency?

By default, Linux irqbalance spreads interrupts across all cores.

OS pre-empts Memcached threads frequently.

Introduces extra context switching overheads and cache pollution.

How can we mitigate it?

Separate cores for interrupt processing and application threads.

3 cores run Memcached threads, and 1 core processes interrupts.

Page 68: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

24

Measurements: Sources of Tail Latencies

Impact of Interrupt Processing

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

4 core Linux - Interrupt spread

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

4 core Linux - Interrupt spread

4 core Linux - Separate Interrupt core

Separate cores for interrupt and applicationprocessing improves tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 69: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

24

Measurements: Sources of Tail Latencies

Impact of Interrupt Processing

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

4 core Linux - Interrupt spread

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

4 core Linux - Interrupt spread

4 core Linux - Separate Interrupt core

Separate cores for interrupt and applicationprocessing improves tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 70: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

24

Measurements: Sources of Tail Latencies

Impact of Interrupt Processing

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4x

4 core Ideal Model

4 core Linux - Interrupt spread

10-4

10-3

10-2

10-1

100

101

102

103

104

CC

DF

P

[X >

= x

]

Latency in micro-seconds

4 core Ideal Model

4 core Linux - Interrupt spread

4 core Linux - Separate Interrupt core

Separate cores for interrupt and applicationprocessing improves tail latency.

Single CPU, 4 cores, Memcached running at 80% utilization.

Page 71: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

25

Measurements: Sources of Tail Latencies

Other sources of tail latency

Source of Tail Latency Underlying Cause

Thread Scheduling Policy Non-FIFO ordering of requests.

NUMA Effects Increased latency across NUMA nodes.

Hyper-threadingContending hyper-threads can increaselatency.

Power Saving FeaturesExtra time required to wake CPU fromidle state.

Page 72: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

26

Summary

Summary and Future Works

We explored hardware, OS and application-level sources of tail latency.

Pin-point sources using finegrained timestaming, and an ideal model.

We obtain substantial improvements, close to ideal distributions.

99.9th percentile latency of Memcached from 5 ms to 32 µs.

Sources of tail latency in multi-process environment.

How does virtualization effect tail latency?

Overhead of virtualization, interference from other VMs.

New effects when moving to a distributed setting, network effects.

Page 73: Tales of the Tail Hardware, OS, and Application-level ...Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports

26

Summary

Summary and Future Works

We explored hardware, OS and application-level sources of tail latency.

Pin-point sources using finegrained timestaming, and an ideal model.

We obtain substantial improvements, close to ideal distributions.

99.9th percentile latency of Memcached from 5 ms to 32 µs.

Sources of tail latency in multi-process environment.

How does virtualization effect tail latency?

Overhead of virtualization, interference from other VMs.

New effects when moving to a distributed setting, network effects.