Transcript
Page 1: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Linux Block IO: Introducing Multiqueue SSD Access on

Multicore Systems

Matias Bjørling1,2, Jens Axboe2, David Nellans2, Philippe Bonnet1

2013/03/05

1IT University of Copenhagen - Fusion-io2

1

Page 2: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Non-Volatile Memory Devices • Higher IOPS and lower

latency than magnetic disk • Hardware continues to

improve – Parallel architecture – Future NVM technologies

• Software design decisions – Fast sequential access – Slow random access

2

2010 2011 2012 2013

?

Page 3: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

The Block Layer in the Linux Kernel

• Provides – Single point of access for block devices – IO scheduling – Merging & reordering – Accounting

• Optimized for traditional hard-drives – Trade CPU cycles for sequential disk accesses – Synchronization is a performance bottleneck

3

Page 4: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

NVM device

In-Kernel IO Submission / Completion

Process App/FS/

etc.

Device driver

Kernel

Low-level block IO

device driver Block IO layer

Request Queue bio

Scheduling Accounting

Reordering Merging

4

Page 5: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Block Layer IO for Disks

• Single request queue – Single lock data structure – Cache-coherence expensive on multi-CPU systems

5

Page 6: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Methodology

• Null block device driver – In kernel dummy device – Acts as proxy for a no latency device – Avoids the DMA overhead – Removes driver effects from block layer – ”Device”  completions  can  be  in-line or soft irq

6

NVM device

Process App/FS/

etc. Null driver Block IO layer

Page 7: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Current Block Layer (IOPS)

Null block device, noop IO scheduler, 512B random reads

Westmere-EP

Nehalem-EX Westmere-EX

7

Sandy-bridge E

Page 8: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Multi-Queue Block Layer

• Balance IO workload across cores • Reduce cache-line sharing • Similar functionality as SQ • Allow multiple hardware queues

Device driver

OS Block Layer

libaio

Socket 1

Process

Device driver

OS Block Layer

libaio

Socket 1

Process

Device driver

OS Block Layer

libaio

Socket 2

Process

Page 9: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Multi-Queue Block Layer

• Two levels of queues – Software Queues – Hardware Dispatch

Queues • Per CPU accounting • Tag-based completions

9

Userspace

Kernel

Single/Multi-queue capable hardware device

Process Process

Block Layer

...

...

SoftwareQueues

Hardware Dispatch Queues

Staging (Insertion/Merge)

Submission / Completion

Fairness / IO Scheduling

Tagging

Accounting

Submit IO

Status / Completion interrupt

Block IO device driver

Async/Sync

Page 10: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Mapping Strategies

• Per-core software queues • Varying hardware

dispatch queues

• Single software queue • Varying hardware

dispatch queues

10 Scalability = Multiple hardware dispatch queues

MQ Scalability

MQ Scalability

Page 11: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Multi-Queue IOPS

11

SQ – Current block layer, Raw – Block layer bypass, MQ – Proposed design

Page 12: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Multi-socket Scaling

12

Page 13: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

AIO interface

• Alternative interface to synchronous • Prevents blocking of user threads • Move bio list to a lockless list • Remove legacy code and locking

13

Device driver

OS Block Layer

libaio

Socket 1

Process

Device driver

OS Block Layer

libaio

Socket 1

Process

Device driver

OS Block Layer

libaio

Socket 2

Process

Page 14: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Multi-Queue, 8 socket

14

Page 15: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Latency Comparison

• Sync IO, 512B • Queue depth grows with CPU count

1,5 0,8

1,2

4,6

1,2 1,9

0

1

2

3

4

5

SQ Raw MQ

Late

ncy

(us)

1 core

6 cores

1,9 1 1,4

15,8

2,2 3,8

29,4

2,1 5,4

05

101520253035

SQ Raw MQ

Late

ncy

(us)

1 core

7 cores

12 cores

15

1 socket

2 socket 5.4x

2.3x

Page 16: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Latency Comparison

2,7 1,5 2 37

5,1 10

251

34

84

0

50

100

150

200

250

300

SQ Raw MQ

Late

ncy

(us)

1 core

9 cores

32 cores

16

4 socket

8 socket

2.9x

2,9 1,6 2,3

386

7,9 10,5

1200

10 153

SQ Raw MQ 0

200

400

600

800

1000

1200

1400

Late

ncy

(us)

1 core

11 cores

80 cores

7.8x

Page 17: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Conclusions

• Current block layer is inadequate for NVM • Proposed Multi-Queue block layer

– 3.5x-10x increase in IOPS – 10x-38x reduction in latency – Supports multiple hardware queues – Simplifies driver development – Maintains industry standard interfaces

17

Page 18: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Thank You & Questions

Implementation available at git://git.kernel.dk/?p=linux-block.git

18

Page 19: Linux Block IO: Introducing Multiqueue SSD Access on Multicore Systems

Systems

• 1 Socket – Sandy Bridge-E, i7-3930K, 3.2Ghz, 12MB L3

• 2 Socket – Westmere-EP, X5690, 3.46Ghz, 12MB L3

• 4 Socket – Nehalem-EX, X7560, 2.66Ghz, 24MB L3

• 8 Socket – Westmere-EX, E7-2870, 2.4Ghz, 30MB L3

19