Improve Linux SWAP For High Speed Flash Storage · • TLB flush is page based • TLB flush is...

Improve Linux SWAP For High Speed Flash Storage

Shaohua Li <shli@fusionio.com>

Background

▸ Partially replace DRAM with Flash storage (SSD) ▸ Compared to DRAM, Flash has:

•  Low price •  High density •  Low power

▸ Reasonable latency is tolerable

How does swap work?

Page in active LRU list

Page in inactive LRU list

Add page to swap cache

Unmap page and set PTE

Pageout page

Remove page from swap cache

Free page

Page fault

Allocate new page

Add page to swap cache

Pagein page

Set PTE

SWAPOUT SWAPIN

SSD specific characteristics

▸ Fast ▸ No seek penalty ▸ Big size request has better throughput ▸ High iodepth has better throughput ▸ Discard

SWAPOUT – TLB flush

▸ At least 2 TLB flushes •  Clear PTE ‘A’ bit •  Clear PTE ‘P’ bit

▸ Overhead is quite high •  TLB flush is page based •  TLB flush is involved by several tasks

▸ Solution: •  Improve smp_call_function_many() •  The first TLB flush can be removed in x86? •  Batch TLB flush?

SWAPOUT – swap_map scan

▸ swap_map entry - in-memory data structure to track swap disk usage

▸ Slow linear memory scan to find a cluster (128 adjacent pages) - to produce big size IO request

▸ Solution: cluster list •  Pros - O(1) algorithm •  Cons - restrictive cluster alignment •  Only enabled for SSD

SWAPOUT – IO pattern

▸ Interleaved IO pattern •  Multiple reclaimers •  New found cluster is shared by all reclaimers

▸ Block layer can’t merge the interleaved IO completely

▸ Solution: per-cpu cluster •  Reclaimer does sequential IO •  Easy to do IO merge in block layer

SWAPIN

▸ Page fault does sync IO - iodepth 1, page size IO request

▸ Need swap readahead to produce: •  Big size IO request •  High iodepth IO •  Parallel IO and CPU

▸ Userspace readahead API •  madvise(MADV_WILLNEED) is extended to do swap

prefetch

SWAPIN - cont

▸ In-kernel readahead •  Arbitrary readahead (always 8 pages) •  Random access workload ▸ Unnecessary currently ▸ Waste IO and increase memory pressure ▸  Let readahead aware workload is random

•  Sequential access workload ▸ Not enough currently ▸ Hard to do - can’t guarantee sequentially accessed pages

swapped out sequentially •  Sequentially accessed pages might not live adjacently in LRU

list •  Adjacent pages of LRU list might be swapout by different

reclaimers

Lock contentions

▸ Lock contentions are high •  Concurrent swapout - kswapd, direct page reclaim •  Concurrent swapin – page fault from each task

▸ Solution: •  anon_vma mutex - now it’s a rw_semaphore •  swap_lock and swap address space lock ▸ Per-swap lock now (a workaround) ▸ Still have lock contention with very high speed SSD

SWAP discard

▸ Discard is important to optimize SSD write throughput

▸ swap discard implementation is synchronous •  Block layer discard API is sync (introduce delay) •  Discard just before write is useless

▸ Solution: async swap discard •  No delay •  Discard and write can run in the same time •  Discard is cluster based

Other issues

▸ Page reclaim policy – bias swap? ▸ Huge page swap

Benchmark – sequential workload

Benchmark – random workload

f u s i o n i o . c o m | R E D E F I N E W H A T ’ S P O S S I B L E

Improve Linux SWAP For High Speed Flash Storage · • TLB flush is page based • TLB flush is...

Documents

Fresh Water Auto Flush Installation and Retrofit …4 Fresh Water Auto Flush Installation Introduction Cruise RO Water is constantly looking for ways to improve our product’s design

TLB-325 TLB-225 TLB-220 - …bidadoomedia.s3.amazonaws.com/PDF/Allmand/... · Allmand TLB™ Models TLB-6235 TLB-535 TLB-425 TLB ™ FULL LINE TLB-325 TLB-225 TLB-220 TLB-425 ESL

LookLeak-aside Buffer · 2018-12-03 · TLBleed: Unprivileged TLB Monitoring • Access Sets • Contained in L1 d-TLB • Contained in L2 TLB, Larger than L1 d-TLB • Larger than

apresentação TLB

Model TLB 6700-LN TLB 6700-XP - Newport Corporation...Model . TLB 6700-LN . TLB 6700-XP . For 6800 Vortex Plus Tunable Lasers Operation . User’s Manual. ii Preface . EU Declaration

Tlb Feuerverzinkt Stahlb E 20022012

Virtual Memory - cseweb.ucsd.educseweb.ucsd.edu/classes/su01/cse141/lect15.pdfTLB organization A TLB for instructions and a TLB for data A TLB for instructio ns and a TLB for data

Nested Virtualization - KVM · Nested VPID – Virtual Processor Identifier Tag address space and avoid a TLB flush – We don't advertise vpid to the L 1 hypervisor – L0 uses the

Catalogo TLB Iluminación S.L

Information Management CS165/CSCI E-268cs161/notes/security-part-one.pdf · Making New Processes: ... must flush the local core’s TLB to remove any stale entry ... •rb_linkand

Wb Tlb Baustaehle 10062011

TLB Hermann Focke - Thüringer Landtag

VR blaðið 2 tlb. 2014

MB TLB Campaign A4 2015

TLB Family brochure (Imperial)

Intel Core i7 Memory Hierarchyweb.cs.wpi.edu/~cs4515/d15/Protected/LecturesNotes_D15/... · 2015. 3. 31. · L1 TLB Divided into 2 parts Data TLB: 64 4KB entries Instruction TLB:

Multicore Cache and TLB coloring on Tegra3 Final Project ...moss.csc.ncsu.edu/.../multicore-cache-and-tlb-coloring/report4.pdf · Multicore Cache and TLB coloring on Tegra3 Final

Tlb Profi Manual En

News TLB - technologie-lizenz-buero.com

Case (TLB) 1