Upload
vanhanh
View
239
Download
0
Embed Size (px)
Citation preview
2
…Or
How to have your Poison and (not) consume it too
3
NVDIMM software stack
NVDIMM
DAXDAXRegular Block IORegular Block IO
UserSpace
KernelSpace
StandardFile API
Libnvdimm DriversLibnvdimm Drivers
ApplicationApplication
File SystemFile System
Application ApplicationApplicationApplication
StandardRaw Device
Access
Load/StoreStandardFile API
Persistent Memory Aware File SystemPersistent Memory Aware File System
MMUMappings
Cache Line I/O
4
What is poison
Persistent Memory == Persistent Poison
What we (storage people) would like
What needs to be done in Linux
How we did it
5
What is Poison
6
What is Poison• Bad cell in memory
– Transient or hard/uncorrectable error
• How platforms deal with it
– Machine Check Exception
● Recoverable on high-RAS platforms (page is sequestered, app get SIGBUS)
● OS crash on other platforms
– If transient, rebooting typically makes the poison page go away
– If permanently degraded cell, replace DIMM
7
What is poison
Persistent Memory == Persistent Poison
What we (storage people) would like
What needs to be done in Linux
How we did it
8
What is Persistent Poison• Bad cell in Persistent memory
– Will not go away on reboot
– Without any changes in Linux:
● Trying to write to it will trigger a machine check
● Deleting the file won't help either
– A bad cache line is now a bad filesystem block
– * Implies data has been lost *
● Drivers/firmware can easily 'fix' the bad location, but it is imperative to let userspace know of the data loss.
– The 'fix' has to be triggered by the user/app
9
NVDIMM
DAXDAX
UserSpace
KernelSpaceLibnvdimm driverLibnvdimm driver
Application Application
Load/StoreStandardFile API
Persistent Memory Aware File SystemPersistent Memory Aware File System
MMUMappings
Cache Line I/O
mcheck handler
• Unmap• Notify• Crash
• Behavior we want to prevent
– Application calls read()
– Hits poison
– Crashes/gets SIGBUS
– (reboots)
– App starts up
– Tries to access its data (read())
– Crash
– …
– “Reebootus-infinitus”
What is Persistent Poison
10
What is poison
Persistent Memory == Persistent Poison
What we (storage people) would like
What needs to be done in Linux
How we did it
11
What we would like
• Instead of a SIGBUS/crash, return -EIO
• Way to expose known poison to Software
• Ability for Software to clear poison
12
What is poison
Persistent Memory == Persistent Poison
What we (storage people) would like
What needs to be done in Linux
How we did it
13
What needs to be done: Exposing poison
DAXDAX
UserSpace
KernelSpaceLibnvdimmLibnvdimm
Application Application
Load/Store
Persistent Memory Aware File SystemPersistent Memory Aware File System
MMUMappings
NVDIMM
FirmwareFirmware
ARS
Expose to FS
Expose to App
• Start an Address Range Scrub (ARS)– Or harvest results from a
previous, automatically started scrub
• Libnvdimm gets a list of poison
• Make it available for libnvdimm drivers
and file systems to check on I/Os
• Expose it to userspace
14
What needs to be done: Clearing poison
DAXDAX
UserSpace
KernelSpaceLibnvdimmLibnvdimm
Application Application
Load/Store
Persistent Memory Aware File SystemPersistent Memory Aware File System
MMUMappings
NVDIMM
Firmware (handles _DSM)Firmware (handles _DSM)
Clear Poison
Provide new data
Provide new data
• App provides new data
• Filesystem detects write to a poison
range and goes through driver
• Driver calls the clear_poison DSM, and
then writes data
15
What is poison
Persistent Memory == Persistent Poison
What we (storage people) would like
What needs to be done in Linux
How we did it
16
How we did it: Exposing poison
LibnvdimmLibnvdimm
NVDIMM
FirmwareFirmware
ARS
Expose to FS Expose to App
• Harvest results from an Address Range Scrub– Get SPA relative list of poison
• Convert SPA poison ranges to bad disk sectors
• Make md-raid's badblocks code generic
• Add bad blocks to gendisk
• Also expose them in sysfs
/dev/pmem0 /dev/pmem1
gendisk->badblocks$ cat /sys/block/pmem1/badblocks
1024 1
17
How we did it: Handling Driver I/O
• In the pmem driver, check if a BIO is for a bad sector
• If reading:– fail with an -EIO
• If writing:– Send clear_poison DSM
– Clear the sector from gendisk->badblocks
– Write the new data
NVDIMM
Regular Block IORegular Block IO
StandardFile API
Libnvdimm DriversLibnvdimm Drivers
ApplicationApplication
File SystemFile System
ApplicationApplication
StandardRaw Device
Access
READ:Check disk->badblocksReturn -EIO
Write:Check disk->badblocksSend DSM: c lear_poisonClear disk badblocks→Write data
UserSpace
KernelSpace
18
How we did it: Handling DAX I/O
• ->direct_access() checks for badblocks at
fault time
• If found, DAX mapping fails
• If writing:– Fallback to blockdev_do_direct_IO()
• All zeroing goes through the driver
• If we hit a latent error, SIGBUS/crash, and it will
be a 'known' error the next time NVDIMM
DAXDAX
UserSpace
KernelSpace
LibnvdimmLibnvdimm
Application Application
Load/StoreStandardFile API
Persistent Memory Aware File SystemPersistent Memory Aware File System
MMUMappings
Cache Line I/O
dax_map_atomicFails with -EIO
XX
19
DAXDAXHow we did it: 'Blast Radius'
• The poison is on a cache line granularity
• Block layer rounds up to a sector (512B)
• Fs/DAX will round up to a page (4K)
• If an app hits a bad page:– look up bad sector from sysfs
– do a write() to clear it
UserSpace
KernelSpace
LibnvdimmLibnvdimm
pmem-AwareFile System
pmem-AwareFile System
MMUMappings
Bad Blocks
4k
4k
512
NVDIMM 64
Machine Check HandlerMachine Check Handler
Q & A