Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
PM in the Wild: VMware Experience and Future ExpectationsRichard A. Brunner Principal Engineer & CTO Server Platform Technologies,
2
Agenda and Introduction
• Experiences Thus Far• Introduction• Are We Ready? With What?• vSphere Support• Customer Reaction
• Future Expectations• Data Protection• Server FW & HW Support• Persistence Domain & Power Fail• Future Topologies• Future Technologies
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
“It's been a long road, getting from there to here.It's been a long time, but my time is finally near. “
3
What is PM (Persistent Memory)?
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
4
PM Inside a VM: Different Access Models
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
NVMe SSD: vSCSI to NVMe SSD; Legacy & New OS• fsync flushes page cache to virtual SCSI disk (vSCSI)• vSCSI write executes GOS SCSI Stack• Hypervisor intercepts and writes physical SCSI stack
vPMemDisk: vSCSI to PMEM; Legacy & New OS• fsync flushes page cache to virtual SCSI disk (vSCSI)• vSCSI write executes GOS SCSI Stack• Hypervisor intercepts and writes to physical PMEM
vNVDIMM (block access): PMEM mapped into New OS• fsync flushes page cache to PMEM
vNVDIMM-DAX (Direct Access)• Requires New GOS and New Guest Apps• File read/write directly to PMEM pages in GOS• PMEM pages directly mapped to Guest App• No need for fsync
5© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
It finally Launched in April 2019 !!!
Intel “Cascade Lake”
6
Are We Ready? - Basics
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
7
Are We Ready? - Advanced
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
But we need more Applications!
8© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
2017
VMware vSphere Certified Servers with PM
9
VMware vSphere 6.7 Working with PM
Legacy OS & Application Usage
• Native: • PMEM as block storage device
with special driver.
• Virtualized: PMEM as block dev• Use special Guest driver; or• Use vPMemDisk with no new
driver • Map Guest Storage to PMEM
outside of VM by admin.
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
Native OS vSphere 6.7
10
VMware vSphere 6.7 Working with PM
New OS & Application Usage• Native & Virtualized
• Can use a direct load/store model with little OS overhead
• All the benefits of vSphere can be made available now or in the future:
• Multiple workloads using PMEM• Live VM Migration across servers• Check-pointing• Boost for Legacy VMs/Workloads• And More …
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
https://kb.vmware.com/s/article/54444https://kb.vmware.com/s/article/54445https://kb.vmware.com/s/article/67645
11
vSphere Support For Persistent Memory
vCenter & DRS
PMem
DS
NVDIMMs NVDIMMsNVDIMM NVDIMMsNVDIMMs NVDIMM
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
12
vSphere Support For Persistent Memory (2)
vCenter & DRS
PMem
DS
NVDIMMs NVDIMMsNVDIMM NVDIMMsNVDIMMs NVDIMM
Enter maintenance mode (vacate powered off VMs also)
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
13
Yes, We Are There!
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
Customer Reaction?
Future Expectations …
14
Data Protection – Near Term
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
Host Running VMs
Schedule Maintenance
DRS/HA restart of VM
VM crashesMemory Poison
Maintenance NeededPerf Degradation
ADL @ Power LossADL @ Off-liningWPL @ Off-lining
WPL @ Power LossWPL @ Starting Now
ACPI 6.3: Get Current Health (NCH)WPL: Write Persistency LossADL: All Data LossNIH: Can inject errors for testing!
Auto Shutdown Affected VMs
Admin Operation
Maintenance Mode
Start Migration
Reboot
Shutdown All VMs
15
Data Protection – Longer Term
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
Host Running VMs
Auto Migration Now
Auto Migration Soon
Schedule Maintenance
VM Auto Suspend or Shutdown
VM Auto Suspend or Shutdown
DRS/HA restart of VM
VM crashesMemory Poison
Maintenance NeededPerf Degradation
ADL @ Power LossADL @ Off-liningWPL @ Off-lining
WPL @ Power LossWPL @ Starting Now
PrayADL @ Starting Now
Checkpointing & Replication of PM in the background
Admin Operation
Maintenance Mode
Start Migration
Reboot
Shutdown All VMs
16
Server HW & FW support for PM• 2017-2019 was a firestorm of different, non-conforming,
Firmware interfaces!• UEFI/ACPI/FW specs struggling against HW schedules.
• With last minute changes to Error Detection & Remediation• Too many iterations with too many OEMs.• Non-std provisioning and encryption
• This shouldn’t happen in 2020.• UEFI/ACPI/FW interfaces are more mature and stable.• Intel “Barlow Pass” does not break these☺• OEMs have now broadly deployed solutions.• VMware has better defense:
• Full “PMEM Certification” kit for OEMs.
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
• Correct and tested UEFI/ACPI interfaces are critical to OS & Hypervisors.• Well-tested error and power-fail handling are critical.• VMware will only certify a PM solution in combination with the platform that supports it.
https://funnyjunk.com/channel/funny/Calling+all+tech+support/
17
DRAM Controller DRAM NAND
NVDIMM-N
x86 Persistence Domain & Power Fail• RTFM: SW needs to flush to persistence
domain, accept no substitutes! • CLFLUSH* & CLWB + SFENCE
• Key: Trigger Memory Controller to Flush WPQ to PM internal buffers or DRAM.
• Auto-triggered on Power Fail by Asynchronous DRAM Refresh (ADR).
• All Servers with PM support ADR.• Triggered by SW by WPQ Flush Cmd.
• Extended ADR: ADR + CPU Cache Flush
• BIOS flushes cache on each CPU.• Requires Large Backup Energy or UPS.• Exotic for Traditional Server.
• NVDIMM-N needs Backup Energy to flush to NAND.
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
SuperCap / Battery
Write PendingQueue (WPQ)
Cache
Core
IMC
Processor Pkg
Media Controller Media Media
Intel Optane DCPMMADR or
WPQ Flush
CLFLUSH*CLWB
18
Media Controller
PM
PM
PM
Hierarchical PM Topology & Pooling• Current SW support for “local” PM
Memory Affinity.• But more levels are coming.• Requires coherent access to be
effective.
• Future Coherent I/O interface allows expanding PM store.• Could be CXL or Gen-Z.• Latency vs Remote Capacity &
Remote Failover.• Replication can be off-lined.
• Many challenges:• Switch HW is non-trivial• Error Propagation & Recovery• Sophisticated SW Memory Tiering
& Hot-page Migration.• Fine-grain encryption.
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
RDMA
Server 1
Server 2
PM
PM
PM
PM
(CXL or Gen-Z)
19
Future Capacity & Latency• PM is not on the same density curve as DRAM,
• But, PM device capacity could be 2x to 4x of current (2 TiB?) in the next 5-years.
• Could we see a 2-socket server with 32 TiB of PM in the next 5-years?• (2 TiB/channel) x (8 channel/socket) x (2 socket) = 32 TiB of PM
• Future Technologies:• Carbon Nanotube (CNT), MRAM, or NAND+ DRAM (NVDIMM-P)• Next Generation Intel Optane, Next Gen ReRAM• Some could have near-DRAM latencies
• Regardless, *1* std OS/SW interface provided by HW, FW, SNIA, etc …• But sadly, advanced error recovery & provisioning likely device-specific for OS/SW
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
20
Summary
© 2020 SNIA Persistent Memory Summit. All Rights Reserved.
Experiences Thus Far: • We are ready enough, but we will learn much in 2020.
• We need more Applications!• Customer Reaction is slow so far.
Future Expectations: • Much collaborative work remains for
• Data Protection• Server FW & HW Support• Persistence Domain & Power Fail
• HW Innovation (with help from SW) continues:• Future Topologies• Future Technologies