32
Ward: Efficiently Mitigating Transient Execution Attacks using the Unmapped Speculation Contract 1 Jonathan Behrens, Anton Cao, Cel Skeggs, Adam Belay, M. Frans Kaashoek, Nickolai Zeldovich

Efficiently Mitigating Transient Execution Attacks using

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Efficiently Mitigating Transient Execution Attacks using

Ward:Efficiently Mitigating Transient Execution Attacks using the Unmapped Speculation Contract

1

Jonathan Behrens, Anton Cao, Cel Skeggs, Adam Belay, M. Frans Kaashoek, Nickolai Zeldovich

Page 2: Efficiently Mitigating Transient Execution Attacks using

Transient execution attacks risk leaking information

2

Linux maintains security using software mitigations

Page 3: Efficiently Mitigating Transient Execution Attacks using

Software mitigations are expensive

3

LEBench [SOSP ‘19] with/without mitigations on Linux

Page 4: Efficiently Mitigating Transient Execution Attacks using

Goal: faster mitigations

Threat model

● Similar security to Linux

Main ideas

● Unmapped Speculation Contract

● Ward kernel design

4

Page 5: Efficiently Mitigating Transient Execution Attacks using

z = shared[y * CACHE_LINE];}

y = array[sysarg];

if (sysarg < SIZE) { // speculate taken

Memory

Cache

Transient execution attack example

5

char array[SIZE];int secret;char shared[256 * CACHE_LINE];

// vulnerable system call code// if sysarg >= SIZE

shared

y z

array secret

// userspace attacker codesecret = is_in_cache(&shared[0]);

Page 6: Efficiently Mitigating Transient Execution Attacks using

Memory

Cache

Typical mitigation approach

6

char array[SIZE];int secret;char shared[256 * CACHE_LINE];

// vulnerable system call code// if sysarg >= SIZEif (sysarg < SIZE) { // speculate taken lfence(); // prevents speculation y = array[sysarg]; z = shared[y * CACHE_LINE];}

// userspace attacker codesecret = is_in_cache(&shared[0]);

sharedarray secret

Page 7: Efficiently Mitigating Transient Execution Attacks using

Memory

Cache

Ward has a different approach

7

char array[SIZE];int secret;char shared[256 * CACHE_LINE];

// vulnerable system call code// if sysarg >= SIZEif (sysarg < SIZE) { // speculate taken

y = array[sysarg]; z = shared[y * CACHE_LINE];}

// userspace attacker codesecret = is_in_cache(&shared[0]);

sharedarray secret

Page Fault

Secret not mapped...

Page 8: Efficiently Mitigating Transient Execution Attacks using

Our observation: Unmapped Speculation Contract (USC)

If some memory has never been mapped in the current address space...

CPU state should be unaffected by values stored there

8

Page 9: Efficiently Mitigating Transient Execution Attacks using

USC is a good hardware-software contract

● Allows most speculation

● Processors seem to be able to provide it:

“AMD processors are designed to not speculate into memory that is not valid in the current virtual address memory range defined by the software defined page tables.”

— “Speculation behavior in AMD micro-architectures” white paper

9

Page 10: Efficiently Mitigating Transient Execution Attacks using

Design

10

Page 11: Efficiently Mitigating Transient Execution Attacks using

Split kernel to leverage USC

Ward extends Linux’s PTI:

● K-domain (“kernel domain”) has a page table with all physical memory

11

0xfffffffffffff

0x800000000000

User

Kernel Text

Direct Map

0x000000000000

K-domain

Page 12: Efficiently Mitigating Transient Execution Attacks using

The Ward kernel is split in half

Ward extends Linux’s PTI:

● K-domain (“kernel domain”) has a page table with all physical memory

● Q-domain (“quasi-visible domain”) has a page table with user mappings, and safe kernel mappings.

12

0xfffffffffffff

0x800000000000

User

Kernel Text

0x000000000000

Direct Map

Q-domain

Page 13: Efficiently Mitigating Transient Execution Attacks using

Syscalls start executing in the Q-domain

● Any syscall or trap handler that doesn’t access any secret data will run entirely in the Q-domain.

● When this happens, we are able to avoid many mitigations:

○ No need for page table swap

○ Don’t have to flush microarchitectural

buffers

○ Retpolines are not required

13

User

Kernel Text

Q-domain

Page 14: Efficiently Mitigating Transient Execution Attacks using

...but sometimes we must enter the K-domain

14

User

Kernel Text

Q-domain

Page 15: Efficiently Mitigating Transient Execution Attacks using

...but sometimes we must enter the K-domain

15

User

Kernel Text

Q-domain

User

Kernel Text

K-domain

world switch

Page 16: Efficiently Mitigating Transient Execution Attacks using

World switches use two stacks

16

Q-stack

Q-stack

K-stack2: memcpy

1: switch page table

Q-domain K-domain

Q-text K-text 3: resume executing

Steps in a world switch…

1. Switch to K-domain page table2. Copy Q-stack contents to K-stack3. Resume executing

Page 17: Efficiently Mitigating Transient Execution Attacks using

● Both code segments are compiled the same

○ Matching instruction addresses and stack layouts

● At runtime, Q-text has mitigations patched out

○ lfence○ verw○ retpoline

Q and K Kernel

17

Q-stack

Q-stack

K-stack

Q-domain K-domain

Q-text K-text

Page 18: Efficiently Mitigating Transient Execution Attacks using

Redesigning the kernel to avoid switches

● Kernel data structures may mix secret and non-secret data

18

struct proc_public { proc_public* next; int pid; ...};struct proc_private { proc_public* pproc; uint64_t saved_regs[16]; ...};

struct proc { proc* next; int pid; uint64_t saved_regs[16]; ...};

Page 19: Efficiently Mitigating Transient Execution Attacks using

Manipulating page tables while in the Q-domain

● The physical memory pages backing the page tables, are themselves in the Q-domain

● Powerful capability which enables Q-domain to...○ Allocate anonymous memory○ Create temporary mappings○ Move kernel pages into/out of the Q-domain

19

Page 20: Efficiently Mitigating Transient Execution Attacks using

Allocating memory without world switches

● Have a per-core list of zeroed memory pages mapped in the Q-domain

○ Refreshed in batches

● Used for a variety of purposes:○ Page tables○ Q-domain kernel data structures○ Lazy allocation of user memory

20

Buddy Allocator (4KB - 32MB)

Per-core free list (4KB)

zeroed alloc (4KB)

public alloc (4KB)Q-domain alloc (4KB)

Page 21: Efficiently Mitigating Transient Execution Attacks using

Implementation

● Based on sv6 research kernel○ 34K lines of C++ code, plus libraries

● Supports all relevant mitigations from Linux○ Focus on Skylake (2015-19) microarchitecture

● Binary compatible with a subset of Linux’s syscall API○ Can run unmodified binaries!

21

Page 22: Efficiently Mitigating Transient Execution Attacks using

Results

22

Page 23: Efficiently Mitigating Transient Execution Attacks using

Does Ward reduce overhead?

Ward configurations:

● Linux-style: Standard mitigations like the ones in Linux

● USC-based: Fast mitigations

● Baseline: All mitigations disabled

Workloads:

● LEBench ● git

23

Page 24: Efficiently Mitigating Transient Execution Attacks using

Ward does better on LEBench

24

Page 25: Efficiently Mitigating Transient Execution Attacks using

Ward does better on LEBench

25

Low

er is

bet

ter

Page 26: Efficiently Mitigating Transient Execution Attacks using

Ward does better on LEBench

26

Low

er is

bet

ter

Page 27: Efficiently Mitigating Transient Execution Attacks using

Ward does better on LEBench

27

Low

er is

bet

ter

Page 28: Efficiently Mitigating Transient Execution Attacks using

Ward does better on LEBench

28

Low

er is

bet

ter

Page 29: Efficiently Mitigating Transient Execution Attacks using

Git benchmark

● Ward also demonstrates application-level performance improvements

● Runtime for git status on a 100 MB repository:

Configuration Overhead

Linux-style 24.6%

USC-based 11.2%

29

Page 30: Efficiently Mitigating Transient Execution Attacks using

Related Work: Spectrum of defenses

30

● Pure software defenses like Linux’s PTI, retpoline, etc.

● Hardware-software co-designs like ConTExT [CoRR], and SpecCFI [SP ‘20]

● Hardware defenses: Intel/AMD designs, Specshield [PACT ‘19], NDA [MICRO ‘19], and Speculative Taint Tracking [MICRO ‘19]

Page 31: Efficiently Mitigating Transient Execution Attacks using

Open question: what is the best way to mitigate attacks?

● Intel Cascade Lake (2019) has hardware mitigations for many attacks○ Eliminates need for software mitigations○ Toggling mitigations is almost free, but...

● New processor up to 33% slower executing LEBench syscalls○ Compared to 2016 CPU model with same clock speed and core count○ When mitigations disabled for both

Can hardware mitigations leverage the USC to get better performance?

31

Page 32: Efficiently Mitigating Transient Execution Attacks using

Conclusion

● The Unmapped Speculation Contract defines a division of responsibility between hardware and software

● Using USC, Ward reduces the performance cost of mitigations in software

32

github.com/mit-pdos/ward

Contact: [email protected]