Analysis of Secure Caches Using a Three-Step Model for ... · against different types of the timing-based attacks. To provide a means of analyzing the caches, this paper presents

httpsdoiorg101007s41635-019-00075-9

Analysis of Secure Caches Using a Three-Step Modelfor Timing-Based Attacks

Shuwen Deng1 middotWenjie Xiong1 middot Jakub Szefer1

Received 19 February 2019 Accepted 23 July 2019copy Springer Nature Switzerland AG 2019

AbstractMany secure cache designs have been proposed in literature with the aim of mitigating different types of cache timingndashbasedattacks However there has so far been no systematic analysis of how these secure cache designs can or cannot protectagainst different types of the timing-based attacks To provide a means of analyzing the caches this paper presents a novelthree-step modeling approach that is used to exhaustively enumerate all the possible cache timingndashbased vulnerabilitiesThe model covers not only attacks that leverage cache accesses or flushes from the local processor core but also attacksthat leverage changes in the cache state due to the cache coherence protocol actions from remote cores Moreoverboth conventional attacks and speculative execution attacks are considered With the list of all possible cache timingvulnerabilities derived from the three-step model this work further manually analyzes each of the existing secure cachedesigns to show which types of timing-based side-channel vulnerabilities each secure cache can mitigate Based on thesecurity analysis of the existing secure cache designs using the new three-step model this paper further summarizes differenttechniques gleaned from the secure cache designs and their ability help mitigate different types of cache timingndashbasedvulnerabilities

Keywords Secure caches middot Timing-based attacks middot Security analysis middot Side channels middot Covert channels

1 Introduction

Research on timing-based attacks in computer processorcaches has a long history eg [1 3 5 19 36] predatingtheir recent use in Spectre [26] attacks These past attackshave shown the possibility to extract sensitive informationvia the timing-based channels and often the focus is onextracting cryptographic keys In addition due to the recentSpectre [26] attacks there is now renewed interested intiming channels Especially the Spectre attacks consist oftwo parts first speculative execution is used to access somesensitive information second a timing-based channel isused to actually transfer the information to the attacker

Shuwen Dengshuwendengyaleedu

Wenjie Xiongwenjiexiongyaleedu

Jakub Szeferjakubszeferyaleedu

1 Yale University New Haven CT 06520 USA

Whether by itself or combined with speculative executionthe timing-based channels in processors pose a threat to asystemrsquos security and should be mitigated

We have recently proposed a three-step model [11] inorder to analyze cache timingndashbased side-channel attacksThe previous model considers cache timingndashbased side-channel vulnerabilities as a set of three ldquostepsrdquo or actionsperformed by either the attacker or the victim which canaffect the states of the cache In this work our methodologyfrom [11] is improved to better represent actions of theattacker and the victim For each step all possible statesfor a cache block are enumerated in terms of whether theoperation is driven by the attacker or the victim whatmemory range the data being operated on belongs to andwhether the state is changed because of a memory accessor data invalidation operation (due to a cache coherenceoperation or a flush instruction for example) To understandwhich possible three-step actions can lead to an attack wefurther propose and develop a cache three-step simulatorand apply a set of reduction rules to derive a complete listof vulnerabilities by eliminating three-step combinationsthat do not map to an attack Furthermore we considerboth normal and speculative execution for the memory

Journal of Hardware and Systems Security (2019) 3397ndash425

Published online 20 November 2019

operations and modeling of the cache attacks Speculativeexecution has gotten increased attention due to recentSpectre [26] attacks many of which depend on timingchannels to actually extract informationmdashspeculation aloneis not enough for most of these attacks Our model considerstiming channels in general independent of whether it is aside or a covert channel

In the process of development of the improved three-step model we have uncovered 43 types of timing-basedvulnerabilities which have not been previously exploited(in addition there are 29 types that map to attacks alreadyknown in literature) We cannot directly compare the typesof vulnerabilities found in this work and in our priorwork [11] due to the improved and different categorizationsof the states of the cache block

To address the threat of the prior cache timingndashbasedattacks to date 18 different secure cache designs havebeen presented in academic literature [7 10 12 22 2325 27 29 30 38 48ndash53 56 57] The secure processorcaches are designed with different assumptions and oftenaddress only specific types of timing-based side-channel orcovert-channel attacks To help analyze the security of thesedesigns this work uses our three-step modeling approach toreason about all the possible timing-based vulnerabilitiesEspecially since our work demonstrates a number of newtiming-based attacks the existing secure caches have neverbeen analyzed with respect to these new attacks beforeFor this work we manually reviewed and analyzed the 18existing secure cache designs [7 10 12 22 23 25 2729 30 38 48ndash53 56 57] in terms of the security featuresand implementations Most of these designs do not havepublicly available hardware implementation source code soautomatic analysis of the caches is not possible

Based on the analysis we summarize cache featuresthat help improve security Especially we propose that anldquoidealrdquo secure caches and processor architectures shouldprovide new features to let software explicitly label memoryloads or stores of sensitive data and differentiate them fromnormal loads and stores so sensitive data can be efficientlyidentified and protected by the hardware The caches canuse partitioning to isolate the attacker and the victim andprevent the attacker from being able to set the victimrsquoscache blocks into a known state which is needed by manyattacks To mitigate attacks based on internal interferencethe caches can use randomization to de-correlate the datathat is accessed and the data that is placed in the cache Moredetails of the possible defenses are discussed in Section 5and Section 6

11 Contributions

The new contributions of this work over [11] are as follows

bull A new formulation of the three-step model with newcache states and derivation of a new set of typesfor covering all the cache timingndashbased vulnerabilities(Section 3)

ndash Inclusion of cache coherence issues into thethree-step mode

ndash Expansion of the three-step model to considerboth cases of normal and speculative executionattacks

ndash Design of reduction rules and cache three-step simulator to automatically derive theexhaustive list of all the three steps which mapto effective vulnerabilities and elimination ofthree-step patterns which do not map to apotential attack

bull Overview of the 18 secure cache designs that have beenpresented in academic literature (Section 4)

bull Manual evaluation of 18 secure processor cache designsto determine how they can help prevent timing-basedattacks and analysis of security features secure cachesused (Sections 5 and 6)

bull Discussion of ldquoidealrdquo secure caches and the featuresthey would need (Section 6)

bull Attack strategies description and comparison amongdifferent attack strategies (Appendix A)

bull Analysis of the soundness of the three-step model andwhy three-steps are able to describe all timing-basedvulnerabilities (Appendix B)

2 Cache TimingndashBased Attacksand the Threat Model

Modern processor caches are known to be vulnerable totiming-based attacks The timing of the memory accessesvaries due to cachesrsquo operation For example a cache hitis fast while a cache miss is slow The cache coherenceprotocol can also change the cache states and affect thetiming of the memory operations The cache coherencemay invalidate a cache block from a remote core resultingin a cache miss in the local core for example Alsothe timing of cache flush operations varies depending onwhether the data to be flushed is in the cache or notFlushing an address using clf lush with valid data in thecache is slow while flushing an address not in the cacheis fast for example From these timing differences ofmemory-related operations the attacker can infer a datarsquosspecific memory address or corresponding cache indexvalue and thus learn some information about the victimrsquossecrets

J Hardw Syst Secur (2019) 3397ndash425398

21 Threat Model

This work focuses only on timing-based attacks in processorcaches Numerous other types of side and covert channelsthat do not use timing or caches exist eg power-based [9]EM-based [2] (including RF) thermal-based [33] and inprocessor channels based on features such as power state ofthe AVX unit [40] for example This work aims to exploremain cache attacks only but similar approach can be donefor the other buffers or cache-like structures which may betarget of attack once main processor caches are secured

In our threat model an attackerrsquos objective is to retrievevictimrsquos secret information using timing-based channels inthe processor cache Specifically we consider the situationwhere the victim accesses an address u and the addressdepends on some secret information The address u is withinsome set of physical memory locations x which are knownto the attacker The goal of the attacker is to obtain theaddress u or at least partial bits of it which relate to the cacheindex of the address

We assume the attacker knows some of the source codeof the victim Especially the attacker can only learn someinformation 1 about the address u from the timing channelsbut with knowledge of the source code he or she can furtherinfer the likely specific value of u and thus infer the secrethe or she is trying to learn

The attacker cannot directly access any data in the statemachine of the cache logic nor directly read the data of thevictim if the two are not sharing the same address spaceThe attacker can however observe its own timing or thetiming of the victim process And the attacker knows howthe timing of the memory-related operations depends on thecache states

The attacker further is able to force the victim to executea specific function For example the attacker can requestvictim to decrypt a specific piece of data thus triggering thevictim to execute a function that makes use of a secret keyhe or she wants to learn The victim in the cache attacks canbe user software code in an enclave operating system oranother virtual machine

The processor microarchitecture and the operatingsystem are assumed to be able to differentiate between thevictim and the attacker in different processes by assigningdifferent process IDs If the victim and the attacker are inthe same process eg attacker is a malicious library theywill have the same process ID The system software (egoperating system or hypervisor) is responsible for properly

1For a hit-based vulnerabilities the attacker is able to learn thefull address of the victimrsquos sensitive data while for the miss-basedvulnerabilities the attacker usually can learn the cache index ofthe victimrsquos sensitive data For more details of these vulnerabilitiesrsquocategorizations please refer to Section 333

setting up virtual memory (page tables) and assigning IDswhich may be used by the hardware to identify differentthreads processes or virtual machines When analyzingsecure cache designs the system software is consideredtrusted and bug-free The attacker is also assumed not to beable to undermine the physical implementation or changethe hardware eg he or she cannot influence randomnessgenerated by any random number generators in hardwarePhysical or invasive attacks are not in scope of this workFor secure cache designs which add new instructionsfor security-related operations the victim process ormanagement software is assumed to correctly use theseinstructions During speculative execution the cache statecan be modified by the instructions executed speculativelyunless a processor cache architecture explicitly prevents orforbids certain speculative accesses

22 Side and Covert Channels

This work focuses on both side and covert channels Covertchannels use the same methods as side channels but theattacker controls both the sender and the receiver side ofthe channel All types of side-channel attacks are equallyapplicable to covert channels For brevity we just use theterm ldquovictimrdquo in the text to represent both the victim (forside channels) and the sender (for covert channels)

23 Hyperthreading Versus Timing-Slice Sharing

When the hyperthreading is supported in a system theattacker and the victim are able to run on different threadsin parallel instead of running once every time slice (whenno hyperthreading is used) Our model can be applied toboth of the scenarios since our model abstracts away howthe sharing happens

3Modeling of the Cache TimingndashBasedSide-Channel Vulnerabilities

This section explains how we developed the three-stepmodeling approach and used it to model the behavior ofthe cache logic and to enumerate all the possible cachetimingndashbased vulnerabilities

31 Introduction of the Three-StepModel

We have observed that all of the existing cache timingndashbased attacks can be modeled with three steps of memory-related operations Here ldquomemory-related operationrdquo refersto loads stores or different flushes that can be done by thevictim or the attacker on the same core or different coresWhen the victim and the attacker are on different cores

J Hardw Syst Secur (2019) 3397ndash425 399

cache coherence will also be triggered when one of thememory-related operations is performed

The three-step model has three steps as the nameimplies In Step 1 a memory operation is performedplacing the cache in an initial state known to the attacker(eg a new piece of data at some address is put into thecache or the cache block is invalidated) Then in Step 2a second memory operation alters the state of the cachefrom the initial state Finally in Step 3 a final memoryoperation is performed and the timing of the final operationreveals some information about the relationship among theaddresses from Step 1 Step 2 and Step 3

For example in Flush + Reload [55] attack in Step

1 a cache block is flushed by the attacker In Step 2security critical data is accessed by for example victimrsquosAES encryption operation In Step 3 the same cache blockas the one flushed in Step 1 will be accessed and the time ofthe access will be measured by the attacker If the victimrsquossecret-dependent operation in Step 2 accesses the cacheblock in Step 3 there will be a cache hit and fast timing

of the memory operation will be observed and the attackerlearns the victimrsquos secret address

To model all the timing-based attacks we write the threesteps as Step 1 Step2 Step3 which represents asequence of steps taken by the attacker or the victim Tosimplify the model we focus on memory-related operationaffecting one single cache block (also called cache slotcache entry or cache line) Cache block is the smallestunit of the cache Since all the cache blocks are updatedfollowing the same cache state machine logic it is sufficientto consider only one cache block

32 States of the Three-StepModel

When modeling the attacks we propose that there are 17possible states for a cache block Table 1 lists all the 17possible states of the cache block for each step in our three-step model and their formal definitions Figure 1 graphicallyshows for each possible state how the memory locationmaps to the cache block

Table 1 The 17 possible states for a single cache block in our three-step model

State Description

Vu A memory location u belonging to the victim is accessed and is placed in the cache block by the victim (V) Attacker doesnot know u but u is from a set x of memory locations a set which is known to the attacker It may have the same index as a

or aalias and thus conflict with them in the cache block The goal of the attacker is to learn the index of the address u Theattacker does not know the address u hence there is no Au in the model

Aa or Va The cache block contains a specific memory location a The memory location is placed in the cache block due to a memoryaccess by the attacker Aa or the victim Va The attacker knows the address a independent of whether the access was by thevictim or the attacker themselves The address a is within the range of sensitive locations x The address a is known to theattacker

Aaalias or Vaalias The cache block contains a memory address aalias The memory location is placed in the cache block due to a memory accessby the attacker Aaalias or the victim Vaalias The address aalias is within the range x and not the same as a but it has the sameaddress index and maps to the same cache block ie it ldquoaliasesrdquo to the same block The address aalias is known to the attacker

Ad or Vd The cache block contains a memory address d The memory address is placed in the cache block due to a memory access bythe attacker Ad or the victim Vd The address d is not within the range x The address d is known to the attacker

Ainv or V inv The cache block is now invalid The data and its address are ldquoremovedrdquo from the cache block by the attacker Ainv or thevictim V inv as a result of cache block being invalidated eg this is a cache flush of the whole cache

Ainva or V inv

a The cache block state can be anything except a in this cache block now The data and its address are ldquoremovedrdquo from thecache block by the attacker Ainv

a or the victim V inva Eg by using a flush instruction such as clf lush that can flush specific

address or by causing certain cache coherence protocol events that force a to be removed from the cache block The addressa is known to the attacker

Ainvaalias or V inv

aalias The cache block state can be anything except aalias in this cache block now The data and its address are ldquoremovedrdquo fromthe cache block by the attacker Ainv

aalias or the victim V invaalias Eg by using a flush instruction such as clf lush that can flush

specific address or by causing certain cache coherence protocol events that force aalias to be removed from the cache blockThe address aalias is known to the attacker

Ainvd or V inv

d The cache block state can be anything except d in this cache block now The data and its address are ldquoremovedrdquo from thecache block by the attacker Ainv

d or the victim V invd Eg by using a flush instruction such as clf lush that can flush specific

address or by causing certain cache coherence protocol events that force d to be removed from the cache block The addressd is known to the attacker

V invu The cache block state can be anything except u in the cache block The data and its address are ldquoremovedrdquo from the cache

block by the victim V invu as a result of cache block being invalidated eg by using a flush instruction such as clf lush or

by certain cache coherence protocol events that force u to be removed from the cache block The attacker does not know uTherefore the attacker is not able to trigger this invalidation and Ainv

u does not exist in the model

Any data or no data can be in the cache block The attacker has no knowledge of the memory address in this cache block

Fig 1 The 17 possible states for a single cache block in our three-step model a Vu b AaVaAaalias Vaalias AdVd c AinvV inv dAinv

a V inva Ainv

aalias V invaalias A

invd V inv

d e V invu f )

In each sub-figure of Fig 1 the left-most part shows thepossible state being described in the sub-figure The middlepart shows the possible situation of the cache state affectedby each For all sub-figures the middle cache block (shownin bold) is the targeted cache block Right-most part showsthe memory region in relation to the cache block Recallthe addresses a and aalias are within the sensitive set ofaddresses x while d is outside the set of sensitive addresses(for simplicity the set is shown as a contiguous region butit can be any set) Also recall A represents the operationsperformed by the attacker and V represents the victimrsquosoperations

Figure 1a shows the description of the possible state Vuwhere address u is within sensitive set and unknown to theattacker Therefore it can possibly map to any cache blockincluding the target cache block shown in the middle Sinceits position in the cache and specific address is unknown

we show Vu in dashed lines Meanwhile Fig 1e shows thedescription of the possible state V inv

u which is result ofthe victim invalidating data at the sensitive address u andpossibly invalidating some address within sensitive regionFurther Fig 1f shows the description of the possible statelowast which represents null knowledge of the address for theattacker to this corresponding cache block Therefore it canpossibly refers to any address in the memory or no validaddress at all

Figure 1b shows the description of the possible stateAaVaAaalias Vaalias AdVd Their addresses are all knownto the attacker and map to the same targeted cache blockBoth a and aalias are within the sensitive set of addressesx and aalias as its name indicates is a different addressthan a but still within set x and maps to the same cacheblock as a Address d is outside of the set x MeanwhileFigure 1d shows the description of the possible state

Ainva V inv

a Ainvaalias V inv

aalias Ainvd V inv

d which correspondto invalidation of the address shown in the subscript ofthe state Some additional possible invalidation statesAinvV inv are shown in Figure 1c These states indicate novalid address is in the cache block Therefore all the possi-ble addresses that mapped to this cache block eg a aalias d and u (if it mapped to this block) before the invalidationstep AinvV inv will be flushed back to the memory

33 Derivation of All Cache TimingndashBasedVulnerabilities

With the 17 candidate states shown in Table 1 for eachstep there are in total 17 lowast 17 lowast 17 = 4913 combinationsof three steps We developed a cache three-step simulatorand a set of reduction rules to process all the three-step combinations and decide which ones can indicate areal attack As is shown in Fig 2 the exhaustive list ofthe 4913 combinations will first be input to the cachethree-step simulator where the preliminary classificationof vulnerabilities is derived The effective vulnerabilitieswill then be sent as the input to the reduction rules toremove the redundant three steps and obtain final list ofvulnerabilities

331 Cache Three-Step Simulator

We developed a cache three-step simulator that simulatesthe state of one cache block and derives the attackerrsquosobservations in the last step of the three-step patterns that itanalyzes for different possible u Since u is in secure rangex the possible candidates of u for a cache block are a aalias

and NIB (Not-In-Block) Here NIB indicates the case thatu does not have same index as a or aalias and thus does notmap to this cache block

The cache three-step simulator is implemented inPython script and its pesudo implementation is shown inAlgorithm 1 Simulatorrsquos inputs are 17 possible states foreach of the step Outputs are all the vulnerabilities thatbelong to the Strong or the Weak type or the Ineffective typeThe simulator uses a nested for loop to check all possiblecombinations (4913) of the three step pattern For each step

Exhaustive List

of all possible

three-step

combinations

Three-Step

Simulator

Preliminary Strong Vulnerability

Preliminary WeakVulnerability

Ineffective Three-Step

Reduction

StrongVulnerability

WeakVulnerability

Classification Step

Reduction Step

Vulnerability TypesVulnerability Types

Fig 2 Procedure to derive the effective types of three-step timing-based vulnerabilities Ovals refer to the number of vulnerabilitiesin each category

of each pattern if it is Vu this step will be extended to beone of three candidates Va Vaalias and VNIB If it is V inv

u this step will be extended to be one of three candidatesV inv

a V invaalias and V inv

NIB We wrote a function output timingthat takes three known memory access steps as input andoutput whether fast or slow timing will be observed forthe last step In this case for each of the u-related steprsquoscandidate we can derive a timing observation Using thesetiming observation function judge type decides whether athree-step pattern is a potential vulnerability by analyzingwhether the attacker is able to observe different andunambiguous timing for different values of u

The simulator categorizes all the three-step patternsinto three categories as listed below Figure 3 showstwo examples for the Strong Vulnerability (a b) WeakVulnerability (c d) and Ineffective Three-Step (e f)categories respectively

1 Strong Vulnerability When a fast or slow timing isobserved by the attacker he or she is able to uniquelydistinguish the value of u (either it maps to someknown address or has the same index with some knownaddress) In this case the vulnerability has stronginformation leakage (ie attacker can directly obtainthe value of u based on the observed timing) Wecategorize these vulnerabilities to be strong Eg forVd Vu Aa vulnerability shown in Fig 3aif u maps to a the attacker will always derive fasttiming If u is aalias or NIB slow timing will beobserved This indicates that the attacker is able tounambiguously infer the victimrsquos behavior (u) from thetiming observation

2 Weak Vulnerability When fast or slow timing isobserved by the attacker he or she knows it correspondsto more than one possible value of u (eg a or aalias)For these vulnerabilities timing variation can still beobserved due to different victimrsquos behavior Howeverthe attacker cannot learn the value of the index ofthe address u unambiguously For example for type Vu Ainv

a shown in Fig 3c when fast timingis observed u possibly maps to aalias or NIB (thereason for the possibility of u mapping to NIB to derivefast timing is that due to the in Step 1 the cachecould have a hit and then Aa would result in a cachehit) On the other hand when slow timing is observedu possibly maps to a or NIB This pattern leads touncertain u guess about value of u based on timingobservation

3 Ineffective Three-Step The remaining types are treatedto be ineffective For example for type Aa Vu Ad

shown in Fig 3f no matter what the value of u isattackerrsquos observation is always slow timing

Pseudo-code for the cache three-step simulator algorithm

Input a list containing 17 possible states for each of the step

Output a list containing all the vulnerabilities that belong to the Strong type

a list containing all the vulnerabilities that belong to the Weak type

a list containing all the ineffective typs

1 for 1 do2 for 2 do3 for 3 do4 1 2 3

5 array to store all possible candidate combinations of this three-step pattern

6 array to store all possible timing observation regading different candidate combinations for this

three-step pattern

7 if 0 or 1 or 2 then8 for 3 rsquos candidates are and rsquos candidates are

and Both candidatersquos number is 3 do9 0

10 end for11 for 3 do12

13 end for14 if then15 strongappend(steps)

16 else17 if then18

19 else20

21 end if22 end if23 else24

25 continue26 end if27 end for28 end for29 end for

Algorithm 1

After computing the type of all the three-step patternsthe cache three-step simulator will output effective (StrongVulnerability or Weak Vulnerability) three-step patternsDue to the space limit we only list and analyze the Strongvulnerabilities in this paper Weak vulnerabilities are left forfuture work when channels with smaller channel capacitiesare desired to be analyzed

332 Reduction Rules

We also have developed rules that can further reduce theoutput list of all the effective three steps from the cache

three-step simulator Figure 2 shows how the output of thesimulator is filtered through the reduction rules to get thefinal list of vulnerabilities Reductionrsquos goal is to removevulnerabilities of repeating or redundant types from the liststo form effective Strong Vulnerability or Weak Vulnerabilityoutput A script was developed that automatically appliesbelow reduction rules to the output of the simulator to getthe final list of vulnerabilities A three-step combinationwill be eliminated if it satisfies one of the below rules

1 Three-step patterns with two adjacent steps which arerepeating or which are both known to the attacker canbe eliminated eg Ad Aa Vu can be reduced

Behavior (u)a

aalias

Observation

Vd Vu Aa

Behavior (u)a

aalias

Observation

Vu Aainv

Behavior (u)a

aalias

Observation

Vd Vuinv Vd

Behavior (u)a

aalias

Observation

Vu Ad Vuinv

Behavior (u)a

aalias

Observation

Aaaliasinv Vu

inv Va

Behavior (u)a

aalias

Observation

Aa Vu Ad

Fig 3 Examples of relations between victimrsquos behavior (u) andattackerrsquos observation for each vulnerability type a b StrongVulnerability c d Weak Vulnerability e f Ineffective Three-Step

to Aa Vu which is equivalent to Aa VuTherefore Ad Aa Vu is a repeat type of Aa Vu and can be eliminated

2 Three-step patterns with a step involving a knownaddress a and an alias to that address aalias givesthe same information Thus three step combinationswhich only differ in use of a or aalias cannot representdifferent attacks and only one combination needs tobe considered For example Vu Aaalias Vu is arepeat type of Vu Aa Vu and we will eliminatethe first pattern

3 Three-step patterns with steps Vu and V invu in adjacent

consecutive steps with each other will only keep thelatter step and eliminate the first step For exampleAa Vu V inv

u can be reduced to Aa V invu and

further equivalent to Aa V invu So Aa Vu

V invu can be eliminated

333 Categorization of StrongVulnerabilities

As is shown in Fig 2 after applying the reduction rulesthere are 72 types of Strong vulnerabilities remaining InAppendix B we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can cover allpossible cache timingndashbased side-channel vulnerabilitiesAnd if there is a vulnerability it can always be reduced toa model that requires only three steps Table 2 lists all the

vulnerability types of which the last step is a memory accessand Table 3 shows all the vulnerability types of which thelast step is an invalidation-related operation To ease theunderstanding of all the vulnerability types we group thevulnerabilities based on attack strategies (left-most columnin Tables 2 and 3) these strategies correspond to well-known names for the attacks if such exist otherwise weprovide a new name In Appendix A we provide descriptionfor each attack strategy to show the main idea behind themWe use existing names for attack strategies where suchexisted before even if similar attacks eg attacks differingin only one step have been given different names before Weuse these established names to avoid confusion but detailsome of the similarities in Appendix A as a clarification

The list of vulnerability types can be further collectedinto four simple macro types which cover one or morevulnerability types internal interference miss-based (IM)internal interference hit-based (IH) external interferencemiss-based (EM) external interference hit-based (EH) aslabeled in the Macro Type column of Tables 2 and 3 Allthe types of vulnerabilities that only involve the victimrsquosbehavior V in the states in Step 2 and Step 3 are calledinternal interference vulnerabilities (I) The remaining onesare called external interference (E) Some vulnerabilitiesallow the attacker to learn that the address of the victimaccesses map to the set the attacker is attacking by observingslow timing due to a cache miss or f ast timing due toinvalidation of data not in the cache2 We call these miss-based vulnerabilities (M) The remaining ones leverageobservation of f ast timing due to a cache hit or slow timingdue to an invalidation of an address that is currently valid inthe cache and are called hit-based vulnerabilities (H)

Many vulnerability types have been explored before Forexample the Cache Collision attack [5] is effectively basedon the Internal Collision and it maps to types labeled (2)in the Attack column in Tables 2 and 3 The types labelednew correspond to new attack not previously discussedin literature We believe these 43 are new attacks notpreviously analyzed nor known

4 Secure Caches

Having explained the three-step model we now explorethe various secure caches which have been presented inliterature to date [7 10 12 22 23 25 27 29 30 38 48ndash53 56 57] Later in Section 5 we will apply the three-stepmodel to check if the secure caches can defend some or allof the vulnerabilities in our model

2Invalidation is fast when the corresponding address which is to beinvalidated does not exist in the cache since no operation is needed forthe invalidation

Table 2 The table shows all the cache timing-based cache vulnerabilities where the last step is a memory access related operation The AttackStrategy column gives a common name for each set of one or more specific vulnerabilities that would be exploited in an attack in a similar mannerThe Vulnerability Type column gives the three steps that define each vulnerability For Step 3 fast indicates a cache hit must be observed toderive sensitive address information while slow indicates a cache miss must be observed The Macro Type column proposes the categorizationthe vulnerability belongs to ldquoErdquo is for external interference vulnerabilities ldquoIrdquo is for internal interference vulnerabilities ldquoMrdquo is for miss-basedvulnerabilities ldquoHrdquo is for hit-based vulnerabilities The Attack column shows if a type of vulnerability has been previously presented in literature

Attack Strategy Vulnerability Type Macro Type Attack

Step 1 Step 2 Step 3

Cache Internal Collision Ainv Vu Va (fast) IH (2)

V inv Vu Va (fast) IH (2)

Ad Vu Va (fast) IH (2)

Vd Vu Va (fast) IH (2)

Aaalias Vu Va (fast) IH (2)

Vaalias Vu Va (fast) IH (2)

Ainva Vu Va (fast) IH (2)

V inva Vu Va (fast) IH (2)

Flush + Reload Ainva Vu Aa (fast) EH (5)

V inva Vu Aa (fast) EH (5)

Ainv Vu Aa (fast) EH (5)

V inv Vu Aa (fast) EH (5)

Ad Vu Aa (fast) EH (5)

Vd Vu Aa (fast) EH (5)

Aaalias Vu Aa (fast) EH (5)

Vaalias Vu Aa (fast) EH (5)

Reload + Time V invu Aa Vu (fast) EH new

V invu Va Vu (fast) IH new

Flush + Probe Aa V invu Aa (slow) EM (6)

Aa V invu Va (slow) IM new

Va V invu Aa (slow) EM new

Va V invu Va (slow) IM new

Evict + Time Vu Ad Vu (slow) EM (1)

Vu Aa Vu (slow) EM (1)

Prime + Probe Ad Vu Ad (slow) EM (4)

Aa Vu Aa (slow) EM (4)

Bernsteinrsquos Attack Vu Va Vu (slow) IM (3)

Vu Vd Vu (slow) IM (3)

Vd Vu Vd (slow) IM (3)

Va Vu Va (slow) IM (3)

Evict + Probe Vd Vu Ad (slow) EM newVa Vu Aa (slow) EM new

Prime + Time Ad Vu Vd (slow) IM newAa Vu Va (slow) IM new

Flush + Time Vu Ainva Vu (slow) EM new

Vu V inva Vu (slow) IM new

(1) Evict + Time attack [34] (2) Cache Internal Collision attack [5] (3) Bernsteinrsquos attack [3] (4) Prime + Probe attack [34 36] Alias-drivenattack [18] (5) Flush + Reload attack [54 55] Evict + Reload attack [17] (6) SpectrePrime MeltdownPrime attack [46]

Table 3 The table shows the second part of the timing-based cache side-channel vulnerabilities where the last step is an invalidation relatedoperation For Step 3 fast indicates no corresponding address of the data is invalidated while slow indicates invalidation operation makes somedata invalid causing longer processing time

Attack Vulnerability Type Macro Attack

Strategy Type

Cache Internal Collision Invalidation Ainv Vu V inva (slow) IH new

V inv Vu V inva (slow) IH new

Ad Vu V inva (slow) IH new

Vd Vu V inva (slow) IH new

Aaalias Vu V inva (slow) IH new

Vaalias Vu V inva (slow) IH new

Flush + Flush Ainva Vu V inv

a (slow) IH (1)

V inva Vu V inv

a (slow) IH (1)

Ainva Vu Ainv

a (slow) EH (1)

V inva Vu Ainv

a (slow) EH (1)

Flush + Reload Invalidation Ainv Vu Ainva (slow) EH new

V inv Vu Ainva (slow) EH new

Ad Vu Ainva (slow) EH new

Vd Vu Ainva (slow) EH new

Aaalias Vu Ainva (slow) EH new

Vaalias Vu Ainva (slow) EH new

Reload + Time Invalidation V invu Aa V inv

u (slow) EH new

V invu Va V inv

u (slow) IH new

Flush + Probe Invalidation Aa V invu Ainv

a (fast) EM new

Aa V invu V inv

a (fast) IM new

Va V invu Ainv

a (fast) EM new

Va V invu V inv

a (fast) IM new

Evict + Time Invalidation Vu Ad V invu (fast) EM new

Vu Aa V invu (fast) EM new

Prime + Probe Invalidation Ad Vu Ainvd (fast) EM new

Aa Vu Ainva (fast) EM new

Bernsteinrsquos Invalidation Attack Vu Va V invu (fast) IM new

Vu Vd V invu (fast) IM new

Vd Vu V invd (fast) IM new

Va Vu V inva (fast) IM new

Evict + Probe Invalidation Vd Vu Ainvd (fast) EM new

Va Vu Ainva (fast) EM new

Prime + Time Invalidation Ad Vu V invd (fast) IM new

Aa Vu V inva (fast) IM new

Flush + Time Invalidation Vu Ainva V inv

u (fast) EM new

Vu V inva V inv

u (fast) IM new

(1) Flush + Flush attack [16]

This section gives brief overview of the 18 secure cachedesigns that have been presented in academic literaturein the last 15 years To the best of our knowledge thesecover all the secure cache designs proposed to date Mostof the designs have been realized in functional simulationeg [23 48] Some have been realized in FPGA eg [7]and a few have been realized in real ASIC hardwareeg [31] No specific secure caches have been implementedin commercial processors to the best of our knowledgehowever CATalyst [29] leverages Intelrsquos CAT (CacheAllocation Technology) technology available today in IntelXeon E5 2618L v3 processors and could be deployed today

When the secure cache description in the cited papersdid not mention the issue of using flush or cache coherencewe assume the victim or the attacker cannot invalidateeach otherrsquos cache blocks by using clf lush instructions orthrough cache coherence protocol operations but they canflush or use cache coherence to invalidate their own cachelines The victim and the attacker also cannot invalidateprotected or locked data Further if the authors specified anyspecific assumptions (mainly about the software) we list theassumption as part of the description of the cache Whatrsquosmore when the level of cache hierarchy was unspecifiedwe assume the secure cachesrsquo features can be appliedto all levels of caches including L1 cache L2 cacheand Last Level Cache (LLC) If the inclusivity of thecaches was not specified we assume they target inclusivecaches Following the below descriptions of each securecache design the analysis of the secure caches is given inSection 5

SPlowast Cache [20 27]3 uses partitioning techniques tostatically partition the cache ways into High and Lowpartition for the victim and the attacker according to theirdifferent process IDs The victim typically belongs to Highsecurity and attacker belongs to Low security Victimrsquosmemory accesses cannot modify Low partition (assigned toprocesses such as the attacker) while the attackerrsquos memoryaccesses cannot modify High partition (assigned to thevictim) However the memory accesses of both the victimand the attacker can result in a hit in either Low or Highpartition if the data is in the cache

SecVerilog Cache [56 57] statically partitions cacheblocks between security levels L (Low) and H (High)Each instruction in the source code for programs usingSecVerilog cache needs to include a t iming label whicheffectively represents whether the data being accessed by

3Two existing papers give slightly different definitions for an ldquoSPrdquocache thus we selected to define a new cache the SPlowast cachethat combines secure cache features of the Secret-Protecting cachefrom [27] with secure cache features of the Static-Partitioned cachefrom [20]

that instruction is Low or High based on the code andthis t iming label can be similar to a process ID thatdifferentiates attackerrsquos (Low) instructions from victimrsquos(High) instructions The cache is designed such thatoperations in the High partition cannot affect timing ofoperations in the Low partition For a cache miss due to Lowinstructions when the data is in the High partition it willbehave as a cache miss and the data will be moved from theHigh to the Low partition to preserve consistency HoweverHigh instructions are able to result in a cache hit in bothHigh and Low partitions if the data is already in the cache

SecDCPCache [48] builds on the SecVerilog cache and usespartitioning idea from the original SecVerilog cache but thepartitioning is dynamic It can support at least two securityclasses H (High) and L (Low) and configurations with moresecurity classes are possible They use the percentage ofcache misses for L instructions that was reduced (increased)when Lrsquos partition size was increased (reduced) by onecache way to adjust the number of ways of the cacheassigned to the Low partition When adjusting number ofways in the cache dedicated to each partition if Lrsquos partitionsize decreases the process ID is checked and L blocksare flushed before the way is reallocated to H On theother hand if Lrsquos partition size increases H blocks in theadjusted cache way remain unmodified so as to not addmore performance overhead and they will eventually beevicted by Lrsquos memory accesses However the feature ofnot flushing High partition data during way adjustment mayleak timing information to the attacker

NoMo Cache [12] dynamically partitions the cache waysamong the currently ldquoactiverdquo simultaneous multithreading(SMT) threads Each thread is exclusively reserved Y blocksin each cache set where Y is within the range of [0 N

where N is the number of ways and M is the number ofSMT threads NoMo-0 equals to traditional set associativecache while NoMo- N

M partitions cache evenly for the

different threads and there are no non-reserved ways Thenumber of Y assigned to each thread is adjusted based onits activeness When adjusting number of blocks assigned toa thread Y blocks are invalidated for cache sets to protecttiming leakage Eviction is not allowed within each threadrsquosown reserved ways while it is possible for the shared waysTherefore to avoid eviction caused by the unreserved wayswe assume NoMo- N

M is used to fully partition the cache

When the attacker and the victim share the same librarythere will be a cache hit if accessing the shared data andthe normal cache hit policy holds to guarantee the cachecoherence

SHARP Cache [53] uses both partitioning and randomiza-tion techniques to prevent victimrsquos data from being evicted

or flushed by other malicious processes and it targets on theinclusive caches Each cache block is augmented with thecore valid bits (CVB) to indicate which private cache (pro-cess) it belongs to (similar to the Process ID) where CVBstores a bitmap and ith bit in the bitmap is set if the lineis present in ith corersquos private cache Cache hit is allowedamong different processesrsquo data When there is cache missand data needs to be evicted data not belonging to any cur-rent processes will be evicted first If there is no such datadata belonging to the same process will be evicted If thereis no existing data in the cache that is in the same processa random data in the cache set will be evicted This randomeviction will generate an interrupt to the OS to notify it ofa suspicious activity For pages that are read-only or exe-cutable SHARP cache disallows flushing using clf lush inuser mode However invalidating victimrsquos blocks by usingcache coherence protocol is still possible

Sanctum Cache [10] focuses on isolation of enclaves(equivalent to Trusted Software Module in other designs)from each other and the operating system (OS) In terms ofcaches they implement security features for L1 cache TLBand LLC Cache isolation of LLC is achieved by assigningeach enclave or OS to different DRAM address regionsIt uses page-coloring-based cache partitioning scheme [2444] and a software security monitor that ensures per-core isolation between OS and enclaves For L1 cacheand TLB when there is a transition between enclave andnon-enclave mode the security monitor will flush thecore-private caches to achieve isolation Normal flushestriggered by the enclave or the OS can only be done withinenclave or not within enclave code Also timing-based side-channel attacks exploiting cache coherence are explicitlynot prevented thus behavior on cache coherence operationsis not defined This cache listed extra software assumptionsas follows

Assumption 1 Software security monitor guarantees thatvictim and attacker process cannot share the same cacheblocks It uses page coloring [24 44] to ensure that victimand attackerrsquos memory is never mapped to the same cacheblocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLB

MI6 Cache [7] is part of the memory hierarchy of theMI6 processor which combines Sanctum [10] cachersquossecurity feature with disabling speculation during thespeculative execution of memory-related operations During

normal processor execution for L1 caches and TLB thecorresponding states will be flushed across context switchesbetween software threads For the LLC set partitioning isused to divide DRAM into contiguous regions And cachesets are guaranteed to be strictly partitioned (two DRAMregions cannot map to the same cache set) Each enclaveis only able to access its own partition Speculation issimply disabled when enclave interacts with the outsideworld because of small performance influence based on therare cases of speculation This cache listed extra softwareassumptions as follows

Assumption 1 Software security monitor guarantees thatthe victim and the attacker process cannot share the samecache blocks It uses page coloring [24 44] to ensure thatvictimrsquos and attackerrsquos memory are never mapped to thesame cache blocks for the LLCAssumption 2 The software runs on a system with asingle processor core where victim and attacker alternateexecution but can never run truly in parallel Moreoversecurity critical data is always flushed by the securitymonitor when program execution switches away from thevictim program for the L1 cache and the TLBAssumption 3 When an enclave is interacting with theoutside environment the corresponding speculation isdisabled by the software

InvisiSpec Cache [52] is able to make speculation invisiblein the data cache hierarchy Before a visibility pointshows up when all of its prior control flow instructionsresolve unsafe speculative loads (USL) will be put intoa speculative buffer (SB) without modifying any cachestates When reaching the visibility point there are twocases In one case the USL and successive instructionswill be possibly squashed because of mismatch of datain the SB and the up-to-date values in the cache Inanother case the core receives possible invalidation fromthe OS before checking of memory consistency modeland no comparison is needed When speculative executionhappens the hardware puts the data into SB as to identifyvisibility point for dealing with final state transition of thespeculative execution InvisiSpec cache targets on Spectre-like attacks and futuristic attacks However InvisiSpeccache is vulnerable to all non-speculative side channels

CATalyst Cache [29] uses partitioning especially CacheAllocation Technology (CAT) [21] available in the LLCof some Intel processors CAT allocates up to 4 differentClasses of Services (CoS) for separate cache ways sothat replacement of cache blocks is only allowed withina certain CoS CATalyst first uses CAT mechanism topartition caches into secure and non-secure parts (non-secure parts may map to 3 CoS while secure parts map

to 1 CoS) Secure pages are assigned to virtual machines(VMs) at a granularity of a page and not shared by morethan one VM Here attacker and victim reside in differentVMs Combined with CAT technology and pseudo-lockingmechanism which pins certain page frames managed bysoftware CATalyst guarantees that malicious code cannotevict secure pages CATalyst implicitly performs preloadingby remapping security-critical code or data to secure pagesFlushes can only be done within each VM And cachecoherence is achieved by assigning secure pages to only oneprocessor and not sharing pages among VMs This cachelisted extra software assumptions as follows

Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecutionAssumption 2 Security critical data is always able to fitwithin the secure partition of the cache That is all datain the range x can fit in the secure partitionAssumption 3 The victim and the attacker process cannotshare the same memory spaceAssumption 4 Use pseudo-locking mechanism by soft-ware to make sure that victim and attacker process cannotshare the same cache blocksAssumption 5 Secure pages are reloaded immediatelyafter the flush which is done by the virtual machinemonitor (VMM) to make sure all the secure pages are stillpinned in the secure partition

DAWG Cache [25] (Dynamically Allocated Way Guard)partitions the cache by cache ways and provides fullisolation for hits misses and metadata updates acrossdifferent protection domains (between the attacker and thevictim) DAWG cache is partitioned for the attacker and thevictim and each of them keep their own different domain id(which is similar to process ID used in general caches) Eachdomain id has its own bit fields one is called policy fillmapfor masking fills and selecting the victim to replace anotheris called policy hitmap for masking hit ways Only boththe tag and the domain id are the same will a cache hithappen Therefore DAWG allows read-only cache lines tobe replicated across ways for different protection domainFor a cache miss the victim can only be chosen withinthe ways belonging to the same domain id recorded bythe policy fillmap Consistently the replacement policy isupdated with the victim selection and the metadata derivedfrom the policy fillmap for different domains is updatedas well The paper also proposes the idea to dynamicallypartitions the cache ways following the systemrsquos workloadchanges but does not actually implement it

RIC Cache [22] (Relaxed Inclusion Caches) proposes a low-complexity cache to defend against eviction-based timing-based side-channel attacks on the LLC Normally for an

inclusive cache if the data R is in the LLC it is also inthe higher level cache and eviction of the R in the LLCwill cause the same data in the higher level cache eg L1cache to be invalidated making eviction-based attacks inthe higher level cache possible (eg attacker is able to evictvictimrsquos security critical cache line) For RIC each cacheline is extended with a single bit to set the relaxed inclusionOnce the relaxed inclusion is set for that cache line thecorresponding LLC line eviction will not cause the sameline in the higher-level cache to be invalidated Two kinds ofdata will be set relaxed inclusion bit read only data andthread private data when they are loaded into the cacheThese two kinds of data are claimed by the paper to cover allthe critical data for ciphers Therefore RIC will not preventwritable in-private critical data which is currently not foundin any ciphers Apart from that RIC requires flushing for thecorresponding cache lines in the cases that the RIC bits aremodified or for thread migration events to avoid the timingleakage during transition time

PL Cache [49] provides isolation by partitioning the cachebased on cache blocks It extends each cache block with aprocess ID and a lock status bit The process ID and thelock status bits are controlled by the extended load and storeinstructions (ld lockld unlock and st lockst unlock)which allow the programmer and compiler to set or reset thelock bit through use of the right load or store instructionIn terms of cache replacement policy for a cache hit PLcache will perform the normal cache hit handling procedureand the instructions with locking or unlocking capability canupdate the process ID and the lock status bits while the hit isprocessed When there is a cache miss locked data cannotbe evicted by data that is not locked and locked data amongdifferent processes cannot be evicted by each other In thiscase the new data will be either loaded or stored withoutcaching In other cases data eviction is possible This cachelisted extra software assumption as follows

Assumption 1 Security critical data is always preloadedinto the cache at the beginning of the whole programexecution

RP Cache [49] uses randomization to de-correlate thememory address accessing and timing of the cache Foreach block of RP cache there is a process ID and oneprotection bit P set to indicate if this cache block needs to beprotected or not A permutation table (PT) stores each cachesetrsquos pre-computed permuted set number and the numberof tables depends on number of protected processes Formemory access operations cache hits need both process IDand address to be the same When a cache miss happensto data D of a cache set S if the to-be-evicted data andto-be-brought-in data belong to the same process but have

different protection bit arbitrary data of a random cache setSprime will be evicted and D will be accessed without cachingIf they belong to different processes D will be stored in anevicted cache block of Sprime and mapping of S and Sprime will beswapped as well Otherwise the normal replacement policyis executed

Newcache Cache [31 50] dynamically randomizesmemory-to-cache mapping It introduced a ReMappingTable (RMT) and the mapping between memory addressesand this RMT is as in a direct mapped cache while themapping between the RMT and actual cache is fully asso-ciative The index bits of memory address are used to lookup entries in the RMT to find the cache block that shouldbe accessed It stores the most useful cache lines rather thanhold a fixed set of cache lines This index stored in RMTcombined with the process ID is used to look up the actualcache where each cache line is associated with its real indexand process ID Each cache block is also associated witha protection bit (P) to indicate if it is security critical Forcache replacement policy it is very similar to RP cacheCache hit needs both process ID and address to be the sameWhen cache miss happens to data D arbitrary data willbe evicted and D will be accessed without caching if theybelong to the same process but either one of their protec-tion bit is set If the evicted data and brought-in data havedifferent process IDs D will randomly replace a cache linesince it is fully associative in the actual cache Otherwisethe normal replacement policy for direct mapped cache isexecuted

Random Fill Cache [30] de-correlates cache fills withthe memory access using random filling technique Newinstructions used by applications in Random Fill cachecan control if the requested data belongs to a normalrequest or a random fill request Cache hits are processedas in normal cache For the security critical data accessesof the victim a Nof ill request is executed and therequested data access will be performed without cachingMeanwhile on a Random Fill request arbitrary datafrom the range of addresses will be brought into the cacheIn the paper [30] the authors show that random fill ofspatially near data does not hurt performance For otherprocessesrsquo memory accesses and normal victimrsquos memoryaccesses Normal request will be used to achieve normalreplacement policy Victim and attacker are able to removevictimrsquos own security critical data including using clf lush

instructions or cache coherence protocol since the flush willnot influence timing-based side-channel attack prevention(the random filling technique is used for this)

CEASER Cache [38] is able to mitigate conflict-basedLLC timingndashbased side-channel attacks using address

encryption and dynamic remapping CEASER cache doesnot differentiate whom the address belongs to and whetherthe address is security critical When memory access triesto modify the cache state the address will first be encryptedusing Low-Latency BlockCipher (LLBC) [6] which notonly randomizes the cache set it maps but also scatters theoriginal possibly ordered and location-intensive addressesto different cache sets decreasing the probability of conflictmisses The encryption and decryption can be done withintwo cycles using LLBC Furthermore the encryption keywill be periodically changed to avoid key reconstructionThe periodic re-keying will cause the address remapping todynamically change

SCATTER Cache [51] uses cache set randomization toprevent timing-based attacks It builds upon two ideas Firsta mapping function is used to translate memory addressand process information to cache set indices the mappingis different for each program or security domain Secondthe mapping function also calculates a different index foreach cache way in a similar way to the skewed associativecaches [41] The mapping function can be keyed hash orkeyed permutation derivation functionmdasha different key isused for different application or security domain resultingin a different mapping from address to cache sets for eachSoftware (eg the operating system) is responsible formanaging the security domains and process IDs which areused to differentiate the different software and assign itdifferent keys for the mapping For the hardware extensiona cryptographic primitive such as hashing and an indexdecoder for each scattered cache way is added SCATTERcache also stores the index bits of the physical addressto efficiently perform lookups and writebacks There isalso one bit per page-table entry added to allow the kernelto communicate with the user space for security domainidentification

Non Deterministic Cache [23] uses cache access delay torandomize the relation between cache block access andcache access timing There is no differentiation of datacaching between different process ID or whether the datais secure or not A per-cache-block counter records theinterval of its data activeness and is increased on eachglobal counter clock tick when the data is untouched Whenthe counter reaches a predefined value the correspondingcache line will be invalidated Non Deterministic Cacherandomly sets the local countersrsquo initial value that is lessthan the maximum value of the global counter In thiscase the cache delay is changed to be randomized Cachedelay interval controlled by this non-deterministic executioncan lead to different cache hit and miss statistics becausethe invalidation is determined by the randomized counterof each cache line and therefore de-correlates any cache

access time from the address being accessed However theperformance degradation is tremendous

5 Analysis of the Secure Caches

In this section we manually evaluate the effectiveness ofthe 18 secure caches [7 10 12 22 23 25 27 29 30 3848ndash53 56 57] We analyze how well the different cachescan protect against the 72 types of vulnerabilities definedin Tables 2 and 3 which cover all the possible Strong(according to the definition in Section 3) cache timingndashbased vulnerabilities Following the analysis discuss whattypes of secure caches and features are best suited fordefending different types of timing-based attacks

51 Effectiveness of the Secure Caches AgainstTiming-Based Attacks

Tables 4 and 5 list the result of our analysis of whichcaches can prevent which types of attacks Some cachesare able to prevent certain vulnerabilities denoted by acheckmark and green color in the table For exampleSPlowast cache can defend against Vu Ad Vu (slow)(one type of Evict + Time [34]) vulnerability For someother caches and vulnerabilities the cache is not able toprevent the vulnerabilities and it is indicated by times and redcolor For example SecDCP cache cannot defend againstVu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])vulnerability

Each cache is analyzed for each type of vulnerabilitylisted in Tables 2 and 3 A cache is judged to be able toprevent a type of cache timingndashbased vulnerability in threecases

1 A cache can prevent a timing attack if the timing ofthe last step in a vulnerability is always constant andthe attacker can never observe fast and slow timingdifference for the given set of three steps For instancein a regular set-associative cache the Vd Vu Aa

(fast) (one type of Flush + Reload [55]) vulnerabilitywill allow the attacker to know that address a mapsto secret u when the attacker observes fast timingcompared with observing slow timing in the other casesHowever in case of the RP cache [49] will make thetiming of the last step to be always slow because RPcache does not allow data of different processes toderive cache hit between each other

2 A cache can prevent a timing attack if the timingof last step is randomized and cannot have originalcorresponding relation between victimrsquos behavior andattackerrsquos observation For instance Ad Vu Ainv

d (fast) (one type of Prime + Probe Invalidation)

vulnerability when executed on a normal set-associativecache will allow the attacker to know that the addressd has the same index with secret u when observing fasttiming compared with slow timing in the other casesHowever when executing this attacks on the RandomFill cache [30] for example a slow timing will notdetermine that u and d have the same index as thesecret since in Random Fill cache u would be accessedwithout caching and another random data would becached instead

3 A cache can prevent a timing attack if it disallowscertain steps from the three-step model to be executedthus prevents the corresponding vulnerability Forinstance when PL cache [49] preloads and locks thesecurity critical data in the cache vulnerabilities suchas Ad Vu V inv

d (slow) (one type of Prime + TimeInvalidation) will not be possible since a preloadedlocked security critical data will not allow Ad in Step 1to replace it In this case Ad cannot be in the cache sothis vulnerability cannot be triggered in PL cache

From the security perspective the entries of the securecache in Tables 4 and 5 should have as many green coloredcells as possible If a cache design has any red cells thenit cannot defend against that type of vulnerabilitymdashattackerusing the timing-based vulnerability that corresponds to thered cell can attack the system

The third column in Tables 4 and 5 shows a normalset associative cache which cannot defend against anytype of timing-based vulnerabilities Meanwhile the lastcolumn of Tables 4 and 5 shows the situation where thecache is fully disabled As is expected the timing-basedvulnerabilities are eliminated and timing-based attacks willnot succeed Disabling caches however has tremendousperformance penalty Similarly second-to-last columnshows Nondeterministic Cache which totally randomizescache access time It can defend all the attacks but againwill have a tremendous cost to security when the applicationis complex

For each of the entry that shows the effectivenessof a secure cache against a vulnerability there are tworesults listed Left one is for normal execution and theright one is for speculative execution Some secure cachessuch as InvisiSpec cache target timing-based channels inspeculative execution For most of the caches that do notdifferentiate speculative execution and normal executionthe two sub-columns for each cache are the same

6 Secure Cache Techniques

Among the secure cache designs presented in the priorsection there are three main techniques that the caches

Table4

esrsquo

timesin

mrsquos

chersquo

c Flus

ilitie

limite

Table5

esrsquop

ilitie

Atimes

mrsquos

chersquo

c Flus

ilitie

limite

utilize differentiating sensitive data partitioning andrandomization

Differentiating Sensitive Data (columns for CATalyst cacheto columns for Random Fill cache in Tables 4 and 5) allowsthe victim or attacker software or management software toexplicitly label a certain range of the data of victim whichthey think is sensitive The victim process or managementsoftware is able to use cache-specific instructions to protectthe data and limit internal interference between victimrsquosown data For example it is possible to disable victimrsquosown flushing of victimrsquos labeled data and therefore preventvulnerabilities that leverage flushing This technique allowsthe designer to have stronger control over security criticaldata rather than forcing the system to assume all of victimrsquosdata is sensitive However how to identify sensitive dataand whether this identification process is reliable are openresearch questions for caches that support differentiation ofsensitive data

This technique is independent of whether a cache usespartitioning or randomization techniques to eliminate sidechannels between the attacker and the victim Caches thatare able to label and identify sensitive data have theadvantage in preventing internal interference since they areable to differentiate sensitive data from the normal dataand can make use of special instructions to give moreprivileges to sensitive data However it requires careful usewhen identifying the actual sensitive data and implementingcorresponding security features on the cache

Comparing PL cache with SPlowast cache although bothof them use partitioning flush is able to be implementedto be disabled for victimrsquos sensitive data in PL cachewhere Vu V inv

a Vu (slow) (one type of Flush +Time) is prevented Newcache is able to prevent Vu Va Vu (slow) (one type of Bernsteinrsquos Attack [3])while most of the caches without ability to differentiatesensitive data cannot because Newcache disallows replacingdata as long as either data to be evicted or data to becached is identified to be sensitive However permittingdifferentiation of sensitive data can potentially backfire onthe cache itself For example Random Fill cache cannotprevent Vu Ad Vu (slow) (one type of Evict +Time [34]) which most of the other caches can prevent oravoid because the random fill technique loses its intendedrandom behavior when the security critical data is initiallyloaded into the cache in Step 1

Partitioning-Based Caches usually limit the victim and theattacker to be able to only access a limited set of cacheblock (columns for SPlowast cache to column for PL cachein Tables 4 and 5) For example either there is staticor dynamic partitioning of caches which allocates someblocks to High victim and Low attacker The partitioning

can be based not just on whether the memory access isvictimrsquos or attackerrsquos but also on where the access is to(eg High partition is determined by the data address) Forspeculative execution attackerrsquos code can be the part ofspeculation or out-of-order load or store which is able to bepartitioned (eg using speculative load buffer) from othernormal operations The partitioning granularity can be cachesets cache lines or cache ways Partitioning-based securecaches are usually able to prevent external interference bypartitioning but are weak at preventing internal interferenceWhen partitioning is used interference between the attackerand the victim or data belonging to different securitylevels should not be possible and attacks based on externalinterference between the victim and the attacker will failHowever the internal interference of victimrsquos own datais hard to be prevented by the partitioning-based cachesWhatrsquos more partitioning is recognized to be wastefulin terms of cache space and inherently degrades systemperformance [49] Dynamic partitioning can help limit thenegative performance and space impacts but it could be ata cost of revealing some information when adjusting thepartitioning size for each part It also does not help withinternal interference prevention

In terms of the three-step model the partitioning-basedcaches excel at making use of partitioning techniques todisallow the attacker to set initial states (Step 0) of victimpartition by use of flushing or eviction and therefore bringuncertainty to the final timing observation made by theattacker

SPlowast cache can prevent external miss-based interferencebut it still allows the victim and the attacker to getcache hits due to each otherrsquos data which makes hit-based vulnerabilities happen eg Vd Vu Va (fast)(one type of Cache Internal Collision [5]) vulnerabilityis one of the examples that SPlowast cache cannot preventSecVerilog cache is similar to SPlowast cache but prevents theattacker from directly getting cache hit due to victimrsquos datafor confidentiality and therefore prevents vulnerabilitiessuch as Ainv

a Vu Aa (fast) (one type of Flush+ Reload [55]) SHARP cache mainly uses partitioningcombined with random eviction to minimize the probabilityof evicting victimrsquos data and prevent external miss-basedvulnerabilities It is vulnerable to hit-based or internalinterference vulnerabilities such as Vu Va Vu (slow)(one type of Bernsteinrsquos Attack [3]) vulnerability DAWGcache will only allow the data to get a cache hit if bothits address and the process ID are the same Thereforecompared with normal partitioning cache such as SPlowast cacheit is able to prevent vulnerabilities such as Vd Vu Ainv

(fast) (one type of Prime + Flush)SecDCP and NoMo cache both leverage dynamic

partitioning to improve performance Compared withSecVerilog cache SecDCP cache introduces certain side

channels which manifest themselves when the number ofways assigned to the victim and attacker changes egVu Ainv

a Vu (slow) (one type of Flush + Time)vulnerability NoMo cache behaves more carefully whenchanging the number of ways during dynamic partitioninghowever it requires victimrsquos sensitive data to fit intothe assigned partitions otherwise it will be put into theunreserved way and allow eviction by the attacker SecDCPdoes not have unreserved way All the space in the cachewill be either belongs to High or Low partition

Sanctum cache and CATalyst cache are both controlledby a powerful software monitor and they disallow securepage sharing between victim and attacker to preventvulnerabilities such as Ad Vu Aa (fast) (onetype of Flush + Reload [55]) Sanctum cache does notconsider internal interference while CATalyst cache ismore carefully designed to prevent different vulnerabilitieswith the implemented software system so far supportingpreventing all of the vulnerabilities but only works forLLC and with high software implementation complexityand some assumptions that might be hard to achievein other scenarios eg assuming the secure partition isbig enough to fit all the secure data MI6 cache is thecombination of Sanctum and disabling speculation wheninteracting with the outside world Therefore in normalexecution it behaves the same as Sanctum For speculativeexecution because it will simply disable all the speculationwhen involving the outside world the external interferencevulnerability such as Vd Vu Ad (slow) (one type ofEvict + Probe) vulnerability will be prevented

InvisiSpec cache does not modify the original cachestate but places the data in a speculative buffer partitionduring the speculation or out-of-order load or store Sinceduring speculation cache state is not actually updated thespeculative execution cannot trigger any of the steps inthe three-step model RIC cache focuses on eviction basedattack and therefore are good at preventing even someinternal miss-based vulnerability such as Vu Va Vu

(slow) (one type of Bernsteinrsquos Attack [3]) but are bad atall hit-based vulnerabilities PL cache is line-partitionedand uses locking techniques for victimrsquos security criticaldata It can prevent many vulnerabilities because preloadingand locking secure data disallow the attacker or non-securevictim data to set initial states (Step 0) for victim partitionand therefore brings uncertainty to the final observation bythe attacker eg Ad Vu Va (fast) (one type of CacheInternal Collision [5]) vulnerability is prevented

Randomization-Based Caches (columns for SHARP cacheand columns for RP cache to columns for Non Deterministiccache in Tables 4 and 5) inherently de-correlate therelationship between information of victimrsquos securitycritical datarsquos address and observed timing from cache

hit or miss or between the address and observed timingof flush or cache coherence operations For speculativeexecution they also de-correlate the relationship betweenthe address of the data being accessed during speculativeexecution or out-of-order load or store and the observedtiming from a cache hit or miss Randomization can beused when bringing data into the cache evicting data orboth Some designs randomize the address to cache setmapping As a result of the randomization the mutualinformation from the observed timing due to having ornot having data in the cache could be reduced to 0 ifrandomization is done on every memory access Somesecure caches use randomization to avoid many of the miss-based internal interference vulnerabilities However theymay still suffer from hit-based vulnerabilities especiallywhen the vulnerabilities are related to internal interferenceHowever randomization is also likewise recognized toincrease performance overheads [23] It also requires afast and secure random number generator Most of therandomization is cache-line-based and can be combinedwith differentiation of sensitive data to be more efficient

RP cache allows eviction between different sensitivedata which leaves vulnerabilities such as Vu Va Vu

(slow) (one type of Bernsteinrsquos Attack [3]) still possiblewhile Newcache prevents this Both of the RP cacheand Newcache are not able to prevent hit-based internal-interference vulnerabilities such as Ainv

a Vu Va

(fast) (one type of Cache Internal Collision [5]) RandomFill cache is able to use total de-correlation of memoryaccess and cache access of victimrsquos security critical datato prevent most of the internal and external interferenceHowever when security critical data is initially directlyloaded into the cache block for Step 1 Random Fill cachewill not randomly load security critical data and allowsvulnerabilities such as Vu V inv

a Vu (slow) (one typeof Flush + Time) vulnerability to exist CEASER cache usesencryption scheme plus dynamic remapping to randomizemapping from memory addresses to cache sets Howeverthis targets eviction-based attacks and cannot prevent hit-based vulnerabilities such as Va V inv

u V inva (fast)

(one type of Flush + Probe Invalidation) SCATTER cacheencrypts both the cache address and process ID whenmapping into different cache index to further prevent morehit-based vulnerabilities for shared and read only memoryNon Deterministic Cache totally randomizes timing ofcache accesses by adding delays and can prevent all attacks(but at tremendous performance cost)

61 Estimated Performance and Security Tradeoffs

Table 6 shows the implementation and performanceresults of the secure caches as listed by the designersin the different papers At the extreme end there is

Table6

esrsquo

the Non Deterministic cache with random delay thesecure cache can prevent all the cache timingndashbasedvulnerabilities in some degreemdashwhile their paper reportsonly 7 degradation in performance we expect it tobe much more for more complex application than AESalgorithm Disabling caches eliminates the attacks but ata huge performance cost Normally a secure cache needsto sacrifice some performance in order to de-correlatememory access with the timing The secure caches thattend to be able to prevent more vulnerabilities usuallyhave weaker performance compared with other securecaches For example more security seems to imply lessperformance

62 Towards Ideal Secure Cache

Based on the above analysis a good secure cache shouldconsider all the 72 types of Strong vulnerabilities egexternal and internal interference and hit-based and miss-based vulnerabilities Considering all factors and basedon Tables 4 and 5 we have several suggestions andobservations for a secure cache design which can defendtiming-based attacks

bull Internal interference is important for caches to preventtiming-based attacks and is the weak point of most ofthe secure caches To prevent this the following threesubpoints should be considered

ndash Miss-based internal interference can be solvedby randomly evicting data to de-correlatememory access with timing information wheneither data to be evicted or data to be cachedis sensitive eg Newcache prevents Vu Va Vu (slow) (one type of BernsteinrsquosAttack [3]) vulnerability

ndash Hit-based internal interference can be solvedby randomly bringing data into the cache egRandom Fill cache prevents Ad Vu Va

(fast) (Cache Internal Collision) vulnerabilityndash To limit internal interference at lower perfor-

mance cost rather than simply assume all ofvictimrsquos data is sensitive it is better to differ-entiate real sensitive data from other data in thevictim code However identification of sen-sitive information needs to be carefully usedeg Random Fill cache is vulnerable to Vu Ad Vu (fast) (one type of Evict + Time [34])vulnerability which most of the secure cachesare able to prevent

bull Direct partitioning between the victim and the attackeralthough may hurt cache space utilization or perfor-mance is good at disallowing attacker to set known

initial state to victimrsquos partition and therefore pre-vents external interference Alternatively careful use ofrandomization can also prevent external interference

It should be noted that some cache designs only focuson certain levels eg CATalyst cache only works at thelast level cache In order to fully protect the whole cachesystem from timing-based attacks all levels of caches in thehierarchy should be protected with related security featuresFor example Sanctum is able to prevent all levels of cachesfrom L1 to last-level cache Consequently secure cachedesign needs to be realizable at all levels of the cachehierarchy

7 RelatedWork

There are a lot of existing attacks exploring timing-based cache channels eg [1 3 5 16ndash19 34 36 5455] Furthermore our recent paper [11] has summarizedcache timingndashbased side-channel vulnerabilities using athree-step model and inspired this work on checkingwhich vulnerability types are truly defeated by the securecaches in context of timing-based attacks In other workZhang and Lee [59] used finite-state machine to modelcache architectures and leveraged mutual information tomeasure potential side-channel leakage of the modeledcache architectures Meanwhile He and Lee [20] modeledinterference using probabilistic information flow graph andused attackerrsquos success probability to estimate differentcachesrsquo ability to defend against some cache timingndashbasedside-channel attacks However they did not explore allpossible vulnerabilities due to cache timingndashbased channels

There is also some other work focusing on cacheside-channel verification [13 14 47] Among theseCacheAudit [14] efficiently computes possible side-channelobservations using abstractions in a modular way Bit-level and arithmetic reasoning is used in [13] for memoryaccesses in the presence of dynamic memory allocationCacheD [47] detects potential cache differences at eachprogram point leveraging symbolic execution and constraintsolving

Hardware transactional memory has also been lever-aged to prevent timing-based cache side-channel attacks [815] Hardware transactional memory (HTM) is availableon modern commercial processors such as Intelrsquos Transac-tional Synchronization Extensions (TSX) Its main featureis to abort the transaction and roll back the modificationswhenever a cache block contained in the read set or writeset is evicted out of the cache In [15] HTM was com-bined with preloading strategy for code and data to preventFlush + Reload attacks in the local setting and Prime andProbe attacks in the cloud setting In [8] the software-level

solution targets system calls page faults code refactoringand abort reasoning to eliminate not only Prime + ProbeFlush + Reload but also Evict + time and Cache Collisionattacks

8 Conclusion

This paper first proposed a new three-step model in orderto model all possible cache timing vulnerabilities It furtherprovided a cache three-step simulator and reduction rulesto derive effective vulnerabilities allowing us to find onesthat have not been exploited in literature With exhaustiveeffective vulnerability types listed this paper presentedanalysis of 18 secure processor cache designs with respectto how well they can defend against these timing-basedvulnerabilities Our work showed that vulnerabilities basedon internal interference of the victim application aredifficult to protect against and many secure cache designsfail in this We also provided a summary of secure processorcache features that could be integrated to make an idealsecure cache that is able to defend timing-based attacksOverall implementing a secure cache in a processor can be aviable alternative to defend timing-based attacks Howeverit requires design of an ideal secure cache or correction ofexisting secure cache designs to eliminate the few attacksthat they do not protect against

Acknowledgments This work was support in part by NSF grants1651945 and 1813797 and through Semiconductor Research Corpo-ration (SRC) under GRC Task 2844001

Appendix A Attack Strategies Descriptions

This appendix gives overview of the attack strategies shownin Tables 2 and 3 in Section 3 For each attack strategy anoverview of the three steps of the strategy is given Some ofthe strategies are similar and some may not be precise butwe keep and use the original names as they were assignedin prior work One advantage of our three-step model isthat it gives precise definition of each attack Neverthelessthe attack strategy names used before (and added by us forstrategies which did not have such names) may be useful torecall the attacksrsquo high-level operation

Cache Internal Collision In Step 1 cache blockrsquos data isinvalidated by flushing or eviction done by either theattacker or the victim Then the victim accesses secret datain Step 2 Finally the victim accesses data at a knownaddress in Step 3 if there is a cache hit then it reveals thatthere is an internal collision and leaks value of u

Flush + Reload In Step 1 either the attacker or the victiminvalidates the cache blockrsquos data by flushing or evictionThen the victim access secret data in Step 2 Finally theattacker tries to access some data in Step 2 using a knownaddress If a cache hit is observed then addresses from lasttwo steps are the same and the attacker learns the secretaddress This strategy has similar Step 1 and Step 2 asCache Internal Collision vulnerability but for Step 3 it isthe attacker who does the reload access

Reload + Time (New Name Assigned in this Paper) In Step

1 secret data is invalidated by the victim Then the attackerdoes some known data access in Step 2 that could possiblybring back the invalidated the victimrsquos secret data in Step

1 In Step 3 if the victim reloads the secret data a cachehit is observed and the attacker can derive the secret datarsquosaddress

Flush + Probe (New Name Assigned in this Paper) In Step

1 the victim or the attacker access some known addressIn Step 2 the victim invalidates secret data In Step 3reloading of Step 1rsquos data and observation of a cache misswill help the attacker learn that the secret data maps to theknown address from Step 1

Evict + Time In Step 1 some victimrsquos secret data is put intothe cache by the victim itself In Step 2 the attacker evicts aspecific cache set by performing a memory related operationthat is not a flush In Step 3 the victim reloads secret dataand if a cache miss is observed the will learn the secretdatarsquos cache set information This attack has similar Step 1and Step 3 as Flush + Time vulnerability but for Step 2 inEvict + Time the attacker invalidates some known addressallowing it to find the full address of the secret data insteadof evicting a cache set to only find the secret datarsquos cacheindex as in the Flush + Time attack

Prime + Probe In Step 1 the attacker primes the cache setusing data at address known to the attacker In Step 2 thevictim accesses the secret data which possibly evicts datafrom Step 1 In Step 3 the attacker probes each cache setand if a cache miss is observed the attacker knowns thesecret data maps to the cache set he or she primed

Bernsteinrsquos Attack This attack strategy leverages the vic-timrsquos internal interference to trigger the miss-based attackFor one case the victim does the same secret data access inStep 1 and Step 3 while in Step 2 the victim tries to evictone whole cache setrsquos data by known data accesses If cachemiss is observed in Step 3 that will tell the attacker thecache set is the one secret data maps to For another casethe victim primes and probe a cache set in Step 1 and Step

3 driven by the attacker while in Step 2 the victim tries to

access the secret data Similar to the first case observingcache miss in Step 3 tells the attacker the cache set is theone secret data maps to

Evict + Probe (NewName Assigned in this Paper) In Step 1victim evict the cache set using the access to a data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step

3 the attacker probes each cache set using the same data asin Step 1 if a cache miss is observed the attacker knownsthe secret data maps to the cache set he or she primed Thisattack strategy has similar Step 2 and Step 3 as Prime +Probe attack but for Step 1 it is the victim that does theeviction accesses

Prime + Time (NewName Assigned in this Paper) In Step 1the attacker primes the cache set using access to data at anaddress known to the attacker In Step 2 the victim accessessecret data which possibly evicts data from Step 1 In Step

3 the victim probes each cache set using the same data Step

1 if a cache miss is observed the attacker knowns the secretdata maps to the cache set he or she primed in Step 1 Thisattack strategy has similar Step 1 and Step 2 as Prime +Probe attack but for Step 3 it is the victim that does theprobing accesses

Flush + Time (NewNameAssigned in this Paper) The victimaccesses the same secret data in Step 1 and Step 3 whilein Step 2 the attacker tries to invalidate data at a knownaddress If cache miss is observed in Step 3 that will tellthe attacker the data address he or she invalidated in Step 2maps to the secret data

Invalidation related (new names assigned in this paper)Vulnerabilities that have names ending with ldquoinvalidationrdquoin Table 3 correspond to the vulnerabilities that havethe same name (except for the ldquoinvalidationrdquo part) inTable 2 The difference between each set of correspondingvulnerabilities is that the vulnerabilities ending withldquoinvalidationrdquo use invalidation related operation in the laststep to derive the timing information rather than the normalmemory access related operations

Appendix B Soundness Analysisof the Three-StepModel

In this section we analyze the soundness of the three-stepmodel to demonstrate that the three-step model can coverall possible timing-based cache vulnerabilities in normalcaches If there is a vulnerability that is represented usingmore than three steps the steps can be reduced to only three

steps or a three-step sub-pattern can be found in the longerrepresentation

In the below analysis we use β to denote the numberof memory-related operations ie steps in a representationof a vulnerability We show that β = 1 is not sufficient torepresent a vulnerability β = 2 covers some vulnerabilitiesbut not all β = 3 represents all the vulnerabilities andβ gt 3 can be reduced to only three steps or a three-step sub-pattern can be found in the longer representation Knownaddresses refer to all the cache states that interferencewith the data a aalias and d Unknown address refersu An access to a known memory address is denoted asknown access operation and an invalidation of a knownmemory address is denoted as known inv operationThe known access operation and known inv operation

together make up not u operations An unknown memoryrelated operation (containing u) is denoted as u operation

B1 Patterns with β = 1

When β = 1 there is only one memory-relatedoperation and it is not possible to create interferencebetween memory-related operations since two memory-related operations are the minimum requirement for aninterference Furthermore β = 1 corresponds to the three-step pattern with both Step 1 and Step 2 being since thecache state gives no information and Step 3 being theone operation These types of patterns are all examined bythe cache three-step simulator and none of these types arefound to be effective Consequently a vulnerability cannotexit when β = 1

When β = 2 it satisfies the minimum requirementof an interference for memory related operations andcorresponds to the three-step cases where Step 1 is and Step 2 and Step 3 are the two operations Thesetypes are all examined by the cache three-step simulatorand some of them belong to Weak Vulnerabilities like Aa Vu Therefore three-step cases where Step

1 is have corresponding effective vulnerabilities shownin Table 2 Consequently β = 2 can represent someweak vulnerabilities but not all vulnerabilities as there existsome that are represented with three steps as discussednext

When β = 3 we have tested all possible combinations ofthree-step memory related operations in Section 33 usingour cache simulator for the three-step model We foundthat there are in total 72 types of Strong Vulnerabilities

and 64 types of Weak Vulnerabilities that are representedby patterns with β = 3 steps Consequently β = 3can represent all the vulnerabilities (including some weakones where Step 1 is ) Using more steps to representvulnerabilities is not necessary as discussed next

B4 Patterns with β gt 3

When β gt 3 the pattern of memory-related operations fora vulnerability can be reduced using the following rules

B41 Subdivision Rules

First a set of subdivision rules is used to divide the longpattern into shorter patterns following the below rulesEach subdivision rule should be applied recursively beforeapplying the next rule

Subdivision Rule 1 If the longer pattern contains a sub-pattern such as the longer pattern canbe divided into two separate patterns where is assignedas Step 1 of the second pattern This is because givesno timing information and the attacker loses track of thecache state after This rule should be recursively applieduntil there are no sub-patterns left with a in the middleor as last step ( in the last step will be deleted) in thelonger pattern

Subdivision Rule 2 Next if a pattern (derived afterrecursive application of the Rule 1 contains a sub-patternsuch as AinvVinv the longer pattern canbe divided into two separate patterns where AinvVinv isassigned as Step 1 of the second pattern This is becauseAinvVinv will flush all the timing information of thecurrent block and it can be used as the flushing step forStep 1 eg vulnerability Ainv Vu Aa(f ast)shown in Table 2 AinvVinv cannot be a candidate formiddle steps or the last step because it will flush alltiming information making the attacker unable to deducethe final timing with victimrsquos sensitive address translationinformation This rule should be recursively applied untilthere are no sub-patterns left with a AinvVinv in themiddle or the last step (AinvVinv in the last step will bedeleted)

B42 Simplification Rules

For each of the patterns resulting from the subdivisionof the original pattern we define Commute Rules UnionRules and Reduction Rules for a each set of two adjacentsteps in these remaining patterns In Table 7 we showall the possible cases of the rule applying conditions for

each adjacent two steps regardless of the attackerrsquos access(A) or the victimrsquos access (V ) The table shows whetherthe corresponding two steps can be commuted reduced orunioned (and the reduced or the unioned result if the rulescan be applied)

B421 Commute Rules

Suppose there are two adjacent steps M and N for a memorysequences M N If commuting M andN lead to the same observation result ie M N and N M will have thesame timing observation information in the final step for theattacker we can freely exchange the place of M and N inthis pattern In this case we have more chance to Reduceand Union the steps within the memory sequence by thefollowing Rules In the possible commuting process we willtry every possible combinations to commute different pairsof two steps that are able to apply the Commute Rules andthen further apply Reduce Rules and Union Rules to seewhether the commute is effective ie there can be stepsreduced or unioned after the proper commuting process Thefollowing two adjacent memory-related operations can becommuted

ndash Commute Rule 1 For two adjacent steps if one stepis a known access operation and another step is aknown inv operation and the addresses they referto are different these two steps can be commuted nomatter which position of the two steps they are in withinthe whole memory sequence It will show a ldquoyesrdquo forthe corresponding two-step pattern for the CommuteRule 1 column if these two can be commuted in Table 7

ndash Commute Rule 2 A superset of two-step patterns thatcan apply Commute Rule 1 can be commuted if thesecond step of these two adjacent steps is not thelast step in the whole memory sequence There aresome two adjacent steps that can only be commutedif the second step of these two adjacent steps is notthe last step in the whole memory sequence Therewill be a ldquoyesrdquo for the corresponding two-step patternfor the Commute Rule 2 column and a ldquonordquo for thecorresponding two-step pattern for the Commute Rule 1column in Table 7

B422 Reduction Rules

If the memory sequence after applying Commute Ruleshave a sub-pattern that has two adjacent steps both relatedto known addresses or both related to unknown address(including repeating states) the two adjacent steps can bereduced to only one following the reduction rules (if thetwo-step pattern has ldquoyesrdquo for the Column ldquoUnion Rule or

Table7

ReduceRule

nominus

minusa

nominus

minusa

minusd

nominus

minusd

nominus

minusu

nominus

minusu

nominus

minusu

nominus

Reduce Rulerdquo and has no Union result for the ldquoCombinedSteprdquo Column in Table 7

ndash Reduction Rule 1 For two u operations although u isunknown both of the accesses target on the same u socan be reduced to only keep the second access in thememory sequence

ndash Reduction Rule 2 For two known adjacent memoryaccessndashrelated operations (known access operation)they always result in a deterministic state of the secondmemory accessndashrelated cache block so these two stepscan be reduced to only one step

ndash Reduction Rule 3 For two adjacent steps if onestep is known access operation and another one isknown inv operation no matter what order theyhave and the address they refer to is the same these twocan be reduced to one step which is the second step

B423 Union Rules

Suppose there are two adjacent steps M and N for memorysequences M N If combining M andN leads to the same timing observation result ie M N and Union(M N) will havethe same timing observation information in the final step forthe attacker we can combine step M and N to be a joint onestep for this memory sequence defined as Union(M N)Two adjacent steps that can be combined are discussed inthe following cases

ndash Union Rule 1 Two invalidations to two knowndifferent memory addresses can be applied UnionRule 1 known inv operation are two operations bothinvalidating some known address therefore they canbe combined to only one step The Union Rule can becontinuously done to union all the adjacent invalidationstep that invalidates known different memory addresses

B424 Final Check Rules

Each long memory sequence will recursively apply thesethree categorizations of the rules in the order Com-mute Rules first to put known access operations andknown inv operation that targets the same address as nearas possible and u operations and not u operations areputting together as much as possible The Reduced Rulesare then checked and applied to the processed memorysequence to reduce the steps Then the Union Rule isapplied to the processed memory sequence

The recursion at each application to these threecategorizations of the rules should be always applied andreduce at least one step until the resulting sequence matchesone of the two possible cases

bull the long (β gt 3) memory sequence with u operation

and not u operation is further reduced to a sequencewhere there are at most three steps in the followingpatterns or less

ndash u operation not u operation u operation

ndash not u operation u operation not u operation

There might be possible extra or AinvV inv beforethese three-step pattern where

bull An extra in the first step will not influencethe result and can be directly removed

bull If an extra AinvV inv in the first step

ndash If followed by known access ope-ration AinvV inv can be removeddue to the actual state further put intothe cache block

ndash If followed by known inv opera-t ion or V inv

u AinvV inv can also beremoved since the memory locationis repeatedly flushed by the two steps

ndash If followed by Vu worst case will beAinvV inv Vu not u operation u operationwhich is either an effective vul-nerability according to Table 2and reduction rules shown inSection 33 or AinvV inv Vu Ainv

d V invd u operation where

Vu Ainvd V inv

d can further beapplied Commute Rule 2 to reduceand be within three steps

In this case the steps are finally within three steps andthe checking is done

bull There exist two adjacent steps that cannot be appliedany Rules above and requires the Rest Checking

The only left two adjacent steps that cannot be appliedby any of the three categorizations of the Rules are thefollowing

ndash AaVaAaalias Vaalias AdVdAinva V inv

Ainvaalias V inv

aalias Vu ndash AaVaAaalias Vaalias V inv

u ndash Vu AaVaAaalias Vaalias AdVd

Ainva V inv

a Ainvaalias V inv

aalias ndash V inv

u AaVaAaalias Vaalias We manually checked all of the two adjacent step patterns

above and found that adding extra step before or after these

Algorithm 2 -Step ( 3) pattern reduction

Input number of steps of the pattern

a two-dimensional dynamic-size array [0] contains the states of each step of the original pattern in

order [1] [2] are empty initially

Output reduced effective vulnerability pattern(s) array It will be an empty list if the original pattern does not

correspond to an effective vulnerability

1 = Oslash

2 while contain( ) and index not 0 do3 = 1 ( )

4 end while5 while ( contain( ) and index not 0) or ( contain( ) and index not 0) do6 = 2 ( )

7 end while8 while ( is ineffecitve or has interval effective three steps) do9 = ( )

10 = ( )

11 = ( )

12 if ( is ineffecitve or has interval effective three steps) then13 += ( )

14 end if15 end while16 return

two steps can either generate two adjacent step patterns thatbe processed by the three Rules where further step can bereduced or construct effective vulnerability according toTable 2 and reduction rules shown in Section 33 wherethe corresponding pattern can be treated effective and thechecking is done

B43 Algorithm for Reducing and CheckingMemorySequence

Algorithm 2 is used to (i) reduce a β-step (β gt 3)pattern to a three-step pattern thus demonstrating that thecorresponding β gt 3 step pattern actually is equivalent tothe output three-step pattern and represents a vulnerabilitythat is captured by an existing three-step pattern or (ii)demonstrate that the β-step pattern can be mapped to one ormore three-step vulnerabilities It is not possible for a β-stepvulnerability pattern to not be either (i) or (ii) after doing theRule applications Key outcome of our analysis is that anyβ-step pattern is not a vulnerability or if it is a vulnerabilityit maps to either outputs (i) or (ii) of the algorithm

Inside the Algorithm 2 contain() represents a function tocheck if a list contains a corresponding state is ineffective()represents a function that checks the corresponding memorysequence does not contain any effective three-stepshas interval effective three steps() represents a functionthat check if the corresponding memory sequence can bemapped to one or more three-step vulnerabilities

B44 Summary

In conclusion the three-step model can model all possibletiming-based cache vulnerability in normal caches Vulner-abilities which are represented by more than three steps canbe always reduced to one (or more) vulnerabilities from ourthree-step model and thus using more than three step is notnecessary

References

1 Acıicmez O Koc CK (2006) Trace-driven cache attacks on AES(short paper) In International conference on information andcommunications security Springer pp 112ndash121

2 Agrawal D Archambeault B Rao JR Rohatgi P (2002) TheEM side-channel (s) In International workshop on cryptographichardware and embedded systems Springer pp 29ndash45

3 Bernstein DJ (2005) Cache-timing attacks on AES4 Binkert N Beckmann B Black G Reinhardt SK Saidi A Basu

A Hestness J Hower DR Krishna T Sardashti S et al (2011)The gem5 simulator ACM SIGARCH computer architecture news39(2)1ndash7

5 Bonneau J Mironov I (2006) Cache-collision timing attacksagainst AES In International workshop on cryptographichardware and embedded systems Springer pp 201ndash215

6 Borghoff J Canteaut A Guneysu T Kavun EB Knezevic MKnudsen LR Leander G Nikov V Paar C Rechberger Cet al (2012) Princendasha low-latency block cipher for pervasivecomputing applications In International conference on the theoryand application of cryptology and information security Springerpp 208ndash225

7 Bourgeat T Lebedev I Wright A Zhang S Devadas S et al (2019)MI6 Secure enclaves in a speculative out-of-order processorIn Proceedings of the 52nd Annual IEEEACM InternationalSymposium on Microarchitecture ACM pp 42ndash56

8 Chen S Liu F Mi Z Zhang Y Lee RB Chen H Wang X(2018) Leveraging hardware transactional memory for cache side-channel defenses In Proceedings of the 2018 on Asia conferenceon computer and communications security ACM pp 601ndash608

9 Clark SS Ransford B Rahmati A Guineau S Sorber J Xu WFu K (2013) Wattsupdoc power side channels to nonintrusivelydiscover untargeted malware on embedded medical devices InPresented as part of the 2013 USENIX workshop on healthinformation technologies

10 Costan V Lebedev IA Devadas S (2016) Sanctum minimalhardware extensions for strong software isolation In USENIXsecurity symposium pp 857ndash874

11 Deng S Xiong W Szefer J (2018) Cache timing side-channel vulnerability checking with computation tree logic InProceedings of the 7th international workshop on hardware andarchitectural support for security and privacy ACM p 2

12 Domnitser L Jaleel A Loew J Abu-Ghazaleh N Ponomarev D(2012) Non-monopolizable caches low-complexity mitigation ofcache side channel attacks ACM Transactions on Architectureand Code Optimization (TACO) 8(4)35

13 Doychev G Kopf B (2017) Rigorous analysis of softwarecountermeasures against cache attacks In Proceedings of the 38thACM SIGPLAN conference on programming language design andimplementation ACM pp 406ndash421

14 Doychev G Kopf B Mauborgne L Reineke J (2015) CacheAudita tool for the static analysis of cache side channels ACM Trans-actions on information and system security (TISSEC) 18(1)4

15 Gruss D Lettner J Schuster F Ohrimenko O Haller I CostaM (2017) Strong and efficient cache side-channel protectionusing hardware transactional memory In USENIX securitysymposium pp 217ndash233

16 Gruss D Maurice C Wagner K Mangard S (2016) Flush+ flusha fast and stealthy cache attack In International conference ondetection of intrusions and malware and vulnerability assessmentSpringer pp 279ndash299

17 Gruss D Spreitzer R Mangard S (2015) Cache template attacksautomating attacks on inclusive last-level caches In USENIXsecurity symposium pp 897ndash912

18 Guanciale R Nemati H Baumann C Dam M (2016) Cache stor-age channels alias-driven attacks and verified countermeasuresIn 2016 IEEE symposium on security and privacy (SP) IEEEpp 38ndash55

19 Gullasch D Bangerter E Krenn S (2011) Cache gamesndashbringingaccess-based cache attacks on AES to practice In 2011 IEEEsymposium on security and privacy (SP) IEEE pp 490ndash505

20 He Z Lee RB (2017) How secure is your cache against side-channel attacks In Proceedings of the 50th annual IEEEACMinternational symposium on microarchitecture ACM pp 341ndash353

21 Intel C (2015) Improving real-time performance by utilizing cacheallocation technology Intel Corporation

22 Kayaalp M Khasawneh KN Esfeden HA Elwell J Abu-Ghazaleh N Ponomarev D Jaleel A (2017) RIC relaxed inclusioncaches for mitigating LLC side-channel attacks In 2017 54thACMEDACIEEE design automation conference (DAC) IEEEpp 1ndash6

23 Keramidas G Antonopoulos A Serpanos DN Kaxiras S (2008)Non deterministic caches a simple and effective defense againstside channel attacks Des Autom Embed Syst 12(3)221ndash230

24 Kessler RE Hill MD (1992) Page placement algorithms for largerealindexed caches ACM Trans Comput Syst (TOCS) 10(4)338ndash359

25 Kiriansky V Lebedev I Amarasinghe S Devadas S Emer J(2018) DAWG a defense against cache timing attacks in spec-ulative execution processors In 2018 51st Annual IEEEACMinternational symposium on microarchitecture (MICRO) IEEEpp 974ndash987

26 Kocher P Genkin D Gruss D Haas W Hamburg M LippM Mangard S Prescher T Schwarz M Yarom Y (2018)Spectre attacks exploiting speculativeexecution In 2019 IEEESymposium on Security and Privacy (SP) IEEE pp 1ndash19

27 Lee RB Kwan P McGregor JP Dwoskin J Wang Z (2005)Architecture for protecting critical secrets in microprocessorsIn ACM SIGARCH computer architecture news vol 33 IEEEComputer Society pp 2ndash13

28 Lee Y Waterman A Avizienis R Cook H Sun C StojanovicV Asanovic K (2014) A 45nm 13 GHz 167 double-precisionGFLOPSW RISC-V processor with vector accelerators InEuropean solid state circuits conference (ESSCIRC) ESSCIRC2014-40th IEEE pp 199ndash202

29 Liu F Ge Q Yarom Y Mckeen F Rozas C Heiser G Lee RB(2016) CATalyst defeating last-level cache side channel attacks incloud computing In 2016 IEEE international symposium on highperformance computer architecture (HPCA) IEEE pp 406ndash418

30 Liu F Lee RB (2014) Random fill cache architecture In2014 47th annual IEEEACM international symposium onmicroarchitecture (MICRO) IEEE pp 203ndash215

31 Liu F Wu H Mai K Lee RB (2016) Newcache secure cachearchitecture thwarting cache side-channel attacks IEEE Micro36(5)8ndash16

32 Luk CK Cohn R Muth R Patil H Klauser A Lowney G WallaceS Reddi VJ Hazelwood K (2005) Pin building customizedprogram analysis tools with dynamic instrumentation In ACMsigplan notices vol 40 ACM pp 190ndash200

33 Masti RJ Rai D Ranganathan A Muller C Thiele L Capkun S(2015) Thermal covert channels on multi-core platforms In 24thUSENIX security symposium (USENIX security 15) pp 865ndash880

34 Osvik DA Shamir A Tromer E (2006) Cache attacks andcountermeasures the case of AES In Cryptographersrsquo track atthe RSA conference Springer pp 1ndash20

35 Patel A Afram F Chen S Ghose K (2011) MARSS Afull system simulator for multicore x86 CPUs In 2011 48thACMEDACIEEE design automation conference (DAC) IEEEpp 1050ndash1055

36 Percival C (2005) Cache missing for fun and profit BSDCan 2005Ottawa 2005 httpwwwdaemonologynetpapershttpdf

37 Hammarlund P Martinez AJ Bajwa AA Hill DL Hallnor EJiang H Dixon M Derr M Hunsaker M Kumar R Osborne RB(2014) Haswell The fourth-generation intel core processor IEEEMicro 1034(2)6ndash20

38 Qureshi MK (2018) CEASER mitigating conflict-based cacheattacks via encrypted-address and remapping In 2018 51Stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 775ndash787

39 Sanchez D Kozyrakis C (2013) ZSIm fast and accuratemicroarchitectural simulation of thousand-core systems In ACMSIGARCH computer architecture news vol 41 ACM pp 475ndash486

40 Schwarz M Schwarzl M Lipp M Gruss D (2018) NetSpectreread arbitrary memory over network In European Symposium onResearch in Computer Security Springer pp 279ndash299

41 Seznec A (1993) A case for two-way skewed-associative cachesACM SIGARCH computer architecture news 21(2)169ndash178

42 Sharkey J Ponomarev D Ghose K (2005) M-sim a flexiblemultithreaded architectural simulation environment Techenicalreport Department of Computer Science State University ofNew York at Binghamton In Technical Report 20012 CompaqComputer Corporation Aug 2001

43 Shivakumar P Jouppi NP (2001) Cacti 30 an integrated cachetiming power and area model In Technical Report 20012Compaq Computer Corporation Aug 2001

44 Taylor G Davies P Farmwald M (1990) The TLB slice-alow-cost high-speed address translation mechanism In 17thannual international symposium on computer architecture 1990Proceedings IEEE pp 355ndash363

45 Thoziyoor S Muralimanohar N Ahn JH Jouppi NP (2008)CACTI 51 Technical Report HPL-2008-20 HP Labs

46 Trippel C Lustig D Martonosi M (2018) MeltdownPrimeand SpectrePrime automatically-synthesized attacks exploitinginvalidation-based coherence protocols arXiv180203802

47 Wang S Wang P Liu X Zhang D Wu D (2017) CacheDidentifying cache-based timing channels in production softwareIn 26th USENIX security symposium USENIX association

48 Wang Y Ferraiuolo A Zhang D Myers AC Suh GE (2016)SecDCP secure dynamic cache partitioning for efficient timingchannel protection In 2016 53nd ACMEDACIEEE designautomation conference (DAC) IEEE pp 1ndash6

49 Wang Z Lee RB (2007) New cache designs for thwarting softwarecache-based side channel attacks In ACM SIGARCH computerarchitecture news vol 35 ACM pp 494ndash505

50 Wang Z Lee RB (2008) A novel cache architecture withenhanced performance and security In 2008 41st IEEEACMinternational symposium on Microarchitecture 2008 MICRO-41IEEE pp 83ndash93

51 Werner M Unterluggauer T Giner L Schwarz M Gruss D Man-gard S (2019) ScatterCache thwarting cache attacks via cache setrandomization In 28th USENIX security symposium (USENIXSecurity 19) USENIX Association Santa Clara CA httpswwwusenixorgconferenceusenixsecurity19presentationwerner

52 Yan M Choi J Skarlatos D Morrison A Fletcher C Torrellas J(2018) InvisiSpec making speculative execution invisible in thecache hierarchy In 2018 51st annual IEEEACM internationalsymposium on microarchitecture (MICRO) IEEE pp 428ndash441

53 Yan M Gopireddy B Shull T Torrellas J (2017) Secure hierarchy-aware cache replacement policy (SHARP) defending againstcache-based side channel attacks In Proceedings of the 44thannual international symposium on computer architecture ACMpp 347ndash360

54 Yao F Doroslovacki M Venkataramani G (2018) Are coherenceprotocol states vulnerable to information leakage In 2018IEEE international symposium on high performance computerarchitecture (HPCA) IEEE pp 168ndash179

55 Yarom Y Falkner K (2014) FLUSH+ RELOAD a high resolutionlow noise L3 cache side-channel attack In USENIX securitysymposium pp 719ndash732

56 Zhang D Askarov A Myers AC (2012) Language-based controland mitigation of timing channels ACM SIGPLAN Not 47(6)99ndash110

57 Zhang D Wang Y Suh GE Myers AC (2015) A hardware designlanguage for timing-sensitive information-flow security In ACMSIGARCH computer architecture news vol 43 ACM pp 503ndash516

58 Zhang S Wright A Bourgeat T Arvind A (2018) Composablebuilding blocks to open up processor design In 2018 51stannual IEEEACM international symposium on microarchitecture(MICRO) IEEE pp 68ndash81

59 Zhang T Lee RB (2014) New models of cache architecturescharacterizing information leakage from cache side channels InProceedings of the 30th annual computer security applicationsconference ACM pp 96ndash105

60 Zhang Y Parikh D Sankaranarayanan K Skadron K Stan M(2003) HotLeakage a temperature-aware model of subthresholdand gate leakage for architects University of Virginia Dept ofComputer Science Tech Report CS-2003 5

Publisherrsquos Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations

Analysis of Secure Caches Using a Three-Step Model for Timing-Based Attacks

Abstract

Introduction

Contributions

Cache TimingndashBased Attacks and the Threat Model

Threat Model

Side and Covert Channels

Hyperthreading Versus Timing-Slice Sharing

Modeling of the Cache TimingndashBased Side-Channel Vulnerabilities

Introduction of the Three-Step Model

States of the Three-Step Model

Derivation of All Cache TimingndashBased Vulnerabilities

Cache Three-Step Simulator

Reduction Rules

Categorization of StrongVulnerabilities

Secure Caches

SP Cache

SecVerilog Cache

SecDCP Cache

NoMo Cache

SHARP Cache

Sanctum Cache

MI6 Cache

InvisiSpec Cache

CATalyst Cache

DAWG Cache

RIC Cache

PL Cache

RP Cache

Newcache Cache

Random Fill Cache

CEASER Cache

SCATTER Cache

Non Deterministic Cache

Analysis of the Secure Caches

Effectiveness of the Secure Caches Against Timing-Based Attacks

Secure Cache Techniques

Differentiating Sensitive Data

Partitioning-Based Caches

Randomization-Based Caches

Estimated Performance and Security Tradeoffs

Towards Ideal Secure Cache

Related Work

Conclusion

Acknowledgments

Appendix A A Attack Strategies Descriptions

B Soundness Analysis of the Three-Step Model

Appendix B B Soundness Analysis of the Three-Step Model

B1 Patterns with =1

B2 Patterns with =2

B3 Patterns with =3

B4 Patterns with gt3



B421 Commute Rules


B423 Union Rules


B43 Algorithm for Reducing and Checking Memory Sequence

B44 Summary

References

Publishers Note